Writing AlexNet from Scratch in PyTorch

Sep 14, 2024 09:50 AM - 4 months ago 138702

Introduction

This station is successful continuation of nan bid connected building classical and astir celebrated convolutional neural networks from scratch successful PyTorch. You tin spot nan erstwhile station here, wherever we built LeNet5. In this post, we will build AlexNet, 1 of nan astir celebrated and earliest breakthrough algorithms successful machine vision.

We will commencement by investigating and knowing nan architecture of AlexNet. We will past dive consecutive into codification by loading our dataset, CIFAR-10, earlier jumping successful by applying immoderate pre-processing to nan data. Then, we will build our AlexNet from scratch utilizing PyTorch and train it connected our pre-processed data. The trained exemplary will beryllium tested connected unseen (test) information for information purposes astatine nan end.

Prerequisites

Knowledge of neural networks will beryllium adjuvant successful knowing this article. This would encompass being acquainted pinch nan different layers of neural networks (input layer, hidden layers, output layer), activation functions, optimization algorithms (variants of gradient descent), nonaccomplishment functions, etc. Additionally, familiarity pinch Python syntax and nan PyTorch room is basal for knowing nan codification snippets presented successful this article.

An knowing of CNNs is besides recommended. This includes knowledge of convolutional layers, pooling layers, and their domiciled successful extracting features from input data. Understanding concepts for illustration stride, padding, and nan effect of kernel/filter size is besides beneficial.

AlexNet

AlexNet is simply a heavy convolutional neural network, which was initially developed by Alex Krizhevsky and his colleagues backmost successful 2012. It was designed to categorize images for nan ImageNet LSVRC-2010 title wherever it achieved authorities of nan creation results. You tin publication successful item astir nan exemplary successful nan original investigation insubstantial here.

Here, we’ll summarize nan cardinal takeaways astir nan AlexNet network. Firstly, it operated pinch 3-channel images that were (224x224x3) successful size. It utilized max pooling on pinch ReLU activations erstwhile subsampling. The kernels utilized for convolutions were either 11x11, 5x5, aliases 3x3 while kernels utilized for max pooling were 3x3 successful size. It classified images into 1000 classes. It besides utilized aggregate GPUs.

Dataset

Let’s commencement by loading and past pre-processing nan data. For our purposes, we will beryllium utilizing nan CIFAR-10 dataset. The dataset consists of 60000 32x32 colour images successful 10 classes, pinch 6000 images per class. There are 50000 training images and 10000 trial images.

Here are nan classes successful nan dataset, arsenic good arsenic 10 random sample images from each:

image grid

Source: source

The classes are wholly mutually exclusive. There is nary overlap betwixt automobiles and trucks. “Automobile” includes sedans, SUVs, and things of that sort. “Truck” includes only large trucks. Neither includes pickup trucks.

Importing nan Libraries

Let’s commencement by importing nan required libraries on pinch defining a adaptable device, truthful that nan Notebook knows to usage a GPU to train nan exemplary if it’s available.

import numpy arsenic np import torch import torch.nn arsenic nn from torchvision import datasets from torchvision import transforms from torch.utils.data.sampler import SubsetRandomSampler # Device configuration device = torch.device('cuda' if torch.cuda.is_available() other 'cpu')

Loading nan Dataset

Using torchvision (a helper room for machine imagination tasks), we will load our dataset. This method has immoderate helper functions that makes pre-processing beautiful easy and straight-forward. Let’s specify nan functions get_train_valid_loader and get_test_loader, and past telephone them to load successful and process our CIFAR-10 data:

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True): normalize = transforms.Normalize( mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010], ) # specify transforms valid_transform = transforms.Compose([ transforms.Resize((227,227)), transforms.ToTensor(), normalize, ]) if augment: train_transform = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize, ]) else: train_transform = transforms.Compose([ transforms.Resize((227,227)), transforms.ToTensor(), normalize, ]) # load nan dataset train_dataset = datasets.CIFAR10( root=data_dir, train=True, download=True, transform=train_transform, ) valid_dataset = datasets.CIFAR10( root=data_dir, train=True, download=True, transform=valid_transform, ) num_train = len(train_dataset) indices = list(range(num_train)) divided = int(np.floor(valid_size * num_train)) if shuffle: np.random.seed(random_seed) np.random.shuffle(indices) train_idx, valid_idx = indices[split:], indices[:split] train_sampler = SubsetRandomSampler(train_idx) valid_sampler = SubsetRandomSampler(valid_idx) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=batch_size, sampler=train_sampler) valid_loader = torch.utils.data.DataLoader( valid_dataset, batch_size=batch_size, sampler=valid_sampler) return (train_loader, valid_loader) def get_test_loader(data_dir, batch_size, shuffle=True): normalize = transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], ) # specify transform toggle shape = transforms.Compose([ transforms.Resize((227,227)), transforms.ToTensor(), normalize, ]) dataset = datasets.CIFAR10( root=data_dir, train=False, download=True, transform=transform, ) data_loader = torch.utils.data.DataLoader( dataset, batch_size=batch_size, shuffle=shuffle ) return data_loader # CIFAR10 dataset train_loader, valid_loader = get_train_valid_loader(data_dir = './data', batch_size = 64, augment = False, random_seed = 1) test_loader = get_test_loader(data_dir = './data', batch_size = 64)

Let’s break down nan codification into bits:

  • We specify 2 functions get_train_valid_loader and get_test_loader to load train/validation and trial sets respectively
  • We commencement by defining nan adaptable normalize pinch nan mean and modular deviations of each of nan transmission (red, green, and blue) successful nan dataset. These tin beryllium calculated manually, but are besides disposable online since CIFAR-10 is rather popular
  • For our training dataset, we adhd nan action to augment nan dataset arsenic good for much robust training and expanding nan number of images arsenic well. Note: augmentation is only applied to nan training subset and not nan validation and testing subsets arsenic they are only utilized for information purposes
  • We divided nan training dataset into train and validation sets (90:10 ratio), and randomly subset it from nan full training set
  • We specify nan batch size and shuffle nan dataset erstwhile loading, truthful that each batch has immoderate variance successful nan types of labels it has. This will summation nan efficacy of our eventual model
  • Finally, we make usage of information loaders. This mightiness not impact nan capacity successful nan lawsuit of a mini dataset for illustration CIFAR10, but it tin really impede nan capacity successful lawsuit of ample datasets and is mostly considered a bully practice. Data loaders let america to iterate done nan information successful batches, and nan information is loaded while iterating and not each astatine erstwhile successful commencement into your RAM

AlexNet from Scratch

Let’s commencement pinch nan codification first:

class AlexNet(nn.Module): def __init__(self, num_classes=10): super(AlexNet, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0), nn.BatchNorm2d(96), nn.ReLU(), nn.MaxPool2d(kernel_size = 3, stride = 2)) self.layer2 = nn.Sequential( nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(256), nn.ReLU(), nn.MaxPool2d(kernel_size = 3, stride = 2)) self.layer3 = nn.Sequential( nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(384), nn.ReLU()) self.layer4 = nn.Sequential( nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(384), nn.ReLU()) self.layer5 = nn.Sequential( nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(256), nn.ReLU(), nn.MaxPool2d(kernel_size = 3, stride = 2)) self.fc = nn.Sequential( nn.Dropout(0.5), nn.Linear(9216, 4096), nn.ReLU()) self.fc1 = nn.Sequential( nn.Dropout(0.5), nn.Linear(4096, 4096), nn.ReLU()) self.fc2= nn.Sequential( nn.Linear(4096, num_classes)) def forward(self, x): retired = self.layer1(x) retired = self.layer2(out) retired = self.layer3(out) retired = self.layer4(out) retired = self.layer5(out) retired = out.reshape(out.size(0), -1) retired = self.fc(out) retired = self.fc1(out) retired = self.fc2(out) return out

Defining nan AlexNet Model

Let’s dive into really nan supra codification works:

  • The first measurement to defining immoderate neural web (whether a CNN aliases not) successful PyTorch is to specify a people that inherits nn.Module arsenic it contains galore of nan methods that we will request to utilize
  • There are 2 main steps aft that. First is initializing nan layers that we are going to usage successful our CNN wrong __init__ , and nan different is to specify nan series successful which those layers will process nan image. This is defined wrong nan guardant function
  • For nan architecture itself, we first specify nan convolutional layers utilizing nan nn.Conv2D usability pinch nan due kernel size and nan input/output channels. We besides use max pooling utilizing nn.MaxPool2D function. The bully point astir PyTorch is that we tin harvester nan convolutional layer, activation function, and max pooling into 1 azygous furniture (they will beryllium separately applied, but it helps pinch organization) utilizing nan nn.Sequential function
  • Then we specify nan afloat connected layers utilizing linear (nn.Linear) and dropout (nn.Dropout) on pinch ReLu activation usability (nn.ReLU) and combining these pinch nan nn.Sequential function
  • Finally, our past furniture outputs 10 neurons which are our last predictions for nan 10 classes of objects

Setting Hyperparameters

Before training, we request to group immoderate hyperparameters, specified arsenic nan nonaccomplishment usability and nan optimizer to beryllium utilized on pinch batch size, learning rate, and number of epochs.

num_classes = 10 num_epochs = 20 batch_size = 64 learning_rate = 0.005 model = AlexNet(num_classes).to(device) # Loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9) # Train nan model total_step = len(train_loader)

We commencement by defining elemental hyperparameters (epochs, batch size, and learning rate) and initializing our exemplary utilizing nan number of classes arsenic an argument, which successful this lawsuit is 10 on pinch transferring nan exemplary to nan due instrumentality (CPU aliases GPU). Then we specify our costs usability arsenic transverse entropy nonaccomplishment and optimizer arsenic Adam. There are a batch of choices for these, but these thin to springiness bully results pinch nan exemplary and nan fixed data. Finally, we specify total_step to support amended way of steps erstwhile training

Training

We are fresh to train our exemplary astatine this point:

total_step = len(train_loader) for epoch successful range(num_epochs): for i, (images, labels) successful enumerate(train_loader): # Move tensors to nan configured device images = images.to(device) labels = labels.to(device) # Forward pass outputs = model(images) nonaccomplishment = criterion(outputs, labels) # Backward and optimize optimizer.zero_grad() loss.backward() optimizer.step() people ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' .format(epoch+1, num_epochs, i+1, total_step, loss.item())) # Validation pinch torch.no_grad(): correct = 0 full = 0 for images, labels successful valid_loader: images = images.to(device) labels = labels.to(device) outputs = model(images) _, predicted = torch.max(outputs.data, 1) full += labels.size(0) correct += (predicted == labels).sum().item() del images, labels, outputs print('Accuracy of nan web connected nan {} validation images: {} %'.format(5000, 100 * correct / total))

Let’s spot what nan codification does:

  • We commencement by iterating done nan number of epochs, and past nan batches successful our training data
  • We person nan images and nan labels according to nan instrumentality we are using, i.e., GPU aliases CPU
  • In nan guardant pass, we make predictions utilizing our exemplary and cipher nonaccomplishment based connected those predictions and our existent labels
  • Next, we do nan backward walk wherever we really update our weights to amended our model
  • We past group nan gradients to zero earlier each update utilizing optimizer.zero_grad() function
  • Then, we cipher nan caller gradients utilizing nan loss.backward() function
  • And finally, we update nan weights pinch nan optimizer.step() function
  • Also, astatine nan extremity of each epoch we usage our validation group to cipher nan accuracy of nan exemplary arsenic well. In this case, we don’t request gradients truthful we usage pinch torch.no_grad() for faster evaluation

We tin spot nan output arsenic follows:

Training Loss and Validation Accuracy

Training Loss and Validation Accuracy

As we tin see, nan nonaccomplishment is decreasing pinch each epoch which shows that our exemplary is so learning. Note that this nonaccomplishment is connected nan training set, and if nan nonaccomplishment is measurement excessively mini it tin bespeak overfitting. This is why we are utilizing nan validation group arsenic well. The accuracy seems to beryllium expanding connected validation group which indicates that location is an improbable chance of immoderate overfitting. Let’s now trial our exemplary to spot really it performs.

Testing

Now, we spot really our exemplary performs connected unseen data:

with torch.no_grad(): correct = 0 full = 0 for images, labels successful test_loader: images = images.to(device) labels = labels.to(device) outputs = model(images) _, predicted = torch.max(outputs.data, 1) full += labels.size(0) correct += (predicted == labels).sum().item() del images, labels, outputs print('Accuracy of nan web connected nan {} trial images: {} %'.format(10000, 100 * correct / total))

Note that nan codification is precisely nan aforesaid arsenic for our validation purposes.

Using nan model, and training for only 6 epochs, we look to get astir 78.8% accuracy connected nan validation group which seems bully enough

Testing Accuracy

Testing Accuracy

Conclusion

Let’s now reason what we did successful this article:

  • We started by knowing nan architecture and different kinds of layers successful nan AlexNet model
  • Next, we loaded and pre-processed nan CIFAR10 dataset utilizing torchvision
  • Then, we utilized PyTorch to build our AlexNet exemplary from scratch
  • Finally, we trained and tested our exemplary connected nan CIFAR10 dataset, and nan exemplary seemed to execute good connected nan trial dataset pinch minimal training (6 epochs)

Future Work

Using this article, you get a bully preamble and hand-on learning but you’ll study overmuch much if you widen this and spot what you tin do else:

  • You tin effort utilizing different datasets. One specified dataset is CIFAR100 which is an hold of nan CIFAR10 dataset pinch 100 classes
  • You tin research pinch different hyperparameters and spot nan champion operation of them for nan model
  • Finally, you tin effort adding aliases removing layers from nan dataset to spot their effect connected nan capacity of nan model
More