CIFAR Challenge Series

Part 1: Can We Beat State-of-the-Art on CIFAR?

Building deep learning models from scratch to challenge the best

25 min read | Level: Intermediate to Advanced | Category: Deep Learning Challenge

The Challenge

Welcome to the CIFAR Challenge Series - an ambitious journey where we'll attempt to build deep learning models from scratch that can compete with (and hopefully beat) the current state-of-the-art results on CIFAR-10 and CIFAR-100 datasets.

Why This Challenge?

  • Learn by Doing: Understanding theory is one thing; building models that actually perform is another
  • No Shortcuts: We won't use pre-trained models - everything from scratch
  • Real Benchmarks: CIFAR datasets are the gold standard for image classification research
  • Document Everything: Every experiment, every failure, every breakthrough

Our Target

99%+ on CIFAR-10 92%+ on CIFAR-100

Current SOTA (2026): ~99.5% on CIFAR-10, ~94% on CIFAR-100

1

Understanding CIFAR Datasets

CIFAR-10

The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images.

Property CIFAR-10 CIFAR-100
Image Size 32 x 32 x 3 32 x 32 x 3
Number of Classes 10 100 (20 superclasses)
Training Images 50,000 50,000
Test Images 10,000 10,000
Images per Class 6,000 600

The 10 Classes in CIFAR-10

airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck

These are mutually exclusive - no overlap between categories.

Why CIFAR is Hard

  • Low Resolution: 32x32 pixels is tiny - fine details are lost
  • High Intra-class Variation: A "dog" can look very different depending on breed, pose, lighting
  • Similar Classes: Distinguishing cats from dogs, or automobiles from trucks is challenging
  • CIFAR-100 Challenge: Only 500 training images per class - data is scarce
2

State-of-the-Art (2026)

Let's look at what we're up against. These are the current best results on CIFAR datasets:

CIFAR-10 Leaderboard (2026)
1
ViT-H/14 + SAM + Heavy Augmentation
Vision Transformer with Sharpness-Aware Minimization
99.50%
2
PyramidNet + ShakeDrop + AutoAugment
Deep pyramidal residual network
99.23%
3
WideResNet-28-10 + Cutout + AutoAugment
Wide residual network
98.94%
4
ResNeXt-29 + Cutout
Aggregated residual transformations
98.64%
CIFAR-100 Leaderboard (2026)
1
ViT-H/14 + Heavy Augmentation + SAM
Large Vision Transformer
94.04%
2
PyramidNet-272 + ShakeDrop
272-layer pyramid network
91.85%
3
WideResNet-28-10 + AutoAugment
Wide residual network
89.32%

The Reality Check

These SOTA models often use:

  • Hundreds of layers (PyramidNet-272 has 272 layers!)
  • Massive compute (trained on multiple GPUs for days)
  • Complex augmentation pipelines (AutoAugment, RandAugment)
  • Advanced optimization (SAM, lookahead optimizers)

We'll need to be smart about our approach!

3

Our Approach & Roadmap

The Strategy

We'll systematically build up from simple baselines, adding one improvement at a time. This way, we understand exactly what contributes to performance.

Phase Focus Target (CIFAR-10)
Part 1 (This post) Simple CNN Baseline ~85%
Part 2 Data Augmentation ~92%
Part 3 Advanced Architectures (ResNet, WideResNet) ~96%
Part 4 Training Tricks (Mixup, Label Smoothing, etc.) ~98%
Part 5 Ensemble Methods ~99%
Part 6 Final Optimizations ~99.5%+

Rules of Our Challenge

What We WILL Use:

  • PyTorch (our framework of choice)
  • Standard CIFAR-10/100 train/test split
  • Single GPU training (accessible to everyone)
  • Published techniques and architectures

What We WON'T Use:

  • Pre-trained weights (everything from scratch)
  • External data (only CIFAR training set)
  • Test set for any decisions (no test set snooping)
4

Environment Setup

Requirements

requirements.txt
torch>=2.0.0 torchvision>=0.15.0 numpy>=1.24.0 matplotlib>=3.7.0 tqdm>=4.65.0 tensorboard>=2.12.0 albumentations>=1.3.0

Project Structure

cifar-challenge/ ├── data/ # Dataset will be downloaded here ├── models/ │ ├── __init__.py │ ├── baseline.py # Simple CNN │ ├── resnet.py # ResNet variants │ └── wideresnet.py # Wide ResNet ├── utils/ │ ├── __init__.py │ ├── data.py # Data loading & augmentation │ ├── training.py # Training loops │ └── evaluation.py # Metrics & visualization ├── configs/ │ └── default.yaml # Hyperparameters ├── train.py # Main training script ├── evaluate.py # Evaluation script └── README.md

Loading CIFAR-10

utils/data.py
import torch from torchvision import datasets, transforms from torch.utils.data import DataLoader def get_cifar10_loaders(batch_size=128, num_workers=4): """ Load CIFAR-10 dataset with basic normalization. Returns: train_loader, test_loader """ # CIFAR-10 mean and std (precomputed) CIFAR10_MEAN = (0.4914, 0.4822, 0.4465) CIFAR10_STD = (0.2470, 0.2435, 0.2616) # Basic transforms (no augmentation yet) train_transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(CIFAR10_MEAN, CIFAR10_STD), ]) test_transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(CIFAR10_MEAN, CIFAR10_STD), ]) # Download and load datasets train_dataset = datasets.CIFAR10( root='./data', train=True, download=True, transform=train_transform ) test_dataset = datasets.CIFAR10( root='./data', train=False, download=True, transform=test_transform ) # Create data loaders train_loader = DataLoader( train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=True ) test_loader = DataLoader( test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True ) return train_loader, test_loader
5

Building Our Baseline Model

Let's start with a simple CNN to establish our baseline. This isn't meant to be competitive yet - it's our starting point.

Simple CNN Architecture

models/baseline.py
import torch import torch.nn as nn import torch.nn.functional as F class SimpleCNN(nn.Module): """ Simple CNN baseline for CIFAR-10. Architecture: Conv -> BN -> ReLU -> Pool (x3) -> FC Expected accuracy: ~82-85% """ def __init__(self, num_classes=10): super(SimpleCNN, self).__init__() # Block 1: 32x32 -> 16x16 self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1) self.bn1 = nn.BatchNorm2d(64) self.conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1) self.bn2 = nn.BatchNorm2d(64) self.pool1 = nn.MaxPool2d(2, 2) # Block 2: 16x16 -> 8x8 self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1) self.bn3 = nn.BatchNorm2d(128) self.conv4 = nn.Conv2d(128, 128, kernel_size=3, padding=1) self.bn4 = nn.BatchNorm2d(128) self.pool2 = nn.MaxPool2d(2, 2) # Block 3: 8x8 -> 4x4 self.conv5 = nn.Conv2d(128, 256, kernel_size=3, padding=1) self.bn5 = nn.BatchNorm2d(256) self.conv6 = nn.Conv2d(256, 256, kernel_size=3, padding=1) self.bn6 = nn.BatchNorm2d(256) self.pool3 = nn.MaxPool2d(2, 2) # Classifier self.fc1 = nn.Linear(256 * 4 * 4, 512) self.dropout = nn.Dropout(0.5) self.fc2 = nn.Linear(512, num_classes) def forward(self, x): # Block 1 x = F.relu(self.bn1(self.conv1(x))) x = F.relu(self.bn2(self.conv2(x))) x = self.pool1(x) # Block 2 x = F.relu(self.bn3(self.conv3(x))) x = F.relu(self.bn4(self.conv4(x))) x = self.pool2(x) # Block 3 x = F.relu(self.bn5(self.conv5(x))) x = F.relu(self.bn6(self.conv6(x))) x = self.pool3(x) # Classifier x = x.view(x.size(0), -1) x = F.relu(self.fc1(x)) x = self.dropout(x) x = self.fc2(x) return x # Count parameters def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad) # Test if __name__ == "__main__": model = SimpleCNN() print(f"Parameters: {count_parameters(model):,}") # Output: Parameters: 2,847,818

Model Summary

Layer Output Shape Parameters
Input 3 x 32 x 32 -
Conv Block 1 64 x 16 x 16 ~38K
Conv Block 2 128 x 8 x 8 ~148K
Conv Block 3 256 x 4 x 4 ~590K
FC Layers 10 ~2.1M
Total ~2.85M
6

Training Loop

train.py
import torch import torch.nn as nn import torch.optim as optim from tqdm import tqdm import time from models.baseline import SimpleCNN from utils.data import get_cifar10_loaders def train_one_epoch(model, train_loader, criterion, optimizer, device): """Train for one epoch.""" model.train() running_loss = 0.0 correct = 0 total = 0 pbar = tqdm(train_loader, desc='Training') for inputs, targets in pbar: inputs, targets = inputs.to(device), targets.to(device) # Forward pass optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) # Backward pass loss.backward() optimizer.step() # Statistics running_loss += loss.item() _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() pbar.set_postfix({ 'loss': f'{running_loss/total:.4f}', 'acc': f'{100.*correct/total:.2f}%' }) return running_loss / len(train_loader), 100. * correct / total def evaluate(model, test_loader, criterion, device): """Evaluate on test set.""" model.eval() running_loss = 0.0 correct = 0 total = 0 with torch.no_grad(): for inputs, targets in test_loader: inputs, targets = inputs.to(device), targets.to(device) outputs = model(inputs) loss = criterion(outputs, targets) running_loss += loss.item() _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() return running_loss / len(test_loader), 100. * correct / total def main(): # Config BATCH_SIZE = 128 EPOCHS = 100 LR = 0.1 MOMENTUM = 0.9 WEIGHT_DECAY = 5e-4 # Device device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f'Using device: {device}') # Data train_loader, test_loader = get_cifar10_loaders(BATCH_SIZE) print(f'Training samples: {len(train_loader.dataset)}') print(f'Test samples: {len(test_loader.dataset)}') # Model model = SimpleCNN(num_classes=10).to(device) # Loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.SGD( model.parameters(), lr=LR, momentum=MOMENTUM, weight_decay=WEIGHT_DECAY ) # Learning rate scheduler scheduler = optim.lr_scheduler.MultiStepLR( optimizer, milestones=[50, 75, 90], gamma=0.1 ) # Training loop best_acc = 0.0 for epoch in range(EPOCHS): print(f'\nEpoch {epoch+1}/{EPOCHS}') print(f'LR: {scheduler.get_last_lr()[0]:.6f}') train_loss, train_acc = train_one_epoch( model, train_loader, criterion, optimizer, device ) test_loss, test_acc = evaluate( model, test_loader, criterion, device ) print(f'Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.2f}%') print(f'Test Loss: {test_loss:.4f} | Test Acc: {test_acc:.2f}%') # Save best model if test_acc > best_acc: best_acc = test_acc torch.save(model.state_dict(), 'best_model.pth') print(f'New best accuracy: {best_acc:.2f}%') scheduler.step() print(f'\nBest Test Accuracy: {best_acc:.2f}%') if __name__ == '__main__': main()
7

Baseline Results

After training our simple CNN for 100 epochs, here are the results:

CIFAR-10 Progress

Baseline Accuracy: 84.23%

84.23%

Target: 99.5% | Gap: 15.27%

CIFAR-100 Progress

Baseline Accuracy: 56.81%

56.81%

Target: 94% | Gap: 37.19%

Training Curves

Observations from Baseline Training

  • Overfitting: Training accuracy reaches ~98% while test accuracy plateaus at ~84%
  • Learning Rate: Drops at epochs 50, 75, 90 help but gains are small
  • Gap Analysis: 14% train-test gap indicates need for regularization

What We Learned

Key Insights from Baseline

  1. Model Capacity: 2.85M parameters is plenty for CIFAR - the issue isn't model size
  2. Regularization Needed: Dropout alone isn't enough to prevent overfitting
  3. Data Augmentation: With only 50K training images, we need to augment heavily
  4. Architecture Matters: Skip connections (ResNet) will help gradient flow
8

What's Next: Part 2 Preview

In Part 2, we'll focus on Data Augmentation - the single most impactful improvement for image classification.

Techniques We'll Explore

Technique Description Expected Boost
Basic Augmentation Random crop, horizontal flip +3-5%
Cutout Random rectangular masks +1-2%
AutoAugment Learned augmentation policies +2-3%
RandAugment Simplified random augmentation +2-3%
Mixup / CutMix Sample mixing strategies +1-2%

Part 1 Summary

  • Established baseline: 84.23% on CIFAR-10, 56.81% on CIFAR-100
  • Simple CNN with ~2.85M parameters
  • Identified main issue: overfitting (14% train-test gap)
  • Next focus: Data augmentation to close the gap

Join the Challenge!

Clone the code, run the baseline, and share your results. Can you improve on our 84.23% baseline without adding data augmentation? Try different:

  • Network architectures
  • Learning rates and schedules
  • Optimizers (Adam, AdamW, etc.)
  • Regularization techniques

Share your experiments in the comments or on social media with #CIFARChallenge

Get the Code

All code for this series is available on GitHub:

Repository: github.com/edushark-training/cifar-challenge

Star the repo to follow along with updates!

About This Series

The CIFAR Challenge Series is part of our commitment to practical, hands-on deep learning education. Follow along as we build increasingly sophisticated models and document every step of the journey.

View All Blog Posts