Training Models with Limited Data: Techniques for Transfer Learning

Training machine learning models often requires vast amounts of data, but what if you only have a small dataset to work with? Whether you’re a data scientist at a retail company analyzing product reviews or an engineer in healthcare building a diagnostic tool, limited data can pose a significant challenge. Transfer learning offers a powerful solution, enabling you to leverage pre-trained models and adapt them to your specific task with minimal data.

In this article, we’ll explore transfer learning techniques for training models with limited data, focusing on methods like fine-tuning, feature extraction, and domain adaptation, and showcasing real-world applications. We’ll follow two European companies—a Dutch retail firm classifying product reviews and a German healthcare startup detecting anomalies in medical images—to see how transfer learning helps overcome the challenge of limited data.

What You’ll Learn

  • What transfer learning is and why it’s ideal for limited data

  • Key techniques: fine-tuning, feature extraction, and domain adaptation

  • Real-world applications through the lens of two companies

  • Practical tips and additional methods for effective transfer learning

What Is Transfer Learning and Why Use It?

Transfer learning involves taking a model trained on a large, general dataset and adapting it to a smaller, specific task. It’s like borrowing knowledge from an expert and customizing it for your needs. This approach is particularly valuable when you have limited data, as it reduces training time, requires fewer labeled examples, and often outperforms training from scratch.

For example, a Dutch retail company, ReviewBoost, had only 800 labeled product reviews to classify as positive or negative. Training a model from scratch on such a small dataset would likely lead to overfitting, where the model memorizes the data instead of learning general patterns. Instead, they used transfer learning with a pre-trained model, leveraging its general knowledge of language to achieve better results with their limited data. Similarly, a German healthcare startup, MedScan, used transfer learning to detect anomalies in medical images with just 1,000 labeled examples, avoiding the need for a massive dataset.

Transfer Learning Techniques for Limited Data

Let’s explore three core transfer learning techniques—fine-tuning, feature extraction, and domain adaptation—using ReviewBoost and MedScan as examples.

Fine-Tuning: Adapting the Model to Your Task

Fine-tuning takes a pre-trained model and retrains some of its layers on your dataset to better fit your specific task. You typically freeze the early layers, which capture general features (like edges or shapes in images), and retrain the later layers to learn task-specific patterns. This method updates the weights of the new layers in your model.

MedScan used fine-tuning to detect anomalies in their medical images. They started with a pre-trained VGG16 model, which had been trained on millions of images from ImageNet. Using TensorFlow, they froze the first 10 layers to retain the model’s ability to recognize general features, then retrained the remaining layers on their medical images. After fine-tuning, their model achieved a 92% accuracy, significantly better than the 65% accuracy they obtained when training a model from scratch.

Steps to Implement:

  • Load a pre-trained model (e.g., VGG16) without top layers.

  • Freeze early layers to preserve general features.

  • Add new layers for your task (e.g., a classifier).

  • Train the model on your dataset with a small learning rate.

Here’s how MedScan could have implemented fine-tuning in TensorFlow:


# Load the pre-trained VGG16 model without the top layers
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze the first 10 layers to retain general features
for layer in base_model.layers[:10]:
    layer.trainable = False

# Add new layers for the specific task (anomaly detection)
x = Flatten()(base_model.output)
x = Dense(256, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)  # Binary classification (normal vs. anomaly)

# Create the new model
model = Model(inputs=base_model.input, outputs=x)

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

Feature Extraction: Using Pre-Trained Features Without Retraining

Feature extraction uses a pre-trained model as a fixed "feature extractor" without changing its weights. You pass your data through the model to extract features (like embeddings or high-level representations) and then train a separate, simpler model on those features for your task. This method essentially uses two models: the first extracts features and passes them to the second, which is trained and updates its weights.

ReviewBoost used feature extraction to classify their product reviews as positive or negative. They leveraged BERT, a pre-trained language model available through Hugging Face Transformers, to extract text embeddings from the reviews. These embeddings captured the semantic meaning of the text, which they then used as input to train a simple logistic regression classifier. This approach yielded an F1-score of 0.88—much better than the 0.72 F1-score achieved by training a model from scratch.

Steps to Implement:

  • Load a pre-trained model (e.g., BERT) and freeze all layers.

  • Pass your data through the model to extract features (e.g., embeddings).

  • Train a simpler model (e.g., logistic regression) on the extracted features.

  • Evaluate performance using a validation set.

Here’s how ReviewBoost could have implemented feature extraction using BERT and scikit-learn’s logistic regression in Python:


import numpy as np
from transformers import BertTokenizer, TFBertModel
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score

# Load BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = TFBertModel.from_pretrained('bert-base-uncased')

# Freeze BERT model to prevent weight updates
bert_model.trainable = False

# Example function to extract embeddings from reviews
def extract_bert_embeddings(reviews, max_length=128):
    embeddings = []
    for review in reviews:
        inputs = tokenizer(review, return_tensors="tf", max_length=max_length, padding="max_length", truncation=True)
        outputs = bert_model(**inputs)
        embedding = outputs.last_hidden_state[:, 0, :].numpy()  # CLS token embedding
        embeddings.append(embedding[0])
    return np.array(embeddings)

# Example reviews and labels (replace with your dataset)
reviews = ["This product is great!", "I hated this item.", "Amazing quality!", "Very disappointing."]
labels = [1, 0, 1, 0]  # 1: positive, 0: negative

# Extract embeddings
embeddings = extract_bert_embeddings(reviews)

# Train a logistic regression classifier on the embeddings
clf = LogisticRegression()
clf.fit(embeddings, labels)

# Predict and evaluate (example with same data for simplicity)
predictions = clf.predict(embeddings)
f1 = f1_score(labels, predictions)
print(f"F1-score: {f1}")

Domain Adaptation: Aligning Source and Target Domains

Domain adaptation aligns the feature distributions of the source domain (pre-trained model’s data) and target domain (your data) to handle domain differences. This technique ensures the model performs well despite domain shifts by making the features extracted from both domains more similar, without adding new convolutional layers.

MedScan used domain-adversarial training to adapt VGG16 for detecting pneumonia in chest X-rays, aligning features between general images (source) and X-rays (target). They added an adversarial network to make the features indistinguishable between domains, improving their anomaly detection accuracy from 78% to 85%. For a deeper dive into domain adaptation, including a detailed code example, check out our article Domain Adaptation in Deep Learning: Bridging the Gap Between Domains.

Steps to Implement:

  • Load a pre-trained model and split it into a feature extractor and classifier.

  • Add a domain classifier to predict the domain (source vs. target).

  • Train with two goals: minimize domain differences (adversarial loss) and maximize task accuracy.

  • Use a gradient reversal layer to balance the objectives.

  • Fine-tune on your target dataset, monitoring performance.

Can You Combine These Techniques?

Yes, combining fine-tuning, feature extraction, and domain adaptation can enhance performance in various scenarios:

  • Feature Extraction + Fine-Tuning: Start with feature extraction to get stable features, then fine-tune for better task adaptation when your dataset grows over time. Feature extraction uses two models—the first extracts features without updating its weights, while the second model is trained—and fine-tuning updates the weights of the pre-trained model for better task adaptation. This combination is widely applied in practice, offering a standard and proven approach in real-world projects, especially in NLP with models like BERT and in computer vision with ResNet or VGG.

  • Domain Adaptation + Fine-Tuning: Align domains first, then fine-tune for your task, which is ideal for significant domain shifts. Domain adaptation focuses on aligning domains, while fine-tuning updates the model’s weights for the task. This approach is often used in specialized projects, such as in medicine or autonomous systems, where domain adaptation, though frequently a research topic, is actively applied alongside fine-tuning to achieve robust results.

  • Feature Extraction + Domain Adaptation: Extract features, align domains, and train a classifier, suitable for limited computational resources with domain mismatches. Feature extraction keeps the pre-trained model fixed, while domain adaptation aligns the extracted features across domains. This combination is more experimental, often seen in academic research, but can be applied in real scenarios where resources are constrained, and domain differences are moderate.

Real-World Applications of Transfer Learning

Transfer learning’s ability to work with limited data makes it valuable across industries. Let’s see how ReviewBoost and MedScan applied these techniques to solve real-world problems.

For ReviewBoost, the Dutch retail company, transfer learning enabled them to improve their product recommendation system. By classifying their 800 reviews as positive or negative, they used feature extraction with BERT to identify products with consistently positive feedback. This improved their recommendation accuracy by 25%, helping them suggest better products to customers and boost sales.

MedScan used transfer learning to advance medical diagnostics. With their 1,000 labeled medical images, they applied fine-tuning and domain adaptation to a pre-trained VGG16 model, achieving a 92% accuracy in detecting anomalies. This high performance allowed them to assist doctors in identifying potential issues more effectively, reducing diagnostic errors and improving patient outcomes.

Enhancing Transfer Learning: Additional Techniques and Practical Tips

To maximize the effectiveness of transfer learning with limited data, you can explore additional methods and best practices:

  • Few-Shot Learning: Recognize new classes with 1–5 examples using meta-learning (e.g., MAML) or matching methods (e.g., Relation Networks).

  • Self-Supervised Learning: Pre-train on unlabeled data with tasks like predicting the next word (e.g., BERT), then fine-tune on your labeled dataset.

  • Data Augmentation: Expand your dataset with transformations—rotate images for MedScan, or use synonym replacement for ReviewBoost’s reviews.

  • Synthetic Data Generation: Generate synthetic examples using GANs, especially in healthcare where real data is scarce, as MedScan could have done.

  • Prompt Tuning and Adapter Layers (for NLP): Fine-tune efficiently by training small adapter layers or prefixes, reducing computational cost for ReviewBoost’s task.

  • Choose the Right Model: Use BERT for text tasks (ReviewBoost) or ResNet for images (MedScan).

  • Monitor Overfitting: Use validation sets and early stopping to avoid overfitting.

  • Leverage Cloud Resources: Use platforms like Google Colab for resource-intensive training, as MedScan did.

  • Start Small: Begin with feature extraction if your dataset is small, as ReviewBoost did.

  • Use Open-Source Tools: Access pre-trained models via Hugging Face Transformers, TensorFlow Hub, or PyTorch.

Challenges and Solutions in Transfer Learning

While transfer learning is powerful, it comes with challenges. Here’s how ReviewBoost and MedScan addressed them:

  • Domain Mismatch: MedScan encountered a mismatch between their medical images and the general images VGG16 was trained on. They used domain-adversarial training to bridge this gap, as mentioned earlier.

  • Overfitting: ReviewBoost faced the risk of overfitting during their classification task. They mitigated this by paraphrasing reviews to create synthetic examples and applying dropout during training.

  • Computational Resources: Fine-tuning large models like VGG16 can be resource-intensive. MedScan leveraged cloud platforms like Google Colab to access the necessary computational power for training.

Bringing It All Together

Transfer learning empowers you to train models with limited data by leveraging pre-trained knowledge. Techniques like fine-tuning, feature extraction, and domain adaptation—along with enhancements like few-shot learning and data augmentation—allow you to adapt models to your specific task, delivering high performance with minimal data. ReviewBoost improved product recommendations, while MedScan enhanced medical diagnostics—both with small datasets.

Whether you’re classifying reviews or detecting anomalies, transfer learning can help you achieve your goals. Start with a pre-trained model, apply the right techniques, and unlock the potential of your limited data.

Previous
Previous

Data Mesh vs. Data Fabric: The Future of Data Management

Next
Next

Domain Adaptation in Deep Learning: Bridging the Gap Between Domains