Applying Parameter-Efficient Fine-Tuning with Hugging Face

By Gemma Lara Savill
Published at August 20, 2025

#AI #LLM #Machine learning #Software

This is a follow-up to my previous post, where I explored how ChatGPT's Study Mode helped me understand foundational AI concepts. Now, I'm ready to put that theory into practice.

In this project, I'm applying LoRA (Low-Rank Adaptation) to fine-tune DistilBERT for text classification on the AG News dataset. The goal is to show how we can drastically improve a model's performance without the massive computational cost of full-scale training.

From Text to Tokens: The First Step

Before we can train a model, we need to convert human language into something a machine can understand. This process is called tokenization. Think of tokens as the fundamental building blocks of text for a transformer model-they can be words, subwords, or punctuation.

Here's a quick look at how the Hugging Face BertTokenizer handles this:

from transformers import BertTokenizer

# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenize a sentence
sentence = "I heart Generative AI"
tokens = tokenizer.tokenize(sentence)
print("Tokens:", tokens)
# ['i', 'heart', 'genera', '##tive', 'ai']

# Convert tokens to ids
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print("Token IDs:", token_ids)
# [1045, 2540, 11416, 6024, 9932]

Notice how "Generative" gets split into two tokens: genera and ##tive. This is a clever way for the model to handle both common and rare words efficiently. The model recognizes these subwords and assigns them specific numerical token IDs (e.g., 11416 and 6024). In my project, this raw text is converted into a sequence of these numerical IDs, and the original text is then discarded, as it's no longer needed for training.

The dataset used for this project is the AG News corpus. It is a widely used benchmark for text classification, containing over 120,000 news articles organized into four distinct categories: World, Sports, Business, and Sci/Tech. This dataset is perfect for fine-tuning because it provides a clear, well-structured task for the model to learn: identifying the correct category for each news headline and description. This makes it a great way to test the model's ability to specialize and improve its performance on a specific, real-world task.

The Baseline: Why Fine-Tuning is Necessary

The base DistilBERT model is a fantastic general-purpose language model, but it isn't specifically trained to classify news articles. When I tested it on the AG News dataset, its performance was almost random, with a Base Model Accuracy of just ~22%.

Base Model Accuracy: ~22%

This poor initial performance highlights a crucial point: a generalist model often needs to be specialized for a specific task. And that's where parameter-efficient fine-tuning (PEFT) comes in.

So, what exactly is DistilBERT? It's a smaller, faster, and more efficient version of the well-known BERT (Bidirectional Encoder Representations from Transformers) model. While BERT is a powerful model, its large size can make it slow and computationally expensive. To solve this, DistilBERT was created using a technique called knowledge distillation.

Think of it as a student learning from a teacher. The large, complex BERT model is the "teacher," and DistilBERT is the "student." The student model is trained to mimic the teacher's behavior, learning to make similar predictions and understand the underlying language patterns. This process allows DistilBERT to retain 97% of BERT's original performance while being 40% smaller and 60% faster. For this project, DistilBERT is the perfect baseline because it gives us a high-performing model to start with, while still being efficient enough to fine-tune on a consumer-grade GPU.

The LoRA Advantage: Fine-Tuning with Minimal Resources

LoRA is a game-changer because it lets us adapt a foundation model without altering its core weights. Instead, we add and train only a small number of new, lightweight adapters. The original model weights are "frozen," which saves an incredible amount of computational power and memory.

In my notebook, this process looked like this:

Load the DistilBERT the base model.
Apply the LoRA configuration to add trainable adapters.
Fine-tune the model on the tokenized AG News dataset.

The result? The model learned to classify news articles with impressive accuracy, all while using a fraction of the resources required for traditional fine-tuning.

The Final Results

After fine-tuning with LoRA, the model's performance skyrocketed:

LoRA Fine-Tuned Model Accuracy: ~90% 🚀

This dramatic improvement demonstrates the power of PEFT techniques. You don't need a supercomputer to get excellent results from a large model!

My Learning Environment: Colab vs. Local

A significant part of this project was figuring out the best environment. I used both Google Colab and my local machine with a MacBook GPU.

Google Colab: Great for a quick start, with pre-configured environments and easy access to a GPU.
Local machine: Gave me more control over dependencies and often provided faster training times, especially with my dedicated GPU. For my local setup, I used VS Code with the Python and Jupyter Notebook extensions. This gave me a full-featured coding environment where I could manage project dependencies easily using a virtual environment (venv), ensuring that the project's required libraries and their specific versions were isolated from my system's other Python projects. This setup provided a consistent and powerful workflow that was independent of any cloud service.

The project runs in both environments with the same notebook, thanks to careful environment and dependency management.

Key Takeaways

This project reinforced some critical lessons:

Tokenization is the non-negotiable first step in preparing text for a model.
Parameter-Efficient Fine-Tuning (PEFT), especially LoRA, is a powerful and accessible method for specializing large models.
You can achieve significant performance gains (from 22% to 90% accuracy!) with minimal resources.
The choice of environment-Colab or local-depends on your needs, but setting up a portable notebook is key for a smooth workflow.

I hope this walkthrough inspires you to try fine-tuning a model yourself! You can find the full code and project details on my GitHub repository.

Applying Parameter-Efficient Fine-Tuning with Hugging Face

From Text to Tokens: The First Step

The Baseline: Why Fine-Tuning is Necessary

The LoRA Advantage: Fine-Tuning with Minimal Resources

The Final Results

My Learning Environment: Colab vs. Local

Key Takeaways

More posts

Beyond testability: unobservable code, a silent warning

Selling your own finished software

How ChatGPT Study Mode Helped Me Fine-Tune a Foundation Model

Recomposition in Jetpack Compose. Imposing stability to improve your Android app performance.