Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

This paper demonstrates that for resource-constrained single-label text classification, fine-tuning causal LLMs with a classification head on final-token embeddings is significantly more parameter-efficient than instruction tuning while achieving comparable or superior performance to both instruction-tuned LLMs and domain-specific BERT models.

Original authors: Amirhossein Yousefiramandi, Ciaran Cooney

Published 2026-05-25✓ Author reviewed
📖 5 min read🧠 Deep dive

Original authors: Amirhossein Yousefiramandi, Ciaran Cooney

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a giant, incredibly smart library assistant (a Large Language Model, or LLM) who has read almost everything in the world. You want to hire this assistant to sort a massive pile of patent documents into specific categories. The problem? This assistant is huge, expensive to run, and usually trained to write stories, not sort files.

This paper is a guide on how to teach this giant assistant to sort files efficiently, using just one standard computer graphics card (GPU) instead of a supercomputer. The authors tested two different ways to train the assistant and found that one method is much better than the other for this specific job.

Here is the breakdown of their findings using simple analogies:

The Two Training Methods

The researchers tried two different "training camps" for the assistant:

1. The "File Folder" Method (Embedding-Based)

  • How it works: Imagine you ask the assistant to read a document and then hand you a single, perfect summary note written on the last page. You then attach a small, simple label maker (a "classification head") to that note to decide which folder the document goes into.
  • The trick: They didn't retrain the whole assistant. They just taught the assistant how to write that one perfect summary note and how to use the label maker. They used a technique called "LoRA" (Low-Rank Adaptation), which is like giving the assistant a set of sticky notes to write on instead of rewriting their entire brain.
  • Result: This method was incredibly fast, cheap, and accurate. It used very few "trainable" resources (like a small budget) but got the job done perfectly.

2. The "Chatbot" Method (Instruction-Based)

  • How it works: Instead of asking for a summary note, you talk to the assistant like a chatbot. You say, "Here is a document. Please tell me what category it belongs to." The assistant then has to type out the answer word by word.
  • The trick: This requires the assistant to learn how to follow instructions and generate text in a specific format.
  • Result: This method was slower and required a much larger budget (more "trainable" resources) to get good results. It worked okay for complex tasks with many categories, but it was often picky about how you asked the question. If the prompt was slightly off, the assistant might get confused or write extra words that broke the system.

The Big Showdown: What They Found

The authors tested these methods on patent data (legal documents about inventions) and compared them to older, smaller models (like BERT) that were built specifically for sorting tasks.

  • For Single-Label Sorting (One category per document):
    The "File Folder" method won hands down. It matched or even beat the older, specialized models and the "Chatbot" method, but it did so while using 10 to 30 times fewer resources. It was like using a Swiss Army knife to cut a steak: it worked just as well as a chef's knife but was much lighter and cheaper to carry.

  • For Multi-Label Sorting (Multiple categories per document):
    The "Chatbot" method had a slight edge, but only if you were willing to spend a lot more money on training (using a huge budget of resources). Even then, the "File Folder" method was still very competitive.

  • Speed and Efficiency:
    The "File Folder" method was much faster at both training and running. The "Chatbot" method was slower because it had to "think" and type out the answer letter by letter, whereas the "File Folder" method just looked at the summary note and clicked a button.

The "Magic" of the Small Budget

One of the coolest findings is that you don't need a massive, expensive model to get great results.

  • They used a relatively small model (3 Billion parameters) with the "File Folder" method and it beat the "Chatbot" method using a much larger model.
  • They even tested the "Chatbot" method on the most expensive, state-of-the-art models available from big tech companies (like GPT-5 and Claude Opus) without training them at all. Even these super-smart, frozen models couldn't beat the small, trained "File Folder" model. It's like a well-trained local mechanic beating a brand-new, untrained Formula 1 car in a specific repair job.

The Catch (Limitations)

The paper is honest about where this method isn't perfect:

  • Speed vs. Accuracy: While the "File Folder" method is great, it is still about 20 times slower than the older, specialized models (BERT) when it comes to pure speed. If you need to sort millions of documents per second, the older models are still the kings of speed.
  • Statistical Confidence: The "File Folder" method was numerically better, but the difference wasn't statistically "proven" to be huge in every single test. It's consistently better, but the margin of victory is sometimes small.
  • Training Instability: Sometimes, the "File Folder" method would fail to learn if the random starting point (the "seed") was unlucky, requiring the researchers to try a few times to get a good result.

The Bottom Line

If you need to sort text documents (like patents) and you have limited computer power (like a single graphics card), the best strategy is to treat the giant AI model like a feature extractor (the "File Folder" method). Don't try to make it chat or write essays; just ask it to summarize the document and attach a simple label maker. This approach is cheaper, faster, and often more accurate than trying to teach the AI to follow complex instructions or using older, specialized models.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →