GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

Imagine you are the head librarian of a massive, chaotic library filled with 82 different sections of medical books. People keep walking in and shouting short, messy requests like, "My knee hurts," or "I feel dizzy after eating herbs." Your job is to instantly shout out exactly which of the 82 sections these requests belong to.

This paper is a report from a team (GATech) who built a robot librarian to solve this exact problem for Arabic medical queries. They tested two very different types of robots to see which one could sort the books best.

Here is the breakdown of their experiment, explained simply:

1. The Two Types of Robots

The team compared two different "brain" architectures:

The "All-Seeing" Robot (Bidirectional Encoder): Think of this robot as a person who can read a sentence backwards and forwards at the same time. If you say, "I have a headache and I took aspirin," this robot sees the connection between the pain and the medicine instantly, understanding the whole picture as a single unit. They used a specialized version called AraBERT, which is like a librarian who has read every Arabic medical book ever written.
The "One-Way" Robot (Causal Decoder): This robot is like a person reading a story one word at a time, from left to right, trying to guess what word comes next. These are the massive "Generative AI" models (like Llama or Qwen) that are famous for writing essays or chatting. They are huge and smart, but they are trained to predict the future, not to categorize the present.

2. The Big Problem: A Messy Library

The team faced two huge hurdles:

The "Long-Tail" Problem: Some sections of the library had hundreds of books (like "General Medicine"), but others had only 7 books (like "IVF" or "Vascular Surgery"). It's like trying to learn to recognize a rare bird when you've only seen it seven times in your life.
The "Confused Labels" Problem: Sometimes, the books were misfiled. A note about a skin rash might be labeled "General Medicine" instead of "Dermatology." The robot had to learn to ignore these mistakes and find the real meaning.

3. The Winning Strategy: The "Hybrid" Librarian

The team didn't just use the standard AraBERT robot; they gave it a special upgrade kit to handle the mess:

The "Spotlight" and the "Average" (Hybrid Pooling):
- Imagine the robot has two ways of looking at a sentence.
- Method A (Mean): It takes the "average" feeling of the whole sentence to get the general vibe.
- Method B (Attention): It puts on a spotlight to zoom in on the most important words (like "heart attack" or "fever") and ignores the boring filler words.
- They combined these two views to get a super-rich understanding of the query.
The "Practice Run" (Multi-Sample Dropout):
- To stop the robot from memorizing the few examples of rare diseases, they made it practice the same task five times with different "blindfolds" (randomly ignoring parts of the data). This forced the robot to learn the core concepts rather than just memorizing specific examples.
The "Soft" Teacher (Label Smoothing):
- Since the labels were sometimes wrong, they taught the robot not to be 100% confident in the labels it was given. Instead of saying "This is 100% Dermatology," it learned to say, "This is mostly Dermatology, but maybe a little bit of General Medicine." This helped it handle the messy data better.

4. The Results: Why the "Big" Robot Lost

The team was surprised. They thought the massive, super-smart "One-Way" robots (like Llama 3.3 70B) would win because they are so big and powerful. They didn't.

The "One-Way" Robot failed because it was too focused on the end of the sentence. If a patient mentioned a symptom at the very beginning of a long query, the big robot forgot it by the time it finished reading. It was also too "general." It knew what "skin disease" meant, but it didn't understand the specific, tiny differences between the 82 specific categories the library required.
The "All-Seeing" Robot won because it could see the whole picture at once. It was better at compressing the messy, short medical queries into a clear, precise category.

5. The "Re-Ranking" Experiment

They tried a clever trick: Let the small robot pick the top 15 guesses, then ask the giant "One-Way" robot to pick the winner from those 15.

Result: This made things worse. The giant robot was too smart for its own good; it used its general knowledge to pick a "logical" answer that was actually wrong according to the library's specific, weird rules. The small, specialized robot knew the specific rules better.

The Bottom Line

In the world of sorting specific, messy medical data:

Specialized, focused tools (Bidirectional Encoders) are better than massive, general-purpose tools (Causal Decoders).
Just because a robot is huge and can write poetry doesn't mean it's good at filing a specific medical form.
By combining different ways of looking at words and teaching the robot to be humble about the messy data, the team built the best "librarian" for this specific job.

Final Score: The specialized robot (AraBERT) scored a 0.39 on a difficult scale, while the giant robot trying to help actually dragged the score down to 0.30.

Model Configuration	Macro-F1 Score
AraBERTv2 (Proposed)	0.3934
multilingual-E5-large	0.3804
CamelBERT	0.3603
AraBERTv2 + Llama 3.3 70B (Re-ranking)	0.3035
Qwen 3 3B (Feature Extraction)	0.1278

GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

1. The Two Types of Robots

2. The Big Problem: A Messy Library

3. The Winning Strategy: The "Hybrid" Librarian

4. The Results: Why the "Big" Robot Lost

5. The "Re-Ranking" Experiment

The Bottom Line

1. Problem Definition

2. Methodology

Primary Architecture: Enhanced AraBERTv2

Comparative Approaches (Benchmarks)

3. Key Contributions

4. Results

5. Significance and Conclusion

GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

1. The Two Types of Robots

2. The Big Problem: A Messy Library

3. The Winning Strategy: The "Hybrid" Librarian

4. The Results: Why the "Big" Robot Lost

5. The "Re-Ranking" Experiment

The Bottom Line

1. Problem Definition

2. Methodology

Primary Architecture: Enhanced AraBERTv2

Comparative Approaches (Benchmarks)

3. Key Contributions

4. Results

5. Significance and Conclusion

More like this

One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations

MultiGraSCCo: A Multilingual Anonymization Benchmark with Annotations of Personal Identifiers

ConFu: Contemplate the Future for Better Speculative Sampling

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance