NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance

Imagine you are trying to teach a very smart, but slightly confused, robot how to read a textbook written in Bengali. You want this robot to be able to answer questions about the book. But there's a catch: sometimes the book doesn't have the answer to the question you ask.

If the robot is too eager, it might just make up a fake answer to look smart. This is called "hallucinating," and in a school setting, that's a disaster. It could confuse a student or make them lose trust in the system.

This paper introduces a new tool called NCTB-QA to solve this problem. Here is the story of how they built it and what they found, explained simply:

1. The Problem: The "Eager Beaver" Robot

Before this study, most question-answering systems for Bengali were like eager beavers. They were trained only on questions where the answer was in the text. So, when they encountered a question the text couldn't answer, they panicked and guessed anyway.

Also, the existing datasets (collections of practice questions) were too small. Imagine trying to learn to play the piano by practicing on just three songs. You wouldn't be ready for a real concert. The old Bengali datasets were like those three songs—too small and too simple to train a really smart AI.

2. The Solution: The "NCTB-QA" Training Ground

The researchers built a massive new training ground called NCTB-QA. Think of it as a giant library of 50 official textbooks from Bangladesh's National Curriculum and Textbook Board, covering grades 1 through 10.

They didn't just copy-paste the books; they turned them into a sophisticated video game for AI:

The Scale: They created nearly 88,000 question-and-answer pairs. That's huge!
The Twist (The "Trick" Questions): This is the most important part. They made sure that about 43% of the questions were "trick questions." These are questions where the answer is not in the text.
- Analogy: Imagine a teacher asking, "What is the capital of France?" while showing a picture of a banana. A good student should say, "I don't know, that's not in the picture." A bad student would guess "Paris" anyway. NCTB-QA forces the AI to learn to say, "I don't know," when the answer isn't there.
The Distractors: They even added "plausible distractors." These are trick questions that almost have the answer, but not quite. It's like asking, "Who won the 2024 World Cup?" when the text only talks about the 2022 World Cup. The AI has to be sharp enough to spot the difference.

3. The Experiment: Teaching the Robots

The researchers took three famous AI models (think of them as three different types of student brains: BERT, RoBERTa, and ELECTRA) and put them through a training camp using this new dataset.

Before Training: The robots were okay at reading, but they were terrible at knowing when not to answer. They were confident but often wrong.
After Training: They studied the NCTB-QA dataset.
- The Result: The improvement was massive. One model, BERT, went from being barely able to answer correctly (15% success) to being very reliable (62% success). That's a 313% improvement!
- The "Don't Know" Skill: The robots learned to stop guessing. When the answer wasn't in the text, they successfully admitted they didn't know, rather than making things up.

4. Why This Matters

This isn't just about getting better scores on a test. It's about trust.

For Students: If a student uses an AI tutor to study for a math or science exam, they need to know that if the AI says "I don't know," it's because the answer really isn't there, not because the AI is broken.
For Low-Resource Languages: Bengali is spoken by over 230 million people, but in the world of AI, it's considered "low-resource" (not enough data). This paper shows that by using official textbooks and creating smart "trick" questions, we can build powerful AI tools for these languages without needing millions of dollars in data.

The Bottom Line

The researchers built a giant, tricky practice test based on real school books. They used it to teach AI models how to read Bengali and how to know when to stop talking. The result is a much smarter, more honest AI that won't try to fake its way through a student's homework.

In short: They taught the AI to say "I don't know" when it's the right thing to do, making it a much safer and more useful tool for education.

NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance

1. The Problem: The "Eager Beaver" Robot

2. The Solution: The "NCTB-QA" Training Ground

3. The Experiment: Teaching the Robots

4. Why This Matters

The Bottom Line

1. Problem Statement

2. Methodology

A. Dataset Construction (NCTB-QA)

B. Experimental Setup

3. Key Contributions

4. Results

Overall Performance

Performance by Question Type

5. Significance and Future Directions

NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance

1. The Problem: The "Eager Beaver" Robot

2. The Solution: The "NCTB-QA" Training Ground

3. The Experiment: Teaching the Robots

4. Why This Matters

The Bottom Line

1. Problem Statement

2. Methodology

A. Dataset Construction (NCTB-QA)

B. Experimental Setup

3. Key Contributions

4. Results

Overall Performance

Performance by Question Type

5. Significance and Future Directions

More like this

Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

Markovian Generation Chains in Large Language Models