Predictive Authoring for Brazilian Portuguese Augmentative and Alternative Communication

This paper proposes using the BERTimbau model to predict pictograms in Brazilian Portuguese Augmentative and Alternative Communication (AAC) systems, demonstrating that representing pictograms via captions yields the highest accuracy while also exploring the potential of using images for prediction.

Jayr Pereira, Rodrigo Nogueira, Cleber Zanchettin, Robson Fidalgo

Published 2026-03-04
📖 5 min read🧠 Deep dive

Here is an explanation of the paper, translated into simple language with some creative analogies to help visualize the concepts.

The Big Picture: Helping People Speak Without Words

Imagine a person who cannot speak out loud due to a disability. To communicate, they use a special "talking board" (an AAC system). This board is filled with thousands of pictures (pictograms) representing words like "eat," "ball," "happy," or "mom."

To say a sentence like "I want an apple," the user has to hunt through a giant grid of pictures, find the "I," then the "want," then the "apple," and tap them one by one. It's like trying to write a letter by digging through a massive box of Scrabble tiles to find the right letters, one by one. It takes a long time and can be frustrating.

The Goal of this Paper:
The researchers wanted to build a "smart assistant" for this talking board. Just like your phone predicts the next word you might type, they wanted a system that could look at the pictures the user has already tapped and say, "Hey, you probably want to tap the apple picture next!"

The Challenge: The "Dictionary" Problem

The tricky part is that these talking boards use pictures, not text. Computers are great at reading words, but they struggle to understand that a picture of a red fruit means the word "apple."

Furthermore, the researchers were working with Brazilian Portuguese. While there are huge databases of English text to teach computers, there wasn't a big library of Portuguese sentences specifically written by people using these picture boards. The computer needed to learn the "language of the pictures."

The Solution: Teaching a Robot to Read Pictures

The team used a powerful AI brain called BERTimbau (a version of the famous BERT model trained on Portuguese). Here is how they taught it to predict the next picture:

1. Building the Training Library (The Corpus)

You can't teach a chef to cook without ingredients. The researchers needed a "recipe book" of sentences.

  • Step A: They asked real experts (speech therapists and parents) to write down common sentences these users say.
  • Step B: They used a super-smart AI (GPT-3) to read those sentences and write thousands of new ones that sounded just like them.
  • Step C: They converted these text sentences back into picture sequences.
  • Analogy: Imagine teaching a dog to fetch. First, you show it a real ball (human sentences). Then, you ask a robot to draw thousands of pictures of balls (synthetic sentences) so the dog gets lots of practice.

2. The "Translation" Test

Now they had to teach the AI how to "see" a picture. They tested four different ways to describe a picture to the computer:

  • Method A (The Caption): Just use the word written under the picture (e.g., "Cat").
  • Method B (The Synonyms): Use a list of similar words (e.g., "Cat," "Feline," "Kitty").
  • Method C (The Definition): Use a dictionary definition (e.g., "A small domesticated carnivorous mammal").
  • Method D (The Image): Show the computer the actual picture file.

The Results: What Worked Best?

The researchers ran the AI through a test to see which method made the best predictions.

  • The Winner: Captions (The Words) and Synonyms.
    • The Analogy: It turns out the computer learns best when you tell it the name of the picture. If you say "Cat," it knows what to do. If you give it a list of synonyms ("Cat," "Feline"), it gets even better at guessing the context (lower "perplexity," which is a fancy way of saying the computer is less confused).
  • The Loser: Definitions and Images.
    • The Analogy: Trying to teach the computer by showing it the actual picture or a long dictionary definition was like trying to teach someone to drive by reading a manual on engine mechanics. It was too complicated and the computer didn't learn as fast. The "image" method was particularly bad because the computer's "brain" for reading text and its "brain" for seeing pictures speak different languages.

The Takeaway

The paper concludes that the best way to build a smart picture-predictor is to treat the pictures like words.

  • If you have a dictionary of synonyms: Use that! It helps the computer understand the meaning better.
  • If you don't: Just use the simple word written under the picture. It works almost as well and is much easier to set up.
  • Don't bother with the actual images for this specific task; it's too heavy and doesn't help the computer guess the next word.

Why This Matters

This research is like giving a new set of glasses to people who rely on picture boards. Instead of scrolling through hundreds of pictures to find the one they need, the system can now suggest the top 5 or 10 most likely pictures right at the top of the screen. This saves time, reduces frustration, and helps people with complex communication needs share their thoughts, feelings, and needs much faster.

In short: They taught a computer to speak "Picture Language" by translating pictures into words, and they found that the simplest translation (just the word under the picture) is often the most effective.