PRAGMA: Revolut Foundation Model

Original authors: Maxim Ostroukhov, Ruslan Mikhailov, Vladimir Iashin, Artem Sokolov, Andrei Akshonov, Vitaly Protasov, Dmitrii Beloborodov, Vince Mullin, Roman Yokunda Enzmann, Georgios Kolovos, Jason Renders, Pavel N

Published 2026-04-13

📖 5 min read🧠 Deep dive

View on arXiv ↗PDF ↗

✨

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine a massive, chaotic library where every book is a person's entire financial life. These books aren't written in neat paragraphs; they are a jumbled mix of receipts, calendar notes, text messages, bank balances, and shopping lists, all written in different languages and formats. Some pages are missing, some are written in code, and the order changes every day.

For years, banks tried to understand these books by hiring a team of specialists for every single question. One team read the books to guess if you'd pay back a loan (Credit Scoring). Another team read them to spot a thief (Fraud Detection). A third team tried to guess what you'd buy next (Product Recommendations). Each team had to manually re-read the same messy pages, highlight different parts, and build their own unique filing system. It was slow, expensive, and they couldn't easily share what they learned with each other.

Enter PRAGMA: The "Financial Super-Reader."

The Revolut team built a new kind of AI called PRAGMA. Think of it not as a specialist, but as a genius librarian who has read every financial book in the world.

Here is how it works, broken down into simple concepts:

1. The Translator (Tokenization)

The problem with financial data is that it's messy. A bank transfer isn't just a sentence; it's a mix of numbers, dates, codes, and text.

The Old Way: Imagine trying to read a receipt by turning every number into a string of letters (e.g., "$100" becomes "dollar, one, zero, zero"). You lose the meaning of the number's size.
The PRAGMA Way: PRAGMA has a special translator. It looks at a transaction and says, "Ah, this is a Key (what happened?), a Value (how much?), and a Time (when?)." It treats "Amount: $100" not as text, but as a structured concept. It understands that "Metal Plan" is a specific status, not just random words. This allows the AI to "read" the financial language fluently without getting confused by the formatting.

2. The Two-Brain System (Architecture)

PRAGMA is designed with two distinct "brains" that work together:

Brain A (The Event Encoder): This brain reads the story. It looks at the sequence of events: "You bought coffee, then you transferred money, then you checked your balance." It understands the flow and timing of your life.
Brain B (The Profile Encoder): This brain reads the context. It looks at static facts that don't change often: "You are 25, you live in the UK, and you have a premium metal card."
The History Encoder: This is the conductor. It takes the story from Brain A and the context from Brain B and blends them together to create a single, deep understanding of who you are at this specific moment.

3. The Training Method (Masked Modelling)

How do you teach a computer to understand money without giving it the answers?
Imagine a game of "Financial Mad Libs."

The AI is shown a user's history, but it has to guess the missing pieces.
"The user transferred money to [BLANK] on [BLANK] for [BLANK]."
The AI has to use the surrounding clues to fill in the blanks.
By playing this game billions of times on billions of fake (but realistic) financial histories, the AI learns the hidden patterns of human behavior. It learns that people who buy a lot of coffee on weekends often have a certain spending pattern, or that a sudden drop in balance followed by a specific app click might signal fraud.

4. The "One-Size-Fits-All" Magic (Downstream Tasks)

Once PRAGMA has learned the language of money, it becomes a universal adapter.

The Old Way: To build a fraud detector, you had to build a new model from scratch.
The PRAGMA Way: You take the "Super-Reader," freeze its brain, and just attach a tiny, simple "decision head" on top.
- Need to predict Credit Score? Attach a "Loan" head.
- Need to predict Fraud? Attach a "Thief" head.
- Need to guess Lifetime Value? Attach a "Profit" head.

Because the heavy lifting (understanding the data) was already done during training, these new heads learn incredibly fast and perform better than the old specialist teams.

5. The Results: Why It Matters

The paper tested PRAGMA on six different banking tasks. The results were like finding a Swiss Army Knife that works better than six separate tools:

Credit Scoring: It became 130% better at spotting risky borrowers than the old methods.
Fraud Detection: It caught 65% more fraudsters while missing fewer legitimate transactions.
Efficiency: Instead of training six different massive models, the bank now trains one giant model and uses it for everything.

The One Weakness

The paper admits one limitation: Anti-Money Laundering (AML).
PRAGMA is great at reading one person's story. But money laundering often involves a network of people passing money back and forth. It's like trying to solve a conspiracy by reading only one suspect's diary; you miss the connections between the suspects. For this specific task, PRAGMA currently struggles because it doesn't "see" the whole network, just the individual.

The Big Picture

PRAGMA is a shift from "hiring a specialist for every job" to "hiring one genius who can do them all." It takes the messy, chaotic reality of how we use money and turns it into a clear, understandable signal. It proves that even in the rigid world of finance, a foundation model can learn the "grammar" of human behavior and apply it to solve almost any problem.

1. The Translator (Tokenization)

2. The Two-Brain System (Architecture)

3. The Training Method (Masked Modelling)

4. The "One-Size-Fits-All" Magic (Downstream Tasks)

5. The Results: Why It Matters

The One Weakness

The Big Picture

1. Problem Statement

2. Methodology

A. Data Representation & Tokenization

B. Model Architecture

C. Pre-training Strategy

D. Downstream Adaptation

3. Key Contributions

4. Results

5. Significance

PRAGMA: Revolut Foundation Model

1. The Translator (Tokenization)

2. The Two-Brain System (Architecture)

3. The Training Method (Masked Modelling)

4. The "One-Size-Fits-All" Magic (Downstream Tasks)

5. The Results: Why It Matters

The One Weakness

The Big Picture

1. Problem Statement

2. Methodology

A. Data Representation & Tokenization

B. Model Architecture

C. Pre-training Strategy

D. Downstream Adaptation

3. Key Contributions

4. Results

5. Significance

More like this