Deterministic Preprocessing and Interpretable Fuzzy Banding for Cost-per-Student Reporting from Extracted Records

This paper presents a deterministic, rule-based Python workflow that processes administrative academic data to generate reproducible cost-per-student reports enriched with an interpretable fuzzy banding system for classifying school-year performance into Low, Medium, or High categories.

Shane Lee, Stella Ng

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are the head chef of a massive, chaotic kitchen (a university). Every week, you get a giant, messy box of receipts from your suppliers (the "Casual Academic Database"). These receipts tell you how much you spent on temporary cooks and how many students they fed.

The problem? The receipts are messy. Some are torn, some have blank prices, and some are just notes saying "Total" or "Sum" which aren't actual receipts. If you try to calculate the "cost per student" just by glancing at this box, you might make a mistake, or worse, you might make a different mistake than your colleague did last week.

This paper describes a robotic kitchen assistant (a computer script) that solves this problem. Here is how it works, broken down into simple parts:

1. The "Perfect Memory" Robot (Deterministic Preprocessing)

The authors built a robot named cad_processor.py. Its most important rule is: "If I see the exact same box of receipts, I will always produce the exact same report."

  • The Fingerprint: Before the robot even starts cooking, it takes a digital "fingerprint" (a SHA-256 hash) of the entire box of receipts. This is like taking a photo of the receipt box so that if anyone tries to swap a receipt later, the photo won't match.
  • The Cleaning Crew: The robot goes through the receipts one by one:
    • If a receipt has no price, it treats it as $0 (but counts it as a missing receipt).
    • If a receipt says "Total" or "Sum," it throws it away because that's just a summary, not a real transaction.
    • If a receipt says you fed "-5 students," it throws that receipt in the trash (you can't have negative students!).
  • The Result: It creates a clean, organized ledger. Because the robot follows strict rules and never "guesses," you can run the same box of receipts through it a thousand times, and you will get the exact same answer every time. This makes it audit-proof.

2. The "Color-Coded Map" (Trend Analysis & Reporting)

Once the robot has cleaned the data, it creates a new report book with four specific pages:

  1. The Receipt Log: A summary of what happened (e.g., "We threw away 5 bad receipts, and 3 had missing prices").
  2. The Heat Map: A colorful chart showing which schools (departments) are spending the most per student.
  3. The Detailed List: A long list of every single subject and its specific costs.
  4. The "Fuzzy" Labels: The most creative part.

3. The "Traffic Light" System (Interpretable Fuzzy Banding)

Looking at a spreadsheet full of numbers like "$12,450.32" or "$14,200.10" is hard for humans to understand quickly. Is that expensive? Is that cheap?

The robot adds a Traffic Light System to help you understand the numbers relative to that specific year.

  • The Anchors (The Traffic Lights): For each year, the robot looks at all the costs and picks three special numbers:
    • The Minimum (Green Light): The cheapest school.
    • The Median (Yellow Light): The "middle" school (not the average, but the one right in the middle of the pack).
    • The Maximum (Red Light): The most expensive school.
  • The "Fuzzy" Logic: Instead of saying "This is exactly $12,000," the robot asks: "How close is this number to the Green, Yellow, or Red light?"
    • If a school's cost is very close to the cheapest, it gets a "Low" label (Green).
    • If it's right in the middle, it gets a "Medium" label (Yellow).
    • If it's near the most expensive, it gets a "High" label (Red).
  • The "Fuzzy" Part: What if a school is exactly halfway between "Low" and "Medium"? In normal math, you have to pick one. But this robot uses Fuzzy Logic. It says, "You are 50% Low and 50% Medium." It gives you both numbers so you can see the nuance.
  • The Tie-Breaker: If the robot must pick a single color for a label (like for a quick summary), it has a strict rule: Always pick "Medium" first. It's like a referee who always favors the middle ground when the call is too close to see.

Why Does This Matter?

In the real world, university budgets are huge, and people argue about them.

  • Without this robot: "I think School A is too expensive!" "No, I think School B is!" (Everyone is arguing over messy spreadsheets).
  • With this robot: "Here is the report. We used the exact same box of receipts as last time (here is the fingerprint). Here is the clean data. And here is the Traffic Light map: School A is 'Medium' this year, but School B is 'High'."

The Big Picture

This paper is about trust.

  1. Trust in the Math: Because the robot is "deterministic," you know the math isn't changing based on who is running it.
  2. Trust in the Meaning: Because of the "Fuzzy Banding," you don't just see a scary number; you see a clear, color-coded label that tells you where you stand compared to your peers, while still keeping the exact number visible if you want to check the details.

It turns a messy pile of receipts into a clear, fair, and checkable story about how money is being spent.