Using Artificial Neural Networks to Predict Claim Duration in a Work Injury Compensation Environment

The Big Picture: Predicting the "Recovery Time" of a Broken Bone

Imagine you work for a giant insurance company that pays workers when they get hurt on the job. Every time someone gets hurt, a claim is filed. The company needs to know one crucial thing: How long will this person be out of work?

If they guess too short, the company runs out of money. If they guess too long, they hold onto money they don't need to.

The problem is that the data they have is incredibly messy. They have thousands of different codes describing injuries (e.g., "burn from a chemical," "sprained finger," "fall from a ladder"), plus details about the worker's age, gender, and job. Trying to find a pattern in this mess using standard math (like simple averages or basic regression) is like trying to solve a Rubik's Cube while wearing blindfolded gloves. It's too complex.

The Solution: The author, Anthony Almudevar, built a Neural Network. Think of this as a digital "brain" that learns from past mistakes to predict the future.

The Ingredients: The "Recipe" for a Prediction

To make a prediction, the model needs ingredients. The paper lists 10 main ingredients:

What happened? (Nature of Injury)
Where did it happen? (Part of Body)
What caused it? (Source of Injury)
How did it happen? (Type of Accident)
Who is the worker? (Age, Gender)
Where do they work? (Job type, Company size, Location)

The Challenge: These aren't just numbers like "5" or "10." They are categories. There are 154 different types of injuries, 119 body parts, and so on. It's a massive, tangled web of information.

The Secret Sauce: The "Neural Network" vs. The "Linear Calculator"

Most traditional statistical models are like a linear calculator. They assume that if "Females" usually take longer to recover than "Males," then every female will take longer than every male, regardless of the injury.

But life isn't that simple.

The Reality: A female might recover faster than a male if she breaks a finger, but slower if she burns her back. The relationship changes depending on the specific injury.
The Old Way: A standard calculator misses these subtle twists and turns.
The New Way (The Neural Network): Imagine a giant, multi-layered spiderweb.
- The input (the injury details) hits the first layer of the web.
- The signal ripples through hidden layers, where the "spider" (the computer) looks for complex patterns and connections that a human might miss.
- It learns that "Female + Finger Injury" = Fast Recovery, but "Female + Back Burn" = Slow Recovery.
- It doesn't just give you a single number; it gives you a probability distribution. Instead of saying "3 weeks," it says, "There's a 50% chance it's 2 weeks, a 30% chance it's 4 weeks, and a 20% chance it's 6 weeks."

The Tricky Part: The "Unfinished Stories"

In the real world, not all claims are closed yet. Some people are still out of work when the data is analyzed. In statistics, this is called Censoring.

The Analogy: Imagine you are counting how long people stay at a party. You walk in at 10:00 PM. Some people have already left (finished claims). Others are still dancing (open claims). You know the people still dancing have been there at least until 10:00 PM, but you don't know when they will leave.
The Problem: If you just ignore the people still dancing, your average party length will be wrong (too short).
The Fix: The author used a special type of math called Cox Proportional Hazards. Think of this as a "time-traveling accountant" that knows the people still dancing are still there, and adjusts the math so the unfinished stories don't ruin the prediction for the finished ones.

How Did They Test It?

The Training: They fed the "brain" 10,000 past claims. The brain tried to guess the duration, got it wrong, adjusted its internal "weights" (like tightening or loosening the spiderweb), and tried again.
The Test: They gave it 7,000 new claims it had never seen before.
The Result: The Neural Network was significantly better than the simple linear calculator. It captured the complex interactions (like the gender/injury mix-ups) that the simple model missed.

What If We Don't Have All the Info?

Sometimes, when a claim first comes in, you might not have the full 10 ingredients. Maybe you know the injury and the gender, but not the specific job code yet.

The author tested two ways to handle this:

Method A (The Average): "Okay, we don't know the job code. Let's look at everyone else with this injury and gender, and use their average recovery time as a guess."
Method B (The Curve): "Let's look at the entire history of everyone with this injury and gender and average out their whole timeline."

The Winner: Method A was simpler and worked just as well. It's like saying, "I don't know your exact height, but since you're a basketball player, I'll guess you're tall based on the average height of all basketball players."

The Bottom Line

This paper proves that when you have a massive, messy dataset with thousands of categories and complex relationships, Artificial Neural Networks are the right tool.

Old Tool: A hammer (good for simple nails, bad for complex screws).
New Tool: A Swiss Army Knife (can handle the complexity, the missing pieces, and the unfinished stories).

By using this "digital brain," insurance boards can predict how long a worker will be out of work much more accurately, helping them manage their money better and get workers the right support at the right time.

1. Problem Statement

The primary objective of worker's compensation programs is to reimburse expenses related to time lost from work due to injury. The claim duration (time loss) is the principal driver of program costs and a key indicator of injury severity.

The Challenge: Traditional statistical modeling techniques (e.g., standard Cox proportional hazards regression, logistic regression) struggle with the complexity of the input data. The data consists of 10 categorical covariates (including injury codes, demographics, and workplace details) with class numbers ranging from 2 to 80.
Data Complexity: The National Work Injury Statistics Program (NWISP) codes have a hierarchical structure. The sheer number of potential interactions between these high-cardinality categorical variables makes standard regression models (which would require modeling all interaction terms) computationally infeasible and prone to overfitting.
Censoring: A significant portion of claims in the training dataset are "open" (not yet closed) at the time of analysis. This introduces right-censoring, requiring survival analysis techniques rather than simple regression.

2. Methodology

The author proposes a hybrid modeling approach that combines the interpretability and handling of censoring found in survival analysis with the non-linear modeling capabilities of Artificial Neural Networks (ANN).

A. Data Structure

Source: Administrative database of the Workplace Health, Safety and Compensation Commission of Newfoundland and Labrador (WHSCC).
Sample: 17,026 claims with open dates $\ge$ Jan 1, 1998 (to ensure coding homogeneity).
Predictors (Inputs): 10 categorical variables:
1. Nature of Injury (NOI)
2. Part of Body (POB)
3. Source of Injury (SOI)
4. Type of Accident (TOA)
5. Age (categorized by quartiles)
6. Sex
7. Employer Type (SIC)
8. Occupation (OCC)
9. Employer Size (Payroll quartiles)
10. Region (Postal code prefix)
Response: Short-term time loss duration (weeks), subject to right-censoring.

B. The Model: ANN-based Cox Proportional Hazards

The core innovation is replacing the linear predictor term ( $\eta = \beta^T x$ ) in the standard Cox model with the output of a Multi-Layer Perceptron (MLP).

Hazard Function: $h_x(t) = h_0(t)e^{\eta(x)}$ $h_{x} (t) = h_{0} (t) e^{η (x)}$
- $h_0(t)$ : Baseline hazard.
- $\eta(x)$ : Output of the Neural Network.
Network Architecture:
- Input Layer: One node per category of the 10 predictors (one-hot encoding). Includes a bias node.
- Hidden Layer: Fully connected layer with $n_h$ nodes using a sigmoid activation function $\phi(u) = \frac{e^u}{1+e^u}$ .
- Output Layer: Single node representing $\eta(x)$ . Includes "skip layer" connections (direct input-to-output).
Training Objective: The network weights ( $W$ ) are optimized by minimizing a penalized negative partial log-likelihood:
$H(W) = -L(W) + \lambda \sum w_{ij}^2 + \lambda_b \sum w_{0j}^2$
Where $L(W)$ is the Cox partial likelihood, and $\lambda, \lambda_b$ are decay (regularization) parameters to prevent overfitting.

C. Model Selection and Validation

Data Split: 10,000 claims for training, 7,026 for testing.
Selection Criteria: The model was tuned by varying the number of hidden nodes ( $n_h$ ) and the decay parameter ( $\lambda$ ). Performance was evaluated using the Generalized $R^2$ coefficient based on the test set's univariate Cox fit using the ANN output as a predictor.
Handling Partial Inputs: The paper proposes two methods for making predictions when some covariates are missing:
- Method A: Average the prediction term $\eta$ over all training records matching the available partial inputs.
- Method B: Average the survival curves directly.
- Result: Method A was selected for its simplicity and comparable accuracy.

3. Key Contributions

Hybrid Modeling Framework: Successfully integrates ANN into the Cox proportional hazards framework, allowing for the modeling of complex, non-linear relationships and high-order interactions between categorical variables without explicitly defining interaction terms.
Handling High-Cardinality Categorical Data: Demonstrates a solution for datasets where the number of categories (e.g., 655 Source of Injury codes) renders standard regression with interaction terms impossible.
Censoring Accommodation: The model naturally handles right-censored data, a critical feature for insurance and claims management where claims are often open during analysis.
Distributional Output: Unlike point-estimate models, this approach allows for the reconstruction of the full duration distribution for a specific claim, enabling the assessment of statistical ranges and risk.
Interaction Discovery: The model effectively captures interactions (e.g., between Sex and Injury Type) that a main-effects-only model would miss.

4. Results

Predictive Performance:
- The selected ANN model (Full model, $n_h=12$ , $\lambda=6$ ) achieved a Generalized $R^2$ of 0.206 on the test set.
- For comparison, a standard main-effects Cox model (no interactions) on the same data achieved an $R^2$ of only 0.15, demonstrating the ANN's superior ability to capture structural complexity.
Validation:
- Decile Analysis: Boxplots of actual durations grouped by predicted deciles showed a clear monotonic relationship, confirming the model's ranking ability.
- Quintile Transition: Conditional probabilities showed that high predicted quintiles corresponded strongly to high actual duration quintiles (e.g., 40% of actuals in the top quintile were predicted to be in the top quintile).
- Interaction Capture: The model successfully predicted the direction of sex-based duration differences across different injury codes (Kendall's correlation $P=0.0003$ ), whereas a main-effects model would predict a constant sex difference regardless of injury type.
Partial Input Performance: Method A (averaging $\eta$ ) provided accurate estimates for means and medians even when only partial inputs (e.g., Body Part + Sex) were available.

5. Significance

This paper provides a robust framework for claims management in worker's compensation.

Operational Utility: Since all inputs are available at the time a claim is filed, the model can be used immediately to forecast claim duration. This aids in resource allocation, reserve setting, and early intervention for long-duration claims.
Statistical Rigor: It bridges the gap between the flexibility of machine learning (ANNs) and the statistical requirements of survival analysis (handling censoring and providing interpretable hazard ratios).
Scalability: The approach offers a pathway to utilize rich, detailed administrative coding systems (like NWISP) that were previously considered too complex for predictive modeling due to the "curse of dimensionality" in interaction terms.

In conclusion, the study validates that ANN-based Cox models are a superior alternative to traditional statistical methods for predicting claim durations in environments characterized by complex, hierarchical categorical data and censored outcomes.