Performance comparison of Python, MATLAB and R for numerical solutions of SI and SIR epidemiological models

Imagine you are a doctor trying to predict how a flu virus will spread through a school. You have a mathematical "recipe" (a model) that tells you how the disease moves from healthy kids to sick ones, and then to those who have recovered. This recipe is written in the language of calculus, which is great for theory but impossible to solve with a pencil and paper for real-world scenarios.

To get a prediction, you need a computer to do the heavy lifting. But here's the catch: there are different ways to tell the computer how to calculate the numbers, and there are different "computers" (software programs) you can use to run those calculations.

This paper is essentially a race between three popular software tools (Python, MATLAB, and R) to see which one is the fastest and most accurate at solving these disease models.

Here is the breakdown of the race using some everyday analogies:

1. The Race Track: The Disease Models

The researchers used two specific "tracks" for the race:

The SI Track (Susceptible-Infected): Imagine a simple game where people are either healthy or sick, and once they get sick, they stay sick. This track is like a straight line; we actually know the exact answer (the "finish line") beforehand. This allowed the researchers to check how close the software got to the truth.
The SIR Track (Susceptible-Infected-Recovered): This is a more complex game. People get sick, but then they recover and become immune. This track is like a winding mountain road with no map. Since there is no "exact answer" to check against, the researchers used a super-precise, high-end calculator (MATLAB's built-in ODE45 solver) as the "Gold Standard" to see how well the other tools performed.

2. The Running Styles: The Numerical Methods

To run the race, the software had to use specific "running styles" (mathematical methods) to take steps forward in time:

Euler's Method: Think of this as walking. You take small, simple steps. It's easy to understand, but if you take big steps, you might miss the curve of the road and end up in the wrong place. It's fast, but not very precise.
RK4 (Runge-Kutta 4th Order): This is like driving a sports car with a GPS. It doesn't just look ahead; it checks the road in four different directions before deciding where to go. It's much more complex and takes more energy, but it stays on the road perfectly, even on sharp turns.
Predictor-Corrector (P-C): This is like taking a guess and then double-checking. You take a step, guess where you'll land, and then immediately correct your path to make sure you're right. It's a middle ground between walking and driving.

3. The Competitors: Python, MATLAB, and R

These are the three "athletes" in the race:

Python: The Olympic Sprinter. It is free, very popular, and incredibly fast.
MATLAB: The Professional Engineer. It is expensive, widely used in universities, and very reliable, but often a bit heavier and slower than Python.
R: The Statistician's Powerhouse. It is amazing for analyzing data and making charts, but when it comes to raw speed in crunching numbers for these specific models, it tends to be the slowest.

4. The Results of the Race

Who was the most accurate?

The Winner: RK4 (The Sports Car) won hands down. No matter which software was used, this method got the closest to the "truth." Even when taking big steps, it was almost perfect.
The Runner-up: Predictor-Corrector was also very good, almost as accurate as the sports car.
The Loser: Euler's Method (The Walker) was the least accurate. If you tried to walk too fast (take big steps), you ended up way off course.

Who was the fastest?

The Speed King: Python crushed the competition. Whether the step size was big or tiny, Python finished the calculations significantly faster than the others. It was like a cheetah compared to a house cat.
The Middle Ground: MATLAB was decent. It wasn't the fastest, but it was reliable and got the job done in a reasonable time.
The Slowpoke: R consistently took the longest to finish the race. While it did the math correctly, it just took its time doing it.

5. The Big Takeaway

The researchers concluded that if you are a scientist or a student trying to model a disease outbreak:

Use Python if you want the best balance of speed and accuracy. It's the most efficient tool for the job.
Use the RK4 method if you need high precision. Don't rely on the simple "walking" method (Euler) if accuracy matters.
Don't worry too much about R for this specific type of heavy number-crunching, as it's slower than the others, though it's still capable of doing the job.

In short: Python + RK4 = The Fastest and Most Accurate Way to Predict Disease Spread.

Here is a detailed technical summary of the paper "Performance Comparison of Python, MATLAB and R for Numerical Solutions of SI and SIR Epidemiological Models" by Berkay Özışık and Elif Demirci.

1. Problem Statement

Mathematical modeling is essential for understanding and controlling infectious disease spread, primarily through compartmental models like SI (Susceptible–Infected) and SIR (Susceptible–Infected–Recovered). While simple cases may have analytical solutions, most real-world scenarios require numerical methods such as Euler's method, the fourth-order Runge-Kutta (RK4) method, and Predictor–Corrector (P–C) methods.

Although Python, MATLAB, and R are the dominant tools for scientific computing, the literature lacks a comprehensive, simultaneous comparison of their computational efficiency (run-time) and numerical accuracy when solving these specific epidemiological models. Previous studies often focused on specific models or software without benchmarking execution times across all three platforms using identical numerical algorithms. This study aims to fill that gap.

2. Methodology

The authors implemented three numerical algorithms to solve the SI and SIR systems of differential equations across three software environments: Python, MATLAB, and R.

Models:
- SI Model: Defined by $dS/dt = -\alpha SI$ and $dI/dt = \alpha SI$ . This model has an exact analytical solution, allowing for direct error calculation.
- SIR Model: Defined by $dS/dt = -\alpha SI$ , $dI/dt = \alpha SI - \beta I$ , and $dR/dt = \beta I$ . This model lacks a simple analytical solution.
Numerical Methods:
- Euler's Method: First-order iterative approach.
- RK4 (Runge-Kutta 4th Order): A higher-order method offering better accuracy.
- Predictor–Corrector (P–C): An iterative technique using a predictor step followed by a corrector refinement.
Experimental Setup:
- Hardware: MacBook Air with Apple M4 chip and 16 GB RAM.
- Parameters: Used data from a historical influenza outbreak (Murray, 2002) with $S(0)=762$ , $I(0)=1$ , $R(0)=0$ , and transmission/recovery rates $\alpha = 2.18 \times 10^{-3}$ and $\beta = \alpha \times 202$ .
- Step Sizes ( $h$ ): Tested at $0.25 $,$ 0.10 $, and$ 0.01$ days.
- Measurement: Only pure computational time was recorded, excluding setup, plotting, or variable initialization.

3. Key Contributions

First Simultaneous Benchmark: This is the first study to compare Python, MATLAB, and R side-by-side using Euler, RK4, and P-C methods for both SI and SIR models.
Dual-Metric Evaluation: The study evaluates both accuracy (via $R^2$ values) and efficiency (via execution time).
Reference Standard: For the SIR model, the authors established a high-accuracy reference solution using MATLAB's built-in ODE45 solver to validate the RK4 results.
Practical Guidance: Provides actionable data for researchers to select the optimal tool based on their specific trade-off between speed and precision.

4. Key Results

A. Accuracy ( $R^2$ Values)

SI Model (vs. Analytical Solution):
- RK4: Achieved near-perfect accuracy ( $R^2 = 1.0$ ) across all software and step sizes.
- P–C Method: Highly accurate, approaching $R^2 = 1.0$ as step size decreased.
- Euler's Method: The least accurate, with $R^2$ dropping to ~0.958 for larger step sizes ( $h=0.25$ ), though it improved significantly with smaller steps.
- Observation: Accuracy was consistent across all three software platforms for the same algorithm and step size.
SIR Model (vs. ODE45 Reference):
- The RK4 method in MATLAB showed extremely high correlation with the ODE45 reference ( $R^2 \approx 0.9999998$ ), confirming its reliability even without an exact analytical solution.

B. Computational Performance (Run-Time)

The study revealed significant performance disparities between the software:

Python: Consistently the fastest across all methods and step sizes. It demonstrated significantly lower execution times, particularly as step sizes decreased (increasing the number of iterations).
MATLAB: Showed moderate performance, generally slower than Python but faster than R.
R: Consistently the slowest among the three, with execution times often several times higher than Python's.
Algorithm Complexity: As expected, RK4 and P-C methods required more time than Euler's method due to their more complex formulations, but the relative performance ranking of the software remained consistent.

Sample Data Trends (SI Model, $h=0.01$ ):

Euler Method: Python (~~0.0007s) < MATLAB (~~0.0187s) < R (~0.0094s).
RK4 Method: Python (~~0.0039s) < MATLAB (~~0.0257s) < R (~0.0314s).

5. Significance and Conclusion

The paper concludes that while all three software packages are capable of solving epidemiological models effectively, Python offers the best balance of speed and accuracy, making it the superior choice for computationally intensive tasks or when high precision is required with fine step sizes.

For Speed: Python is the clear winner, outperforming MATLAB and R significantly.
For Accuracy: All three platforms yield identical numerical results for the same algorithm; the choice of software does not impact the mathematical accuracy, only the time to compute it.
Recommendation: Researchers should prioritize Python for large-scale simulations or real-time modeling where run-time is critical. MATLAB remains a viable option for moderate complexity, while R, despite its statistical strengths, may be less efficient for pure numerical integration tasks in this context.

This study provides a critical reference for epidemiologists and data scientists to optimize their computational workflows when modeling infectious disease dynamics.

Performance comparison of Python, MATLAB and R for numerical solutions of SI and SIR epidemiological models

1. The Race Track: The Disease Models

2. The Running Styles: The Numerical Methods

3. The Competitors: Python, MATLAB, and R

4. The Results of the Race

5. The Big Takeaway

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

A. Accuracy (R2R^2R2 Values)

B. Computational Performance (Run-Time)

5. Significance and Conclusion

More like this

Partial Sums of the Series for the Dirichlet Eta Function, their Peculiar Convergence, the Simple Zeros Conjecture, and the RH

Triangular arrangements on the projective plane

Some arithmetic properties of Weil polynomials of the form t2g+atg+qgt^{2g}+at^g+q^gt2g+atg+qg

Big Picard theorems and algebraic hyperbolicity for varieties admitting a variation of Hodge structures

On the dual positive cones and the algebraicity of a compact Kähler manifold

A. Accuracy ( $R^2$ Values)

Some arithmetic properties of Weil polynomials of the form $t^{2g}+at^g+q^g$