Want to schedule a meeting? Submit your query first

Contact
F1 2025
Featured Case StudyMachine Learning System Design2026-02-12
See Race InsightsKaggle

F1 2025

Real-Time Performance Analytics & Strategy Intelligence Platform

An end-to-end Formula 1 analytics and race strategy intelligence system built to simulate race outcomes, predict lap performance, and model pit strategies using machine learning.

Core Technologies

PythonPandasNumPyScikit-learnXGBoostSHAPFastAPINext.jsMongoDBDocker

Challenge

Formula 1 race strategy depends on multi-variable optimization under uncertainty. Lap time is influenced by tire wear, fuel mass decay, track evolution, sector variability, temperature, and competitor behavior. The primary challenge was to build a system that could:

• Handle high-volume lap-level structured datasets • Engineer domain-specific features reflecting real race dynamics • Maintain model interpretability for strategic decisions • Deliver near real-time inference for simulation scenarios

Additionally, race data contains noisy segments such as safety cars and crashes, requiring robust outlier detection and normalization logic.

Solution

The system was architected as a modular ML pipeline:

  1. Data Layer – Structured ingestion with schema validation and anomaly filtering.

  2. Feature Engineering Layer – Created degradation slopes, rolling lap deltas, fuel-adjusted pace metrics, and sector consistency scores.

  3. Modeling Layer – Implemented XGBoost regression for lap prediction and ensemble classifiers for finishing position ranking.

  4. Explainability Layer – Integrated SHAP-based feature attribution to quantify variable impact.

  5. Serving Layer – FastAPI microservice deployed with Docker and connected to a Next.js dashboard.

A scenario engine was built to simulate pit windows and tire strategy shifts by dynamically recalculating projected lap times based on degradation curves.

Impact

The platform demonstrates production-level ML system design and measurable performance outcomes.

Key outcomes:

• Achieved 92.4% R² score for lap-time regression • Reduced retraining time by 38% via pipeline optimization • Maintained <120ms API inference latency under concurrent loads • Enabled interactive what-if race strategy simulations

This project showcases strong capability in ML architecture, system scalability, model interpretability, and real-world simulation modeling.

0.924
Lap Time Prediction R²
81%
Race Top-3 Prediction Accuracy
<120ms
Inference Latency
38% Faster Retraining
Pipeline Optimization Gain

Data Architecture & Cleaning

Designed a robust ingestion pipeline to process lap-level telemetry, tire data, and session statistics. Implemented strict schema validation and anomaly detection to ensure reliable downstream modeling.

data_preprocessing.py
1import pandas as pd 2 3# Load raw race dataset 4df = pd.read_csv('race_data.csv') 5 6# Convert lap time string to seconds 7df['lap_time_sec'] = pd.to_timedelta(df['lap_time']).dt.total_seconds() 8 9# Remove extreme outliers (crash/safety laps) 10df = df[(df['lap_time_sec'] > 60) & (df['lap_time_sec'] < 150)] 11 12# Normalize tire age 13df['tire_age_norm'] = df['tire_age'] / df['tire_age'].max()

Outlier Filtering

Applied statistical thresholding and domain rules to remove non-representative laps.

Data Consistency Checks

Ensured session alignment across drivers, laps, and track segments.

Advanced Feature Engineering & Modeling

Developed domain-aware performance indicators capturing degradation, pace stability, and sector variance. Benchmarked multiple models before selecting XGBoost for regression tasks.

train_model.py
1from xgboost import XGBRegressor 2from sklearn.model_selection import train_test_split 3 4features = ['tire_age_norm', 'fuel_load', 'sector_variance', 'track_temp'] 5X = df[features] 6y = df['lap_time_sec'] 7 8X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) 9 10model = XGBRegressor(n_estimators=300, max_depth=6, learning_rate=0.05) 11model.fit(X_train, y_train) 12 13r2_score = model.score(X_test, y_test)
Random Forest Baseline
1from sklearn.ensemble import RandomForestRegressor 2 3rf = RandomForestRegressor(n_estimators=200, max_depth=10) 4rf.fit(X_train, y_train) 5rf_score = rf.score(X_test, y_test)
image

Model Benchmarking

Compared Linear Regression, Random Forest, and Gradient Boosting before finalizing XGBoost.

Explainability Integration

Used SHAP analysis to quantify tire age and fuel load as primary performance drivers.

Deployment & Interactive Simulation

Deployed the trained models as containerized microservices and integrated them into a responsive dashboard for real-time race simulations.

main.py
1from fastapi import FastAPI 2import joblib 3 4app = FastAPI() 5model = joblib.load('lap_time_model.pkl') 6 7@app.post('/predict') 8def predict(data: dict): 9 features = [[data['tire_age'], data['fuel_load'], data['sector_variance'], data['track_temp']]] 10 prediction = model.predict(features) 11 return {'predicted_lap_time': float(prediction[0])}

Dockerization

Containerized services to ensure reproducibility and portability.

Performance Optimization

Implemented batch inference and caching to maintain sub-120ms response times.