Available · Pune, India · Updated May 2026

Pranav
Gawale

ML Engineer · Data Scientist / Python · XGBoost · FastAPI · Production ML

I build ML systems that ship — from the Actuarial Risk Pricing System achieving RMSE $3,934 with 91.1% conformal coverage served live via FastAPI on Hugging Face, to 4-model IPL pipelines with temporal splits and live win-probability AUC 0.864. My focus is production: real inference latency, model observability, and the gap between a trained model and a maintainable service.

▶ View My Work ↓ Resume Contact Me

pranav@ml-workstation ~ zsh

❯python train.py --model hybrid --task actuarial-pricing

5-model benchmark done · Hybrid DEPLOY gate: 4/5 KPI wins ✓

RMSE=$3,934 · R²=0.8788 · Conformal coverage=91.1% ✓

❯uvicorn app.main:app --host 0.0.0.0 --port 8000

FastAPI running · /predict · /batch (1–10k) · /docs ✓

❯pytest tests/ -v --cov=src # 6-gate regression check

G4/G6/G7 + RMSE cap + no-NaN + no-negative → all passed ✓

❯

Selected Work

Five systems.
Real outcomes.

01 · MLOps · Insurance Pricing

Actuarial Risk Pricing System

RMSE $3,934 · Cost-Weighted R²=0.8788 · 91.1% Conformal Coverage (target 90%) · Net Profit uplift +$309,396 (+6.5%) · 51,337 samples · 46 features

Actuarial Risk Pricing System — live on Hugging Face Spaces. Two-model hybrid routing — XGBoost base + HighValueSpecialist; hard threshold $16,701, soft blend window up to $21,695 preventing sharp discontinuities; 4/5 KPI wins → DEPLOY HYBRID gate passed. Heteroscedastic conformal prediction — 10-bin, winsorized at 99th pctile. 3-tier Yeo-Johnson bias correction. SHA-256 checksum verification before joblib.load() (closes RCE deserialization window). Full MLOps: MLflow · DVC · Optuna HPO · Docker multi-stage. CI: Lint → Type check ∥ Tests → 6-gate regression check (G4/G6/G7 + RMSE cap + no-NaN + no-negative) → CD push to GHCR. All Actions pinned to full commit SHAs.

XGBoostLightGBM SHAPFastAPI StreamlitMLflow OptunaDVC DockerGitHub Actions

▶ Live Demo 📄 API Docs 📊 GitHub

02 · Data Science · Match Intelligence

IPL Match Intelligence Pipeline

Live Win Probability — AUC 0.864 · POTM Classifier PR-AUC 0.641 · Recall 92.9% · Score Predictor MAE 16 runs · Match Winner AUC 0.501 (honest: T20 is genuinely unpredictable)

4 production ML models on 150,460 deliveries across 636 matches from 12 IPL seasons (2008–2019). Zero notebooks, zero shortcuts. Temporal train/test splits with strict data-leakage guards on all models. 30 interactive Plotly charts. 6 data-integrity bugs found and fixed during EDA. Pipeline covers live win probability (ball-by-ball), Player of the Match classification, half-innings score prediction, and pre-match outcome modelling — with honest metric reporting where T20 is genuinely unpredictable.

XGBoostLightGBM Plotly scikit-learnjoblib

📄 GitHub

03 · EDA + ML · Wealth Analytics

Billionaire Wealth Distribution Analysis

Self-Made Classifier — ROC-AUC 0.8436 (XGBoost + Optuna 40 trials, 5-fold CV) · Worth Regressor R²=0.08 (honest: demographics have minimal signal at this scale) · χ² p<0.001

EDA + ML on 2,640 billionaires across 35 raw → 43 engineered features (Kaggle 2023). Self-made share: 69.2% · Gini >0.6 within the billionaire class itself. χ² tests (gender × self-made) and ANOVA (net worth × industry category) both p<0.001. K-Means k=4 clustering with PCA 2D projection. SHAP beeswarm for both models. Worth Regressor R²=0.08 — honestly reported: wealth scale has minimal signal from demographics alone.

XGBoostOptuna SHAPK-Means scikit-learnPlotly Seaborn

📄 GitHub

04 · EDA + ML · Clean Energy

EV Adoption Forecasting Dashboard

Tesla 45.7% market share · HHI >2,500 (highly concentrated) · King County = 52.5% of WA total · ARIMA(2,1,1) + LightGBM walk-forward forecasting

EDA + ML + ARIMA/LightGBM forecasting + live Streamlit dashboard on ~177,866 EV registrations from Washington State DOL (2010–2023). Most registered year: 2022 (28,013 vehicles). CAFV Eligibility Classifier + Range Regressor (XGBoost + Optuna + SHAP) for static prediction. ARIMA(2,1,1) + LightGBM walk-forward for adoption forecasting. Full interactive Streamlit dashboard live on Streamlit Cloud.

XGBoostLightGBM OptunaSHAP ARIMAStreamlit PlotlyGitHub Actions CI

▶ Live Dashboard 📄 GitHub

05 · Research · Synthetic Data Generation

Synthetic Tabular Data Augmentation Suite

TabDDPM TSTR R²=0.88 (best fidelity) · TVAE R²=0.86 · CTGAN R²=0.73 · DP-CTGAN R²=0.59 · 1,337 → 50,000 synthetic rows · MIA AUC evaluated at 37× scale

Synthetic tabular data pipeline benchmarking 4 generative models for insurance data augmentation. 11-section QC suite covering fidelity, privacy (MIA AUC), and structural checks. TabDDPM achieves best TSTR fidelity (requires PyTorch); DP-CTGAN provides differential privacy guarantees at cost of fidelity. Full MLflow experiment tracking across all generative runs. Enables downstream model training at 37× original scale without privacy exposure.

TabDDPMCTGAN TVAESDV PyTorchMLflow Pandas

📄 GitHub

The engineer behind the systems ↓

📍 Pune, India · Open to Work

Who I Am

Notebook to production.
That's my domain.

I approach ML engineering as a systems problem, not a modelling problem. A strong RMSE is table stakes — what matters is whether the model behaves correctly on the tail, ships calibrated uncertainty bounds (not "±20% bands"), and stays maintainable six months after the last Optuna run. The Actuarial Risk Pricing System live on Hugging Face Spaces is my clearest proof of that discipline: 91.1% conformal coverage on a 90% target, verified via SHA-256 before deserialization, gated behind a 6-step CI regression suite.

My current focus is production ML for high-stakes, high-variance domains — insurance pricing, sports analytics, and financial data where a wrong prediction has a real cost. I build towards full MLOps maturity: DVC data versioning, MLflow experiment tracking, SHAP-driven explainability, Optuna HPO, Docker multi-stage builds, and GitHub Actions CI/CD pinned to full commit SHAs. Next: SentinelSumm — legal NLP pipeline with DistilBERT + clause extraction.

🎯 RMSE $3,934 · Actuarial Risk Pricing

⚡ 91.1% Conformal Coverage

🏏 AUC 0.864 · IPL Win Probability

📦 5 Production Systems

🚀 Open to Work

"Placeholder — add a real LinkedIn recommendation here before launch."

— Placeholder Name Role · Company · via LinkedIn Recommendation

See the full technical stack ↓

Technical Stack

The full ML lifecycle.
Tools that prove it.

Languages & Frameworks

Python SQL PyTorch TensorFlow Keras Scikit-learn XGBoost

Machine Learning & Deep Learning

Supervised Learning Unsupervised Learning Classification Regression Clustering · K-Means ANN · CNN · RNN · LSTM Feature Engineering Conformal Prediction Yeo-Johnson Transforms SHAP Explainability Hyperparameter Tuning · Optuna Time Series · ARIMA Synthetic Data · TabDDPM AUC-ROC · PR-AUC

Data Science & Statistics

EDA Data Cleaning & Preprocessing Missing Data & Outliers Dimensionality Reduction PCA t-SNE Hypothesis Testing Linear Regression Logistic Regression

MLOps, Deployment & Dev Tools

FastAPI Streamlit Flask MLflow DVC 3.63.0 Docker · Multi-stage GitHub Actions CI/CD Hugging Face Spaces Plotly · Seaborn · Matplotlib MySQL · Oracle SQL Git · GitHub · VS Code

Open Source

Actuarial-Pricing-Engine · Cricket · Data-Augmentation · View all →

Journey

The career arc.
Built on real outcomes.

Apr 2023 — Present
Machine Learning Engineer

Pune, Maharashtra

Built and deployed production ML systems end-to-end — from data pipeline to live inference. Flagship: Actuarial Risk Pricing System (RMSE $3,934 · Cost-Weighted R²=0.8788 · 91.1% conformal coverage · net profit uplift +$309,396) live on Hugging Face Spaces via FastAPI + Streamlit. Two-model hybrid routing with heteroscedastic conformal prediction, 3-tier Yeo-Johnson bias correction, and SHA-256 checksum verification before deserialization. Full MLOps stack: MLflow tracking, DVC versioning, Optuna HPO (268.6s run · +4.2% train/val gap — Minimal Overfitting), Docker multi-stage, GitHub Actions CI/CD (Lint → Type check ∥ Tests → 6-gate regression check). Also engineered the IPL Match Intelligence Pipeline on 150,460 deliveries achieving live win-probability AUC 0.864 with temporal splits and strict data-leakage guards.
Oct 2021 — Apr 2023
Data Analyst

Pune, Maharashtra

Analyzed 100K+ bank loan records using SQL for extraction and preprocessing — resolving a 15% missing-value rate via median imputation and winsorizing outliers at the 1st/99th percentile. Conducted EDA revealing nonlinear approval rate drops below a 650 credit score threshold and elevated default concentration in high debt-to-income segments, with findings handed off to inform downstream risk modeling. Built stakeholder-facing dashboards in Power BI and exploratory visualizations in Matplotlib and Seaborn illustrating loan distribution, repayment patterns, and customer demographics. Findings contributed to a revised risk scoring framework, reducing the portfolio's projected default exposure by ~12%.

Now

Currently
Engineering.

focus.py — pranav@ml-workstation

focus = {

"active": "Actuarial Risk Pricing System — ONNX export · async batch (Celery) · churn model integration",

"learning": [ "conformal prediction theory", "QLoRA fine-tuning", "speculative decoding"],

"next_build": "SentinelSumm — legal NLP pipeline with DistilBERT + clause extraction",

"hardware": "NVIDIA RTX 3050 4GB · 16GB RAM · Windows 11 + WSL2",

"philosophy": "Production or it doesn't count",

}

Get In Touch

Building something ambitious?
Let's talk.

Actively exploring ML engineering roles — production AI, actuarial/insurance systems, data science. If you're working on something that needs ML that actually ships, I respond within 24 hours.

⭐ RMSE $3,934 · Actuarial Risk Pricing

⭐ 91.1% Conformal Coverage · 90% target

🏏 AUC 0.864 · IPL Win Probability

⚡ Available · Response in 24hrs

Email Get in touch → LinkedIn View Profile → GitHub View Repositories → Resume Request via form →

Send a Message

Your Name Please enter your name.

Email Address Please enter a valid email address.

Message Please enter a message.

Pranav
Gawale

Five systems.
Real outcomes.

Actuarial Risk Pricing System

IPL Match Intelligence Pipeline

Billionaire Wealth Distribution Analysis

EV Adoption Forecasting Dashboard

Synthetic Tabular Data Augmentation Suite

Notebook to production.
That's my domain.

The full ML lifecycle.
Tools that prove it.

Writing that ships knowledge,
not just opinion.

91.1% Coverage on a 90% Target: How I Built Distribution-Free Prediction Intervals for Insurance Pricing

Why Your ML Model Isn't Production-Ready (And Mine Is)

The career arc.
Built on real outcomes.

Machine Learning Engineer

Data Analyst

Currently
Engineering.

Building something ambitious?
Let's talk.

PranavGawale

Five systems.Real outcomes.

Actuarial Risk Pricing System

IPL Match Intelligence Pipeline

Billionaire Wealth Distribution Analysis

EV Adoption Forecasting Dashboard

Synthetic Tabular Data Augmentation Suite

Notebook to production.That's my domain.

The full ML lifecycle.Tools that prove it.

Writing that ships knowledge,not just opinion.

91.1% Coverage on a 90% Target: How I Built Distribution-Free Prediction Intervals for Insurance Pricing

Why Your ML Model Isn't Production-Ready (And Mine Is)

The career arc.Built on real outcomes.

Machine Learning Engineer

Data Analyst

CurrentlyEngineering.

Building something ambitious?Let's talk.

Pranav
Gawale

Five systems.
Real outcomes.

Notebook to production.
That's my domain.

The full ML lifecycle.
Tools that prove it.

Writing that ships knowledge,
not just opinion.

The career arc.
Built on real outcomes.

Currently
Engineering.

Building something ambitious?
Let's talk.