Available · Pune, India · Updated May 2026

Pranav
Gawale

ML Engineer · Data Scientist  /  Python · XGBoost · FastAPI · Production ML

I build ML systems that ship — from the Actuarial Risk Pricing System achieving RMSE $3,934 with 91.1% conformal coverage served live via FastAPI on Hugging Face, to 4-model IPL pipelines with temporal splits and live win-probability AUC 0.864. My focus is production: real inference latency, model observability, and the gap between a trained model and a maintainable service.

Five systems.
Real outcomes.

01 · MLOps · Insurance Pricing

Actuarial Risk Pricing System

RMSE $3,934  ·  Cost-Weighted R²=0.8788  ·  91.1% Conformal Coverage (target 90%)  ·  Net Profit uplift +$309,396 (+6.5%)  ·  51,337 samples  ·  46 features

Actuarial Risk Pricing System — live on Hugging Face Spaces. Two-model hybrid routing — XGBoost base + HighValueSpecialist; hard threshold $16,701, soft blend window up to $21,695 preventing sharp discontinuities; 4/5 KPI wins → DEPLOY HYBRID gate passed. Heteroscedastic conformal prediction — 10-bin, winsorized at 99th pctile. 3-tier Yeo-Johnson bias correction. SHA-256 checksum verification before joblib.load() (closes RCE deserialization window). Full MLOps: MLflow · DVC · Optuna HPO · Docker multi-stage. CI: Lint → Type check ∥ Tests → 6-gate regression check (G4/G6/G7 + RMSE cap + no-NaN + no-negative) → CD push to GHCR. All Actions pinned to full commit SHAs.

XGBoostLightGBM SHAPFastAPI StreamlitMLflow OptunaDVC DockerGitHub Actions

02 · Data Science · Match Intelligence

IPL Match Intelligence Pipeline

Live Win Probability — AUC 0.864  ·  POTM Classifier PR-AUC 0.641 · Recall 92.9%  ·  Score Predictor MAE 16 runs  ·  Match Winner AUC 0.501 (honest: T20 is genuinely unpredictable)

4 production ML models on 150,460 deliveries across 636 matches from 12 IPL seasons (2008–2019). Zero notebooks, zero shortcuts. Temporal train/test splits with strict data-leakage guards on all models. 30 interactive Plotly charts. 6 data-integrity bugs found and fixed during EDA. Pipeline covers live win probability (ball-by-ball), Player of the Match classification, half-innings score prediction, and pre-match outcome modelling — with honest metric reporting where T20 is genuinely unpredictable.

XGBoostLightGBM Plotly scikit-learnjoblib

03 · EDA + ML · Wealth Analytics

Billionaire Wealth Distribution Analysis

Self-Made Classifier — ROC-AUC 0.8436 (XGBoost + Optuna 40 trials, 5-fold CV)  ·  Worth Regressor R²=0.08 (honest: demographics have minimal signal at this scale)  ·  χ² p<0.001

EDA + ML on 2,640 billionaires across 35 raw → 43 engineered features (Kaggle 2023). Self-made share: 69.2% · Gini >0.6 within the billionaire class itself. χ² tests (gender × self-made) and ANOVA (net worth × industry category) both p<0.001. K-Means k=4 clustering with PCA 2D projection. SHAP beeswarm for both models. Worth Regressor R²=0.08 — honestly reported: wealth scale has minimal signal from demographics alone.

XGBoostOptuna SHAPK-Means scikit-learnPlotly Seaborn

04 · EDA + ML · Clean Energy

EV Adoption Forecasting Dashboard

Tesla 45.7% market share · HHI >2,500 (highly concentrated) · King County = 52.5% of WA total · ARIMA(2,1,1) + LightGBM walk-forward forecasting

EDA + ML + ARIMA/LightGBM forecasting + live Streamlit dashboard on ~177,866 EV registrations from Washington State DOL (2010–2023). Most registered year: 2022 (28,013 vehicles). CAFV Eligibility Classifier + Range Regressor (XGBoost + Optuna + SHAP) for static prediction. ARIMA(2,1,1) + LightGBM walk-forward for adoption forecasting. Full interactive Streamlit dashboard live on Streamlit Cloud.

XGBoostLightGBM OptunaSHAP ARIMAStreamlit PlotlyGitHub Actions CI

05 · Research · Synthetic Data Generation

Synthetic Tabular Data Augmentation Suite

TabDDPM TSTR R²=0.88 (best fidelity)  ·  TVAE R²=0.86  ·  CTGAN R²=0.73  ·  DP-CTGAN R²=0.59  ·  1,337 → 50,000 synthetic rows  ·  MIA AUC evaluated at 37× scale

Synthetic tabular data pipeline benchmarking 4 generative models for insurance data augmentation. 11-section QC suite covering fidelity, privacy (MIA AUC), and structural checks. TabDDPM achieves best TSTR fidelity (requires PyTorch); DP-CTGAN provides differential privacy guarantees at cost of fidelity. Full MLflow experiment tracking across all generative runs. Enables downstream model training at 37× original scale without privacy exposure.

TabDDPMCTGAN TVAESDV PyTorchMLflow Pandas
📍 Pune, India · Open to Work

Notebook to production.
That's my domain.

I approach ML engineering as a systems problem, not a modelling problem. A strong RMSE is table stakes — what matters is whether the model behaves correctly on the tail, ships calibrated uncertainty bounds (not "±20% bands"), and stays maintainable six months after the last Optuna run. The Actuarial Risk Pricing System live on Hugging Face Spaces is my clearest proof of that discipline: 91.1% conformal coverage on a 90% target, verified via SHA-256 before deserialization, gated behind a 6-step CI regression suite.

My current focus is production ML for high-stakes, high-variance domains — insurance pricing, sports analytics, and financial data where a wrong prediction has a real cost. I build towards full MLOps maturity: DVC data versioning, MLflow experiment tracking, SHAP-driven explainability, Optuna HPO, Docker multi-stage builds, and GitHub Actions CI/CD pinned to full commit SHAs. Next: SentinelSumm — legal NLP pipeline with DistilBERT + clause extraction.

🎯 RMSE $3,934  ·  Actuarial Risk Pricing
⚡ 91.1% Conformal Coverage
🏏 AUC 0.864 · IPL Win Probability
📦 5 Production Systems
🚀 Open to Work

"Placeholder — add a real LinkedIn recommendation here before launch."

— Placeholder Name Role · Company · via LinkedIn Recommendation

The full ML lifecycle.
Tools that prove it.

Languages & Frameworks

Python SQL PyTorch TensorFlow Keras Scikit-learn XGBoost

Machine Learning & Deep Learning

Supervised Learning Unsupervised Learning Classification Regression Clustering · K-Means ANN · CNN · RNN · LSTM Feature Engineering Conformal Prediction Yeo-Johnson Transforms SHAP Explainability Hyperparameter Tuning · Optuna Time Series · ARIMA Synthetic Data · TabDDPM AUC-ROC · PR-AUC

Data Science & Statistics

EDA Data Cleaning & Preprocessing Missing Data & Outliers Dimensionality Reduction PCA t-SNE Hypothesis Testing Linear Regression Logistic Regression

MLOps, Deployment & Dev Tools

FastAPI Streamlit Flask MLflow DVC 3.63.0 Docker · Multi-stage GitHub Actions CI/CD Hugging Face Spaces Plotly · Seaborn · Matplotlib MySQL · Oracle SQL Git · GitHub · VS Code

The career arc.
Built on real outcomes.

  1. Machine Learning Engineer

    Pune, Maharashtra

    Built and deployed production ML systems end-to-end — from data pipeline to live inference. Flagship: Actuarial Risk Pricing System (RMSE $3,934 · Cost-Weighted R²=0.8788 · 91.1% conformal coverage · net profit uplift +$309,396) live on Hugging Face Spaces via FastAPI + Streamlit. Two-model hybrid routing with heteroscedastic conformal prediction, 3-tier Yeo-Johnson bias correction, and SHA-256 checksum verification before deserialization. Full MLOps stack: MLflow tracking, DVC versioning, Optuna HPO (268.6s run · +4.2% train/val gap — Minimal Overfitting), Docker multi-stage, GitHub Actions CI/CD (Lint → Type check ∥ Tests → 6-gate regression check). Also engineered the IPL Match Intelligence Pipeline on 150,460 deliveries achieving live win-probability AUC 0.864 with temporal splits and strict data-leakage guards.

  2. Data Analyst

    Pune, Maharashtra

    Analyzed 100K+ bank loan records using SQL for extraction and preprocessing — resolving a 15% missing-value rate via median imputation and winsorizing outliers at the 1st/99th percentile. Conducted EDA revealing nonlinear approval rate drops below a 650 credit score threshold and elevated default concentration in high debt-to-income segments, with findings handed off to inform downstream risk modeling. Built stakeholder-facing dashboards in Power BI and exploratory visualizations in Matplotlib and Seaborn illustrating loan distribution, repayment patterns, and customer demographics. Findings contributed to a revised risk scoring framework, reducing the portfolio's projected default exposure by ~12%.

Currently
Engineering.

focus.py — pranav@ml-workstation
focus = {
"active":  "Actuarial Risk Pricing System — ONNX export · async batch (Celery) · churn model integration",
"learning":  [ "conformal prediction theory", "QLoRA fine-tuning", "speculative decoding"],
"next_build":  "SentinelSumm — legal NLP pipeline with DistilBERT + clause extraction",
"hardware":  "NVIDIA RTX 3050 4GB · 16GB RAM · Windows 11 + WSL2",
"philosophy":  "Production or it doesn't count",
}

Building something ambitious?
Let's talk.

Actively exploring ML engineering roles — production AI, actuarial/insurance systems, data science. If you're working on something that needs ML that actually ships, I respond within 24 hours.

RMSE $3,934  ·  Actuarial Risk Pricing
91.1% Conformal Coverage · 90% target
🏏 AUC 0.864 · IPL Win Probability
Available · Response in 24hrs
Please enter your name.
Please enter a valid email address.
Please enter a message.