ETA Prediction System
ETA prediction is one of the most consequential ML systems at a ride-sharing company: inaccurate estimates erode user trust, hurt driver efficiency, and break marketplace matching. I'll work through business and ML objectives, system architecture, data and features, modeling, infrastructure, evaluation, and robustness.
Solution Walkthrough
Business Objective
The objective is to provide accurate real-time ETA predictions that enable optimal marketplace decisions (rider pickup times, trip duration estimates, driver routing, and pricing) while maintaining user trust through consistent accuracy and appropriately communicated uncertainty. ETAs directly impact core business metrics: conversion (accurate ETAs lead to more completed rides), satisfaction (users value reliability over optimistic estimates), driver efficiency (accurate predictions enable better schedules), and marketplace balance (pricing and matching depend on accurate time estimates).
There's a critical tradeoff between accuracy and user experience. Overly pessimistic ETAs (padding heavily to ensure we don't under-promise) lead to lower conversion, users may choose competitors or not ride at all. Overly optimistic ETAs damage trust when they're consistently wrong. The optimal ETA balances realism with maintaining user confidence.
ETAs serve multiple use cases with different precision requirements: rider pickup ETA (how long until driver arrives? (impacts conversion, needs precision within 1-2 minutes), trip duration ETA (how long will the ride take?) impacts pricing and scheduling, tolerate 5-10% error), driver arrival at destination, and delivery ETAs for food/package delivery.
ML Objective
From an ML perspective, this is time-series regression on graphs with real-time updates. Given current conditions (driver location, destination, traffic, weather, historical patterns), predict travel time remaining. The prediction must update every few seconds as conditions change; we're not predicting once but continuously refining estimates as the trip progresses.
The core challenge is modeling complex spatial-temporal dynamics: road networks are graphs (not Euclidean space), traffic conditions change dynamically, driver behavior varies (speed, routing choices), and external factors (weather, events, time of day) affect travel time. We need representations that capture these interdependencies.
We're predicting distributions, not point estimates. A highway route might be 15 minutes ±2 minutes with high confidence. A route through downtown during rush hour might be 20 minutes ±10 minutes with high uncertainty. Communicating uncertainty appropriately is critical for user trust.
Unlock Full Solution
Get access to the complete walkthrough, key concepts, summary, and follow-up questions.