Video Content Recommendation
Video recommendation differs from content recommendation in important ways: consumption is sequential and session-based, completion rates matter as much as clicks, and you need to balance immediate engagement against long-term viewer satisfaction. I'll work through business and ML objectives, system architecture, data and features, modeling, infrastructure, evaluation, and robustness.
Solution Walkthrough
Business Objective
The objective is to maximize quality-adjusted watch time while maintaining content diversity and platform health, subject to creator sustainability and user retention constraints. This is more nuanced than simply maximizing raw watch hours. We could show addictive low-quality content that keeps users watching but leaves them feeling terrible and ultimately churning. We need sustainable engagement where users feel good about their time spent.
Quality-adjusted means weighting watch time by satisfaction signals like completion rate, positive reactions, saves for later, shares with friends, and crucially, whether users return for another session within 24 hours. A user who watches three videos they love, completes them all, and comes back tomorrow is far more valuable than someone who doomscrolls through 50 videos, completes none, and doesn't return.
The creator sustainability piece is critical. If the recommendation system only surfaces mega-hit content from established creators, new creators can never break through, and eventually content diversity dies. We need discovery mechanisms that give new, high-quality content a chance to find its audience even without initial engagement signals.
Content diversity matters for long-term retention. Users who only see one type of content (say, cooking videos) eventually saturate. We need to help them discover new interests and categories, expanding their engagement surface over time. This requires explicitly modeling diversity and serendipity, not just immediate relevance.
ML Objective
From an ML perspective, this is a ranking problem with sequential dependencies. Given a user's watch history, current session context, and millions of candidate videos, we need to rank them by predicted value. But video recommendation has unique characteristics that distinguish it from other recommendation problems.
First, consumption is sequential and session-based. Users watch one video, then another, forming viewing sessions. The context evolves, after watching two cooking videos, they might want a third or might want something different. We need to model both immediate next-video prediction and longer-term session optimization.
Second, engagement has multiple dimensions. A video can be clicked but immediately abandoned (low dwell time), watched partially (moderate engagement), or completed and rewatched (high engagement). We need to predict the full engagement profile, not just clicks. Completion rate is particularly important for video because it captures whether content delivered on its promise.
Third, video has content richness that text posts lack. We have visual content, audio, speech, music, editing style, pacing, all of which affect engagement but are expensive to model. Our representations need to capture these multi-modal signals efficiently.
We're predicting multiple outcomes for each candidate video: probability of click, expected watch time (regression on seconds), probability of completion, probability of engagement actions (like, share, save), and probability of session continuation (will they watch another video?). These predictions feed into a value model that balances immediate engagement with long-term satisfaction.
Unlock Full Solution
Get access to the complete walkthrough, key concepts, summary, and follow-up questions.