ML Systems — Recommendations

3 min readDec 11, 2023

1. Scaling the Instagram Explore recommendations system

Explore is one of the largest recommendation systems on Instagram. This is powered by multi-stage ranking approach with several well-defined stages, each focusing on different objectives and algorithms — 1. Retrieval 2. First-stage ranking 3. Second-stage ranking and 4. Final reranking
Retrieval — The retrieval stage consists of multiple candidates’ retrieval sources. These select hundreds of relevant items from a media pool of billions of items. These candidates are combined and passed to ranking models.
a. Candidates’ sources can be based on heuristics as well as ML based. Additionally, they can be real-time as well as and pre-generated. They utilize all these source types together and mix them with tunable weights. for eg heuristics real-time can be recent media from followed author and ML real-time can be candidates generated from Two-tower network
b. Two-tower networks are trained with user and item features and objective is to predict engagement events (like liking a post) which is basically dot-product of the respective embeddings learnt. For real-time inference, a. Freshest user-side features is used to generate user-embedding and a ANN search is done for most similar items. or b. For interacted items, an ANN search is done for most similar items. This helps to tradeoff between different engagement types.

Ranking — Ranking gradually reduces the number of candidates from a few thousand to few hundred in multiple stages.
a. A first-stage ranker (i.e., lightweight model), which is less precise and less computationally intensive. Here, the same Two-tower network is used but the objective is to predict the output of the second stage ranker with label (similar to knowledge distillation).
b. Second stage ranker predicts the probability of different engagement events (click, like, etc.) using the multi-task multi label (MTML) neural network model. Recommendations are precomputed for some users during off-peak hours. Final score that is used for ordering of ranked items is a weighted sum — W_click * P(click) + W_like * P(like) — W_see_less * P(see less) + etc.
Final reranking — Applying certain rules allows us to have a much better control over the final recommendations like e.g — a. Do not show items from the same authors in a sequence to have diversity b. Filter-out/downrank some items based on integrity-related scores
Hyperparameter tuning like W_click etc. are done using offline tuning (learn these parameters from data) and online Bayesian Optimization.