I’ve never personally used TikTok, but have heard of its legendary sway on people, with some users who scroll for hours getting messages “saying they should put the phone down” from the app itself.1 It is so powerful in fact, that projects that offered a “TikTok for X” recommendation experience could disrupt other spheres, for example in searching/browsing the web, local events, generative AI art/videogames, and text or audio social media.
The most in-depth paper on TikTok’s algorithm is “Analysis on the “Douyin (Tiktok) Mania” Phenomenon Based on Recommendation Algorithms” by Zhengwei Zhao2, who published it while an undergraduate at Sun Yat-sen University in Guangzhou, China.
A paper by authors at ByteDance titled “Monolith: Real Time Recommendation System With Collisionless Embedding Table” focuses more on the infrastructure components of model serving and training at scale.3 I recommend reading the papers in that order for a more in-depth view of the summary that follows.
Video features: First, TikTok will create lots of ML features about each new piece of video content. It will run NLP algorithms on the description and tags, object detection on the video, and other models on the audio. Then it will classify content into hierarchical interest groups, with broad categories like Tech, Sports, or Entertainment containing specific categories like Chinese Football or Bundesliga. Features will also include data about the video creator, like their location and more.4
User features: Separately TikTok will create features about the user, including based on the device details, geolocation, and a social graph of other TikTok users based on data from contacts. Finally, these features will also contain detailed data about all the users’ interactions, including watch time, sharing, likes, etc, at a very granular detail, for example, understanding which objects were in a frame while the user was watching a video, and when they exactly scrolled to the next one. Higher-level/computed features are also included, like “same_author_seen” or “same_tag_today” to prevent boredom, for example.5 These user features are continuously updated, and time is an important component.6
Recall and ranking: Next, TikTok uses collaborative filtering based on the hierarchical groups to recommend pieces of content viewed by similar users. For example if two users are both interested in Japanese rock climbing, and a video is well-received by the first, then it may be sent to the second user. It also filters content in a recall step by genres, topics, and popularity. Then this list is ranked using a formula that combines the output of specialized models, each designed to predict key metrics. Those models are several terabytes in size, and are based on DeepFM, or deep factorization machines.7 FMs are useful because:8
In contrast to SVMs, FMs model all interactions between variables using factorized parameters. Thus they are able to estimate interactions even in problems with huge sparsity (like recommender systems) where SVMs fail. We show that the model equation of FMs can be calculated in linear time and thus FMs can be optimized directly.
This is a good introduction to FMs with context on collaborative filtering.9 A large-scale hash table is used to store the feature embeddings so the pairwise/cross-encoder rankings can be computed effectively.10
Ranking formula: According to an internal TikTok document acquired by the NYTimes, the formula is roughly as follows:11
Plike x Vlike + Pcomment x Vcomment + Eplaytime x Vplaytime + Pplay x Vplay
Where P is the predicted probability of a binary action, E is the predicted time spent, and V is the value or weight assigned to a given action.12 The Wall Street Journal experimentally found that time spent watching a video, number of repeat viewings, and whether the video was paused during playback were the most important metrics.13
Batch Rollouts based on Interactions: Pieces of content are not immediately eligible to be retrieved by any user. A video is first “seeded” to just one user, and then larger batches of users if the majority of users in the previous batch respond positively.14
Model drift and in-flight updates: While users are scrolling, liking, and sharing, Tiktok has to incorporate this real time feedback about users’ changing interests or the quality of content into their models, otherwise they will make suboptimal predictions on key metrics and recommend the wrong content. TikTok transfers a sparse subset of model weights to the parameter servers every hour to keep models up to date, while continuing to serve requests.15
Footnotes
Hern, “How TikTok’s Algorithm Made It a Success.”↩
Zhao, “Analysis on the ‘Douyin (Tiktok) Mania’ Phenomenon Based on Recommendation Algorithms.”↩
Liu et al., “Monolith.”↩
Zhao, “Analysis on the ‘Douyin (Tiktok) Mania’ Phenomenon Based on Recommendation Algorithms”; Guo et al., “Multi-Modal Representation Learning for Short Video Understanding and Recommendation.”↩
Smith, “How TikTok Reads Your Mind.”↩
Cheng et al., “ICME Grand Challenge on Short Video Understanding.”↩
Liu et al., “Monolith.”↩
Rendle, “Factorization Machines.”↩
Lundquist, “Factorization Machines for Item Recommendation with Implicit Feedback Data.”↩
Liu et al., “Monolith.”↩
Smith, “How TikTok Reads Your Mind.”↩
Singh, “Is There Really a Secret Sauce behind TikTok’s Algorithm?”↩
“TikTok’s Recommender Revealed, DeepMind’s Not-So-Large...”↩
Zhao, “Analysis on the ‘Douyin (Tiktok) Mania’ Phenomenon Based on Recommendation Algorithms.”↩
Liu et al., “Monolith.”↩