Twitter, MAU = 300M, DAU = 150M
Scenario
Concurrent User:
- QPS = DAU * average request per user / total seconds per day = 150M * 60 / 100k = 90k
- Peak QPS = 3 * QPS = 270k
- Read QPS = 300k
- Write QPS = 5k
Service
split into micro-services
define features for each service
e.g.
- user service
- login
- register
- tweet service
- post tweet
- news feed timeline
- media service
- upload image
- upload video
- friendship service
- follow
- unfollow
Storage
SQL / NOSQL / File System?
schema, STAR schema, (de-)normalize
e.g.
- user service: SQL (id, username, email, password)
- tweet service: NoSQL (id, userId, content, createdAt)
- media service: File System
- friendship service: SQL / NoSQL (from_userId, to_userId)
Scale
Optimize
- pull / push? normalize / de-normalize
- more features?
- special case, such as, Lady Gaga, Inactive users...
Maintenance
- handle failure
- scalability
News Feed
- Twitter / Facebook / RSS reader / Wechat Friend Circle...
pull model
- read news feed
- get followings (friendship table)
- get tweets from followings (tweets table)
- merge N sorted array
- analysis: N DB reads + merge k
- post a tweet
- 1 DB write (tweets table)
- cons:
- N * DB.getTweets(following, 100) // N DB read is expensive and blocking process
- how to??? -> cache
- cache each user's timeline (top 100 tweets) (reduce pull time)
- cache each user's news_feed (reduce merge time)
- pros:
- 1 DB write
push model
- news feed table
- Id, ownerId, tweetId, createdAt
- read news feed
- 1 DB read from news feed table, top k latest
- post a tweet
- insert to tweet table
- get followers from friendship table
- insert and fan out to followers in news feed table
- cons:
- when followers is huge, Lady Gaga, push may take very long
- how to??? ->
- rank followers by weight (e.g. last login time)
- mark as "star" user. "star" user do not push tweets to news feed table. Followers pull from "star" user timeline
- merge results from news feed table and "star" user timeline
- pros:
- get followers and fan out can be async
- 1 DB read
什么时候用Push?
资源少
想偷懒,少写代码
实时性要求不高
用户发帖比较少
双向好友关系,没有明星问题(比如朋友圈)
什么时候用Pull ?
资源充足
实时性要求高
用户发帖很多
单向好友关系,有明星问题
Follow / Unfollow
Follow一个用户之后,异步地将他的Timeline合并到你的News Feed中
- Merge timeline into news feed asynchronously.
Unfollow一个用户之后,异步地将他发的Tweets从你的News Feed中移除
- Pick out tweets from news feed asynchronously.
Store Likes
- Tweet table
- id, userId, content, createdAt, likeNums, commentNums,, retweetNums