Assumption: 100M DAU

QPS (register / login / logout) = 100M * 0.1 / 100k = 100
Peak QPS = 100 * 3 = 300
QPS (lookup) = 100M * 100 / 100k = 100k
- 平均每个用户每天与查询用户信息相关的操作次数(查看好友，发信息，更新消息主页)
Peak QPS = 100k * 3 = 300k

读多写少的系统 -> Cache
读写操作都很多 ->
- Redis, cache-through database (cf. memcached + MySQL, cache-aside)
- add servers to share traffic

AuthService

FriendshipService

单向好友关系(Twitter、Instagram、微博)
- Friendship table: fromUserId, toUserId
双向好友关系(WhatsApp、Facebook、微信)
- 方案1:存为两条信息，A关注了B，B关注了A
  - 很多 NoSQL 一般来说不支持 Multi Indexes, 所以需要拆分为两条数据
- 方案2:存为一条信息，但查询的时候需要查两次
好友关系所涉及的操作非常简单，基本都是key-value:
- 求某个user的所有关注对象
- 求某个user的所有粉丝
- A关注B →插入一条数据
- A取关B →删除一条数据

SQL vs NoSQL

need transaction? -> SQL
easy configuration for scale such as replica and sharding? -> NoSQL
high performance with little machine? -> NoSQL
- NoSQL support higher QPS for single machine (>10x)
- 不過在 multiple machines 的情況下, 可對 SQL 去作 sharding & replicas 來增加其 performace 和 availability/reliability. 這邊甚至可混用 cosistent hashing 的架構來作 SQL sharding/replication. 也就是說, 在多台機器可用的環境下, CF NoSQL 和 SQL 的效能是可以作到差不多的.
Data 非常不 relational (require no join or few joins)? -> NoSQL
Date 非常 relational (require lots of joins) 或有大量的 columns 要作 index -> SQL
- 這時用 CF NoSQL 可能要處理大量的 de-normalization, 雖然 disk 便宜, 但 duplicated data 太多的話可能也會爆容量? 而且 update 時要處理 de-norm data 間 consistency 的問題. (reference: http://www.jiuzhang.com/qa/1836/

HashKey mod n,
- when n ~ number of machines -> inconsistent hash
- when n is constant, not related to number of machines -> consistent hash
Hash Ring: n = 0 ~ 2^64-1
1 physical machine -> 1000 virtual nodes

User System

results matching ""