Distributed File System (Google File System)
File store
- Similar to Registrar Service
- file info (name, attributes, schema, timestamp, size...)
- physical location
==> metadata info
Pattern
- Peer 2 Peer
- pro: no single point failure
- con: hard to maintain consistency
- Master - Slave
- pro: sample design
- con: single point failure
how to save a file?
- metadata + block
how to save a large file?
- metadata + chunk
how to save an extra large file?
- Master metadata + Slave chunk servers
- how to save traffic
how to read?
- master + chunk server + client
how to write?
- master + chunk server + client
master fail?
- restart
- recover from / switch to replica / backup
- double master
- muti master: -> paxos algo
how to determine if a chunk on the machine is broken?
- checksum
- when write: write file and checksum
- when read: read file and re-calculate checksum. if do not match existing checksum -> broken.
how to avoid data loss when chunk server is down?
- replica * 3
how to recover when a chunk is broken?
- ask Master which servers have the replica of broken chunk
how to find if a chunk server is down?
- hearbeat
hot spot?
- make more replica for this chunk
// client 把文件拆分为n份,每一份一个chunk index
// 如果写出错 让client重试 server尽量只处理简单逻辑 不要弄复杂