Distributed File System (Google File System)

File store
  • Similar to Registrar Service
  • file info (name, attributes, schema, timestamp, size...)
  • physical location

==> metadata info

Pattern
  • Peer 2 Peer
    • pro: no single point failure
    • con: hard to maintain consistency
  • Master - Slave
    • pro: sample design
    • con: single point failure

how to save a file?

  • metadata + block

how to save a large file?

  • metadata + chunk

how to save an extra large file?

  • Master metadata + Slave chunk servers
  • how to save traffic

how to read?

  • master + chunk server + client

how to write?

  • master + chunk server + client

master fail?

  • restart
  • recover from / switch to replica / backup
  • double master
  • muti master: -> paxos algo

how to determine if a chunk on the machine is broken?

  • checksum
  • when write: write file and checksum
  • when read: read file and re-calculate checksum. if do not match existing checksum -> broken.

how to avoid data loss when chunk server is down?

  • replica * 3

how to recover when a chunk is broken?

  • ask Master which servers have the replica of broken chunk

how to find if a chunk server is down?

  • hearbeat

hot spot?

  • make more replica for this chunk

// client 把文件拆分为n份,每一份一个chunk index

// 如果写出错 让client重试 server尽量只处理简单逻辑 不要弄复杂

results matching ""

    No results matching ""