My other computer is a datacentre - 2012 edition


Published on

An updated version of the "my other computer is a datacentre" talk, presented at the Bristol University HPC talk.

Because it is targeted at universities, it emphasises some of the interesting problems -the classic CS ones of scheduling, new ones of availability and failure handling within what is now a single computer, and emergent problems of power and heterogeneity. It also includes references, all of which are worth reading, and, being mostly Google and Microsoft papers, are free to download without needing ACM or IEEE library access.

Comments welcome.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

My other computer is a datacentre - 2012 edition

  1. 1. Julio SteveOur other computer is a datacentre
  2. 2. Their other computer is a datacentre
  3. 3. That sorts a Terabyte in 82s
  4. 4. Their other computer is a datacentre
  5. 5. facebook: 45 Petabytes
  6. 6. Why?
  7. 7. Big DataPetabytes of "unstructured" dataDatamining: analysing past eventsPrediction, Inference, ClusteringGraph analysisFacebook-scale NoSQL databases
  8. 8. Big Data vs HPCBig Data : Petabytes HPC: petaflops  Storage of low-value data  Storage for checkpointing  H/W failure common  Surprised by H/W failure  Code: frequency, graphs,  Code: simulation, rendering machine-learning, rendering  Less persistent data, ingress  Ingress/egress problems & egress  Dense storage of data  Dense compute  Mix CPU and data  CPU + GPU  Spindle:core ratio  Bandwidth to other servers
  9. 9. Architectural IssuesFailure is inevitable – design for it.Bandwidth is finite – be topology aware.SSD expensive, low seek times, best written to sequentially.HDD less $, more W; awful random access– stripe data across disks, read/write in bulk
  10. 10. Coping with FailureAvoid SPOFsReplicate dataRestart workRedeploy applications onto live machinesRoute traffic to live front endsDecoupled connection to back end: queues, scatter/gather Failure-tolerance must be designed in
  11. 11. ScaleGoal: linear performance scalingHide the problems from (most) developersDesign applications to scale across thousands of servers, tens of thousands of coresMassive parallelisation, minimal communication Scalability must be designed in
  12. 12. Algorithms and FrameworksMapReduce – Hadoop, CouchDB, (Dryad)BSP – Pregel, Giraph, HamaColumn Tables – Cassandra, HBase, BigTableLocation Aware filesystem: GFS, HDFSState Service: Chubby, Zookeeper, AnubisScatter/gather – search engines(MPI)
  13. 13. MapReduce: Hadoop
  14. 14. Bath Bluetooth Datasetgate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24• 2006-2009• Multiple sites• 10GB data
  15. 15. Map to device IDclass DeviceMapper extends Mapper { def parser = new EventParser() def one = new IntWritable(1) def map(LongWritable k, Text v, Mapper.Context ctx) { def event = parser.parse(v) ctx.write(event.device, one) }} (gate, device,day,hour ) → (deviceID, 1)
  16. 16. Reduce to device countclass CountReducer2 extends Reducer { def iw = new IntWritable() def reduce(Text k, Iterable values, Reducer.Context ctx) { def sum = values.collect() {it.get()}.sum() iw.set(sum) ctx.write(k, iw); }} (device,[count1, count2, ...] ) → (deviceID, count)
  17. 17. Hadoop running MapReduce
  18. 18. HDFS FilesystemCommodity disksScatter data across machinesReplicate data across racksTrickle-feed backup to other sitesArchive unused filesIn idle time: check health of data, rebalanceReport data location for placement of work
  19. 19. Results0072adec0c1699c0af152c3cdb6c018e 21280120df42306097c70384501ebbdd888c 24301541fef30e606ce88f8b0e931f010d2 50161257b1b0b8d1884975dd7b62f4387 1501ad97908c53712e58894bc7009f5aa0 220225a2b080a4ac8f18344edd6108c46c 30276aba603a2aead55fe67bc48839cec 9027973a027d85ad4dd4a15efa5142204 102e73779c77fcd4e9f90a193c4f3e7ff 395302e9a7bef5ba4c1caf5f35e8ada226ed 2...
  20. 20. Map: day of week; reduce: count1400012000100008000600040002000 0 1 2 3 4 5 6 7
  21. 21. Other QuestionsPeak hours for devicesPredictability of deviceTime to cross gates/transit cityRoutes they takeWhich devices are often co-sighted?When do they stop being co-sighted?Clustering: resident, commuter, student, tourist
  22. 22. Hardware ChallengesEnergy Proportional Computing [Barroso07]Energy Proportional Datacenter Networks [Abts10]Dark Silicon and the End of Multicore Scaling [Esmaeilzadeh10]Scaling of CPU, storage, network
  23. 23. CS ProblemsSchedulingPlacement of data and workGraph TheoryMachine LearningHeterogenous parallelismAlgorithms that run on large, unreliable, clustersDealing with availability
  24. 24. New problem: availabilityAvailability in Globally Distributed Storage Systems [Ford10]Failure Trends in a Large Disk Drive Population [Pinheiro07]DRAM Errors in the Wild [Schroeder09]Characterizing Cloud Computing Hardware Reliability [Vishwanath10]Understanding Network Failures in Data Centers [Gill11]
  25. 25. What will your datacentre do?