Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Process data on distributed systems

5 views

Published on

Presentation for Data Analytics Day at Microsoft

Published in: Software
  • Be the first to comment

  • Be the first to like this

Process data on distributed systems

  1. 1. https://en.wikipedia.org/wiki/Five_Ws
  2. 2. https://en.wikipedia.org/wiki/Big_data#Characteristics
  3. 3. Does the data fit into a single machine? Can the data be processed fast enough on a single machine?
  4. 4. Designing Distributed Systems
  5. 5. Designing Distributed Systems
  6. 6. MapReduce: Simplified DataProcessingonLargeClusters
  7. 7. Hadoop: The Definitive Guide
  8. 8. Can we separate tasks into a Work Queue? Does the data need to be aggregated/merged?
  9. 9. Spark: The Definitive Guide
  10. 10. Do you have resources to tune, debug and profile the processing pipeline? Do you need a response in seconds?

×