Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Hadoop, HBase, and NoSQL

23,681 views

Published on

Introduction to Hadoop, HBase, and NoSQL

  1. 1. Nick Dimiduk - @xefyr Founder, Drawn to Scale nick@drawntoscalehq.com April 28, 2010
  2. 2. Agenda what NoSQL is not motivation Hadoop HBase
  3. 3. whoami Computer Science & Engineering at Ohio State: Artificial Intelligence, Programming Languages, Systems Engineering Applied Technical Systems: Hierarchical, non-relational data storage and analysis systems (no-sql before there was NoSQL). Information Retrieval, Wire Serialization/RPC (before there was Thrift/Avro), Data Visualization (GB's) Visible Technologies: Social Media Storage, Processing, Analytics. Monitoring, Engagement, Warehousing, and BI. (TB's) Drawn to Scale: Big Data Storage, Processing, Retrieval, Analytics (TB's, PB's)
  4. 4. Agenda what NoSQL is not motivation Hadoop HBase
  5. 5. What NoSQL is not. movement
  6. 6. What NoSQL is not. movement - no ANSI NoSQL-2010 one-size-fits-all
  7. 7. It’s not Anti-RDBMS
  8. 8. It’s about Choice! http://www.flickr.com/photos/zakh/337938459/
  9. 9. What NoSQL is not. movement - no ANSI NoSQL-2010 one-size-fits-all - it’s about choice silver bullet
  10. 10. What NoSQL is not. movement - no ANSI NoSQL-2010 one-size-fits-all - it’s about choice silver bullet - guarantees are hard
  11. 11. Agenda what NoSQL is not motivation Hadoop HBase
  12. 12. motivation more, More, MORE Data!
  13. 13. motivation more, More, MORE Data! ACID Burns
  14. 14. motivation more, More, MORE Data! ACID Burns BASE is good enough
  15. 15. motivation more, More, MORE Data! ACID Burns BASE is good enough Life’s too short
  16. 16. motivation more, More, MORE Data! ACID Burns BASE is good enough Life’s too short
  17. 17. “typical” application
  18. 18. “typical” application Data Server Village People App Server
  19. 19. growing pains Data Server Villages of People App Servers
  20. 20. vertical partitioning Data Server Villages of People App Servers Data Server Villages of People App Servers
  21. 21. vertical partitioning Data Server Villages of People Data Server Villages of People App Servers App Servers Data Server Villages of People Data Server Villages of People App Servers App Servers
  22. 22. vertical partitioning Data Server Villages of People App Servers Data Server Villages of People App Servers
  23. 23. “typical” application
  24. 24. growing pains Data Servers Villages of People App Servers
  25. 25. horizontal partitioning Villages of People
  26. 26. horizontal partitioning Villages of People
  27. 27. horizontal partitioning Villages of People Data Layer Application Layer
  28. 28. Agenda what NoSQL is not motivation Hadoop HBase
  29. 29. “open source, reliable, distributed computing”
  30. 30. “open source, reliable, distributed computing”
  31. 31. MapReduce - API for parallel computing
  32. 32. MapReduce - API for parallel computing HDFS - distributed, replicated file system
  33. 33. MapReduce - API for parallel computing HDFS - distributed, replicated file system ZooKeeper - distributed synchronization
  34. 34. MapReduce - API for parallel computing HDFS - distributed, replicated file system ZooKeeper - distributed synchronization Avro - Data Serialization / RPC
  35. 35. Agenda what NoSQL is not motivation Hadoop HBase
  36. 36. structured, distributed database for your horizontally scalable FS
  37. 37. structured, distributed database for your horizontally scalable FS
  38. 38. random access
  39. 39. random access real-time reads/writes
  40. 40. random access real-time reads/writes simple API
  41. 41. random access real-time reads/writes simple API big table
  42. 42. references : http://www.nosql-database.org Eventually Consistent: http://www.allthingsdistributed.com/2007/12/ eventually_consistent.html Soft State: http://mercury.lcs.mit.edu/~jnc/tech/hard_soft.html Accuracy and Precision: http://en.wikipedia.org/wiki/Accuracy_and_precision Compare and Swap: http://en.wikipedia.org/wiki/Compare-and-swap Apache Hadoop: http://hadoop.apache.org Google MapReduce: http://labs.google.com/papers/mapreduce.html Google FS: http://labs.google.com/papers/gfs.html Apache Thrift: http://incubator.apache.org/thrift/ Protobuf: http://code.google.com/p/protobuf/ Google BigTable: http://labs.google.com/papers/bigtable.html Google Chubby: http://labs.google.com/papers/chubby.html
  43. 43. Questions? Nick Dimiduk - @xefyr Founder, Drawn to Scale nick@drawntoscalehq.com April 28, 2010

×