Your SlideShare is downloading. ×
0
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introduction to Hadoop, HBase, and NoSQL

22,103

Published on

3 Comments
15 Likes
Statistics
Notes
No Downloads
Views
Total Views
22,103
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
708
Comments
3
Likes
15
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide


  • I’m Not an RDBMS Guy!
  • squish the FUD
  • no central point of organization
    no committee or standardizing body
    no plan/strategy/illuminati to take down the RDBMS; lots of "in-fighting"
  • central tenant - there IS NO one-size-fits-all
    unlike RDBMS assumptions, each engineering effort must be evaluated for data needs

  • is it “anti-RDBMS”?
  • not so much

  • will not magically solve all your data or performance problems
    applications won’t magically stop crashing, data corruption, etc.
    Big Data is still hard. These tools make it possible/affordable/approachable

  • data persistence comes down to garantees
  • why are we here?
  • "web scale"
    more users, content, connections
    more trends, insight, knowledge

  • Atomicity: fault-tolerance is moving to the application layer - smaller atomic units
    Consistency: yes! but not necessarily immediate - "availability" (latency, reads) is more important.
    Isolation: smaller atomic units (multi-step transaction vs. compare-and-swap), greater availability, denormalization => reduced dependency on isolation
    Durability: some things are more important that getting every last detail, i.e. latency of response, view in aggregate

  • Basically Available: is the data layer up or not? are we serving content to our users or not?
    Soft State: shifting burden of "correctness" up to application layer. availability is more important than precision. accuracy (correct) vs. precision (repeatable).
    Eventual Consistency: all operations are recorded and ordered. played back as resources permit.

  • agile dev moves too fast for schema and constraints - this isn’t waterfall
    data models change quickly
    up-front schema modeling is akin to waterfall development - not always practical/feasible/possible
    data is messy - record what you have and leave constraints up to the application

  • at scale, data services look like a DHT anyway!
    isolated independent services
    introduced caching layers
    partitioned data by logical and range boundaries.

  • webapp

  • app servers/session self-contained - load-balanced
    data’s in one spot - what do you do?

  • 37-signals approach - DHH “scaling is a good thing because scaling => users => $$$”
  • more users, more instances. easy!
  • doesn’t work for social applications:
    - users cannot interact
    - old MMO’s vs. new social games

  • redesign data server as “data services”
    separate independent logical components
  • knowing each service by name becomes “vexing”

  • configuration/logistical nightmare!

  • abstractions!
    wouldn’t it be nice if...

  • Distributed Computing Made Easy Less Hard

  • programming model/API for parallel computing
    Google's MapReduce paper
  • replicated, high throughput, fairly UNIX-y (not POSIX).
    Google FS Paper
  • Distributed Group Services - coordination, synchronization, configuration, naming.
    Google Chubby Paper
  • efficient, cross-language messaging
    Facebook/Apache Thrift
    Google Protobufs

  • Google BigTable
  • Addresses limitations of Raw M/R, HDFS access
  • request by key: vs. hdfs sequential reads
  • low-latency, ms response times vs. m/r high-latency
  • row/column concepts
    DHT semantics
    Java, ReST, thrift
  • Billions of rows, millions of columns


  • Transcript

    • 1. Nick Dimiduk - @xefyr Founder, Drawn to Scale nick@drawntoscalehq.com April 28, 2010
    • 2. Agenda what NoSQL is not motivation Hadoop HBase
    • 3. whoami Computer Science & Engineering at Ohio State: Artificial Intelligence, Programming Languages, Systems Engineering Applied Technical Systems: Hierarchical, non-relational data storage and analysis systems (no-sql before there was NoSQL). Information Retrieval, Wire Serialization/RPC (before there was Thrift/Avro), Data Visualization (GB's) Visible Technologies: Social Media Storage, Processing, Analytics. Monitoring, Engagement, Warehousing, and BI. (TB's) Drawn to Scale: Big Data Storage, Processing, Retrieval, Analytics (TB's, PB's)
    • 4. Agenda what NoSQL is not motivation Hadoop HBase
    • 5. What NoSQL is not. movement
    • 6. What NoSQL is not. movement - no ANSI NoSQL-2010 one-size-fits-all
    • 7. It’s not Anti-RDBMS
    • 8. It’s about Choice! http://www.flickr.com/photos/zakh/337938459/
    • 9. What NoSQL is not. movement - no ANSI NoSQL-2010 one-size-fits-all - it’s about choice silver bullet
    • 10. What NoSQL is not. movement - no ANSI NoSQL-2010 one-size-fits-all - it’s about choice silver bullet - guarantees are hard
    • 11. Agenda what NoSQL is not motivation Hadoop HBase
    • 12. motivation more, More, MORE Data!
    • 13. motivation more, More, MORE Data! ACID Burns
    • 14. motivation more, More, MORE Data! ACID Burns BASE is good enough
    • 15. motivation more, More, MORE Data! ACID Burns BASE is good enough Life’s too short
    • 16. motivation more, More, MORE Data! ACID Burns BASE is good enough Life’s too short
    • 17. “typical” application
    • 18. “typical” application Data Server Village People App Server
    • 19. growing pains Data Server Villages of People App Servers
    • 20. vertical partitioning Data Server Villages of People App Servers Data Server Villages of People App Servers
    • 21. vertical partitioning Data Server Villages of People Data Server Villages of People App Servers App Servers Data Server Villages of People Data Server Villages of People App Servers App Servers
    • 22. vertical partitioning Data Server Villages of People App Servers Data Server Villages of People App Servers
    • 23. “typical” application
    • 24. growing pains Data Servers Villages of People App Servers
    • 25. horizontal partitioning Villages of People
    • 26. horizontal partitioning Villages of People
    • 27. horizontal partitioning Villages of People Data Layer Application Layer
    • 28. Agenda what NoSQL is not motivation Hadoop HBase
    • 29. “open source, reliable, distributed computing”
    • 30. “open source, reliable, distributed computing”
    • 31. MapReduce - API for parallel computing
    • 32. MapReduce - API for parallel computing HDFS - distributed, replicated file system
    • 33. MapReduce - API for parallel computing HDFS - distributed, replicated file system ZooKeeper - distributed synchronization
    • 34. MapReduce - API for parallel computing HDFS - distributed, replicated file system ZooKeeper - distributed synchronization Avro - Data Serialization / RPC
    • 35. Agenda what NoSQL is not motivation Hadoop HBase
    • 36. structured, distributed database for your horizontally scalable FS
    • 37. structured, distributed database for your horizontally scalable FS
    • 38. random access
    • 39. random access real-time reads/writes
    • 40. random access real-time reads/writes simple API
    • 41. random access real-time reads/writes simple API big table
    • 42. references : http://www.nosql-database.org Eventually Consistent: http://www.allthingsdistributed.com/2007/12/ eventually_consistent.html Soft State: http://mercury.lcs.mit.edu/~jnc/tech/hard_soft.html Accuracy and Precision: http://en.wikipedia.org/wiki/Accuracy_and_precision Compare and Swap: http://en.wikipedia.org/wiki/Compare-and-swap Apache Hadoop: http://hadoop.apache.org Google MapReduce: http://labs.google.com/papers/mapreduce.html Google FS: http://labs.google.com/papers/gfs.html Apache Thrift: http://incubator.apache.org/thrift/ Protobuf: http://code.google.com/p/protobuf/ Google BigTable: http://labs.google.com/papers/bigtable.html Google Chubby: http://labs.google.com/papers/chubby.html
    • 43. Questions? Nick Dimiduk - @xefyr Founder, Drawn to Scale nick@drawntoscalehq.com April 28, 2010

    ×