Introduction to Hadoop, HBase, and NoSQL

Nick Dimiduk - @xefyr
Founder, Drawn to Scale
nick@drawntoscalehq.com

April 28, 2010

Agenda

what NoSQL is not
motivation
Hadoop
HBase

whoami
Computer Science & Engineering at Ohio State:
Artiﬁcial Intelligence, Programming Languages, Systems
Engineering
Applied Technical Systems: Hierarchical, non-relational
data storage and analysis systems (no-sql before there was
NoSQL). Information Retrieval, Wire Serialization/RPC
(before there was Thrift/Avro), Data Visualization (GB's)
Visible Technologies: Social Media Storage, Processing,
Analytics. Monitoring, Engagement, Warehousing, and BI. (TB's)
Drawn to Scale: Big Data Storage, Processing, Retrieval,
Analytics (TB's, PB's)

What NoSQL is not.

movement - no ANSI NoSQL-2010
one-size-ﬁts-all

It’s about Choice!

http://www.ﬂickr.com/photos/zakh/337938459/

What NoSQL is not.

one-size-ﬁts-all - it’s about choice
silver bullet

What NoSQL is not.

one-size-ﬁts-all - it’s about choice
silver bullet - guarantees are hard

motivation
more, More, MORE Data!

motivation
ACID Burns

motivation
ACID Burns
BASE is good enough

motivation
ACID Burns
BASE is good enough
Life’s too short

“typical” application
Data Server Village People

App Server

growing pains
Data Server Villages of People

App Servers

vertical partitioning

App Servers


App Servers

vertical partitioning
Data Server Villages of People Data Server Villages of People

App Servers App Servers

Data Server Villages of People Data Server Villages of People

App Servers App Servers

growing pains
Data Servers Villages of People

App Servers

horizontal partitioning
Villages of People

horizontal partitioning
Villages of People

Data Layer Application Layer

“open source, reliable, distributed
computing”

MapReduce - API for parallel computing

HDFS - distributed, replicated ﬁle system

ZooKeeper - distributed synchronization

ZooKeeper - distributed synchronization
Avro - Data Serialization / RPC

structured, distributed database for your
horizontally scalable FS

random access
real-time reads/writes

random access
simple API

random access
simple API
big table

references
: http://www.nosql-database.org
Eventually Consistent: http://www.allthingsdistributed.com/2007/12/
eventually_consistent.html
Soft State: http://mercury.lcs.mit.edu/~jnc/tech/hard_soft.html
Accuracy and Precision: http://en.wikipedia.org/wiki/Accuracy_and_precision
Compare and Swap: http://en.wikipedia.org/wiki/Compare-and-swap
Apache Hadoop: http://hadoop.apache.org
Google MapReduce: http://labs.google.com/papers/mapreduce.html
Google FS: http://labs.google.com/papers/gfs.html
Apache Thrift: http://incubator.apache.org/thrift/
Protobuf: http://code.google.com/p/protobuf/
Google BigTable: http://labs.google.com/papers/bigtable.html
Google Chubby: http://labs.google.com/papers/chubby.html

Questions?

Nick Dimiduk - @xefyr
Founder, Drawn to Scale
nick@drawntoscalehq.com

April 28, 2010

Introduction to Hadoop, HBase, and NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to Hadoop, HBase, and NoSQL

Similar to Introduction to Hadoop, HBase, and NoSQL (20)

Introduction to Hadoop, HBase, and NoSQL

Editor's Notes