Ricardo CardosoDownload doesn't work. Stupid file .key1 year ago
Are you sure you want to
I’m Not an RDBMS Guy!
squish the FUD
no central point of organization no committee or standardizing body no plan/strategy/illuminati to take down the RDBMS; lots of "in-fighting"
central tenant - there IS NO one-size-fits-all unlike RDBMS assumptions, each engineering effort must be evaluated for data needs
is it “anti-RDBMS”?
not so much
will not magically solve all your data or performance problems applications won’t magically stop crashing, data corruption, etc. Big Data is still hard. These tools make it possible/affordable/approachable
data persistence comes down to garantees
why are we here?
"web scale" more users, content, connections more trends, insight, knowledge
Atomicity: fault-tolerance is moving to the application layer - smaller atomic units Consistency: yes! but not necessarily immediate - "availability" (latency, reads) is more important. Isolation: smaller atomic units (multi-step transaction vs. compare-and-swap), greater availability, denormalization => reduced dependency on isolation Durability: some things are more important that getting every last detail, i.e. latency of response, view in aggregate
Basically Available: is the data layer up or not? are we serving content to our users or not? Soft State: shifting burden of "correctness" up to application layer. availability is more important than precision. accuracy (correct) vs. precision (repeatable). Eventual Consistency: all operations are recorded and ordered. played back as resources permit.
agile dev moves too fast for schema and constraints - this isn’t waterfall data models change quickly up-front schema modeling is akin to waterfall development - not always practical/feasible/possible data is messy - record what you have and leave constraints up to the application
at scale, data services look like a DHT anyway! isolated independent services introduced caching layers partitioned data by logical and range boundaries.
webapp
app servers/session self-contained - load-balanced data’s in one spot - what do you do?
37-signals approach - DHH “scaling is a good thing because scaling => users => $$$”
more users, more instances. easy!
doesn’t work for social applications: - users cannot interact - old MMO’s vs. new social games
redesign data server as “data services” separate independent logical components
knowing each service by name becomes “vexing”
configuration/logistical nightmare!
abstractions! wouldn’t it be nice if...
Distributed Computing Made Easy Less Hard
programming model/API for parallel computing Google's MapReduce paper
replicated, high throughput, fairly UNIX-y (not POSIX). Google FS Paper
Distributed Group Services - coordination, synchronization, configuration, naming. Google Chubby Paper
efficient, cross-language messaging Facebook/Apache Thrift Google Protobufs
Google BigTable
Addresses limitations of Raw M/R, HDFS access
request by key: vs. hdfs sequential reads
low-latency, ms response times vs. m/r high-latency
Introduction to Hadoop, HBase, and NoSQLPresentation Transcript
Nick Dimiduk - @xefyr
Founder, Drawn to Scale
nick@drawntoscalehq.com
April 28, 2010
Agenda
what NoSQL is not
motivation
Hadoop
HBase
whoami
Computer Science & Engineering at Ohio State:
Artificial Intelligence, Programming Languages, Systems
Engineering
Applied Technical Systems: Hierarchical, non-relational
data storage and analysis systems (no-sql before there was
NoSQL). Information Retrieval, Wire Serialization/RPC
(before there was Thrift/Avro), Data Visualization (GB's)
Visible Technologies: Social Media Storage, Processing,
Analytics. Monitoring, Engagement, Warehousing, and BI. (TB's)
Drawn to Scale: Big Data Storage, Processing, Retrieval,
Analytics (TB's, PB's)
Agenda
what NoSQL is not
motivation
Hadoop
HBase
What NoSQL is not.
movement
What NoSQL is not.
movement - no ANSI NoSQL-2010
one-size-fits-all
It’s not Anti-RDBMS
It’s about Choice!
http://www.flickr.com/photos/zakh/337938459/
What NoSQL is not.
movement - no ANSI NoSQL-2010
one-size-fits-all - it’s about choice
silver bullet
What NoSQL is not.
movement - no ANSI NoSQL-2010
one-size-fits-all - it’s about choice
silver bullet - guarantees are hard
Agenda
what NoSQL is not
motivation
Hadoop
HBase
motivation
more, More, MORE Data!
motivation
more, More, MORE Data!
ACID Burns
motivation
more, More, MORE Data!
ACID Burns
BASE is good enough
motivation
more, More, MORE Data!
ACID Burns
BASE is good enough
Life’s too short
motivation
more, More, MORE Data!
ACID Burns
BASE is good enough
Life’s too short
“typical” application
“typical” application
Data Server Village People
App Server
growing pains
Data Server Villages of People
App Servers
vertical partitioning
Data Server Villages of People
App Servers
Data Server Villages of People
App Servers
vertical partitioning
Data Server Villages of People Data Server Villages of People
App Servers App Servers
Data Server Villages of People Data Server Villages of People
App Servers App Servers
vertical partitioning
Data Server Villages of People
App Servers
Data Server Villages of People
App Servers
“typical” application
growing pains
Data Servers Villages of People
App Servers
horizontal partitioning
Villages of People
horizontal partitioning
Villages of People
horizontal partitioning
Villages of People
Data Layer Application Layer
Agenda
what NoSQL is not
motivation
Hadoop
HBase
“open source, reliable, distributed
computing”
“open source, reliable, distributed
computing”
MapReduce - API for parallel computing
MapReduce - API for parallel computing
HDFS - distributed, replicated file system
MapReduce - API for parallel computing
HDFS - distributed, replicated file system
ZooKeeper - distributed synchronization
MapReduce - API for parallel computing
HDFS - distributed, replicated file system
ZooKeeper - distributed synchronization
Avro - Data Serialization / RPC
Agenda
what NoSQL is not
motivation
Hadoop
HBase
structured, distributed database for your
horizontally scalable FS
structured, distributed database for your
horizontally scalable FS
random access
random access
real-time reads/writes
random access
real-time reads/writes
simple API
random access
real-time reads/writes
simple API
big table
references
: http://www.nosql-database.org
Eventually Consistent: http://www.allthingsdistributed.com/2007/12/
eventually_consistent.html
Soft State: http://mercury.lcs.mit.edu/~jnc/tech/hard_soft.html
Accuracy and Precision: http://en.wikipedia.org/wiki/Accuracy_and_precision
Compare and Swap: http://en.wikipedia.org/wiki/Compare-and-swap
Apache Hadoop: http://hadoop.apache.org
Google MapReduce: http://labs.google.com/papers/mapreduce.html
Google FS: http://labs.google.com/papers/gfs.html
Apache Thrift: http://incubator.apache.org/thrift/
Protobuf: http://code.google.com/p/protobuf/
Google BigTable: http://labs.google.com/papers/bigtable.html
Google Chubby: http://labs.google.com/papers/chubby.html
Questions?
Nick Dimiduk - @xefyr
Founder, Drawn to Scale
nick@drawntoscalehq.com
April 28, 2010
Let LinkedIn power your SlideShare experience
+
Let LinkedIn power your SlideShare experience
Customize SlideShare content based on your interests
We will import your LinkedIn profile and you will be visible on SlideShare.
Keep up to date when your LinkedIn contacts post on SlideShare
Stupid file .key 1 year ago