• Save
NoSQL intro for YaJUG / NoSQL UG Luxembourg
Upcoming SlideShare
Loading in...5
×
 

NoSQL intro for YaJUG / NoSQL UG Luxembourg

on

  • 848 views

 

Statistics

Views

Total Views
848
Views on SlideShare
848
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NoSQL intro for YaJUG / NoSQL UG Luxembourg NoSQL intro for YaJUG / NoSQL UG Luxembourg Presentation Transcript

  • NOSQLBIG DATANEW DATABASESStuff. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Hey! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • I won’t doa demo. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • I’m @stevennorstevenn@outerthought.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Houston, we have a problem.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • We’re drowning. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Drowningin aSeaofData. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Mountains of Metadata.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • The firehose of UGC.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Still, we can’t makemuch sense of it. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • ... and we throw a lot of it away.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • We regardcontent as cost. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • But data is anopportunity. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Think about it. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • advertisementsIIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  • recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  • anything that sells IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • profile harvesting IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  • The future is fordatanerds. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Nerdy enough?IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
  • IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  • This is what BigData is about:new insights,new business. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • But first,some history. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • How did NOSQL happen? 2. simplification 1. standardization hierarchical databases IMS XMLDB RDBMS OODBMS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  • How did NOSQL happen? 4. rethinking the problem RDBMS NOSQL caching denormalisation sharding replication ... 3. pain IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • Numbers of scale http://qos.doubleclick.net/counters/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
  • Numbers of scale» Twitter does 12 M tweet displays» ... per second. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
  • Types of scaling» scaling for usage » scaling types of ops » volume of users » concurrent read » volume of data » concurrent write availability partioning replication consistency distributed systems IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  • ... anddistributedsystems areHARD. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 8 fallacies of distributed computing» The network is reliable.» Latency is zero.» Bandwidth is infinite. Peter Deutsch and James Gosling» The network is secure.» Topology doesnt change.» There is one administrator.» Transport cost is zero.» The network is homogeneous. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
  • Data. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Trend 1: Data size ExaBytes (10!") of data stored per year 9881000 Each year more and more digital data is created. Over t wo 750 years we create more digital data than all 623 the data created in history before that. 500 397 253 250 161 0 2006 2007 2008 2009 2010 Data source: IDC 2007 3 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
  • Trend 2: Connectedness Giant Global Graph (GGG) Over time data has evolved to Ontologies be more and more interlinked and connected. RDF Hypertext has links, Blogs have pingback, Tagging groups all related data Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 4 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
  • Trend 3: Semi-structure! Individualization of content • In the salary lists of the 1970s, all elements had exactly one job • In Or 15? lists of the 2000s, we need 5 job columns! Or 8? the salary! All encompassing “entire world views” • Store more data about each entity! Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”) 5 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
  • Trend 4: Architecture 1980s: Mainframe applications Application DB 6 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
  • Trend 4: Architecture 1990s: Database as integration hub Application Application Application DB 7 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
  • Trend 4: Architecture 2000s: (moving towards) Decoupled services with their own backend Application Application Application DB DB DB 8 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
  • Trend 4: Architecture 2000s: (moving towards) Decoupled services with their own backend Application Application Application DB DATA TIER DB DB 8 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
  • For years, wetried to squeezedata into aone-size-fits-allcontainer. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Also, the cost perspective IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 40
  • NOSQL,the DataLiberationFront(or: Polyglot Persistency) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Cambrian Explosion IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 42
  • Cambrian Explosion N-O-SQL IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 43
  • IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 44
  • The NOSQL footprint free-structured or sparse data NOSQL MongoDB CouchDB neo4j Cassandra available (complexity) simple operational HBase highly scalable and constraints ACID, SQL referential integrity, typed data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
  • Other axes of classification» Data Model» Consistency» Atomic test-and-set» Secondary indexes» Manageability» Latency vs. Durability» Read vs. Write Performance» Dynamic Scaling» Auto failover» Compression Support» Range Scanning» Failure Scenarios IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 46
  • Data Model» Key/Value» Document» Row Stores with Column Families» Graphs IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 47
  • Other axes of classification» Data Model» Consistency» Atomic test-and-set» Secondary indexes» Manageability» Latency vs. Durability» Read vs. Write Performance» Dynamic Scaling» Auto failover» Compression Support» Range Scanning http://huanliu.wordpress.com/2011/01/21/» Failure Scenarios dimensions-to-use-to-compare-nosql-data-stores/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 48
  • Hire a goodconsultant.(or become one, like Xebia, SFEIR,Cloudera ...) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Data Processing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 51
  • IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 52
  • Map + Reduce IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Hadoop: HDFS + MapReduce» single filesystem + single execution-space IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 54
  • Processing large datasets with MR» Benefit from parallellisation» Less modelling upfront (ad-hoc processing)» Compartmentalized approach reduces operational risks (aka robustness)» AsterData et al. have SQL/MR hybrids for huge-scale BI IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 55
  • LILYIIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 56
  • Cloud-scalecontentstorage & search IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • LILY +IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 125 58
  • Lily» provides scalable storage» and scalable search» with a fault-tolerant, distributed architecture» automated index maintenance» versioning, rich data types, Java+REST API» based on HBase (NOSQL) and SOLR (Lucene) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 59
  • Choosing a NoSQL store for Lily: step I» automatic scaling to large data sets» fault-tolerance» flexible datamodel with sparse data» commodity hardware» efficient random access» community-based open source» Java if possible IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 60
  • Choosing a NoSQL store for Lily: step II» need for consistency» atomic single-row updates» M/R for index regeneration IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 61
  • Choosing a NoSQL store for Lily: step III HBase» datamodel with column families and cell versioning» ordered tables with range scans» HDFS for blob storage» Apache IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 62
  • Lily» scales to infinity, and beyond» open source » Apache license (no strings attached) » Java and REST API» www.lilyproject.org» subscription- and partnership-based business model IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 63
  • distributed process coordination and configuration (ZooKeeper) } query update indexer Lily Lily Lily Store Server store client node WAL MQ M/R client } store node 2ary WAL / HBase Region Server documents indexes MQ client store node } Hadoop DFS REST index replica inverted index replica replica } SOLRlily simplified architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 64
  • Key lessons learned» unlearning normalization is very difficult» integrity checking in code = not so bad» doing joins in code can be very liberating» importance of keyspace design » secondary indexing» data de-normalization = size! (x3)» schema vs. code flexibility?» distribution is everywhere and you shouldn’t forget about it IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 65
  • Pssst. :-)If you absolutely, positively want to see ademo, go check http://outerthought.blip.tv/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Reading material» Amazon Dynamo, Google BigTable, CAP» http://nosql.mypopescu.com/» http://nosql-database.org/» http://twitter.com/nosqlupdate» http://highscalability.com/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 67
  • We’re growing We’re hiring IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org