Oracle vs NoSQL – The good, the bad and the ugly
Upcoming SlideShare
Loading in...5
×
 

Oracle vs NoSQL – The good, the bad and the ugly

on

  • 897 views

A good understanding of NoSQL database technologies that can be used to support a Big Data implementation is essential for today’s Oracle professional. This was discussed in detail in a 2 hour ...

A good understanding of NoSQL database technologies that can be used to support a Big Data implementation is essential for today’s Oracle professional. This was discussed in detail in a 2 hour deep-dive technical session at COLLABORATE 2014 - The Oracle User Group Conference. In this slide deck, you will learn what Big Data brings to the table as well as the concepts behind the underlying NoSQL data stores, in comparison to its ancestor you know well - the Oracle RDBMS. We will determine where and how to employ these NoSQL data stores effectively as well as point out some of the issues that you will have to think through (and prepare for) before your organization rushes headlong into a “Big Data” implementation. We will look specifically at MongoDB, CouchBase and Cassandra in this context. At the end of the session, we will provide pointers and links to help the audience take the next step in learning about these technologies for themselves

Statistics

Views

Total Views
897
Views on SlideShare
879
Embed Views
18

Actions

Likes
2
Downloads
42
Comments
0

2 Embeds 18

https://www.linkedin.com 16
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Oracle vs NoSQL – The good, the bad and the ugly Oracle vs NoSQL – The good, the bad and the ugly Presentation Transcript

    • REMINDER Check in on the COLLABORATE mobile app Oracle vs. NoSQL The good, the bad and the ugly John Kanagaraj Member of Technical Staff, PayPal Database Engineering, An eBay Inc. company
    • Housekeeping ■  Check the font sizes ▪  Can you read this at the back of the room? ▪  Can you read this at the back of the room? ▪  Just kidding! ■  Silence your Phones! ■  Q & A : Ask as we go along (and I will repeat the question) ▪  Keep it relevant to the slide at hand ▪  I might defer the question to a later slide if I believe it is addressed later ▪  If it gets too long, I humbly request we deal with it after the break or after the session ■  It is a long day, so if you nod off it is ok (hopefully no snoring!)
    • Agenda ■  Big Data – What it is, why should we care ■  NoSQL – What it is, and why do we need it ■  Concepts you need to understand ▪  CAP Theorem (and why it is important) ▪  Unstructured Data ▪  Sharding and Replication ▪  Data Modeling in the brave new world of NoSQL ■  Introduction to some popular NoSQL stores ■  A look into the (immediate) future: Moving forward
    • Not on the Agenda ■  Not a Tutorial on various NoSQL datastores ■  NotAnInstallationGuide ■  NotAnAdministrationManual ■  If you already know the CAP Theorem and NoSQL: ▪  I will be covering the basics (so you know!) ▪  We are all here to share and learn: Maybe I can learn from your questions/inputs (time and context permitting) ▪  Let’s talk after the talk (or during the break)
    • Speaker Qualifications ■  Currently Database Engineer @ PayPal ■  Has been working with Oracle Databases and UNIX for too many years J ■  Author and Technical editor ■  Frequent speaker at OOW, IOUG COLLABORATE and regional OUGs ■  Oracle ACE ■  Contributing Editor, IOUG SELECT Journal ■  Loves to mentor new speakers and authors! ■  http://www.linkedin.com/in/johnkanagaraj
    • Big Data
    • Big Data – The Why ■  2.5 quintillions of data is generated every day ▪  (1 quintillion = 1018 Bytes): so that is ~= 2.3 Trillion GB ▪  Humans (using devices) as well as Machines (IoT) —  Location data emitted by your smart phone —  “Web-scale” Webserver logs and interactions —  Sensor data emitted by almost every networked device: E.g. Cars’ fuel/pressure gauges, Personal fitness devices (wearables) —  Multi-media sources: Security cameras, Face/Plate recognition —  Data that matters to you: Medical, Scientific, Weather ▪  Lots of value in this data, but mostly untapped ▪  Most of this is never stored: Too big to store, but not too big to understand J
    • Big Data – The Why ■  Plummeting cost of technology ▪  Storage Cost/GB – 1980 : $437,500, 2013 : $0.05 ▪  Computing Cost – Moore’s law ▪  Network transportation Cost – WiFi, BLE, etc. ■  What is driving this? ▪  Cheaper to store data than to delete/ignore it ▪  Minimal cost to generate, transport and store ▪  Ubiquity of network, storage and data generation ▪  Accelerating advances in science and technology ▪  Machine learning and intelligence is growing Source for storage cost: http://www.statisticbrain.com/average-cost-of-hard-drive-storage/
    • Big Data – The Why Infographic:  h.p://www.ibmbigdatahub.com/infographic/four-­‐vs-­‐big-­‐data  
    • Big Data Characteristics: 4 V’s + 1 ■  Volume – Scale at which data is generated ▪  Cannot be stored using traditional methods ▪  Cannot be stored in a monolithic store ■  Variety – Different forms of data ▪  Big Data is usually not structured; structure not known in advance; structure not controlled by consumer ▪  May not always be in text form (more than just binary) ■  Velocity – Data arrives in a continuous stream ▪  Multiple, varied source produce data continuously ▪  Peaks and bursts unpredictable ▪  “Always on”: No down time for maintenance or re-orgs ▪  No “Known Users” – unpredictable, unknown patterns/scale
    • Big Data Characteristics: 4 V’s + 1 ■  Veracity – Uncertainty: Data is not always accurate ▪  Multiplicity of sources creates convergence of truth ▪  Eventual consistency (versus immediate consistency) ■  Value – Immediacy and hidden relationships ▪  In many use cases, value of Big Data declines quickly —  Traffic reports do not matter after 30 minutes —  Routing resupply trucks is counterproductive after the fact —  However, some historical value may be derived post the event ▪  Concept of “Near Line” data (neither fully online or offline) ▪  Easy to miss hidden relationships —  Most data sets are correlated to other data sets, implicitly or explicitly —  Not easy to detect due to volume and variety —  Mine data using various techniques (Data Science)
    • So how do we store this storm? ■  Big Data impossible to store using RDBMS ▪  Too big, too fast for RDBMS to ingest ▪  RDBMS needs “schema before write” ▪  Unknown structures = “schema during read” ■  So what is limiting RDBMS? ▪  ACID requirement drives “protection” mechanism ▪  Redo and Undo in Oracle provides ACID ▪  “Relational” imposes “schema before write” ▪  Easy to get “small bits”; hard to get “large pieces”
    • So how do we store this storm? ■  RDBMS’ are essentially ACID ▪  Atomic: Transactions fully succeeds or fully fails ▪  Consistent: Transactions moves the database from one consistent state to another ▪  Isolated: Transactions cannot interfere with each other ▪  Durable: Committed transactions persist even during failure ■  RDBMS Clusters = “Shared everything” for ACID ■  Atomicity in a distributed database: Two Phase commit ▪  Essential for splitting workload ▪  Reduction in availability though! ■  New concept! BASE (Basically Available, Soft state, Eventual Consistency)
    • Confiden=al  and  Proprietary  14   ■  Heap table with one or more “right growing” indexes −  Primary Key: Unique index on a NUMBER column −  Key value generated from an Oracle Sequence (NEXTVAL = 1) −  I.e. “monotonically” increasing ID value −  High rate of insert (> 5000 inserts/second) from multiple sessions −  Multiple indexes, typically leading date/time series or mono-valued −  E.g. Oracle E-Business Suite’s FND_CONCURRENT_REQUESTS ■  Here’s the Problem: −  All INSERTing sessions need one particular index block in CURRent mode (as well as one particular data block in CURRent mode) −  Question: Would you use RAC to scale out this particular workload? A common scalability inhibitor
    • Confiden=al  and  Proprietary  15   ■  Here’s what happens to accommodate the INSERT −  Assume the current value of the PK is 100, and NEXTVAL = 1 −  Assume we have ‘N’ sessions simultaneously inserting into that table −  Session 1 needs to update the Index block (add the Index entry for 100) −  Session 2 wants the same block in CURRent mode (add another entry for 101; needs the same block because the entry fits in the same block) −  Session 3… N also want the same block in CURRent mode at the very same time (as all sessions will have “nearby” values for index entry) −  Block level pins/unpins (+ lots of other work – Redo/Undo) required…. −  Same memory location (SGA buffer for Index block) accessed −  Smaller but still impacting work for buffer for Data block −  Rate of work constrained by CPU speed and RAM access speeds A quick deep dive
    • Confiden=al  and  Proprietary  16   ■  What if you use RAC to “scale out” this workload? −  Assume “N” sessions simultaneously inserting from 2 RAC nodes (2xN) −  In addition to previously described work, you need to −  Obtain the Index block from remote node in CURRent mode −  Session 1 (Node 1) updates Index block with value 100 −  Session 2 (Node 2) requests block in CURRent mode (value 101) −  LMS processes on both nodes churn CPU co-ordinating messages and block transfers back and forth on the interconnect −  Flush redo changes to disk on Node 1 before shipping CURRent block to Node 2 (gated by RedoWriter response!!!) −  Sessions block on “gc current <state>” waits during this process −  CPU, Redo IO, Interconnect, LMS/LMD processes involved A quick deep dive
    • Confiden=al  and  Proprietary  17   ■  Some solutions −  Spread the pain for the right growing index −  Use Reverse Indexes (cons: Range scan not possible) −  Use Hash partitioned indexes (cons: All partitions probed for Range scan, Need Partitioning Option, Additional administration) −  Prefix RAC node # (or some identifier per node) to key −  Use a modified key: Use Java UUID, Other distinct prefix/suffixes −  Use Range-Hash Partitioned tables with Time based ID as key −  E.g. Epoch Time (# of seconds from Jan 1, 1970) + Sequence value for lower bits −  Enables Date/Time based partitioning key −  Unique values allow Local Index to be unique A quick deep dive
    • Relaxing ACID – Skip the Redo/Undo ☺ ■  BASE Model ▪  “In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability” ▪  Proposed by Dan Pritchett (eBay) in 2008 ▪  ACID is pessimistic; enforces consistency at the end of a transaction ▪  BASE is optimistic; accepts eventual consistency ▪  Supports partial failure without total failure ■  Enabled new paradigms ▪  New patterns for distributing workload emerges —  Sharding and Replication —  Less than perfect (but good enough) consistency
    • A New Beginning - NoSQL ■  A new dawn emerges… ▪  Brewer proposes CAP theorem (2000) ▪  Google creates BigTable (~ 2006) ▪  Amazon creates Dynamo (~ 2007) ▪  eBay shards over Oracle Databases (2008) ▪  Inspires a new set of alternate data storage projects ▪  NoSQL databases start appearing… (~2008 – 2010) ▪  Becomes a buzz word (~ 2011 – 2013) ■  Now we all want “in”… ■  Picture courtesy Kamran Agayev via Twitter
    • So What is NoSQL? ■  NoSQL – supposed to be “No SQL”, but it is NOT ■  NoSQL – Loosely it is “Not Only SQL” (i.e. NOSQL) ▪  Term coined by Eric Evans (developer at Rackspace) ▪  Adopted by Johan Oskarrson (another developer) ▪  For a meetup of like minds at SF, 2009 ▪  Meetup for “open-source, distributed, nonrelational databases” [Voldemort, Cassandra, CouchDB, MongoDB, etc.] ■  NoSQL does not mean there is no “SQL-Like” interface ▪  Cassandra supports CQL (Cassandra Query Language) ■  NoSQL does NOT always mean Big Data ▪  But Big Data stores are almost always NoSQL based ▪  That is, if you count Hadoop as a NoSQL datastore * * See: http://wiki.apache.org/hadoop/HadoopIsNot
    • A small diversion: The Hadoop ecosystem ■  Let’s understand Hadoop vs. the Rest ■  Hadoop – The real Big Data Store ▪  Real Big platform to store data ▪  Store almost anything and everything ▪  Key components of Hadoop: —  HDFS: A unified file system that combines all storage in the cluster —  MapReduce: A programming model to handle large data sets —  An extensile ecosystem: Other components to control, schedule and manage processing and the cluster ▪  Is NOT a database (although there is HBase)…. ▪  But supports SQL-like interface using Hive ▪  Not really meant for Online, Web-site facing implementation
    • A small diversion: The Hadoop ecosystem
    • Big Data / NoSQL Landscape From http://www.bigdata-startups.com/open-source-tools/
    • Why NoSQL? ■  Impedance Mismatch ▪  Real world data does not naturally posses structure ▪  A “Person” has many variable characteristics ▪  Applications deal with a “person” object ▪  This is then a set of In-memory structures ▪  Relational Databases require structured table/columns though…. ▪  Thus, an “impedance mismatch” between Dev and DBA ▪  Which ORM’s try to bridge (the gap between Dev and DBA) —  Cultural mismatch: “Agile” (Dev) seems to be “Fragile” (for a DBA) —  Technical mismatch: “Objects” to “Relational Tables” —  Storage structure mismatch: “Un-/Semi-structured” to “Structured”
    • Why NoSQL? ■  Rapid “web-scale” growth for external entities/users ▪  Ability to support viral/burst traffic patterns ■  Most data does not (usually) need immediate consistency ▪  It is ok to lose some data; It is Ok not to have ACID ■  Commodity hardware and the Cloud ▪  RDBMS’ don’t run well on clusters (apologies: RAC world) —  Shared Disk clusters are both a SPOF and expensive! —  License costs for RDBMS on clusters —  Failure of one component brings everything down ▪  Clustering cheaper commodity hardware is economical —  Single or even a small number of failures affect a portion of workload, not the whole application (due to sharding) ▪  Easier to create a “cloud” with commodity hardware
    • Why NoSQL? ■  Open patterns ▪  Almost all NoSQL products is open-source ▪  Relatively open learning —  Meetups; Open seminars run by vendors —  Lively blogs and passionate contributors ▪  Quick-and-easy installs ▪  Community versions from vendors ▪  Easy to install on for-rent cloud environments ▪  Monitoring/Alerting through open frameworks (Nagios, Ganglia) ■  Enterprise support through vendor ▪  10gen for MongoDB; DataStax for Cassandra; CouchBase ▪  Cloudera, Hortonworks, MapR for Hadoop ■  Large Webscale companies building own NoSQL databases
    • NoSQL Characteristics ■  “Schema before write” vs. “Schema before read” ▪  Caters to “unstructured” need ▪  Primarily solves Impedance mismatch ▪  Creates its own challenges ■  Modeled by read and write patterns ▪  “customer and orders” together for a customer centric view ▪  “product and orders” for a production/supply-chain centric view ▪  Alternative: Store twice ■  Data modeling driven by physical storage model ■  Read patterns ▪  Secondary indexing (overheads) ▪  Brute-force access via MapReduce jobs ▪  Store multiple, denormalized copies (“disk is cheap”)
    • NoSQL Characteristics ■  ACID is “relaxed” ▪  A transaction is limited to an aggregate (k-v pair) ▪  Enables distributed, shared-nothing architectures ▪  Ideal for clustered deployments ▪  Optimistic locking ▪  Some loss of data and consistency is expected (and catered to) ■  Write patterns ▪  UPDATEs converted to INSERTs (timestamped/tombstoned) ▪  Time-To-Live (TTL) based DELETE’s/Purges ▪  Compaction based garbage collection ▪  Reduced Write latency due to memory only writes ▪  Transaction logging supported in some NoSQL stores
    • Why use an RDBMS then?! ■  ACID may be a hard business requirement ▪  Data loss can never be tolerated ▪  Data inconsistency can never be tolerated (e.g. Money movement) ■  Complex data models favor RDBMS ▪  Try modeling Oracle EBS in NoSQL J ■  Standardized interface via SQL ▪  Broadly same across all RDBMS ▪  Well understood, skills availability ■  Inter-application integration ▪  Single platform for data created it’s own ecosystem ■  Cost to change is prohibitive
    • Introducing the CAP Theorem ■  Eric Brewer’s conjecture at the July 2000 ACM Symposium ■  Formalized by Seth Gilbert and Nancy Lynch in 2002 ■  Any networked shared-data system can have at most two of three desirable properties: ▪  At least one Consistent (C) up-to-date copy of the data ▪  high Availability (A) of that data (for both reads and updates) ▪  tolerance to network Partitions (P) ■  Core systemic requirements in a distributed environment ▪  Special symbiotic relationship ▪  Present during design and deployment of applications in a distributed environment (whether acknowledged or not) ■  Applies well to the distributed NoSQL world
    • Components of the CAP Theorem ■  (C)onsistency ▪  All clients see the same results from a query, even in the presence of an update at the same time as the query ■  High (A)vailability ▪  All clients can write or access data, even in the presence of system failures. Requestors receive acknowledgment of success or failure ▪  Performance may degrade, but consuming applications are able to access data even though some parts of the system may not be operational at the time of a query ■  (P)artition Tolerance ▪  The system returns results regardless of failures in communication between partitions in the distributed system; i.e. system property holds true even if there is a network partition
    • General CAP Theorem
    • Illustrating the CAP Theorem (adapted) ■  You start a small business: Provide phone reminders/information ■  Customers call with information; You call back/respond to remind ■  Start small: All information written down in your (single) notebook ■  Business grows: Wife is recruited (scale out, PBX shards calls) ■  Inconsistency: Response misses info updated in Wife’s notebook ■  Resolve inconsistency: All notebooks updated when call ends (lock) ■  Wife’s day off: You leave sticky notes (Inconsistent until next day) ■  Wife fights with you: Network Partition (sticky notes thrown away) ■  You have a choice here: CAP Theorem in play – Pick two ▪  (C) Always provide consistent information to clients ▪  (A) Business is always open if at least one of you is present ▪  (P) Business is open even during a loss of communication between 2 ■  Run around clerk: Eventual consistency and Compaction
    • Examples of CAP Theorem pairs ■  Consistency and Partition Tolerance (CP): Banking Transaction at an ATM ▪  Data needs to be consistent in the presence of updates ▪  If there is a network failure, dispense cash but limit the transaction amount ▪  Transaction still available, but system property changed due to network partition ■  Consistency and Availability (CA): Database System-of-Record ▪  Data Consistency is key ▪  During is a network failure, clients stop writing (no redo), no write availability ▪  Present in Oracle Data Guard’s Maximum protection mode/Single node DB ■  Availability and Partition Tolerance (AP): Shopping cart in Amazon.com ▪  Spread data across multiple partitions to be always available ▪  Reconcile cart at checkout (may result in dual purchases!) ▪  Sacrifices consistency, but works for most cases, most of the time
    • CAP Theorem in the Oracle World ■  Application Scalability: Some well-known techniques ▪  Partition workload by function —  Schema level split: data unrelated to each other is segregated —  Typically provides headroom for main workload/environment ▪  Distribute transactions —  For related data that still needs to be viewed together —  Typically using Database links —  Typically for master lookups and remote writes —  Introduces dependencies (more on that soon) ▪  Decouple work asynchronously —  Use AQ to write tokens or keys to process later —  Introduces a “delay”: Data not immediately consistent
    • CAP Theorem in the Oracle World ■  Application Scalability: Some well-known techniques ▪  Partition workload by function —  Schema level split: data unrelated to each other is segregated —  Typically provides headroom for main workload/environment ▪  Distribute transactions —  For related data that still needs to be viewed together —  Typically using Database links —  Typically for master lookups and remote writes —  Introduces dependencies (more on that soon) ▪  Decouple work asynchronously —  Use AQ to write tokens or keys to process later —  Introduces a “delay”: Data not immediately consistent
    • CAP Theorem in the Oracle World ■  Application Scalability: Some well-known techniques ▪  Offload reads using Active Data Guard (DB 11g and above) ▪  DG copy opened for reads during Real Time Apply ▪  DG allows Redo Data shipping in 3 modes —  Maximum Protection: Zero loss but dependent on remote redo write —  Maximum Performance: Remote redo written asynchronously —  Maximum Availability: Switches to Max Performance mode on remote redo write failure, operates in Max protection mode otherwise ▪  Offers multiple shades of availability and protection ▪  ADG and “read your writes” pattern —  RTA apply is not equal to “instant” apply —  Not “immediately consistent” but “eventually consistent”
    • CAP Theorem in the Oracle World ■  Application Scalability: Some well-known techniques ▪  Offload reads using Active Data Guard (DB 11g and above) ▪  DG copy opened for reads during Real Time Apply ▪  DG allows Redo Data shipping in 3 modes —  Maximum Protection: Zero loss but dependent on remote redo write —  Maximum Performance: Remote redo written asynchronously —  Maximum Availability: Switches to Max Performance mode on remote redo write failure, operates in Max protection mode otherwise ▪  Offers multiple shades of availability and protection ▪  ADG and “read your writes” pattern —  RTA apply is not equal to “instant” apply —  Not “immediately consistent” but “eventually consistent”
    • CAP Theorem in the NoSQL World ■  Realization of CAP enabled NoSQL to “break free” ▪  Opened minds of database developers ■  However, the “2 of 3” rule was somewhat misleading ▪  NoSQL datastores offer options to vary consistency/durability and availability levels ▪  MongoDB has “Write Concern” – Unacknowledged, Acknowledged, Journaled, Replica Acknowledged ▪  Cassandra has Write Consistency: From ANY to ALL ■  Reality is a spectrum between C and A in the presence of P ▪  Eventual Consistency is a given ▪  Some data loss is expected ▪  Application code/other techniques will need to cater for this
    • Sharding and Replication in NoSQL ■  NoSQL datastores: essentially shared-nothing clusters ■  Relaxing ACID allows distributed processing (CAP applies!) ■  Ability to scale out reads/writes is the key ■  Achieved using two techniques: Sharding and Replication ■  Sharding: Divide and Rule ▪  Data is read/written to different servers (“shards”) ▪  Location determined applying a fixed function on a known key ▪  Different functions: Modulo, Hash, Range, Programmatic ▪  Efficacy of load balancing dependent on function and data ▪  Typically used for Write-scaling (more than Read-scaling) ▪  (Hash partitioned tables/indexes are essentially object level sharding in Oracle databases to enable write scaling)
    • Sharding and Replication in NoSQL ■  Sharding (contd.) ▪  Difficult, if not impossible to change function once implemented ▪  No consistency across shards, or across aggregates ▪  No joins allowed – no cross-shard dependencies ▪  Resilience does not improve (but enables partial availability) ▪  Not to be implemented lightly: Start single if you can ▪  Many NoSQL stores allow auto-sharding (e.g. CouchBase) ■  Replication: Allow multiple copies ▪  Master-Slave model: Simplest, Scales out reads only; Read resilience; May need to cater for eventual consistency ▪  Peer-to-Peer or Multi-Master model: Scales out reads and writes, but consistency/conflict resolution is a big problem ■  Can combine Sharding and Replication!
    • The NoSQL Datastore Landscape ■  Generally four types: ▪  Key-Value ▪  Document ▪  Column Family ▪  Graph ■  Not using the relational model, i.e. schema-less ▪  But not without a Data Model! ■  Runs on clusters of commodity hardware ■  Generally Open Source ■  Can be considered as storing/retrieving “aggregates” ▪  a collection of related objects that can be treated as a unit ■  Usually described by “Keys” and “Values” (i.e. K-V pairs)
    • Key-Value NoSQL stores ■  The most basic of NoSQL stores ■  Simple K-V structure: A “blob” of data (“Value”) indexed and accessed via a “Key” ■  “Value” part also known as Aggregate ■  Aggregate is a collection of related objects treated as a unit ■  Written/Updated/Read/Consistent as single, smallest unit ■  Typically, aggregate is limited in size (BLOB in Oracle) ■  Typically, expressed in JSON, and sometimes in XML ■  JSON/XML aggregates are self-describing ■  Value is “opaque” in a K-V store, but is simple ■  Scale out with sharding ■  Examples of K-V store: Riak, Oracle NoSQL
    • Key-Value NoSQL stores ■  Typical Use cases ▪  Shines when you need simple GET/PUT operations ▪  Session state; Tokens – Enables web-scale ▪  User profiles and preferences – Typically latent caching layer ▪  Latency bridge: Support RYOW’s in some cases ■  Anti-patterns ▪  No ad-hoc query patterns - (i.e. need key to access) ▪  Not meant for analytics type workload ▪  When multi-key/multi-operation consistency is required ▪  Set based operations (i.e. related data)
    • Document NoSQL stores ■  Datastore able to understand and manipulate structures ■  Needs to follow an agreed format ▪  usually JSON, but BSON, XML and YAML ■  Support for secondary indexes ▪  Needs ability to understand/index K-V pairs in the aggregate ▪  Secondary indexes may throttle write rate ■  Aggregate size usually limited ■  Scale-out again supported via sharding ▪  Some stores support multiple sharding methods (MongoDB) ■  K-V store sometimes evolve into Document stores ▪  E.g. CouchBase evolution ■  Needs embedding/linking support (size/other limitations)
    • Document NoSQL stores ■  Typical Use cases ▪  Of course, any collection of document-type models ▪  Easy-to-start NoSQL projects when moving from RDBMS ▪  Almost any NoSQL use case needing secondary index access ▪  Content and Metadata store: typically multiple keys ▪  Queries using materialized views (CouchBase) ▪  Non-trivial sharding (MongoDB) ▪  Horizontally scaled or Cached reads (MongoDB, CouchBase) ▪  Models requiring simple relationships (Blogs, User modeling) ■  Anti-patterns: ▪  Not a drop-in replacement for RDBMS ▪  Evolving relationships or query patterns ▪  Usually not good for write-heavy
    • Column Family NoSQL stores ■  Characteristics of CF Stores ▪  Data is mostly organized by sets of columns ▪  Key – Value based access ▪  “Value” consists of sets or ranges of columns ▪  Still unstructured ▪  No joins (except via another keyed table, using MapReduce) ■  Cassandra, Hbase, Amazon SimpleDB are prime examples ▪  HDFS on a Hadoop cluster underlies HBase ▪  HBase evolved from Google’s BigTable ▪  Cassandra evolved from Facebook ▪  Cassandra also supports CQL (a SQL like language)
    • Column Family NoSQL stores ■  Typical Use cases ▪  Data is mostly organized by sets of columns (super columns) ▪  Key – Value based access ▪  “Value” consists of sets of columns (but still unstructured) ▪  Lots of repeated sets of values (e.g. Customer transactions) ▪  No joins (except via another keyed table, using MapReduce) ▪  Write-intensive patterns (Internet-of-Things type data) ▪  Rolling expiry patterns such as Time series data ■  Anti-patterns ▪  IMHO Low-latency reads (in comparison to other NoSQL stores) ▪  Need access via secondary or other keys
    • Graph NoSQL stores ■  Stores Nodes and Edges ■  Provides “Index-free Adjacency” ■  Nodes are entities: People, Accounts, Items, Locations ■  Edges connect Nodes to other Nodes ■  Edges have properties ■  Can mine patterns present in these relationships ■  Supports graph-like queries: ▪  Shortest distance between two locations ▪  Social Graphing: Connecting people ▪  Products that your friends liked ■  Neo4j is a well-known graph database ■  Giraph: An open source graph processing systems (FB!)
    • Graph NoSQL stores ■  Typical Use Cases ▪  Social Graphs ▪  Recommendation Engines ▪  Graph transversal uses cases ▪  Relationships with defined end-points ▪  Routing and Location based solutions ▪  Account Linking (e.g. for fraud detection; peer risk checking) ■  Anti-patterns ▪  Scale out via sharding typically not supported in some products ▪  Update all/Update most patterns ▪  Dangling end-points
    • Some more concepts: JSON ■  You need to understand JSON ▪  Java Script Object Notation ▪  Self describing, English text key-value pairs ▪  In other words, a simpler version of XML ▪  No externally imposed structure (hint: No tab/column mapping!) { "id":101, ”first_name":”John", “second_name”:”Kanagaraj”, ”residential_address":[{“add1”:”20 First St”, "city":”San Jose”, “state”:”CA”}], “phone”:”408-555-9999” } ▪  Can you spot some optimization here?
    • Some more concepts: Languages ■  You need to understand JVMs and some Java ▪  Many NoSQL stores use JVM based programs ▪  E.g. Hadoop, Cassandra ▪  Ability to understand JVM’s and their internals is key ▪  JVM’s Garbage Collection needs to be managed ▪  Need to understand/configure JMX (Java Management Xtensions) ▪  Most NoSQL stores support Java API’s out of the box ■  Most NoSQL stores support more than just Java ▪  E.g. Python, Ruby, Perl, C/C++, Node.js, Go ▪  Less-well known ones such as Erlang, Haskell, Scala ▪  Need to able to install and troubleshoot app issues ■  Deploy/Management: Puppet, Nagios, Ganglia, Fab ▪  Frameworks can do more than just NoSQL!
    • MongoDB: Document datastore Client   MongoS   MongoS   MongoD   (Master)   MongoD   (Slave)   MongoD   (Slave)   MongoD   (Master)   MongoD   (Slave)   MongoD   (Slave)   MongoD   MongoD   Replica  Set  1   Replica  Set  2   1 3 2 •  Write  scaling   Sharding  through   MongoS   •  Read  scaling  via   Replica  sets   •  Writes  to  Master   Node,  reads  from   Master  and  Slave   nodes  (op=onal)   MongoD   Routers   Config  Servers   4
    • MongoDB: Data Modeling RDBMS   MongoDB   Database   Database   Table   Collec=on   Row   Document   RowID   _id     Index   Index   Join   Embedded   Document   (DBRef)   Foreign  Key   Reference   Order  ID:  1001   Customer:  John     Order  Line  Items:   20001  –  Tires  –  2  x  $84  -­‐  $168   45320  –  Pump  –  1  x  $54  -­‐  $54     Payment  Details:   Card:  Amex   CC:  3425268768   Exp:  03/17   Total:  $222   Order     Customer   Line  Items   Financial   Instrument   FinTrans   Journal   {  “order_id”:  “1001”,  “customer”:”John”,            “orderitems”:  [  {“prodid”:”20001”,  “prodname”:”Tires”,  “Qty”:2,  “price”:168},                                                                  {“prodid”:”45320”,  “prodname”:”Pump”,  “Qty”:1,  “price”:54}  ],          “pcard”:”Amex”,”pcc”:”3425268768”,”pexp”:”03/17”,”ord_tot”:222  }  
    • MongoDB: Essentials ■  Stands for “huMONGOus DataBase” ■  Reads and Writes using memory-mapped files ▪  Try and fit working set in memory ▪  Use SSDs for faster I/O ■  Very good index support on identified JSON fields ▪  Allows Key-Value, Range and text search queries ▪  Unique as well as Compound Indexes ▪  Special TTL (Time-to-Live) index to retire data ■  Stores documents in BSON format (Binary JSON) ■  Interact, manage, program through Mongo Shell ■  Many other drivers and interfaces ■  Support for Geospatial data and queries ■  Aggregation Framework and MapReduce support
    • MongoDB Physical/Memory Mapping
    • MongoDB: Essentials ■  Query optimizer exposes execution plan ■  Multiple sharding methods: ▪  Range-based sharding: Optimized for range queries ▪  Hash-based sharding: Ensure uniform distribution ▪  Tag-aware sharding: Partitioned by user-specified configuration ■  Write-ahead journaling ▪  Journal commits every 100ms (oplog is capped collection) ■  Configurable Write-availability via Write Concern ▪  Unacknowledged (memory only) ▪  Acknowledgement for specific levels: —  Write to at least 2 replicas in the same datacenter —  Write to at least 1 replica in remote datacenter ■  Commercially supported by 10gen (now called MongoDB)
    • MongoDB: The Not-so-good… ■  Reads block Writes (albeit for very short periods ~ microsecs) ▪  Be careful about aggregation/MapReduce: Intense reads ▪  Read lock yields when read has to go to disk ▪  Read locks can be shared by multiple readers ■  Writes block Reads (Writer-greedy, for very short periods) ■  Locks are at a “database” level ▪  Careful with your data model! ▪  Typically restrict one collection per database if possible ▪  Write to multiple documents will yield periodically ■  Index creation (writes) locks your entire database ■  Replicates to Slaves and locks all slaves in Replicaset ■  Compaction also locks the database ■  Secondaries block on replication writes
    • CouchBase – Another Document Store Couchbase Cluster" Multitenant Architecture" Server Nodes" User/applica=on  data   based  on  bucket  par==oning   Which  live  on   Data Buckets" Documents" Read/write  from/to   That  form  a   Clients   Servers   dynamically  scalable  
    • CouchBase Single-Node Architecture Replica=on,  Rebalance,     Shard  State  Manager   REST  management     API/Web  UI   8091   Admin  Console   Erlang  /OTP   11210  /  11211   Data  access  ports   Object-­‐managed   Cache   Storage  Engine   8092   Query  API   Query  Engine   hDp   Data  Manager   Cluster  Manager  
    • CouchBase: Background and Use cases ■  Created as a Merge of code and ideas: ▪  MemCache – An excellent memory only cache ▪  CouchDB – A Key-Value store ▪  Now a Persistent Cache ▪  Code in Erlang and C++ (??) ▪  Different ports for both products – now merging ▪  Lots of MemCache implementations ▪  Now can upgrade into CouchBase quickly – Moxi client ■  Primarily as a Caching solution ▪  Very fast for reads and writes ▪  Some concerns with cross data center replication ▪  IMHO - Not yet suited for RYOWs via secondary key
    • Cassandra: Column-Family datastore Node  1   Node  2   Node  3   Node  4   Node  5   Node  6   Client   •  Hash  func=on(Key)  =>  Token   •  Client  writes  to  selected  Node  as  per   Token   •  Coordinator  Node  replicates  to  other   nodes  (Timed  per  Quorum  selng)   •  Node  acknowledges  to  coordinator   •  Acknowledgement  to  client   •  Data  wri.en  to  internal  commit  log   •  If  node  goes  offline,  writes  stop   •  When  node  rejoins,  a  “hinted  handoff”   process  completes  the  pending  writes  +   “read  repair”   •  Requests  can  range  from  ANY  to  ALL   •  ANY:  Write  to  commit  log  on  at   least  1  node   •  ALL:  Writes  complete  to  memory   and  commit  log  on  ALL  replicas   •  Availability  precedes  Consistency  (AP)   •  Read  and  Write  Paths  are  separate  
    • Cassandra: Column-Family datastore (1)  Write:(K1,{C1:V1})   (2)  Write:(K1,{C2:V2})   (3)  Write:(K2,{C1:V3,C2:V4})   (4)  Write:(K1,{C1:V5,C3:V6})   K1   C1:V1   Memory   Disk   K1   C1:V1   C2:V2   K1   C2:V2   K2   C1:V3   C2:V4   K2   C1:V3   C2:V4   C1:V5   C3:V6   K1   C1:V5   C3:V6   Memtable   Commit  log   Index   K1   C2:V2  C1:V5   C3:V6   K2   C1:V3   C2:V4   SSTable  
    • Cassandra: Essentials ■  Write Path is simpler; Reads are a little more complex ▪  Merge Memtable (Row/Key cache) and Row Reads from Disk ▪  Uses Bloom Filter to decide which SSTables to skip (false +ive) ▪  In-memory caches are stored in Java heap (GC!!!!) ▪  Can return inconsistent data for RYOW (depending on Quorum) ▪  Consistent: (nodes_written + nodes_read) > replication_factor ■  Compaction: Merge SSTables; Expire Tombstoned data (TTL) ■  Data Modeling: ▪  Model your queries – Optimize for reads ▪  Denormalize – Reads: Slow; Writes: Fast; Disk: Cheap ▪  Column families are stored sorted by timestamp ■  CQL: Cassandra Query Language – A familiar interface ■  Maintaining the Cluster: Gossip and Snitch J
    • Choosing the right NoSQL database: ASCII the right question! ■  Is this a site-facing, P1 Application? ■  Is this a BI/Analytics type problem waiting to be solved? ■  Is this Write Intensive or Read Intensive? ■  Is this a Caching problem? ■  Can the application afford some data loss? ■  What about data consistency? ■  What is more important – consistency or availability? ■  How many data centers need to be supported? ■  What are the query patterns? Are they widely varying? ■  How many distinct clusters of data are present, and how are they related? ■  Is my organization ready to support this product?
    • Generic problems ■  Consistency is and will be a problem in the NoSQL world ■  Data loss will be present - application should cater to this ▪  Consider the cost of workarounds/cost of data loss ■  The world of NoSQL is evolving: ▪  Maturing slowly: Peak -> Sliding into the trough ▪  Too many choices: 150 choices: http://nosql-database.org/ ▪  Many picking the wrong product… —  (and had to change it later: Check my Delicious stream #nosql) ▪  Most NoSQL vendors still VC funded ▪  New Versions/Features every 6 months! ▪  We will learn lessons the hard way…..
    • Real World problems ■  Need to break out of the RDBMS/ACID world ▪  Imagine a world with no COMMITs, no “Transactions” ▪  Data loss and Data inconsistency is inevitable ▪  Data Owners/Architects shy away: FUDs, Real dangers ■  Everyone wants to become (or is!) a NoSQL expert ▪  Spell NoSQL and earn $$$ J ▪  Best way to learn: Create a “Big Data” need and fulfill it ▪  Who makes the decisions? ■  Lack of skills and maturity ▪  Product choice: Knowledge/Experience/Forethought required ▪  Many NoSQL products still basic in functionality ▪  Be prepared to back out of your initial choice
    • How to get there (from here)? ■  This presentation is just the beginning ■  Lots and lots of reading and experimenting required ■  Recommended Reading: ▪  NoSQL Distilled by Fowler and Sadalage ▪  Seven Databases in Seven Weeks: Redmond and Wilson ▪  Many NoSQL books – browse at Safari Online ■  Lots of links to read – Live links: ▪  Follow me on http://delicious.com/jkanagaraj - Tag #nosql ■  Play with the community versions: ▪  Available from the vendors: No support though ▪  Spin up/use Cloud based VMs – Rackspace or AWS
    • A warning – And some advice “Some people, when confronted with a big data problem, think, I’ll use Hadoop. Now, they have a big data problem and a big Hadoop cluster” Dmitry Ryaboy, Engineering Manager, Twitter ▪  Start small ▪  Grow with success ▪  Create your own expertise ▪  It is about the untapped potential in your data
    • Please  fill  in  the  feedback  form!   Link  up  with  me  on  LinkedIn   John  Kanagaraj,  PayPal,  an  eBay  Inc.  Company