Yes, SQL!

            Mickey Alon
> SELECT * FROM qcon.speakers WHERE
  name=„Mickey Alon‟
+-----------------------------------------------------+
| Name      | Company    | Role            | Twitter |
+-----------------------------------------------------+
| Mickey Alon | GigaSpaces | Director | @mickey_alon |
+-----------------------------------------------------+



> db.speakers.find({name:”Mickey Alon”})
{
    “name”:”Mickey Alon”,
    “company”: {
                  name:”GigaSpaces”,
                  products:[“XAP”, “IMDG”]
                  domain: “In memory data grids”
               }
    “role”:”director”,
    “twitter”:”@mickey_alon”
}


                                           ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Agenda
• SQL
  – What it is and isn’t good for

• NoSQL
  – Motivation & Main Concepts of Modern Distributed Data Stores
  – Common interaction models
        • Key/Value, Column, Document
        • NOT consistency and distribution algorithms

• One Data Store, Multiple APIs
  – (Really) brief intro to GigaSpaces
  – SQL challenges: Add-hoc querying, Relationships (JPA)


                                              ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
A FEW (MORE) WORDS ABOUT SQL


                 ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
SQL 

  • Transactional, consistent
  • Hard to Scale
    – Disk Based




                          ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
SQL

Static, normalized
data schema
• Don’t duplicate, use FKs




                             ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
SQL


 Add hoc query support
  Model first, query later




                         ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
SQL



  Standard
   Well known
   Rich ecosystem




                     ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
(BRIEF) NOSQL RECAP




             ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
NoSql (or a Naive Attempt to Define It)



A loosely coupled
collection of
non-relational
multiple data stores

                          ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
NoSQL – some key concepts
 • Takes care of data-scaling
 • Distributed by nature (mostly)
 • Can scale up-to thousands of nodes
 • Complements SQL – not replacing it




                                ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Few words about scaling




    scal     abl   e   (Up & Out)
                       CPU speed/Cores
         !
     Concurrency




                          ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
What are the options?
 • Hardware based
   – Use extreme hardware such as RAC - 99,999% uptime
   – Non intrusive – minimal change
   – Can you afford it? How far does it scale?

 • Software (NoSQL) based on commodity HW
   – Failures are more likely to happen (due to number of nodes)
   – Design for failure scenarios
       • Putting data in multiple nodes
       • Client support for transparent failover
   – Eventually consistent (CAP theorem)


                                             ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Why Now?
  The Big Data Era
   • Exponential Increase in data & throughput
      – We generate increasingly growing amount content
      – Data is being pushed to consumers
      – Caching ?

   • Software is delivered as service
      – 24/7 + schema evolution + agility = leading constraints ?

   • Competition is growing while price decrease




                                          ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
The Current Leading Data Models


Key / Value   Column                          Document
                                                   {
                                                       “name”:”uri”,
                                                       “ssn”:”213445”,
                                                       “hobbies”:[”…”,“…”],
                                                       “…”: {
                                                           “…”:”…”
                                                           “…”:”…”
                                                        }
                                                   }


                                                   {
                                                       { ... }
                                                   }

                                                   {
                                                       { ... }
                                                   }




                        ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Key/Value

• Have the key? Get the value
  – That’s about it when it comes to querying
  – Map/Reduce (sometimes)
  – Good for
     • cache aside (e.g. Hibernate 2nd level cache)
                                                                                  K1           V1
     • Simple, id based interactions (e.g. user profiles)                         K2           V2

• In most cases, values are Opaque                                                K3           V3
                                                                                  K4           V1




                                        ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Key/Value

Scaling out is relatively easy
(just hash the keys)
• Some will do that automatically for you
• Fixed vs. consistent hashing




                                 ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Key/Value

• Implementations:
  – Memcached, Redis, Riak
  – In memory data grids (mostly Java-based) started this way
     • GigaSpaces, Oracle Coherence, WebSphere XS,
       JBoss Infinispan, etc.




                                      ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Column Based




               ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Column Based

• Google’s BigTable / Amazon Dynamo
• One giant table of rows and columns
  – Column == pair (name and a value, sometimes timestamp)
  – Each row can have a different number of
    columns = flexible schema
  Table ->* Rows ->* Columns ->* Values




                                     ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Better query support

• Query on row key
  – Or column value (aka secondary index)

• Good for a constantly changing,
  (albeit flat) domain model
• Can joins and relations be
  replaced by map/reduce?



                                     ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Document

Think JSON
(or BSON, or XML)    {
                         “name”:”Lady Gaga”,
                         “ssn”:”213445”,
                         “hobbies”:[”Dressing up”,“Singing”],
                         “albums”:
                            [{“name”:”The fame”
                              “release_year”:”2008”},
             _id:1
                             {“name”:”Born this way”
             _id:2            “release_year”:”2011”}]
                     }
             _id:3
                     {
                         { ... }
                     }

                     {
                         { ... }
                     }




                                   ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Document

• Built-in support for hierarchal model
   – Arrays, nested documents

• Better support for ad hoc queries
Great power comes with great
   – MongoDB excels at this
responsibility!
• Very intuitive model
   – normally comes with restful and map/reduce API

• Flexible schema




                                            ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
What if you
didn’t have to choose?
                                    JPA




                  JDBC

                     ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
A Brief Intro to GigaSpaces

In Memory Data Grid
• With optional write behind to
  a secondary storage



                                             write-behind
                   JPA
    Multiple API

    Map/Reduce




                                  ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
A Brief Intro to GigaSpaces


    Transparent partitioning & HA
    • Fixed hashing based on a chosen
      property


                             Replication                                                 Replication



            Backup 1                             Primary 1             Primary 2                             Backup 2

      JAVA Virtual Machine                 JAVA Virtual Machine   JAVA Virtual Machine                 JAVA Virtual Machine




                                                                     ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
A Brief Intro to GigaSpaces


    Transactional (Like, ACID)
    • Local (single partition)
    • Distributed (multiple partitions)
    • Durability via memory replication




                                  ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Use the Right API for the Job

• Even for the same data…
  – POJO & JPA for Java apps with complex domain model
  – Document for a more dynamic view
  – Memcached for simple, language neutral
    data access
  – JDBC for:
     • Interaction with legacy apps
     • Flexible ad-hoc querying (e.g. projections)



                                       ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
SQL/JDBC – Query Them All

 Query may involve Map/Reduce
 • Reduce phase includes merging and sorting




                     JDBC


                            ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
SQL/JDBC – Things to Consider

• Unique and FK constraints are not practically
  inforceable
• Sorting and aggregation may be expensive
• Distributed transactions are evil
  – Stay local…




                               ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
Summary

• One API doesn’t fit all
  – Use the right API for the job

• Know the tradeoffs
  – Always ask what you’re giving up, not just what you’re
    gaining




                                      ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
THANK YOU!
      @mickey_alon
http://blog.gigaspaces.com




                 ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved

NoSql-YesSQL mickey alon

  • 1.
    Yes, SQL! Mickey Alon
  • 2.
    > SELECT *FROM qcon.speakers WHERE name=„Mickey Alon‟ +-----------------------------------------------------+ | Name | Company | Role | Twitter | +-----------------------------------------------------+ | Mickey Alon | GigaSpaces | Director | @mickey_alon | +-----------------------------------------------------+ > db.speakers.find({name:”Mickey Alon”}) { “name”:”Mickey Alon”, “company”: { name:”GigaSpaces”, products:[“XAP”, “IMDG”] domain: “In memory data grids” } “role”:”director”, “twitter”:”@mickey_alon” } ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 3.
    Agenda • SQL – What it is and isn’t good for • NoSQL – Motivation & Main Concepts of Modern Distributed Data Stores – Common interaction models • Key/Value, Column, Document • NOT consistency and distribution algorithms • One Data Store, Multiple APIs – (Really) brief intro to GigaSpaces – SQL challenges: Add-hoc querying, Relationships (JPA) ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 4.
    A FEW (MORE)WORDS ABOUT SQL ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 5.
    SQL  • Transactional, consistent • Hard to Scale – Disk Based ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 6.
    SQL Static, normalized data schema •Don’t duplicate, use FKs ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 7.
    SQL Add hocquery support  Model first, query later ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 8.
    SQL Standard  Well known  Rich ecosystem ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 9.
    (BRIEF) NOSQL RECAP ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 10.
    NoSql (or aNaive Attempt to Define It) A loosely coupled collection of non-relational multiple data stores ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 11.
    NoSQL – somekey concepts • Takes care of data-scaling • Distributed by nature (mostly) • Can scale up-to thousands of nodes • Complements SQL – not replacing it ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 12.
    Few words aboutscaling scal abl e (Up & Out) CPU speed/Cores ! Concurrency ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 13.
    What are theoptions? • Hardware based – Use extreme hardware such as RAC - 99,999% uptime – Non intrusive – minimal change – Can you afford it? How far does it scale? • Software (NoSQL) based on commodity HW – Failures are more likely to happen (due to number of nodes) – Design for failure scenarios • Putting data in multiple nodes • Client support for transparent failover – Eventually consistent (CAP theorem) ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 14.
    Why Now? The Big Data Era • Exponential Increase in data & throughput – We generate increasingly growing amount content – Data is being pushed to consumers – Caching ? • Software is delivered as service – 24/7 + schema evolution + agility = leading constraints ? • Competition is growing while price decrease ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 15.
    The Current LeadingData Models Key / Value Column Document { “name”:”uri”, “ssn”:”213445”, “hobbies”:[”…”,“…”], “…”: { “…”:”…” “…”:”…” } } { { ... } } { { ... } } ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 16.
    Key/Value • Have thekey? Get the value – That’s about it when it comes to querying – Map/Reduce (sometimes) – Good for • cache aside (e.g. Hibernate 2nd level cache) K1 V1 • Simple, id based interactions (e.g. user profiles) K2 V2 • In most cases, values are Opaque K3 V3 K4 V1 ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 17.
    Key/Value Scaling out isrelatively easy (just hash the keys) • Some will do that automatically for you • Fixed vs. consistent hashing ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 18.
    Key/Value • Implementations: – Memcached, Redis, Riak – In memory data grids (mostly Java-based) started this way • GigaSpaces, Oracle Coherence, WebSphere XS, JBoss Infinispan, etc. ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 19.
    Column Based ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 20.
    Column Based • Google’sBigTable / Amazon Dynamo • One giant table of rows and columns – Column == pair (name and a value, sometimes timestamp) – Each row can have a different number of columns = flexible schema Table ->* Rows ->* Columns ->* Values ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 21.
    Better query support •Query on row key – Or column value (aka secondary index) • Good for a constantly changing, (albeit flat) domain model • Can joins and relations be replaced by map/reduce? ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 22.
    Document Think JSON (or BSON,or XML) { “name”:”Lady Gaga”, “ssn”:”213445”, “hobbies”:[”Dressing up”,“Singing”], “albums”: [{“name”:”The fame” “release_year”:”2008”}, _id:1 {“name”:”Born this way” _id:2 “release_year”:”2011”}] } _id:3 { { ... } } { { ... } } ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 23.
    Document • Built-in supportfor hierarchal model – Arrays, nested documents • Better support for ad hoc queries Great power comes with great – MongoDB excels at this responsibility! • Very intuitive model – normally comes with restful and map/reduce API • Flexible schema ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 24.
    What if you didn’thave to choose? JPA JDBC ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 25.
    A Brief Introto GigaSpaces In Memory Data Grid • With optional write behind to a secondary storage write-behind JPA Multiple API Map/Reduce ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 26.
    A Brief Introto GigaSpaces Transparent partitioning & HA • Fixed hashing based on a chosen property Replication Replication Backup 1 Primary 1 Primary 2 Backup 2 JAVA Virtual Machine JAVA Virtual Machine JAVA Virtual Machine JAVA Virtual Machine ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 27.
    A Brief Introto GigaSpaces Transactional (Like, ACID) • Local (single partition) • Distributed (multiple partitions) • Durability via memory replication ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 28.
    Use the RightAPI for the Job • Even for the same data… – POJO & JPA for Java apps with complex domain model – Document for a more dynamic view – Memcached for simple, language neutral data access – JDBC for: • Interaction with legacy apps • Flexible ad-hoc querying (e.g. projections) ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 29.
    SQL/JDBC – QueryThem All Query may involve Map/Reduce • Reduce phase includes merging and sorting JDBC ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 30.
    SQL/JDBC – Thingsto Consider • Unique and FK constraints are not practically inforceable • Sorting and aggregation may be expensive • Distributed transactions are evil – Stay local… ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 31.
    Summary • One APIdoesn’t fit all – Use the right API for the job • Know the tradeoffs – Always ask what you’re giving up, not just what you’re gaining ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved
  • 32.
    THANK YOU! @mickey_alon http://blog.gigaspaces.com ® Copyright 2010 Gigaspaces Ltd. All Rights Reserved

Editor's Notes

  • #6 Easy to handle
  • #7 Efficient in disk space
  • #8 You don’t have to think about “how do you need to fetch the data”
  • #12 Takes care of data scalbilty
  • #13 Touches programing model and hardware configuration
  • #14 Does your business allow this cost, why companies like google / facebook chose non HW!?
  • #19 Map/reduce