Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data Platforms:
      An Overview
                          C. Scyphers
                   Chief Technical Architect

...
What Is “Big Data”?
• Big Data is not simply a huge pile of information

• A good starting place is the following thought:...
What Is A Big Data Platform?




 Putting it simply, it is any platform which supports
             those kind of large da...
It doesn’t have to be cutting edge technology.
Lots of legacy technologies can address the problem.
If only at a sizeable cost.
SQL
Of the new technologies, the most promising are from
                the “NoSQL” family.
What Is “NoSQL”?

SQL
A family of non-relational data storage technologies
Horizontal scalability
Distributed processing
Faster throughput
Usually with less cost than more traditional approaches.
Some of these
technologies
are new and
innovative
Others have been around for decades.
NoSQL Does Not Mean “SQL Is Bad”




When the trend was just starting, “NoSQL” was coined. It’s
unfortunate, because it im...
NoSQL Means “Not Only SQL”
        RELATIONAL




                                  RELATIONAL
                           ...
Why Won’t
 SQL Do?
Scale is very hard without ridiculous expense
SQL can get very complex, very quickly
Changing a schema for a large production system is
             both risky and expensive
Throughput can be a challenge
How Does
NoSQL Do It?
Scale is achieved through a shared-nothing architecture,
                  removing bottlenecks
Schemaless design means change becomes much less
          risky and significantly cheaper
Most solutions use simple RESTful interfaces
NoSQL is based upon a better understanding
 of data storage, usually referred to as the

           “CAP Theorem”
The CAP Theorem
        Grossly simplified (with apologies to Brewer):

A database can be
• Consistent (All clients see th...
CAP In Practice
• Consistent & Available (no Partition Tolerance)
  • Either single machines or single site clusters.
  • ...
CAP In Practice
• Consistent & Partition Tolerant (no Availability)
  • Some data may be inaccessible, but the remainder
 ...
CAP In Practice
• Available & Partition Tolerant (no Consistency)
  • Some data may be inaccurate; a conflict resolution
 ...
CAP From A Vendor POV
• C-A (no P) – this is generally how most
               RDBMS vendors operate

• C-P (no A) – this ...
ACID vs BASE
   Traditional Databases                        NoSQL Databases Tend
    Are ACID Compliant                  ...
SQL Strengths




Very well known technology
Very mature technology
Interoperability across vendors
Large talent pool from which to choose
Ad hoc operations common, if not encouraged
NoSQL Strengths




 Built to address massive scale
Through horizontal scalability
While remaining highly available
And handling unstructured data
NoSQL Pros/Cons
       Pros                         Cons

• Schema Evolution         • Querying the data is
• Horizontal S...
A Disclaimer Before We Continue
• I am not an expert on every possible Big Data
  Platform
• There are hundreds of them; t...
Flavors Of NoSQL

The major four divisions of NoSQL are:

•   Key-Value
•   Document Store
•   Columnar
•   Other
Key-Value
• At a very high level, key-value works essentially
  by pairing a index token (a key) with a data
  element (a ...
A Key-Value Example

“John Smith”, “100 Century Dr. Alexandria VA 22304”


“John Doe”, “16 Kozyak Street, Lozenets Distric...
Document Store
• Document stores extend the key-value paradigm
  into values with multiple attributes.
• The document valu...
A Document Store Example
“John Smith”, “<address><street>100 Century Dr.</street>
                        <city>Alexandria...
Columnar Family

• Usually has “rows” and “columns”
  • Or, at least, their logical equivalents
• Not a traditional, “pure...
The Others
• Hierarchical Databases
  • LDAP, Active Directory
• Graph Databases
  • Neo4j, Flock DB, InfiniteGraph
• XML
...
Key-Value Recap

 Pairing a index token (a key)
 with a data element (a value)
Key-Value Pro/Con
         Pros                                   Cons
•   Schema Evolution                •   Packing & u...
Where Did Key-Value
       Come From?

The concept is quite old, but most people trace the
lineage back to Amazon and the ...
Dynamo
Amazon devised the Dynamo engine as a way to
address their scalability issues in a reliable way.

• Communication b...
Eventually Consistent
• Rather than expending the runtime resources
  to ensure that all nodes are aware of a change
  bef...
Can I Use Dynamo?

  No. It’s an Amazon only internal product.
  However, AWS S3 is largely based upon it.


Amazon did an...
Riak
• Riak is a key-value database largely
  modeled after the Dynamo model.
• Open source (free) with paid support
  fro...
Riak Pro/Con
         Pros                                   Cons
•   All nodes are equal – no       •   Not meant for sma...
Riak Users
Redis
• Redis is a key-value in-memory datastore.
• Open source (free) with support from the
  community.

• Main claims t...
Redis Pro/Con
         Pros                                   Cons
•   Transactional support           •   Entirely in mem...
Redis Users
Voldemort
• Voldemort is a key-value in-memory database
  built by LinkedIn.
• Open source (free) with support from the
  ...
Voldemort Pro/Con
         Pros                                 Cons
•   Highly customizable – each    •   Versioning mean...
Voldemort Users
Key/Value
        “Big Vendors”
• Microsoft Azure Table Storage
• Oracle NoSQL
• BerkleyDB (Oracle)
Document Store Recap

Document stores store an index token
with a grouping of attributes in a semi-
         structured do...
Document Store Pro/Con
         Pros                                  Cons
•   Tends to support a more       •   The entir...
CouchDB
• CouchDB is a document store database.
• Open source (free), part of the Apache
  foundation with paid support av...
CouchDB Pro/Con
         Pros                                 Cons
•   Very simple API for           •   The simple API fo...
CouchDB Users
MongoDB
• MongoDB is a document store database.
• Open source (free) with paid support available
  from 10Gen.

• Main cla...
MongoDB Pro/Con
         Pros                                   Cons
•   Auto-sharding                   •   Does not supp...
MongoDB Users
Document Store
        “Big Vendors”
• Lotus Domino
Columnar Family Recap

• A key with many values attached
• Usually presenting as “rows” and “columns”
  • Or, at least, th...
Columnar Pro/Con
         Pros                                  Cons
•   Tend to have some level of     •   Is much less e...
Where Did Columnar
      Come From?

The concept has been around for a while, but most
 people trace the NoSQL lineage bac...
BigTable
Google devised the BigTable engine as a way to
address their search related scalability issues in a
reliable way....
Can I Use BigTable?

No. It’s a Google only internal product. However,
 quite a few open source products are built upon
  ...
Cassandra
• Cassandra is a hybrid of Big Table built on
  Dynamo infrastructure
• Open source (free), built by Facebook wi...
Cassandra Pro/Con
         Pros                                      Cons
•   Designed to span multiple         •   No joi...
Cassandra Users
HBase
• Hbase is a columnar database built on top of the
  Hadoop environment.
• Open source (free) with paid support from...
HBase Pro/Con
         Pros                                  Cons
•   Map/Reduce support             •   Secondary indexes...
HBase Users
Hadoop
• Hadoop is not a columnar store as such.
• Rather, Hadoop is a massively parallel data
  processing engine

• Main...
Hadoop Pro/Con
         Pros                                   Cons
•   While written in Java, almost   •   Large amounts ...
Hadoop Users




         Plus lots, lots more
Columnar “Big Vendor”

• EMC Greenplum
• Teradata Aster


 In so far as both of these solutions are grafting
Map/Reduce in...
Which One Do
            I Use Where?
•   Key-Value for (relatively) simple, volitile data
•   Document store for more com...
Questions?
@scyphers


            Additional Information At
http://www.daemonconsulting.net/BDC-FOSE-2012


Daemon Consulting, LLC
 ...
Upcoming SlideShare
Loading in …5
×

of

Big Data Platforms: An Overview Slide 1 Big Data Platforms: An Overview Slide 2 Big Data Platforms: An Overview Slide 3 Big Data Platforms: An Overview Slide 4 Big Data Platforms: An Overview Slide 5 Big Data Platforms: An Overview Slide 6 Big Data Platforms: An Overview Slide 7 Big Data Platforms: An Overview Slide 8 Big Data Platforms: An Overview Slide 9 Big Data Platforms: An Overview Slide 10 Big Data Platforms: An Overview Slide 11 Big Data Platforms: An Overview Slide 12 Big Data Platforms: An Overview Slide 13 Big Data Platforms: An Overview Slide 14 Big Data Platforms: An Overview Slide 15 Big Data Platforms: An Overview Slide 16 Big Data Platforms: An Overview Slide 17 Big Data Platforms: An Overview Slide 18 Big Data Platforms: An Overview Slide 19 Big Data Platforms: An Overview Slide 20 Big Data Platforms: An Overview Slide 21 Big Data Platforms: An Overview Slide 22 Big Data Platforms: An Overview Slide 23 Big Data Platforms: An Overview Slide 24 Big Data Platforms: An Overview Slide 25 Big Data Platforms: An Overview Slide 26 Big Data Platforms: An Overview Slide 27 Big Data Platforms: An Overview Slide 28 Big Data Platforms: An Overview Slide 29 Big Data Platforms: An Overview Slide 30 Big Data Platforms: An Overview Slide 31 Big Data Platforms: An Overview Slide 32 Big Data Platforms: An Overview Slide 33 Big Data Platforms: An Overview Slide 34 Big Data Platforms: An Overview Slide 35 Big Data Platforms: An Overview Slide 36 Big Data Platforms: An Overview Slide 37 Big Data Platforms: An Overview Slide 38 Big Data Platforms: An Overview Slide 39 Big Data Platforms: An Overview Slide 40 Big Data Platforms: An Overview Slide 41 Big Data Platforms: An Overview Slide 42 Big Data Platforms: An Overview Slide 43 Big Data Platforms: An Overview Slide 44 Big Data Platforms: An Overview Slide 45 Big Data Platforms: An Overview Slide 46 Big Data Platforms: An Overview Slide 47 Big Data Platforms: An Overview Slide 48 Big Data Platforms: An Overview Slide 49 Big Data Platforms: An Overview Slide 50 Big Data Platforms: An Overview Slide 51 Big Data Platforms: An Overview Slide 52 Big Data Platforms: An Overview Slide 53 Big Data Platforms: An Overview Slide 54 Big Data Platforms: An Overview Slide 55 Big Data Platforms: An Overview Slide 56 Big Data Platforms: An Overview Slide 57 Big Data Platforms: An Overview Slide 58 Big Data Platforms: An Overview Slide 59 Big Data Platforms: An Overview Slide 60 Big Data Platforms: An Overview Slide 61 Big Data Platforms: An Overview Slide 62 Big Data Platforms: An Overview Slide 63 Big Data Platforms: An Overview Slide 64 Big Data Platforms: An Overview Slide 65 Big Data Platforms: An Overview Slide 66 Big Data Platforms: An Overview Slide 67 Big Data Platforms: An Overview Slide 68 Big Data Platforms: An Overview Slide 69 Big Data Platforms: An Overview Slide 70 Big Data Platforms: An Overview Slide 71 Big Data Platforms: An Overview Slide 72 Big Data Platforms: An Overview Slide 73 Big Data Platforms: An Overview Slide 74 Big Data Platforms: An Overview Slide 75 Big Data Platforms: An Overview Slide 76 Big Data Platforms: An Overview Slide 77 Big Data Platforms: An Overview Slide 78 Big Data Platforms: An Overview Slide 79 Big Data Platforms: An Overview Slide 80 Big Data Platforms: An Overview Slide 81 Big Data Platforms: An Overview Slide 82 Big Data Platforms: An Overview Slide 83 Big Data Platforms: An Overview Slide 84 Big Data Platforms: An Overview Slide 85 Big Data Platforms: An Overview Slide 86 Big Data Platforms: An Overview Slide 87 Big Data Platforms: An Overview Slide 88 Big Data Platforms: An Overview Slide 89 Big Data Platforms: An Overview Slide 90 Big Data Platforms: An Overview Slide 91 Big Data Platforms: An Overview Slide 92 Big Data Platforms: An Overview Slide 93
Upcoming SlideShare
Top 5 Considerations for a Big Data Solution
Next

107 Likes

Share

Big Data Platforms: An Overview

A high level overview of NoSQL Big Data platforms. This presentation is targeted more towards business people than technical people.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Big Data Platforms: An Overview

  1. 1. Big Data Platforms: An Overview C. Scyphers Chief Technical Architect Daemon Consulting, LLC
  2. 2. What Is “Big Data”? • Big Data is not simply a huge pile of information • A good starting place is the following thought: “Big Data describes datasets so large they become very difficult to manage with traditional database tools.”
  3. 3. What Is A Big Data Platform? Putting it simply, it is any platform which supports those kind of large datasets.
  4. 4. It doesn’t have to be cutting edge technology.
  5. 5. Lots of legacy technologies can address the problem.
  6. 6. If only at a sizeable cost.
  7. 7. SQL Of the new technologies, the most promising are from the “NoSQL” family.
  8. 8. What Is “NoSQL”? SQL A family of non-relational data storage technologies
  9. 9. Horizontal scalability
  10. 10. Distributed processing
  11. 11. Faster throughput
  12. 12. Usually with less cost than more traditional approaches.
  13. 13. Some of these technologies are new and innovative
  14. 14. Others have been around for decades.
  15. 15. NoSQL Does Not Mean “SQL Is Bad” When the trend was just starting, “NoSQL” was coined. It’s unfortunate, because it implies antagonism towards SQL.
  16. 16. NoSQL Means “Not Only SQL” RELATIONAL RELATIONAL NON- NoSQL is a complement to a traditional RDBMS, not necessarily as a replacement of them.
  17. 17. Why Won’t SQL Do?
  18. 18. Scale is very hard without ridiculous expense
  19. 19. SQL can get very complex, very quickly
  20. 20. Changing a schema for a large production system is both risky and expensive
  21. 21. Throughput can be a challenge
  22. 22. How Does NoSQL Do It?
  23. 23. Scale is achieved through a shared-nothing architecture, removing bottlenecks
  24. 24. Schemaless design means change becomes much less risky and significantly cheaper
  25. 25. Most solutions use simple RESTful interfaces
  26. 26. NoSQL is based upon a better understanding of data storage, usually referred to as the “CAP Theorem”
  27. 27. The CAP Theorem Grossly simplified (with apologies to Brewer): A database can be • Consistent (All clients see the same data) • Available (All clients can find some available node) • Partition-Tolerant (the database will continue to function even if split into disconnected sets – e.g. a network disruption) Pick Any Two.
  28. 28. CAP In Practice • Consistent & Available (no Partition Tolerance) • Either single machines or single site clusters. • Typically uses 2 phase commits
  29. 29. CAP In Practice • Consistent & Partition Tolerant (no Availability) • Some data may be inaccessible, but the remainder is available and consistent • Sharding is an example of this implementation Customers Customers Customers A-F G-R S-Z
  30. 30. CAP In Practice • Available & Partition Tolerant (no Consistency) • Some data may be inaccurate; a conflict resolution strategy is required. • DNS is an example of this, as well as standard master-slave replication
  31. 31. CAP From A Vendor POV • C-A (no P) – this is generally how most RDBMS vendors operate • C-P (no A) – this is how many RDBMS’ attempt to address scale without incurring large costs • A-P (no C) – this is how most NoSQL approaches solve the problem
  32. 32. ACID vs BASE Traditional Databases NoSQL Databases Tend Are ACID Compliant To Be BASE Compliant Atomicity – either the entire transaction Basically completes or none of it does Consistent – any transaction will take the Available database from one consistent state to another, with no broken constraints Isolation – changes do not affect other users Scalable until committed Durability – committed transactions can be Eventually consistent recovered in case of system failure Eventually consistent is the key phrase here
  33. 33. SQL Strengths Very well known technology
  34. 34. Very mature technology
  35. 35. Interoperability across vendors
  36. 36. Large talent pool from which to choose
  37. 37. Ad hoc operations common, if not encouraged
  38. 38. NoSQL Strengths Built to address massive scale
  39. 39. Through horizontal scalability
  40. 40. While remaining highly available
  41. 41. And handling unstructured data
  42. 42. NoSQL Pros/Cons Pros Cons • Schema Evolution • Querying the data is • Horizontal Scalability much harder • Simple Protocols • Paradigm Shift • Security is a big issue • May or may not support data types (BLOBs, spatial) • Generally, uniqueness cannot be enforced
  43. 43. A Disclaimer Before We Continue • I am not an expert on every possible Big Data Platform • There are hundreds of them; these are the ones I consider the leaders in the field and recommend • If you have a favorite, please let me know and I’ll update the deck for next time • The internal details on how these systems work are rather complex; I would prefer to take those questions offline
  44. 44. Flavors Of NoSQL The major four divisions of NoSQL are: • Key-Value • Document Store • Columnar • Other
  45. 45. Key-Value • At a very high level, key-value works essentially by pairing a index token (a key) with a data element (a value). • Both index token and the data value can be of any structure. • Such a pairing is arbitrary and up to the developer of the system to determine.
  46. 46. A Key-Value Example “John Smith”, “100 Century Dr. Alexandria VA 22304” “John Doe”, “16 Kozyak Street, Lozenets District, 1408 Sofia Bulgaria” In both examples, the key is a name and the value is an address. However, the structure of the address differs between the two.
  47. 47. Document Store • Document stores extend the key-value paradigm into values with multiple attributes. • The document values tend to be semi-structured data (XML, JSON, et al) but can also be Word or PDF documents.
  48. 48. A Document Store Example “John Smith”, “<address><street>100 Century Dr.</street> <city>Alexandria</city> <state>VA</state> <postalCode>22304</postalCode> </address>” “John Doe”, “{ “address”: { “street”: “16 Kozyak Street” “district”: “Lozenets, 1408” “city”: “Sofia” “country”: “Bulgaria” } }”
  49. 49. Columnar Family • Usually has “rows” and “columns” • Or, at least, their logical equivalents • Not a traditional, “pure” column store • More of a hybridized approach leveraging key-value pairs • A key with many values attached
  50. 50. The Others • Hierarchical Databases • LDAP, Active Directory • Graph Databases • Neo4j, Flock DB, InfiniteGraph • XML • MarkLogic • Object Oriented Databases • Versant • Lotus Notes • HPCC (LexisNexis)
  51. 51. Key-Value Recap Pairing a index token (a key) with a data element (a value)
  52. 52. Key-Value Pro/Con Pros Cons • Schema Evolution • Packing & unpacking each key • Horizontal Scalability • Keys typically are not related • Simple Protocols to each other • Works well for volatile data • The entire value must be • High throughput, typically returned, not just a part of it optimized for reads or writes • Security tends to be an issue • Keys become meaningful • Hard to support reporting, rather than arbitrary analytics, aggregation or • Application logic defines ordered values object model • Generally does not support updates in place • Application logic defines object model
  53. 53. Where Did Key-Value Come From? The concept is quite old, but most people trace the lineage back to Amazon and the Dynamo paper.
  54. 54. Dynamo Amazon devised the Dynamo engine as a way to address their scalability issues in a reliable way. • Communication between nodes is peer to peer (P2P) • Replication occurs with the end client addressing conflict resolution • Quorum Reads/Writes • Always writable (Hinted Handoff) • Eventually Consistent
  55. 55. Eventually Consistent • Rather than expending the runtime resources to ensure that all nodes are aware of a change before continuing, Dynamo uses an eventually consistent model. • In this model, a subset of nodes are changed • Those nodes then inform their neighbors until all nodes are changed (grossly simplifying).
  56. 56. Can I Use Dynamo? No. It’s an Amazon only internal product. However, AWS S3 is largely based upon it. Amazon did announce a DynamoDB offering for their AWS customers. While it’s probably the same, I cannot guarantee that it is.
  57. 57. Riak • Riak is a key-value database largely modeled after the Dynamo model. • Open source (free) with paid support from Basho. • Main claims to fame: • Extreme reliability • Performance speed
  58. 58. Riak Pro/Con Pros Cons • All nodes are equal – no • Not meant for small, discrete single point of failure and numerous datapoints. • Horizontal Scalability • Getting data in is great; • Full Text Search getting it out, not so much • RESTful interface (and HTTP) • Security is non-existent: • Consistency level tunable on “Riak assumes the internal each operation environment is trusted” • Secondary indexes available • Conflict resolution can bubble • Map/Reduce (JavaScript & up to the client if not careful. Erlang only) • Erlang is fast, but it’s got a serious learning curve.
  59. 59. Riak Users
  60. 60. Redis • Redis is a key-value in-memory datastore. • Open source (free) with support from the community. • Main claims to fame: • Fast. So very, very fast. • Transactional support • Best for rapidly changing data
  61. 61. Redis Pro/Con Pros Cons • Transactional support • Entirely in memory • Blob storage • Master-slave replication • Support for sets, lists and (instead of master-master) sorted sets • Security is non-existent: • Support for Publish-Subscribe designed to be used in (Pub-Sub) messaging trusted environments • Robust set of operators • Does not support encryption • Support can be hard to find
  62. 62. Redis Users
  63. 63. Voldemort • Voldemort is a key-value in-memory database built by LinkedIn. • Open source (free) with support from the community • Main claims to fame: • Low latency • Highly Available • Very fast reads
  64. 64. Voldemort Pro/Con Pros Cons • Highly customizable – each • Versioning means lots of disk layer of the stack can be space being used. replaced as needed • Does not support range • Data elements are versioned queries during changes • No complex query filters • All nodes are independent – • All joins must be done in no single point of failure code • Very, very fast reads • No foreign key constraints • No triggers • Support can be hard to find
  65. 65. Voldemort Users
  66. 66. Key/Value “Big Vendors” • Microsoft Azure Table Storage • Oracle NoSQL • BerkleyDB (Oracle)
  67. 67. Document Store Recap Document stores store an index token with a grouping of attributes in a semi- structured document
  68. 68. Document Store Pro/Con Pros Cons • Tends to support a more • The entire value must be complex data model than returned, not just a part of it key/value • Security tends to be an issue • Good at content • Joins are not available within management the database • Usually supports multiple • No foreign keys indexes • Application logic defines • Schemaless (can be nested) object model • Typically low latency reads • Application logic defines object model
  69. 69. CouchDB • CouchDB is a document store database. • Open source (free), part of the Apache foundation with paid support available from several vendors. • Main claims to fame: • Simple and easy to use • Good read consistency • Master-master replication
  70. 70. CouchDB Pro/Con Pros Cons • Very simple API for • The simple API for development development is somewhat • MVCC support for read limited consistency • No foreign keys • Full Map/Reduce support • Conflict resolution devolves • Data is versioned to the application • Secondary indexes supported • Versioning requires extensive • Some security support disk space • RESTful API, JSON support • Versioning places large load • Materialized views with on I/O channels incremental update support • Replication for performance, not availability
  71. 71. CouchDB Users
  72. 72. MongoDB • MongoDB is a document store database. • Open source (free) with paid support available from 10Gen. • Main claims to fame: • Index anything • Ad hoc query support • SQL like operations (not SQL syntax)
  73. 73. MongoDB Pro/Con Pros Cons • Auto-sharding • Does not support JSON: BSON • Auto-failover instead • Update in place • Master-slave replication • Spatial index support • Has had some growing pains • Ad hoc query support (e.g. Foursquare outage) • Any field in Mongo can be • Not RESTful by default indexed • Failures require a manual • Very, very popular (lots of database repair operation production deployments) (similar to MySQL) • Very easy transition from SQL • Replication for availability, not performance
  74. 74. MongoDB Users
  75. 75. Document Store “Big Vendors” • Lotus Domino
  76. 76. Columnar Family Recap • A key with many values attached • Usually presenting as “rows” and “columns” • Or, at least, their logical equivalents
  77. 77. Columnar Pro/Con Pros Cons • Tend to have some level of • Is much less efficient when rudimentary security support processing many columns • Usually include a degree of simultaneously versioning • Joins tend to not be • Can be more efficient than supported row databases when • Referential integrity not processing a limited number available of columns over a large amount of rows
  78. 78. Where Did Columnar Come From? The concept has been around for a while, but most people trace the NoSQL lineage back to Google.
  79. 79. BigTable Google devised the BigTable engine as a way to address their search related scalability issues in a reliable way. • Data is organized through a set of keys: • Row • Column • Timestamp • A hybrid row/column store with a single master • Versioning is handled through the time key • Tablets are a dynamic partition of a sequence of rows – supports very efficient range scans • Columns can be grouped into column families • Column families can have access control
  80. 80. Can I Use BigTable? No. It’s a Google only internal product. However, quite a few open source products are built upon the concepts.
  81. 81. Cassandra • Cassandra is a hybrid of Big Table built on Dynamo infrastructure • Open source (free), built by Facebook with paid support available from several vendors. • Main claims to fame: • An Apache project • Very, very fast writes • Spans multiple datacenters
  82. 82. Cassandra Pro/Con Pros Cons • Designed to span multiple • No joins datacenters • No referential integrity • Peer to peer communication • Written in Java – quite between nodes complex to administer • No single point of failure and configure • Always writeable • Last update wins • Consistency level is tunable at run time • Supports secondary indexes • Supports Map/Reduce • Supports range queries
  83. 83. Cassandra Users
  84. 84. HBase • Hbase is a columnar database built on top of the Hadoop environment. • Open source (free) with paid support from numerous vendors • Main claims to fame: • Ad hoc type abilities • Easy integration with Map/Reduce
  85. 85. HBase Pro/Con Pros Cons • Map/Reduce support • Secondary indexes generally • More of a CA approach and not supported an AP • Security is non-existent • Supports predicate push • Requires a Hadoop down for performance gains infrastructure to function • Automatic partitioning and rebalancing of regions • Data is stored in a sorted order (not indexed) • RESTful API • Strong and vibrant ecosystem
  86. 86. HBase Users
  87. 87. Hadoop • Hadoop is not a columnar store as such. • Rather, Hadoop is a massively parallel data processing engine • Main claims to fame: • Specializes in unstructured data • Very flexible and popular
  88. 88. Hadoop Pro/Con Pros Cons • While written in Java, almost • Large amounts of disk space any language can leverage and bandwidth required Hadoop • Paradigm shift for IT staff • Runs on commodity servers • Quality talent is highly in • Horizontally scalable demand and expensive • Very fast and powerful • Security is non-existent • Where Map/Reduce • Name node is a single point originated of failure • Ample support from vendors • More or less only supporting • “Helper” languages like Hive batch processing and Pig • Not user friendly to anyone • Strong and vibrant ecosystem other than developers
  89. 89. Hadoop Users Plus lots, lots more
  90. 90. Columnar “Big Vendor” • EMC Greenplum • Teradata Aster In so far as both of these solutions are grafting Map/Reduce into a (more or less) SQL environment
  91. 91. Which One Do I Use Where? • Key-Value for (relatively) simple, volitile data • Document store for more complex data • Columnar for analytical processing • RBDMS for traditional processing – particularly where a lazy consistency is not acceptable • Point Of Sale, for example
  92. 92. Questions?
  93. 93. @scyphers Additional Information At http://www.daemonconsulting.net/BDC-FOSE-2012 Daemon Consulting, LLC http://www.daemonconsulting.net/ Specializing In The Hard Stuff
  • dungtk4

    Dec. 14, 2020
  • SanuraHettiarachchi

    Feb. 4, 2020
  • VirajWannige1

    Oct. 15, 2019
  • shanugandhi7

    Oct. 6, 2019
  • WangLiangLin

    Sep. 14, 2019
  • AntonioFelicio1

    Apr. 11, 2019
  • tigerkdp

    Aug. 29, 2018
  • ssuserf76ffb

    Aug. 23, 2018
  • vincenza63

    Mar. 3, 2018
  • annamartmuller

    Dec. 13, 2017
  • rajshri_pote

    Dec. 8, 2017
  • manvapradhan

    Sep. 10, 2017
  • LeVanThanh2

    Aug. 8, 2017
  • asutosh333

    Jun. 22, 2017
  • tuananhma

    May. 24, 2017
  • RajNavani

    Apr. 17, 2017
  • MayZu2

    Mar. 7, 2017
  • meetdebasish

    Feb. 24, 2017
  • HanzaHong

    Feb. 7, 2017
  • TonyLiu92

    Jan. 5, 2017

A high level overview of NoSQL Big Data platforms. This presentation is targeted more towards business people than technical people.

Views

Total views

30,120

On Slideshare

0

From embeds

0

Number of embeds

187

Actions

Downloads

0

Shares

0

Comments

0

Likes

107

×