Database Architectures and Hypertable

6,941 views

Published on

This presentation covers some common terminology used to describe NoSQL databases, goes into depth on some popular scalable database architectures, and includes an overview of Hypertable

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,941
On SlideShare
0
From Embeds
0
Number of Embeds
114
Actions
Shares
0
Downloads
201
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • Describe the 360 degree panoramic view feature of Google Maps
  • Spend some time.
  • Spend some time
  • Database Architectures and Hypertable

    1. 1. Databases ArchitecturesDatabases Architectures & Hypertable& Hypertable Doug JuddDoug Judd CEO, Hypertable, Inc.CEO, Hypertable, Inc.
    2. 2. Database TerminologyDatabase Terminology
    3. 3. www.hypertable.org Structured, Semi-Structured,Structured, Semi-Structured, and Unstructured Dataand Unstructured Data  Structured is what RDBMS storeStructured is what RDBMS store  Data is broken into discrete componentsData is broken into discrete components  Types associated with each component:Types associated with each component: integer, floating point, date, stringinteger, floating point, date, string  Unstructured is free-form textUnstructured is free-form text  Semi-structured is combination ofSemi-structured is combination of sturctured and semi-structuredsturctured and semi-structured
    4. 4. www.hypertable.org Document-OrientedDocument-Oriented  Semi-structured documentsSemi-structured documents  Accepts documents in a format such asAccepts documents in a format such as JSON, XML, YAMLJSON, XML, YAML  Often Schema-lessOften Schema-less  Auto-index fieldsAuto-index fields  Examples: CouchDB, MongoDBExamples: CouchDB, MongoDB  Best Fit: XML or Web documentsBest Fit: XML or Web documents
    5. 5. www.hypertable.org Graph DatabasesGraph Databases  Database designed to represent graphsDatabase designed to represent graphs  APIs for performing graph operationsAPIs for performing graph operations  Traversal (depth-first, breadth-first)Traversal (depth-first, breadth-first)  Shortest/Cheapest pathShortest/Cheapest path  PartitioningPartitioning  Some allow HypergraphsSome allow Hypergraphs  Examples:Examples: Neo4j, HyperGraphDB, InfoGrid,Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB,AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, FilamentOrientDB, VertexDB, InfiniteGraph, Filament  More info: sones graphdb landscapeMore info: sones graphdb landscape
    6. 6. www.hypertable.org Column-OrientedColumn-Oriented  Data physically stored by columnData physically stored by column  RDBMS typically row-orientedRDBMS typically row-oriented  Improved performance for columnImproved performance for column operationsoperations  Better data compressionBetter data compression  Examples:Examples: Hypertable, HBase, Cassandra, VerticaHypertable, HBase, Cassandra, Vertica
    7. 7. www.hypertable.org In-MemoryIn-Memory  Data set stored in RAMData set stored in RAM  Extremely fast accessExtremely fast access  Limited capacityLimited capacity  Examples:Examples: Memcached, Redis, MonetDB, VoltDBMemcached, Redis, MonetDB, VoltDB
    8. 8. www.hypertable.org Horizontal ScalabilityHorizontal Scalability  Scale outScale out  Increase capacity by adding machinesIncrease capacity by adding machines  Opposite of vertical scalability (scale up)Opposite of vertical scalability (scale up)  Commodity HardwareCommodity Hardware
    9. 9. www.hypertable.org Distributed Hash Table (DHT)Distributed Hash Table (DHT)  Horizontally ScalableHorizontally Scalable  DecentralizedDecentralized  Fast accessFast access  Restricted API:Restricted API: GET,SET,DELETEGET,SET,DELETE  Peer-to-peer file sharing systems:Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, FreenetBitTorrent, Napster, Gnutella, Freenet  Examples:Examples: Dynamo, Cassandra, Riak, Project Voldemort,Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, MembaseSimpleDB, S3, Redis, Scalaris, Membase
    10. 10. www.hypertable.org Amazon AWSAmazon AWS  S3S3  Online storage web serviceOnline storage web service  Designed for larger amounts of dataDesigned for larger amounts of data  Cost $0.15/GB per monthCost $0.15/GB per month  SimpleDBSimpleDB  Designed for smaller amounts of dataDesigned for smaller amounts of data  Provides indexing and richer query capabilityProvides indexing and richer query capability  Cost $0.27/GB per month + machine utilization feeCost $0.27/GB per month + machine utilization fee  RDSRDS  Managed MySQL instancesManaged MySQL instances
    11. 11. Scalable DatabaseScalable Database ArchitecturesArchitectures
    12. 12. www.hypertable.org Auto-ShardingAuto-Sharding  Splits table data into horizontal “shards”Splits table data into horizontal “shards”  Shards managed by traditional RDBMSShards managed by traditional RDBMS (e.g. MySQL, Postgres)(e.g. MySQL, Postgres)  Automated “glue” code to handle shardingAutomated “glue” code to handle sharding and request routingand request routing  Examples:Examples: MongoDB, AsterData, GreenplumMongoDB, AsterData, Greenplum
    13. 13. www.hypertable.org MongoDBMongoDB
    14. 14. www.hypertable.org DynamoDynamo  Developed by Amazon.com for theirDeveloped by Amazon.com for their Shopping CartShopping Cart  Designed for high write availabilityDesigned for high write availability  Eventually Consistent DHTEventually Consistent DHT  Implementations:Implementations:  CassandraCassandra  Project VoldemortProject Voldemort  RiakRiak  DynomiteDynomite
    15. 15. www.hypertable.org Eventual ConsistencyEventual Consistency  Database update semantics in aDatabase update semantics in a distributed system with data replicationdistributed system with data replication  Strong Consistency - after an updateStrong Consistency - after an update completescompletes allall processes see the updatedprocesses see the updated valuevalue  Eventual Consistency -Eventual Consistency - eventually alleventually all processes will see the updated valueprocesses will see the updated value  Most well-known eventual consistencyMost well-known eventual consistency system is DNSsystem is DNS
    16. 16. www.hypertable.org Eventual ConsistencyEventual Consistency
    17. 17. www.hypertable.org Consistent HashingConsistent Hashing
    18. 18. www.hypertable.org Order Preserving PartitionerOrder Preserving Partitioner (Cassandra)(Cassandra) www.recipezaar.comwww.recipezaar.com 1091721999…6297502721091721999…629750272 ++ www.ribbonprinters.comwww.ribbonprinters.com 1091721999…9652931031091721999…965293103 / 2 =/ 2 = www.rgb????i?pQdpwww.rgb????i?pQdp?.??? 1091721999…297521687?.??? 1091721999…297521687
    19. 19. www.hypertable.org Order Preserving PartitionerOrder Preserving Partitioner Balance ProblemBalance Problem
    20. 20. www.hypertable.org Bigtable: the infrastructure thatBigtable: the infrastructure that Google is built onGoogle is built on  Bigtable underpins 100+ GoogleBigtable underpins 100+ Google services, including:services, including: YouTube, Blogger, Google Earth, GoogleYouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics,Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code,Google Book Search, Google Code, Crawl Database…Crawl Database…  ImplementationsImplementations  HypertableHypertable  HBaseHBase
    21. 21. www.hypertable.org Google StackGoogle Stack  GFSGFS - Replicates data inter-machine- Replicates data inter-machine  MapReduceMapReduce - Efficiently process data in GFS- Efficiently process data in GFS  BigtableBigtable - Indexed table structure- Indexed table structure
    22. 22. www.hypertable.org Google File SystemGoogle File System
    23. 23. www.hypertable.org Google File SystemGoogle File System
    24. 24. www.hypertable.org Table: Visual RepresentationTable: Visual Representation
    25. 25. www.hypertable.org Table: Actual RepresentationTable: Actual Representation
    26. 26. www.hypertable.org Scaling (part I)Scaling (part I)
    27. 27. www.hypertable.org Scaling (part II)Scaling (part II)
    28. 28. www.hypertable.org Scaling (part III)Scaling (part III)
    29. 29. www.hypertable.org Request RoutingRequest Routing
    30. 30. HypertableHypertable
    31. 31. www.hypertable.org Hypertable OverviewHypertable Overview  Massively Scalable DatabaseMassively Scalable Database  Modeled after Google’s BigtableModeled after Google’s Bigtable  High Performance Implementation (C++)High Performance Implementation (C++)  Thrift Interface for all popular High LevelThrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etcLanguages: Java, Ruby, Python, PHP, etc  Open Source (GPL license)Open Source (GPL license)  Project started March 2007 @ ZventsProject started March 2007 @ Zvents
    32. 32. www.hypertable.org Hypertable In Use TodayHypertable In Use Today
    33. 33. www.hypertable.org Hypertable vs. HBaseHypertable vs. HBase
    34. 34. www.hypertable.org Hypertable vs. HBaseHypertable vs. HBase Test Hypertable Advantage Relative to HBase (%) Random Read Zipfian 80 GB 925 Random Read Zipfian 20 GB 777 Random Read Zipfian 2.5 GB 100 Random Write 10KB values 51 Random Write 1KB values 102 Random Write 100 byte values 427 Random Write 10 byte values 931 Sequential Read 10KB values 1060 Sequential Read 1KB values 68 Sequential Read 100 byte values 129 Scan 10KB values 2 Scan 1KB values 58 Scan 100 byte values 75 Scan 10 byte values 220
    35. 35. www.hypertable.org Annual EC2 Cost SavingsAnnual EC2 Cost Savings  Assuming 200% improvementAssuming 200% improvement  Extra large reserved instancesExtra large reserved instances
    36. 36. www.hypertable.org ResourcesResources Project SiteProject Site www.hypertable.org TwitterTwitter hypertable Commercial SupportCommercial Support www.hypertable.com Performance EvaluationPerformance Evaluation Write-upWrite-up blog.hypertable.com/?p=14
    37. 37. Q&AQ&A
    38. 38. www.hypertable.org System OverviewSystem Overview
    39. 39. www.hypertable.org Data ModelData Model  Sparse, two-dimensional table with cell versionsSparse, two-dimensional table with cell versions  Cells are identified by a 4-part keyCells are identified by a 4-part key  Row (string)Row (string)  Column Family (byte)Column Family (byte)  Column Qualifier (string)Column Qualifier (string)  Timestamp (long integer)Timestamp (long integer)

    ×