Databases ArchitecturesDatabases Architectures
& Hypertable& Hypertable
Doug JuddDoug Judd
CEO, Hypertable, Inc.CEO, Hypertable, Inc.
Database TerminologyDatabase Terminology
www.hypertable.org
Structured, Semi-Structured,Structured, Semi-Structured,
and Unstructured Dataand Unstructured Data
 Structured is what RDBMS storeStructured is what RDBMS store

Data is broken into discrete componentsData is broken into discrete components

Types associated with each component:Types associated with each component:
integer, floating point, date, stringinteger, floating point, date, string
 Unstructured is free-form textUnstructured is free-form text
 Semi-structured is combination ofSemi-structured is combination of
sturctured and semi-structuredsturctured and semi-structured
www.hypertable.org
Document-OrientedDocument-Oriented
 Semi-structured documentsSemi-structured documents
 Accepts documents in a format such asAccepts documents in a format such as
JSON, XML, YAMLJSON, XML, YAML
 Often Schema-lessOften Schema-less
 Auto-index fieldsAuto-index fields
 Examples: CouchDB, MongoDBExamples: CouchDB, MongoDB
 Best Fit: XML or Web documentsBest Fit: XML or Web documents
www.hypertable.org
Graph DatabasesGraph Databases
 Database designed to represent graphsDatabase designed to represent graphs
 APIs for performing graph operationsAPIs for performing graph operations

Traversal (depth-first, breadth-first)Traversal (depth-first, breadth-first)

Shortest/Cheapest pathShortest/Cheapest path

PartitioningPartitioning
 Some allow HypergraphsSome allow Hypergraphs
 Examples:Examples:
Neo4j, HyperGraphDB, InfoGrid,Neo4j, HyperGraphDB, InfoGrid,
AllegroGraph, Sones, DEX, FlockDB,AllegroGraph, Sones, DEX, FlockDB,
OrientDB, VertexDB, InfiniteGraph, FilamentOrientDB, VertexDB, InfiniteGraph, Filament
 More info: sones graphdb landscapeMore info: sones graphdb landscape
www.hypertable.org
Column-OrientedColumn-Oriented
 Data physically stored by columnData physically stored by column
 RDBMS typically row-orientedRDBMS typically row-oriented
 Improved performance for columnImproved performance for column
operationsoperations
 Better data compressionBetter data compression
 Examples:Examples:
Hypertable, HBase, Cassandra, VerticaHypertable, HBase, Cassandra, Vertica
www.hypertable.org
In-MemoryIn-Memory
 Data set stored in RAMData set stored in RAM
 Extremely fast accessExtremely fast access
 Limited capacityLimited capacity
 Examples:Examples:
Memcached, Redis, MonetDB, VoltDBMemcached, Redis, MonetDB, VoltDB
www.hypertable.org
Horizontal ScalabilityHorizontal Scalability
 Scale outScale out
 Increase capacity by adding machinesIncrease capacity by adding machines
 Opposite of vertical scalability (scale up)Opposite of vertical scalability (scale up)
 Commodity HardwareCommodity Hardware
www.hypertable.org
Distributed Hash Table (DHT)Distributed Hash Table (DHT)
 Horizontally ScalableHorizontally Scalable
 DecentralizedDecentralized
 Fast accessFast access
 Restricted API:Restricted API: GET,SET,DELETEGET,SET,DELETE
 Peer-to-peer file sharing systems:Peer-to-peer file sharing systems:
BitTorrent, Napster, Gnutella, FreenetBitTorrent, Napster, Gnutella, Freenet
 Examples:Examples:
Dynamo, Cassandra, Riak, Project Voldemort,Dynamo, Cassandra, Riak, Project Voldemort,
SimpleDB, S3, Redis, Scalaris, MembaseSimpleDB, S3, Redis, Scalaris, Membase
www.hypertable.org
Amazon AWSAmazon AWS
 S3S3

Online storage web serviceOnline storage web service

Designed for larger amounts of dataDesigned for larger amounts of data

Cost $0.15/GB per monthCost $0.15/GB per month
 SimpleDBSimpleDB

Designed for smaller amounts of dataDesigned for smaller amounts of data

Provides indexing and richer query capabilityProvides indexing and richer query capability

Cost $0.27/GB per month + machine utilization feeCost $0.27/GB per month + machine utilization fee
 RDSRDS

Managed MySQL instancesManaged MySQL instances
Scalable DatabaseScalable Database
ArchitecturesArchitectures
www.hypertable.org
Auto-ShardingAuto-Sharding
 Splits table data into horizontal “shards”Splits table data into horizontal “shards”
 Shards managed by traditional RDBMSShards managed by traditional RDBMS
(e.g. MySQL, Postgres)(e.g. MySQL, Postgres)
 Automated “glue” code to handle shardingAutomated “glue” code to handle sharding
and request routingand request routing
 Examples:Examples:
MongoDB, AsterData, GreenplumMongoDB, AsterData, Greenplum
www.hypertable.org
MongoDBMongoDB
www.hypertable.org
DynamoDynamo
 Developed by Amazon.com for theirDeveloped by Amazon.com for their
Shopping CartShopping Cart
 Designed for high write availabilityDesigned for high write availability
 Eventually Consistent DHTEventually Consistent DHT
 Implementations:Implementations:

CassandraCassandra

Project VoldemortProject Voldemort

RiakRiak

DynomiteDynomite
www.hypertable.org
Eventual ConsistencyEventual Consistency
 Database update semantics in aDatabase update semantics in a
distributed system with data replicationdistributed system with data replication
 Strong Consistency - after an updateStrong Consistency - after an update
completescompletes allall processes see the updatedprocesses see the updated
valuevalue
 Eventual Consistency -Eventual Consistency - eventually alleventually all
processes will see the updated valueprocesses will see the updated value
 Most well-known eventual consistencyMost well-known eventual consistency
system is DNSsystem is DNS
www.hypertable.org
Eventual ConsistencyEventual Consistency
www.hypertable.org
Consistent HashingConsistent Hashing
www.hypertable.org
Order Preserving PartitionerOrder Preserving Partitioner
(Cassandra)(Cassandra)
www.recipezaar.comwww.recipezaar.com 1091721999…6297502721091721999…629750272
++
www.ribbonprinters.comwww.ribbonprinters.com 1091721999…9652931031091721999…965293103
/ 2 =/ 2 =
www.rgb????i?pQdpwww.rgb????i?pQdp?.??? 1091721999…297521687?.??? 1091721999…297521687
www.hypertable.org
Order Preserving PartitionerOrder Preserving Partitioner
Balance ProblemBalance Problem
www.hypertable.org
Bigtable: the infrastructure thatBigtable: the infrastructure that
Google is built onGoogle is built on
 Bigtable underpins 100+ GoogleBigtable underpins 100+ Google
services, including:services, including:
YouTube, Blogger, Google Earth, GoogleYouTube, Blogger, Google Earth, Google
Maps, Orkut, Gmail, Google Analytics,Maps, Orkut, Gmail, Google Analytics,
Google Book Search, Google Code,Google Book Search, Google Code,
Crawl Database…Crawl Database…
 ImplementationsImplementations

HypertableHypertable

HBaseHBase
www.hypertable.org
Google StackGoogle Stack
 GFSGFS - Replicates data inter-machine- Replicates data inter-machine
 MapReduceMapReduce - Efficiently process data in GFS- Efficiently process data in GFS
 BigtableBigtable - Indexed table structure- Indexed table structure
www.hypertable.org
Google File SystemGoogle File System
www.hypertable.org
Google File SystemGoogle File System
www.hypertable.org
Table: Visual RepresentationTable: Visual Representation
www.hypertable.org
Table: Actual RepresentationTable: Actual Representation
www.hypertable.org
Scaling (part I)Scaling (part I)
www.hypertable.org
Scaling (part II)Scaling (part II)
www.hypertable.org
Scaling (part III)Scaling (part III)
www.hypertable.org
Request RoutingRequest Routing
HypertableHypertable
www.hypertable.org
Hypertable OverviewHypertable Overview
 Massively Scalable DatabaseMassively Scalable Database
 Modeled after Google’s BigtableModeled after Google’s Bigtable
 High Performance Implementation (C++)High Performance Implementation (C++)
 Thrift Interface for all popular High LevelThrift Interface for all popular High Level
Languages: Java, Ruby, Python, PHP, etcLanguages: Java, Ruby, Python, PHP, etc
 Open Source (GPL license)Open Source (GPL license)
 Project started March 2007 @ ZventsProject started March 2007 @ Zvents
www.hypertable.org
Hypertable In Use TodayHypertable In Use Today
www.hypertable.org
Hypertable vs. HBaseHypertable vs. HBase
www.hypertable.org
Hypertable vs. HBaseHypertable vs. HBase
Test Hypertable
Advantage
Relative to
HBase (%)
Random Read Zipfian 80 GB 925
Random Read Zipfian 20 GB 777
Random Read Zipfian 2.5 GB 100
Random Write 10KB values 51
Random Write 1KB values 102
Random Write 100 byte values 427
Random Write 10 byte values 931
Sequential Read 10KB values 1060
Sequential Read 1KB values 68
Sequential Read 100 byte
values
129
Scan 10KB values 2
Scan 1KB values 58
Scan 100 byte values 75
Scan 10 byte values 220
www.hypertable.org
Annual EC2 Cost SavingsAnnual EC2 Cost Savings
 Assuming 200% improvementAssuming 200% improvement
 Extra large reserved instancesExtra large reserved instances
www.hypertable.org
ResourcesResources
Project SiteProject Site www.hypertable.org
TwitterTwitter hypertable
Commercial SupportCommercial Support www.hypertable.com
Performance EvaluationPerformance Evaluation
Write-upWrite-up
blog.hypertable.com/?p=14
Q&AQ&A
www.hypertable.org
System OverviewSystem Overview
www.hypertable.org
Data ModelData Model
 Sparse, two-dimensional table with cell versionsSparse, two-dimensional table with cell versions
 Cells are identified by a 4-part keyCells are identified by a 4-part key

Row (string)Row (string)

Column Family (byte)Column Family (byte)

Column Qualifier (string)Column Qualifier (string)

Timestamp (long integer)Timestamp (long integer)

Database Architectures and Hypertable

  • 1.
    Databases ArchitecturesDatabases Architectures &Hypertable& Hypertable Doug JuddDoug Judd CEO, Hypertable, Inc.CEO, Hypertable, Inc.
  • 2.
  • 3.
    www.hypertable.org Structured, Semi-Structured,Structured, Semi-Structured, andUnstructured Dataand Unstructured Data  Structured is what RDBMS storeStructured is what RDBMS store  Data is broken into discrete componentsData is broken into discrete components  Types associated with each component:Types associated with each component: integer, floating point, date, stringinteger, floating point, date, string  Unstructured is free-form textUnstructured is free-form text  Semi-structured is combination ofSemi-structured is combination of sturctured and semi-structuredsturctured and semi-structured
  • 4.
    www.hypertable.org Document-OrientedDocument-Oriented  Semi-structured documentsSemi-structureddocuments  Accepts documents in a format such asAccepts documents in a format such as JSON, XML, YAMLJSON, XML, YAML  Often Schema-lessOften Schema-less  Auto-index fieldsAuto-index fields  Examples: CouchDB, MongoDBExamples: CouchDB, MongoDB  Best Fit: XML or Web documentsBest Fit: XML or Web documents
  • 5.
    www.hypertable.org Graph DatabasesGraph Databases Database designed to represent graphsDatabase designed to represent graphs  APIs for performing graph operationsAPIs for performing graph operations  Traversal (depth-first, breadth-first)Traversal (depth-first, breadth-first)  Shortest/Cheapest pathShortest/Cheapest path  PartitioningPartitioning  Some allow HypergraphsSome allow Hypergraphs  Examples:Examples: Neo4j, HyperGraphDB, InfoGrid,Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB,AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, FilamentOrientDB, VertexDB, InfiniteGraph, Filament  More info: sones graphdb landscapeMore info: sones graphdb landscape
  • 6.
    www.hypertable.org Column-OrientedColumn-Oriented  Data physicallystored by columnData physically stored by column  RDBMS typically row-orientedRDBMS typically row-oriented  Improved performance for columnImproved performance for column operationsoperations  Better data compressionBetter data compression  Examples:Examples: Hypertable, HBase, Cassandra, VerticaHypertable, HBase, Cassandra, Vertica
  • 7.
    www.hypertable.org In-MemoryIn-Memory  Data setstored in RAMData set stored in RAM  Extremely fast accessExtremely fast access  Limited capacityLimited capacity  Examples:Examples: Memcached, Redis, MonetDB, VoltDBMemcached, Redis, MonetDB, VoltDB
  • 8.
    www.hypertable.org Horizontal ScalabilityHorizontal Scalability Scale outScale out  Increase capacity by adding machinesIncrease capacity by adding machines  Opposite of vertical scalability (scale up)Opposite of vertical scalability (scale up)  Commodity HardwareCommodity Hardware
  • 9.
    www.hypertable.org Distributed Hash Table(DHT)Distributed Hash Table (DHT)  Horizontally ScalableHorizontally Scalable  DecentralizedDecentralized  Fast accessFast access  Restricted API:Restricted API: GET,SET,DELETEGET,SET,DELETE  Peer-to-peer file sharing systems:Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, FreenetBitTorrent, Napster, Gnutella, Freenet  Examples:Examples: Dynamo, Cassandra, Riak, Project Voldemort,Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, MembaseSimpleDB, S3, Redis, Scalaris, Membase
  • 10.
    www.hypertable.org Amazon AWSAmazon AWS S3S3  Online storage web serviceOnline storage web service  Designed for larger amounts of dataDesigned for larger amounts of data  Cost $0.15/GB per monthCost $0.15/GB per month  SimpleDBSimpleDB  Designed for smaller amounts of dataDesigned for smaller amounts of data  Provides indexing and richer query capabilityProvides indexing and richer query capability  Cost $0.27/GB per month + machine utilization feeCost $0.27/GB per month + machine utilization fee  RDSRDS  Managed MySQL instancesManaged MySQL instances
  • 11.
  • 12.
    www.hypertable.org Auto-ShardingAuto-Sharding  Splits tabledata into horizontal “shards”Splits table data into horizontal “shards”  Shards managed by traditional RDBMSShards managed by traditional RDBMS (e.g. MySQL, Postgres)(e.g. MySQL, Postgres)  Automated “glue” code to handle shardingAutomated “glue” code to handle sharding and request routingand request routing  Examples:Examples: MongoDB, AsterData, GreenplumMongoDB, AsterData, Greenplum
  • 13.
  • 14.
    www.hypertable.org DynamoDynamo  Developed byAmazon.com for theirDeveloped by Amazon.com for their Shopping CartShopping Cart  Designed for high write availabilityDesigned for high write availability  Eventually Consistent DHTEventually Consistent DHT  Implementations:Implementations:  CassandraCassandra  Project VoldemortProject Voldemort  RiakRiak  DynomiteDynomite
  • 15.
    www.hypertable.org Eventual ConsistencyEventual Consistency Database update semantics in aDatabase update semantics in a distributed system with data replicationdistributed system with data replication  Strong Consistency - after an updateStrong Consistency - after an update completescompletes allall processes see the updatedprocesses see the updated valuevalue  Eventual Consistency -Eventual Consistency - eventually alleventually all processes will see the updated valueprocesses will see the updated value  Most well-known eventual consistencyMost well-known eventual consistency system is DNSsystem is DNS
  • 16.
  • 17.
  • 18.
    www.hypertable.org Order Preserving PartitionerOrderPreserving Partitioner (Cassandra)(Cassandra) www.recipezaar.comwww.recipezaar.com 1091721999…6297502721091721999…629750272 ++ www.ribbonprinters.comwww.ribbonprinters.com 1091721999…9652931031091721999…965293103 / 2 =/ 2 = www.rgb????i?pQdpwww.rgb????i?pQdp?.??? 1091721999…297521687?.??? 1091721999…297521687
  • 19.
    www.hypertable.org Order Preserving PartitionerOrderPreserving Partitioner Balance ProblemBalance Problem
  • 20.
    www.hypertable.org Bigtable: the infrastructurethatBigtable: the infrastructure that Google is built onGoogle is built on  Bigtable underpins 100+ GoogleBigtable underpins 100+ Google services, including:services, including: YouTube, Blogger, Google Earth, GoogleYouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics,Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code,Google Book Search, Google Code, Crawl Database…Crawl Database…  ImplementationsImplementations  HypertableHypertable  HBaseHBase
  • 21.
    www.hypertable.org Google StackGoogle Stack GFSGFS - Replicates data inter-machine- Replicates data inter-machine  MapReduceMapReduce - Efficiently process data in GFS- Efficiently process data in GFS  BigtableBigtable - Indexed table structure- Indexed table structure
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    www.hypertable.org Hypertable OverviewHypertable Overview Massively Scalable DatabaseMassively Scalable Database  Modeled after Google’s BigtableModeled after Google’s Bigtable  High Performance Implementation (C++)High Performance Implementation (C++)  Thrift Interface for all popular High LevelThrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etcLanguages: Java, Ruby, Python, PHP, etc  Open Source (GPL license)Open Source (GPL license)  Project started March 2007 @ ZventsProject started March 2007 @ Zvents
  • 32.
    www.hypertable.org Hypertable In UseTodayHypertable In Use Today
  • 33.
  • 34.
    www.hypertable.org Hypertable vs. HBaseHypertablevs. HBase Test Hypertable Advantage Relative to HBase (%) Random Read Zipfian 80 GB 925 Random Read Zipfian 20 GB 777 Random Read Zipfian 2.5 GB 100 Random Write 10KB values 51 Random Write 1KB values 102 Random Write 100 byte values 427 Random Write 10 byte values 931 Sequential Read 10KB values 1060 Sequential Read 1KB values 68 Sequential Read 100 byte values 129 Scan 10KB values 2 Scan 1KB values 58 Scan 100 byte values 75 Scan 10 byte values 220
  • 35.
    www.hypertable.org Annual EC2 CostSavingsAnnual EC2 Cost Savings  Assuming 200% improvementAssuming 200% improvement  Extra large reserved instancesExtra large reserved instances
  • 36.
    www.hypertable.org ResourcesResources Project SiteProject Sitewww.hypertable.org TwitterTwitter hypertable Commercial SupportCommercial Support www.hypertable.com Performance EvaluationPerformance Evaluation Write-upWrite-up blog.hypertable.com/?p=14
  • 37.
  • 38.
  • 39.
    www.hypertable.org Data ModelData Model Sparse, two-dimensional table with cell versionsSparse, two-dimensional table with cell versions  Cells are identified by a 4-part keyCells are identified by a 4-part key  Row (string)Row (string)  Column Family (byte)Column Family (byte)  Column Qualifier (string)Column Qualifier (string)  Timestamp (long integer)Timestamp (long integer)

Editor's Notes

  • #21 Describe the 360 degree panoramic view feature of Google Maps
  • #25 Spend some time.
  • #40 Spend some time