SlideShare a Scribd company logo
1 of 61
Download to read offline
Scaling with HiveDB
The Problem
• Hundreds of
  millions of
  products

• Millions of new
  products every
  week

• Accelerating
  growth
Scaling up is no
longer an option.
Requirements
➡   OLTP optimized
➡   Constant response time is more
    important than low-latency
➡   Expand capacity online
➡   Ability to move records
➡   No re-sharding
2 years ago
   Clustered      Non-relational   Unstructured



   Oracle RAC
 MS SQL Server                      Hadoop
 MySQL Cluster                      MogileFS
Teradata (OLAP)                       S3
Today
   Clustered      Non-relational   Unstructured



   Oracle RAC
 MS SQL Server     Hypertable
                                    Hadoop
 MySQL Cluster       HBase
                                    MogileFS
      DB2           CouchDB
                                      S3
Teradata (OLAP)    SimpleDB
    HiveDB
System      Storage   Interface   Partitioning    Expansion   Node Types     Maturity


                                                        No
 Oracle RAC     Shared      SQL       Transparent                 Identical      7 years
                                                     downtime

                Memory                               Requires
MySQL Cluster               SQL       Transparent                  Mixed         3 years
                / Local                               Restart

                                                                                    ?
                                                        No
 Hypertable      Local     HQL        Transparent                  Mixed        (released
                                                     downtime
                                                                                  2/08)
                                                     Degraded                    3 years
    DB2          Local      SQL       Fixed Hash     Performan    Identical     (25 years
                                                         ce                       total)
                                      Key-based    No                           18 months
   HiveDB        Local      SQL                                    Mixed
                                      Directory downtime                       (+13 years!)
Operating a
distributed system
is hard.
Non-functional
concerns quickly
dominate functional
concerns.
Replication
Fail over
Monitoring
Our approach:
minimize cleverness
Build atop solid,
easy to manage
technologies.
So we can
get back
to selling
t-shirts.
MySQL is fast.
MySQL is well
understood.
MySQL replication
works.
MySQL is free.
The same for JAVA
The simplest thing
➡   HiveDB is a JDBC Gatekeeper for
    applications.
➡   Lightest integration
➡   Smallest possible implementation
The eBay approach
➡   Pro: Internally consistent data
➡   Con: Re-sharding of data required
➡   Con: Re-sharding is a DBA
    operation
Partition by key
A simple bucketed partitioning scheme
expressed in code.
Partition by key
Directory
➡   No broadcasting
➡   No re-partitioning
➡   Easy to relocate records
➡   Easy to add capacity
Disadvantages
➡   Intelligence moved from the
    database to the application
    (Queries have to be planned and
    indexed)
➡   Can’t join across partitions
➡   NO OLAP! (We consider this a
    benefit.)
Hibernate Shards
Partitioned Hibernate from Google
Why did we build
this thing again?
Oh wait, you have
to tell it where
things are.
Benefits of Shards
➡   Unified data access layer
➡   Result set aggregation across
    partitions
➡   Everyone in the JAVA-world knows
    Hibernate.
Working on
simplifying it even
further.
But does it go?
Case Study:
CafePress
➡   Leader in User-generated
    Commerce
➡   More products than eBay
    (>250,000,000)
➡   24/7/365
Case Study:
Performance Requirements
➡   Thousands of queries/sec
➡   10 : 1 read/write ratio
➡   Geographically distributed
Case Study:
Test Environment
➡   Real schema
➡   Production-class hardware
➡   Partial data (~40M records)
CafePress HiveDB 2007
Performance Test Environment


                             JMeter (1 thread)                                       JMeter (no threads)
 command &



                                 client.jar                                                client.jar
   control




                              Measurement                                              Test Controller
                               Workstation                                              Workstation
                                                      100MBit switch
    load generators




                       JMeter (100s of threads)                                     JMeter (100s of threads)
                                 client.jar                                                 client.jar

                                                    48GB backplane non-
                        Dell 2950 / 2x2 Xeon        blocking gigabit switch          Dell 2950 / 2x2 Xeon
                        16GB, 6x72GB 15k                                             16GB, 6x72GB 15k
 web service




                                                         Hardware LB
  (hivedb)




                        Dell 1950 / 2x2 Xeon        Dell 1950 / 2x2 Xeon            Dell 1950 / 2x2 Xeon
                             Tomcat 5.5                  Tomcat 5.5                      Tomcat 5.5



                                   Directory              Partition 0              Partition 1
 databases
  (mysql)




                             Dell 2950 / 2x2 Xeon   Dell 2950 / 2x2 Xeon      Dell 2950 / 2x2 Xeon
                             16GB, 6x72GB 15k       16GB, 6x72GB 15k          16GB, 6x72GB 15k

   jmccarthy@cafepress.com                                                                               Modified on April 09 2007
Case Study:
Performance Goals
➡   Large object reads: 1500/s


➡   Large object writes: 200/s


➡   Response time: 100ms
Case Study:
Performance Results
➡   Large object reads: 1500/s
    Actual result: 2250/s
➡   Large object writes: 200/s
    Actual result: 300/s
➡   Response time: 100ms
    Actual result: 8ms
Case Study:
Performance Results
➡   Max read throughput
    Actual result: 4100/s
    (CPU limited in Java layer;
    MySQL <25% utilized)
Case Study:
Results
➡   Billions of queries served
➡   Highest DB uptime at CafePress
➡   Hundreds of millions of updates
    performed
Show don’t tell!
High Availability
➡   We don’t specify a fail over
    strategy.
➡   We don’t specify a backup or
    replication strategy.
Fail Over Options
➡   Load balancer
➡   MySQL Proxy
➡   Flipper / MMM
➡   HAProxy
Replication
➡   You should use MySQL replication.
➡   It really works.
➡   You can add in LVM snapshots for
    even more disastrous disaster
    recovery.
The Bottleneck
There is one problem with HiveDB. The
directory is a scaling bottleneck.
There’s no reason
it has to be a
database.
MemcacheD
➡   Hive directory implemented atop
    memcached
Roadmap
➡   MemcacheD directory
➡   Simplified Hibernate support
➡   Management web service
➡   Command-line tools
➡   Framework Adapters
Call for developers!
➡   Grails (GORM)
➡   ActiveRecord (JRuby on Rails)
➡   Ambition (JRuby or web service)
➡   iBatis
➡   JPA
➡   ... Postgres?
Contributing
➡   Post to the mailing list
    http://groups.google.com/group/hivedb-dev

➡   Comment on our site
    http://www.hivedb.org

➡   Submit a patch / pull request
    git clone git://github.com/britt/hivedb.git
Photo Credits
➡   http://www.flickr.com/photos/7362313@N07/1240245941/
    sizes/o

➡   http://www.flickr.com/photos/
    99287245@N00/2229322675/sizes/o

➡   http://www.flickr.com/photos/
    51035555243@N01/26006138/

➡   http://www.flickr.com/photos/vgm8383/2176897085/
    sizes/o/

➡   http://t-shirts.cafepress.com/item/money-organic-cotton-
    tee/73439294
Blobject
➡   Gets around the problem of ALTER
    statements. No data set of this size
    can be transformed synchronously.
➡   The hive can contain multiple
    versions of a serialized record.
➡   Compression
Hadoop!
➡   Query and transform blobbed data
    asynchronously.

More Related Content

What's hot

Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational PostgresEDB
 
High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016Vlad Mihalcea
 
Parallel Query on Exadata
Parallel Query on ExadataParallel Query on Exadata
Parallel Query on ExadataEnkitec
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
 
Siebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelSiebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelJeroen Burgers
 
BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003Sanjeev Kumar
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability BpChris Adkin
 
Scaling Twitter 12758
Scaling Twitter 12758Scaling Twitter 12758
Scaling Twitter 12758davidblum
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012Lucas Jellema
 
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorPGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorEqunix Business Solutions
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsContinuent
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.EDB
 
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...Continuent
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
Evolutionary Database Design
Evolutionary Database DesignEvolutionary Database Design
Evolutionary Database DesignAndrei Solntsev
 
Tungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersTungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersContinuent
 
Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyFranck Pachot
 
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...Stephan H. Wissel
 

What's hot (19)

Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational Postgres
 
High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016
 
Parallel Query on Exadata
Parallel Query on ExadataParallel Query on Exadata
Parallel Query on Exadata
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Siebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelSiebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for Siebel
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability Bp
 
Scaling Twitter 12758
Scaling Twitter 12758Scaling Twitter 12758
Scaling Twitter 12758
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorPGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.
 
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
Evolutionary Database Design
Evolutionary Database DesignEvolutionary Database Design
Evolutionary Database Design
 
Tungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersTungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten Clusters
 
Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easy
 
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
 

Viewers also liked

Presentation1class Project Scm
Presentation1class Project ScmPresentation1class Project Scm
Presentation1class Project ScmLenny Wijaya
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Avishai Ziv
 
Slidesare Class Project Revisi
Slidesare Class Project RevisiSlidesare Class Project Revisi
Slidesare Class Project RevisiLenny Wijaya
 
Jawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicJawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicLenny Wijaya
 
Class Project MBA 26 Biodisel
Class Project MBA 26 BiodiselClass Project MBA 26 Biodisel
Class Project MBA 26 BiodiselLenny Wijaya
 
Class Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryClass Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryLenny Wijaya
 
Class Project 29: Me
Class Project 29: MeClass Project 29: Me
Class Project 29: MeLenny Wijaya
 

Viewers also liked (9)

Presentation1class Project Scm
Presentation1class Project ScmPresentation1class Project Scm
Presentation1class Project Scm
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
 
Slidesare Class Project Revisi
Slidesare Class Project RevisiSlidesare Class Project Revisi
Slidesare Class Project Revisi
 
Tugas Pa 2 B
Tugas Pa 2 BTugas Pa 2 B
Tugas Pa 2 B
 
Jawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicJawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 Dmaic
 
Class Project MBA 26 Biodisel
Class Project MBA 26 BiodiselClass Project MBA 26 Biodisel
Class Project MBA 26 Biodisel
 
Class Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryClass Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil Industry
 
Class Project 29: Me
Class Project 29: MeClass Project 29: Me
Class Project 29: Me
 

Similar to Cloudcon East Presentation

[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practicejavablend
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityConSanFrancisco123
 
Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29liufabin 66688
 
J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch ProcessingChris Adkin
 
Squeak DBX
Squeak DBXSqueak DBX
Squeak DBXESUG
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
Sql 2012 always on
Sql 2012 always onSql 2012 always on
Sql 2012 always ondilip nayak
 
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod ColledgeDb As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledgesqlserver.co.il
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Finaljucaab
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Miguel Araújo
 
2012 replication
2012 replication2012 replication
2012 replicationsqlhjalp
 
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)mezis
 
download it from here
download it from heredownload it from here
download it from herewebhostingguy
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...LarryZaman
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 

Similar to Cloudcon East Presentation (20)

[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice
 
The RDBMS You Should Be Using
The RDBMS You Should Be UsingThe RDBMS You Should Be Using
The RDBMS You Should Be Using
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And Availability
 
Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29
 
J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch Processing
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
Squeak DBX
Squeak DBXSqueak DBX
Squeak DBX
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
MySQL高可用
MySQL高可用MySQL高可用
MySQL高可用
 
Sql 2012 always on
Sql 2012 always onSql 2012 always on
Sql 2012 always on
 
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod ColledgeDb As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
 
cosbench-openstack.pdf
cosbench-openstack.pdfcosbench-openstack.pdf
cosbench-openstack.pdf
 
2012 replication
2012 replication2012 replication
2012 replication
 
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
 
11g R2
11g R211g R2
11g R2
 
download it from here
download it from heredownload it from here
download it from here
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Cloudcon East Presentation

  • 2. The Problem • Hundreds of millions of products • Millions of new products every week • Accelerating growth
  • 3. Scaling up is no longer an option.
  • 4. Requirements ➡ OLTP optimized ➡ Constant response time is more important than low-latency ➡ Expand capacity online ➡ Ability to move records ➡ No re-sharding
  • 5. 2 years ago Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hadoop MySQL Cluster MogileFS Teradata (OLAP) S3
  • 6. Today Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hypertable Hadoop MySQL Cluster HBase MogileFS DB2 CouchDB S3 Teradata (OLAP) SimpleDB HiveDB
  • 7. System Storage Interface Partitioning Expansion Node Types Maturity No Oracle RAC Shared SQL Transparent Identical 7 years downtime Memory Requires MySQL Cluster SQL Transparent Mixed 3 years / Local Restart ? No Hypertable Local HQL Transparent Mixed (released downtime 2/08) Degraded 3 years DB2 Local SQL Fixed Hash Performan Identical (25 years ce total) Key-based No 18 months HiveDB Local SQL Mixed Directory downtime (+13 years!)
  • 12. Build atop solid, easy to manage technologies.
  • 13. So we can get back to selling t-shirts.
  • 18. The same for JAVA
  • 19.
  • 20. The simplest thing ➡ HiveDB is a JDBC Gatekeeper for applications. ➡ Lightest integration ➡ Smallest possible implementation
  • 21. The eBay approach ➡ Pro: Internally consistent data ➡ Con: Re-sharding of data required ➡ Con: Re-sharding is a DBA operation
  • 22. Partition by key A simple bucketed partitioning scheme expressed in code.
  • 24. Directory ➡ No broadcasting ➡ No re-partitioning ➡ Easy to relocate records ➡ Easy to add capacity
  • 25.
  • 26. Disadvantages ➡ Intelligence moved from the database to the application (Queries have to be planned and indexed) ➡ Can’t join across partitions ➡ NO OLAP! (We consider this a benefit.)
  • 28. Why did we build this thing again?
  • 29. Oh wait, you have to tell it where things are.
  • 30.
  • 31. Benefits of Shards ➡ Unified data access layer ➡ Result set aggregation across partitions ➡ Everyone in the JAVA-world knows Hibernate.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Working on simplifying it even further.
  • 39.
  • 40. But does it go?
  • 41. Case Study: CafePress ➡ Leader in User-generated Commerce ➡ More products than eBay (>250,000,000) ➡ 24/7/365
  • 42. Case Study: Performance Requirements ➡ Thousands of queries/sec ➡ 10 : 1 read/write ratio ➡ Geographically distributed
  • 43. Case Study: Test Environment ➡ Real schema ➡ Production-class hardware ➡ Partial data (~40M records)
  • 44. CafePress HiveDB 2007 Performance Test Environment JMeter (1 thread) JMeter (no threads) command & client.jar client.jar control Measurement Test Controller Workstation Workstation 100MBit switch load generators JMeter (100s of threads) JMeter (100s of threads) client.jar client.jar 48GB backplane non- Dell 2950 / 2x2 Xeon blocking gigabit switch Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k web service Hardware LB (hivedb) Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Tomcat 5.5 Tomcat 5.5 Tomcat 5.5 Directory Partition 0 Partition 1 databases (mysql) Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k 16GB, 6x72GB 15k jmccarthy@cafepress.com Modified on April 09 2007
  • 45. Case Study: Performance Goals ➡ Large object reads: 1500/s ➡ Large object writes: 200/s ➡ Response time: 100ms
  • 46. Case Study: Performance Results ➡ Large object reads: 1500/s Actual result: 2250/s ➡ Large object writes: 200/s Actual result: 300/s ➡ Response time: 100ms Actual result: 8ms
  • 47. Case Study: Performance Results ➡ Max read throughput Actual result: 4100/s (CPU limited in Java layer; MySQL <25% utilized)
  • 48. Case Study: Results ➡ Billions of queries served ➡ Highest DB uptime at CafePress ➡ Hundreds of millions of updates performed
  • 50. High Availability ➡ We don’t specify a fail over strategy. ➡ We don’t specify a backup or replication strategy.
  • 51. Fail Over Options ➡ Load balancer ➡ MySQL Proxy ➡ Flipper / MMM ➡ HAProxy
  • 52. Replication ➡ You should use MySQL replication. ➡ It really works. ➡ You can add in LVM snapshots for even more disastrous disaster recovery.
  • 53. The Bottleneck There is one problem with HiveDB. The directory is a scaling bottleneck.
  • 54. There’s no reason it has to be a database.
  • 55. MemcacheD ➡ Hive directory implemented atop memcached
  • 56. Roadmap ➡ MemcacheD directory ➡ Simplified Hibernate support ➡ Management web service ➡ Command-line tools ➡ Framework Adapters
  • 57. Call for developers! ➡ Grails (GORM) ➡ ActiveRecord (JRuby on Rails) ➡ Ambition (JRuby or web service) ➡ iBatis ➡ JPA ➡ ... Postgres?
  • 58. Contributing ➡ Post to the mailing list http://groups.google.com/group/hivedb-dev ➡ Comment on our site http://www.hivedb.org ➡ Submit a patch / pull request git clone git://github.com/britt/hivedb.git
  • 59. Photo Credits ➡ http://www.flickr.com/photos/7362313@N07/1240245941/ sizes/o ➡ http://www.flickr.com/photos/ 99287245@N00/2229322675/sizes/o ➡ http://www.flickr.com/photos/ 51035555243@N01/26006138/ ➡ http://www.flickr.com/photos/vgm8383/2176897085/ sizes/o/ ➡ http://t-shirts.cafepress.com/item/money-organic-cotton- tee/73439294
  • 60. Blobject ➡ Gets around the problem of ALTER statements. No data set of this size can be transformed synchronously. ➡ The hive can contain multiple versions of a serialized record. ➡ Compression
  • 61. Hadoop! ➡ Query and transform blobbed data asynchronously.