SlideShare a Scribd company logo
1 of 30
Download to read offline
August 8, 2012




Cassandra at eBay
    Time left: 29m 59s




                     Jay Patel
                     Architect, Platform Systems
                     @pateljay3001
eBay Marketplaces
 97 million active buyers and sellers
 200+ million items
 2 billion page views each day
 80 billion database calls each day
 5+ petabytes of site storage capacity
 80+ petabytes of analytics storage capacity

                                                2
How do we scale databases?
 Shard
   – Patterns: Modulus, lookup-based, range, etc.
   – Application sees only logical shard/database
 Replicate
   – Disaster recovery, read availability/scalability
 Big NOs
   – No transactions
   – No joins
   – No referential integrity constraints
                                                        3
We like Cassandra
 Multi-datacenter (active-active)    Write performance
 Availability - No SPOF              Distributed counters
 Scalability                         Hadoop support



We also utilize MongoDB & HBase




                                                              4
Are we replacing RDBMS with NoSQL?

          Not at all! But, complementing.
 Some use cases don’t fit well - sparse data, big data, schema
  optional, real-time analytics, …
 Many use cases don’t need top-tier set-ups - logging, tracking, …




                                                                  5
A glimpse on our Cassandra deployment
 Dozens of nodes across multiple clusters
 200 TB+ storage provisioned
 400M+ writes & 100M+ reads per day, and growing
 QA, LnP, and multiple Production clusters




                                                    6
Use Cases on Cassandra
      Social Signals on eBay product & item pages
      Hunch taste graph for eBay users & items
      Time series use cases (many):
     Mobile notification logging and tracking
     Tracking for fraud detection
     SOA request/response payload logging
     RedLaser server logs and analytics

                                                    7
Served by
Cassandra




            8
Manage signals via “Your Favorites”




                                      Whole page is
                                      served by
                                      Cassandra




                                                9
Why Cassandra for Social Signals?
 Need scalable counters
 Need real (or near) time analytics on collected social data
 Need good write performance
 Reads are not latency sensitive




                                                                10
Deployment
                 User request has no datacenter affinity


                           Non-sticky load balancing




Topology - NTS           Data is backed up periodically
RF - 2:2                 to protect against human or
Read CL - ONE            software error
Write CL – ONE

                                                       11
Data Model
             depends on query patterns




                                         12
Data Model (simplified)




                          13
Wait…



                    Duplicates!




        Oh, toggle button!
        Signal --> De-signal --> Signal…
                                       14
Yes, eventual consistency!
One scenario that produces duplicate signals in UserLike CF:
   1. Signal
   2. De-signal (1st operation is not propagated to all replica)
   3. Signal, again (1st operation is not propagated yet!)



 So, what’s the solution? Later…

                                                                   15
Social Signals, next phase: Real-time Analytics
 Most signaled or popular items per affinity groups (category, etc.)
 Aggregated item count per affinity group



                                                     Example affinity group




                                                                              16
Initial Data Model for real-time analytics

                                               Items in an affinitygroup
                                               is physically stored
                                               sorted by their signal
                                               count




                           Update counters for both individual item
                           and all the affinity groups that item
                           belongs to
Deployment, next phase




Topology - NTS
RF - 2:2:2
user1       bid
                                  item1
        buy

item2         watch               sell
                        user2




                                          19
Graph in Cassandra
Event consumers listen for site events (sell/bid/buy/watch) & populate graph in Cassandra




   30 million+ writes daily                Batch-oriented reads
   14 billion+ edges already                (for taste vector updates)
                                                                                    20
 Mobile notification logging and tracking
 Tracking for fraud detection
 SOA request/response payload logging
 RedLaser server logs and analytics




                                             21
A glimpse on Data Model
RedLaser tracking & monitoring console




                                         23
That’s all about the use cases..
Remember the duplicate problem in Use Case #1?




  Let’s see some options we considered to solve this…
                                                    24
Option 1 – Make ‘Like’ idempotent for UserLike
 Remove time (timeuuid) from the composite column name:
    Multiple signal operations are now Idempotent
    No need to read before de-signaling (deleting)




    X            Need timeuuid for ordering!
                 Already have a user with more than 1300 signals   25
Option 2 – Use strong consistency

 Local Quorum
  – Won’t help us. User requests are not geo-load balanced
    (no DC affinity).
 Quorum
  – Won’t survive during partition between DCs (or, one of the
    DC is down). Also, adds additional latency.

              X      Need to survive!
                                                             26
Option 3 – Adapt to eventual consistency
If desire survival!




                                                                              27
                      http://www.strangecosmos.com/content/item/101254.html
Adjustments to eventual consistency
 De-signal steps:
      – Don’t check whether item is already signaled by a user, or not
      – Read all (duplicate) signals from UserLike_unordered (new CF to avoid reading
        whole row from UserLike)
      – Delete those signals from UserLike_unordered and UserLike




Still, can get duplicate signals or false positives as there is a ‘read before delete’.
To shield further, do ‘repair on read’.                  Not a full story!
                                                                                     28
Lessons & Best Practices
• Choose proper Replication Factor and Consistency Level.
    – They alter latency, availability, durability, consistency and cost.
    – Cassandra supports tunable consistency, but remember strong consistency is not free.
• Consider all overheads in capacity planning.
    – Replicas, compaction, secondary indexes, etc.
• De-normalize and duplicate for read performance.
    – But don’t de-normalize if you don’t need to.
• Many ways to model data in Cassandra.
    – The best way depends on your use case and query patterns.
                More on http://ebaytechblog.com?p=1308
Thank You
  @pateljay3001
  #cassandra12
                  30

More Related Content

What's hot

Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimizationSANG WON PARK
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversScyllaDB
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandraVinay Kumar Chella
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 

What's hot (20)

Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandra
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 

Viewers also liked

Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query languageCourtney Robinson
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
 

Viewers also liked (6)

Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 

Similar to Cassandra at eBay - Cassandra Summit 2012

Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...SL Corporation
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsPanagiotis Papadopoulos
 
Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev
 
Real World Cassandra
Real World CassandraReal World Cassandra
Real World CassandraGiltTech
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scalexcbsmith
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastMapR Technologies
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationShanley Kane
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overviewElifTech
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...netvis
 
Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Monitoring applications on cloud - Indicthreads cloud computing conference 2011Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Monitoring applications on cloud - Indicthreads cloud computing conference 2011IndicThreads
 
High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017Rick Hightower
 

Similar to Cassandra at eBay - Cassandra Summit 2012 (20)

Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
 
Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9
 
Real World Cassandra
Real World CassandraReal World Cassandra
Real World Cassandra
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scale
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 Presentation
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Monitoring applications on cloud - Indicthreads cloud computing conference 2011Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Monitoring applications on cloud - Indicthreads cloud computing conference 2011
 
High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Cassandra at eBay - Cassandra Summit 2012

  • 1. August 8, 2012 Cassandra at eBay Time left: 29m 59s Jay Patel Architect, Platform Systems @pateljay3001
  • 2. eBay Marketplaces  97 million active buyers and sellers  200+ million items  2 billion page views each day  80 billion database calls each day  5+ petabytes of site storage capacity  80+ petabytes of analytics storage capacity 2
  • 3. How do we scale databases?  Shard – Patterns: Modulus, lookup-based, range, etc. – Application sees only logical shard/database  Replicate – Disaster recovery, read availability/scalability  Big NOs – No transactions – No joins – No referential integrity constraints 3
  • 4. We like Cassandra  Multi-datacenter (active-active)  Write performance  Availability - No SPOF  Distributed counters  Scalability  Hadoop support We also utilize MongoDB & HBase 4
  • 5. Are we replacing RDBMS with NoSQL? Not at all! But, complementing.  Some use cases don’t fit well - sparse data, big data, schema optional, real-time analytics, …  Many use cases don’t need top-tier set-ups - logging, tracking, … 5
  • 6. A glimpse on our Cassandra deployment  Dozens of nodes across multiple clusters  200 TB+ storage provisioned  400M+ writes & 100M+ reads per day, and growing  QA, LnP, and multiple Production clusters 6
  • 7. Use Cases on Cassandra Social Signals on eBay product & item pages Hunch taste graph for eBay users & items Time series use cases (many):  Mobile notification logging and tracking  Tracking for fraud detection  SOA request/response payload logging  RedLaser server logs and analytics 7
  • 9. Manage signals via “Your Favorites” Whole page is served by Cassandra 9
  • 10. Why Cassandra for Social Signals?  Need scalable counters  Need real (or near) time analytics on collected social data  Need good write performance  Reads are not latency sensitive 10
  • 11. Deployment User request has no datacenter affinity Non-sticky load balancing Topology - NTS Data is backed up periodically RF - 2:2 to protect against human or Read CL - ONE software error Write CL – ONE 11
  • 12. Data Model depends on query patterns 12
  • 14. Wait… Duplicates! Oh, toggle button! Signal --> De-signal --> Signal… 14
  • 15. Yes, eventual consistency! One scenario that produces duplicate signals in UserLike CF: 1. Signal 2. De-signal (1st operation is not propagated to all replica) 3. Signal, again (1st operation is not propagated yet!) So, what’s the solution? Later… 15
  • 16. Social Signals, next phase: Real-time Analytics  Most signaled or popular items per affinity groups (category, etc.)  Aggregated item count per affinity group Example affinity group 16
  • 17. Initial Data Model for real-time analytics Items in an affinitygroup is physically stored sorted by their signal count Update counters for both individual item and all the affinity groups that item belongs to
  • 19. user1 bid item1 buy item2 watch sell user2 19
  • 20. Graph in Cassandra Event consumers listen for site events (sell/bid/buy/watch) & populate graph in Cassandra  30 million+ writes daily  Batch-oriented reads  14 billion+ edges already (for taste vector updates) 20
  • 21.  Mobile notification logging and tracking  Tracking for fraud detection  SOA request/response payload logging  RedLaser server logs and analytics 21
  • 22. A glimpse on Data Model
  • 23. RedLaser tracking & monitoring console 23
  • 24. That’s all about the use cases.. Remember the duplicate problem in Use Case #1? Let’s see some options we considered to solve this… 24
  • 25. Option 1 – Make ‘Like’ idempotent for UserLike  Remove time (timeuuid) from the composite column name:  Multiple signal operations are now Idempotent  No need to read before de-signaling (deleting) X Need timeuuid for ordering! Already have a user with more than 1300 signals 25
  • 26. Option 2 – Use strong consistency  Local Quorum – Won’t help us. User requests are not geo-load balanced (no DC affinity).  Quorum – Won’t survive during partition between DCs (or, one of the DC is down). Also, adds additional latency. X Need to survive! 26
  • 27. Option 3 – Adapt to eventual consistency If desire survival! 27 http://www.strangecosmos.com/content/item/101254.html
  • 28. Adjustments to eventual consistency De-signal steps: – Don’t check whether item is already signaled by a user, or not – Read all (duplicate) signals from UserLike_unordered (new CF to avoid reading whole row from UserLike) – Delete those signals from UserLike_unordered and UserLike Still, can get duplicate signals or false positives as there is a ‘read before delete’. To shield further, do ‘repair on read’. Not a full story! 28
  • 29. Lessons & Best Practices • Choose proper Replication Factor and Consistency Level. – They alter latency, availability, durability, consistency and cost. – Cassandra supports tunable consistency, but remember strong consistency is not free. • Consider all overheads in capacity planning. – Replicas, compaction, secondary indexes, etc. • De-normalize and duplicate for read performance. – But don’t de-normalize if you don’t need to. • Many ways to model data in Cassandra. – The best way depends on your use case and query patterns. More on http://ebaytechblog.com?p=1308
  • 30. Thank You @pateljay3001 #cassandra12 30