Your SlideShare is downloading. ×
0
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cassandra at eBay - Cassandra Summit 2013

3,683

Published on

"Buy It Now! Cassandra at eBay" talk at Cassandra Summit 2013 …

"Buy It Now! Cassandra at eBay" talk at Cassandra Summit 2013

This session will cover various use cases for Cassandra at eBay. It’ll start with overview of eBay’s heterogeneous data platform comprised of SQL & NoSQL databases, and where Cassandra fits into that. For each use case, Jay will go into detail of system design, data model & multi-datacenter deployment. To conclude, Jay will summarize the best practices that guide Cassandra utilization at eBay.

http://www.datastax.com/company/news-and-events/events/cassandrasummit2013

Published in: Technology, Business
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,683
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
142
Comments
0
Likes
16
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cassandra @ eBayJay PatelArchitect, Platform Systems@pateljay3001
  • 2. eBay MarketplacesThousands of serversPetabytes of dataBillions of SQLs/day24x7x36599.98+% Availabilityturning over a TBevery secondMultiple DatacentersNear-Real-timeAlways online400+ million items for sale$75 billion+ per year in goods are sold on eBayBig Data112 million active usersBillions of page views/day
  • 3. 3eBay Site Data InfrastructureDon’t force!One size does not fit all.It’s a mixture ofmultiple SQL &NoSQL databases.We use the rightdatabase for theright problem.
  • 4. eBay Site Data InfrastructureA heterogeneous mixtureThousands of nodes> 2K sharded logical host> 16K tables> 27K indexes> 140 billion SQLs/day> 5 PB provisionedHundreds of nodesPersistent & in-memory> 40 billion SQLs/day10+ clusters, 100+ nodes> 250 TB provisioned(local HDD + shared SSD)> 9 billion writes/day> 5 billion reads/dayHundreds of nodes> 50 TB> 2 billion ops/dayThousands of nodesThe world largestcluster with 2K+ nodesDozens of nodes
  • 5. How do we scale RDBMS? Shard– Patterns: Modulus, lookup-based, range, etc.– Application sees only logical shard/database Replicate– Disaster recovery, read availability & read scalability Big NOs– No transactions– No joins– No referential integrity constraints 5
  • 6. Why Cassandra? Multi-datacenter (active-active) Always Available - No SPOF Easy to scale up & down6 Write performance Distributed counters Hadoop supportNot replacing RDBMS, but complementing! Some use cases don’t fit well in RDBMS - sparse data, big data,flexible schema, real-time analytics, … Many use cases don’t need top-tier set-ups.
  • 7. Cassandra GrowthAug,2011Aug,2012ay,20131234567Billions(per day)writesasync. readssync. site readsTerabytes50100200250300350storage capacityDoesn’t predictbusiness7
  • 8. eBay Use Cases on Cassandra Time-series data, real-time insights & immediate actions• Fraud detection & prevention• Quality Click Pricing for affiliates• Order & shipment tracking and insights• Mobile notification logging & tracking• Cloud CMS change history storage• RedLaser server logs and analytics Server metrics collection for monitoring & alerting Taste graph based next-gen recommendation system Personalization Data Service Social Signals on eBay Product & Item pages Milo’s store-item availability inventory (evaluation phase) 8
  • 9. Real-time insights & actions for9Fraud Prevention ReportingQuality Click Pricing More…
  • 10. 10System OverviewBusiness Event StreamCheckout Shipping Refund & Recoup …Order placed(bin/bid)Paid Shipped RefundedRawdataSimple in-memory aggregations +/Complex Event Processing +/Cassandra’s distributed countersLabel printed per day per userUser segmentation for affiliate pricingOrders per hour, …Multiple Cassandra clustersPaymentActinreal-timeFraud PreventionAffiliate Pricing Engine(eBay Partner Network)Order trackingReal-time reporting…(Kept from several months to years)
  • 11. A glimpse on Data Model11Historic & real-time insights per user per carrier.Sudden & drastic change might be suspicious.User bucketing based on historic& real-time buying activity.
  • 12. A glimpse on Data Model12
  • 13. Fraud Detection & Prevention13Shop with Confidence
  • 14. System Overview14CassandraFraud Detection & Prevention SystemSign-ininfoBusiness events(checkout, sell,…)StaaSOracleCheckout Shipping …PaymentSellingReal-timeBeacons dataReal-timeInsightsOther dataMachineLearned Models
  • 15. 15A glimpse on Data ModelCollected at sign-in& stored as key-value.Pulled periodically to StaaS fortraining machine learned models.
  • 16. Metrics collection for monitoring & alerting16
  • 17. System Overview17Transport (HTTP, …)Scalable NIOservers basedon NettyThousands ofproductionmachinesCassandraStats for CPU, Memory, Disk, ..…agent agent agent agent …Server Server Server Server ServerIn-memory grid (hazelcast) for rollups
  • 18. A glimpse on Data Model18Granular data pointsRolled up metricsfor various time intervals
  • 19. Taste graph based recommendation system19
  • 20. Data Model20TasteGraphTasteVector50 billion+ edges, 600 million+ writes, 3 billion+ reads, 30TB+ of data on SSD
  • 21. System Overview21Business Event StreamRecommendation systemTaste GraphTaste Vector1. Item purchased.2a. Write purchase edge.2b. Read other edges for this user & item.4. Req. recommendations.5. Finds other items close touser’s coordinates.6. Reco. shown to userMore, http://www.slideshare.net/planetcassandra/e-bay-nyc
  • 22. Real-time Personalization Data Service22User performs search using keyword User gets personalized pages based onimplicit/explicit profile
  • 23. System Overview23Personalization Data ServiceCacheMesh(write-back cache)Heavy writeseBay site pages (personalized)Every few minsin-memoryMySQL& XMP DBCassandraOracle(scaled out) HeavyreadsCache missuser profilesApplication SOA services (multiple)DataWarehouse
  • 24. Data Model24• Keep column names short.• Don’t overload one CF with all the data:- Split hot & cold data in separate CF.- Splitting & sharding can help compaction.Static column families
  • 25. 25Served byCassandraSocial Signals
  • 26. Manage signals via “Your Favorites”26Whole page isserved byCassandraMore, http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
  • 27. Multi-Datacenter Deployment27Topology - NTSRF - 1:1 or 2:2 or 3:3Read CL - ONE/QUORUMWrite CL - ONEData is backed up periodicallyto protect against human orsoftware errorUser request has no datacenter affinityNon-sticky load balancing
  • 28. Multi-Datacenter DeploymentTopology - NTSRF – 1:1:1 or2:2:2
  • 29. Lessons & Best Practices• One size does not fit all– Use Cassandra for the right use cases.• Choose proper Replication Factor and Consistency Level– They alter latency, availability, durability, consistency and cost.– Cassandra supports tunable consistency, but remember strong consistency is not free.• Many ways to model data in Cassandra– The best way depends on your use case and query patterns.• De-normalize and duplicate for read performance– But don’t de-normalize if you don’t need to.http://www.slideshare.net/jaykumarpatel/cassandra-data-modeling-best-practices29
  • 30. Are you excited? Come Join Us!30Thank You@pateljay3001#cassandra13

×