Cassandra in            |   Online Advertising:                Real Time Biddingthe prospect engine for brands.
Who are we?Costa Sevdinoglou & Edward Capriolo
Impressions look like…
A High Level look at RTB1. Browsers visit Publishers and create impressions.2. Publishers sell impressions via Exchanges.3...
Performance and Data• Billions and billions of bid requests a day  • A single request can result in multiple       Cassand...
Segment DataSegments are how we assign product or serviceaffinity to a group of users. User’s we consider to belike minded...
Old Approach for Segment Data                  Application Nodes                  (Tomcat + MySQL )                       ...
Cassandra Approach        for Segment DataApplication Nodes                  Better! (Tomcat + Less     •   Updating in re...
One Ring to rule them allhttp://askyyy.blog.163.com/blog/static/1234575992010428819399/
Peer to Peer            per operation replication   Fail fast, self-healing   Each write goes to all natural endpoints ...
Multi Data Center No designing and managing complex replication topologies create keyspace worldwith placement_strategy ...
Monitoring & Management   Many Many things to monitor with JMX   Nice command line tools   Most values can be tweaked a...
Capacity Planning   How many          Rows          Columns          Size of Average Column   Latency requirements  ...
Unit Tests FTW!
Max 2 billion columns per row   Awesome          Unless you accidentally write 2 billion           columns to a row key ...
Local (NYC) Meetups   www.meetup.com/NYC-Cassandra-User-    Group/
Upcoming SlideShare
Loading in …5
×

Real World Cassandra

10,013 views

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
10,013
On SlideShare
0
From Embeds
0
Number of Embeds
8,963
Actions
Shares
0
Downloads
6
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Real World Cassandra

  1. 1. Cassandra in | Online Advertising: Real Time Biddingthe prospect engine for brands.
  2. 2. Who are we?Costa Sevdinoglou & Edward Capriolo
  3. 3. Impressions look like…
  4. 4. A High Level look at RTB1. Browsers visit Publishers and create impressions.2. Publishers sell impressions via Exchanges.3. Exchanges serve as auction houses for the impressions4. On behalf of the marketer, m6d bids the impressions via the auction house. If m6d wins, we display our ad to the browser.
  5. 5. Performance and Data• Billions and billions of bid requests a day • A single request can result in multiple Cassandra Operations! • One cluster is just under 10TB and growing• Low latency requirement below 120 ms typical• Limited data available to m6d via the exchange
  6. 6. Segment DataSegments are how we assign product or serviceaffinity to a group of users. User’s we consider to belike minded with respect to a given brand will beplaced in the same segment.Segment Data is just one component of ouroverarching data model.Segments help to reduce the number of calculationswe do in real time.
  7. 7. Old Approach for Segment Data Application Nodes (Tomcat + MySQL ) Limitations •Periodically updated.MySQL Data Push Event Logs •Only subsection of the data. •Cluster performance is effected during a data push. Aggregation Hadoop
  8. 8. Cassandra Approach for Segment DataApplication Nodes Better! (Tomcat + Less • Updating in real time now MySQL Usage) possible • Distributed not duplicated • Less complexity to manage • Storing more information • We can now bid on users Cassandra sooner!
  9. 9. One Ring to rule them allhttp://askyyy.blog.163.com/blog/static/1234575992010428819399/
  10. 10. Peer to Peer per operation replication Fail fast, self-healing Each write goes to all natural endpoints Hinted handoff if destination is down Repair on Read No more:  STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;
  11. 11. Multi Data Center No designing and managing complex replication topologies create keyspace worldwith placement_strategy = org.apache.cassandra.locator.NetworkTopologyStrategyand strategy_options={1:3, 2:3, 3:3}; The same process as single data center No log shipping, or separate processes to run
  12. 12. Monitoring & Management Many Many things to monitor with JMX Nice command line tools Most values can be tweaked at run time
  13. 13. Capacity Planning How many  Rows  Columns  Size of Average Column Latency requirements Throughput read and writes per sec
  14. 14. Unit Tests FTW!
  15. 15. Max 2 billion columns per row Awesome  Unless you accidentally write 2 billion columns to a row key named “null” Check maxRowSize JMX Watch logs for messages about compacting large rows
  16. 16. Local (NYC) Meetups www.meetup.com/NYC-Cassandra-User- Group/

×