0
The Big DataRevolution is an               Eric Lubow               @elubow               elubow@simplereach.co
Overvie•   Evolution•   SimpleReach•   Data Stores / Languages•   Architecture Implementation                  Big Data Re...
Were in the midst of anevolution, not a revolution.       Big Data Revolution is an   Eric Lubow  @elubow       Evolution ...
The 2 Truths      Big Data Revolution is an   Eric Lubow  @elubow      Evolution                   #NYCassandra2013
The Real TruthEven with the right tools, 80% ofthe work of building a big datasystem is acquiring and refining          Big...
30m plays/day + 4m user ratings + 75k movies metadata + 24.4m usemetadata =    David Fincher + Kevin                   Mit...
BRING ITTOGETHE       Big Data Revolution is an   Eric Lubow  @elubow       Evolution                   #NYCassandra2013
revolution                                          evolution  Insufficient                                               ...
Big Data Revolution is an   Eric Lubow  @elubowEvolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubowEvolution                   #NYCassandra2013
SimpleReach•   Millions of URLs per day•   Over 1 billion pageviews per month•   250m events per day (~3k events/second)• ...
HUMBLE BEGINNINGS  Big Data Revolution is an   Eric Lubow  @elubow  Evolution                   #NYCassandra2013
Scale        Big Data Revolution is an   Eric Lubow  @elubow        Evolution                   #NYCassandra2013
AND THEN... C*Big Data Revolution is an   Eric Lubow  @elubowEvolution                   #NYCassandra2013
Cassandra                                                           C*•   Large data volume ingestion at high velocity•   ...
•    MongoDB    Fast atomic increments (Node.js is native JSON)•   Sharding•   Solid ORM for Rails (MongoID)•   B-Tree Ind...
Redis•   Supports hundreds of thousands transactions per    second•   Great caching engine•   Supports useful variable typ...
Infobright•   Works with standard MySQL driver•   Column Stores for ad-hoc analytics queries    in SQL•   Heavy compressio...
The•   c0dez    Polyglottany doesn’t only apply to data stores•   Each language has its own benefit to each stack    layer•...
Big Data Revolution is an   Eric Lubow  @elubowEvolution                   #NYCassandra2013
Cons•   Redis - Can only utilize a single core. SerDe price.•   Infobright - DELETE/UPDATEs are VERY expensive•   Cassandr...
•    Evolution Takes Work    Service Oriented Architecture (Internal API)•   Data accuracy checks: visual and programmatic...
Service  Solr  C*Real-time  C*                      Internal API            Big Data Revolution is an   Eric Lubow  @elubo...
Path of a Packet           Fire                                                 Solr           Hos                        ...
Architecture Distribution    US-EAST-1a                  US-EAST-1b               US-EAST-1e  CASSANDRA-0001            CA...
The Schrute of the Problem     Big Data Revolution is an   Eric Lubow  @elubow     Evolution                   #NYCassandr...
Evolving Amazon Tools            •   CloudSearch•   Full Featured API                                     •   Elastic Bean...
DevOps Wizardry•   Extensive use of AWS•   Monitor: Nagios, Statsd, and Graphite•   Manage: Chef, OpsWorks, cSSHx•   Deplo...
•    Summary    Solutions Require Evolution•   Build, Use, and Integrate Tools•   Abstraction•   Distribution•   Monitorin...
Evolution TakesTimeA revolution only lasts fifteenyears, a period whichcoincides with the          Big Data Revolution is a...
We’re(Ask us about Foodis an      Big Data Revolution Coma Fridays)                               Eric Lubow   @elubow    ...
Questions are guaranteed in life.Answers aren’t.                                      Eric Lubow                          ...
Upcoming SlideShare
Loading in...5
×

The Big Data Revolution is an Evolution

571

Published on

Dealing with data doesn't only require a data store, it requires an infrastructure. At SimpleReach, we have 5 data storage layers to service all of our data needs. These range from high volume, high velocity data ingestion with real-time analytics to ad-hoc style historical analysis with search capabilities. To communicate effectively between applications, data stores sit behind a service architecture for consistent data access patterns and failover/redundancy. This talk is a story of how we came to this architecture and some of the lessons we learned along the way.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
571
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "The Big Data Revolution is an Evolution"

  1. 1. The Big DataRevolution is an Eric Lubow @elubow elubow@simplereach.co
  2. 2. Overvie• Evolution• SimpleReach• Data Stores / Languages• Architecture Implementation Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  3. 3. Were in the midst of anevolution, not a revolution. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  4. 4. The 2 Truths Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  5. 5. The Real TruthEven with the right tools, 80% ofthe work of building a big datasystem is acquiring and refining Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  6. 6. 30m plays/day + 4m user ratings + 75k movies metadata + 24.4m usemetadata = David Fincher + Kevin Mitch Hurwitz + Will Arnett + Spacey + British House of Jason Bateman + Arrested Cards Development Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  7. 7. BRING ITTOGETHE Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  8. 8. revolution evolution Insufficient New Products Capabilities Scale/Need Development & Changes Integration Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  9. 9. Big Data Revolution is an Eric Lubow @elubowEvolution #NYCassandra2013
  10. 10. Big Data Revolution is an Eric Lubow @elubowEvolution #NYCassandra2013
  11. 11. SimpleReach• Millions of URLs per day• Over 1 billion pageviews per month• 250m events per day (~3k events/second)• Auto-scale 90-130 machines depending on traffic Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  12. 12. HUMBLE BEGINNINGS Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  13. 13. Scale Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  14. 14. AND THEN... C*Big Data Revolution is an Eric Lubow @elubowEvolution #NYCassandra2013
  15. 15. Cassandra C*• Large data volume ingestion at high velocity• Really fast writes to many locations (eventual consistency)• Query by column groups within rows (slicing)• TTLs for small group aggregation• Wrote Helenus, Node.js driver for Cassandra Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  16. 16. • MongoDB Fast atomic increments (Node.js is native JSON)• Sharding• Solid ORM for Rails (MongoID)• B-Tree Indexes• Document based via JSON• TTLs for ephemeral data Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  17. 17. Redis• Supports hundreds of thousands transactions per second• Great caching engine• Supports useful variable types like sets, sorted set, lists• Everything is guaranteed to be Memory Mapped Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  18. 18. Infobright• Works with standard MySQL driver• Column Stores for ad-hoc analytics queries in SQL• Heavy compression of data (avg 12:1) Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  19. 19. The• c0dez Polyglottany doesn’t only apply to data stores• Each language has its own benefit to each stack layer• Each language has its own individual benefits• Each language has its own development benefits Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  20. 20. Big Data Revolution is an Eric Lubow @elubowEvolution #NYCassandra2013
  21. 21. Cons• Redis - Can only utilize a single core. SerDe price.• Infobright - DELETE/UPDATEs are VERY expensive• Cassandra - No btree indexes or probabilistic counters• Mongo - Indexes must fit in memory. Forced Replica ping times• Python - Whitespace. Community• Ruby - Not high performance enough for our standards Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  22. 22. • Evolution Takes Work Service Oriented Architecture (Internal API)• Data accuracy checks: visual and programmatic• Built framework for testing out engines (Storage, Queueing, etc)• Access to many toolsets (for all languages, DBs, Engines) Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  23. 23. Service Solr C*Real-time C* Internal API Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  24. 24. Path of a Packet Fire Solr Hos C* Internal API Consumers EP QueueInternet Mong API Redis SC IB Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  25. 25. Architecture Distribution US-EAST-1a US-EAST-1b US-EAST-1e CASSANDRA-0001 CASSANDRA-0002 CASSANDRA-0003 CASSANDRA-0010 CASSANDRA-0011 CASSANDRA-0012 REDIS-0001A REDIS-0001B INFOBRIGHT-00 INFOBRIGHT-00 01 02MONGO-SHARD-0000-A MONGO-SHARD-0000-BMONGO-SHARD-0001-B MONGO-SHARD-0001-A MONGO-SHARD-0002-B MONGO-SHARD-0002-A iAPI-0001 iAPI-0002 iAPI-0003 Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  26. 26. The Schrute of the Problem Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  27. 27. Evolving Amazon Tools • CloudSearch• Full Featured API • Elastic Beanstalk• Simple Queuing Service • Elastic MapReduce• Data Pipelining • Simple Workflow Coordinator• OpsWorks • S3 / Glacier• Cloud Formation• Redshift Analytics Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  28. 28. DevOps Wizardry• Extensive use of AWS• Monitor: Nagios, Statsd, and Graphite• Manage: Chef, OpsWorks, cSSHx• Deployments Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  29. 29. • Summary Solutions Require Evolution• Build, Use, and Integrate Tools• Abstraction• Distribution• Monitoring & Automation Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  30. 30. Evolution TakesTimeA revolution only lasts fifteenyears, a period whichcoincides with the Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  31. 31. We’re(Ask us about Foodis an Big Data Revolution Coma Fridays) Eric Lubow @elubow Evolution #NYCassandra2013
  32. 32. Questions are guaranteed in life.Answers aren’t. Eric Lubow @elubow elubow@simplereach.co Thank Big Data Revolution is an you. Eric Lubow @elubow Evolution #NYCassandra2013
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×