Eric Lubow@elubowelubow@simplereach.comBigArchitecturesfor Big Data
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Overvie• SimpleReach• Goals• Tools• Architecture Implementation
Big Architectures for BigDataEric Lubow @elubow#Cassandra13The 2 Truths
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Even with the right tools, 80% ofthe work of building a big dat...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13
Big Architectures for BigDataEric Lubow @elubow#Cassandra13
Big Architectures for BigDataEric Lubow @elubow#Cassandra13• Millions of URLs per day• Over 1.25 billion page views per mo...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13And It Goes Like This...C*Vertica
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Goals• Consistent non-data storage layer access patterns• Data ...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Consistent Access Patternsrealtime_score(‘score’,‘realtime’)
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Authentication, Tracking,Per serviceaccess keysTrack callvolume...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Controlled Data FlowSocialEventCollectorSocialDataBatch & Write...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13NSQ by Bit.ly• Distributed and de-centralized topology• At leas...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Path of a PacketInternetECInternalAPISolrC*MongRedisVerticAPIFi...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Evolution Takes Work• Know your access patterns• Service Orient...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Homogeneous Machines at BaseApplicationBase AMIOrganizational B...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13DevOps Wizardry• Extensive use of AWS• Monitor: Nagios, Statsd,...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Evolving Amazon Tools• Full Featured API• OpsWorks• Cloud Forma...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13ServiceInternal APISolrReal-timeC*C*Vertica
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Service Architecture MachinesApplicationBase AMIOrganizational ...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Anatomy of an EndpointMongMongVerticC*C*hourlycontentMongMongVe...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Endpoint Breakout• Availability• Consistent Access Patterns• Mi...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Architecture DistributionUS-EAST-1aMONGO-SHARD-0001-BMONGO-SHAR...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Problems?
Big Architectures for BigDataEric Lubow @elubow#Cassandra13The Schrute of the Problem
Big Architectures for BigDataEric Lubow @elubow#Cassandra13New Service Questions• Can its host be completely homogenous?• ...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Summary• Solutions Require Evolution• Build, Use, and Integrate...
Big Architectures for BigDataEric Lubow @elubow#Cassandra13We’re(Ask about Food Coma Fridays)
Big Architectures for BigDataEric Lubow @elubow#Cassandra13Questions are guaranteed in life.Answers aren’t.Eric Lubow@elub...
Upcoming SlideShare
Loading in …5
×

C* Summit 2013: Big Architectures for Big Data by Eric Lubow

2,314 views

Published on

Having many different technologies within an organization can be problematic for developers and operations alike. Structuring those systems into discrete modules not only abstracts away a lot of the complexity of a heterogeneous architecture, it also allows the evolution of systems using common access and storage patterns. This session will discuss how to think about, architect, and maintain a service architecture for a big data system.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,314
On SlideShare
0
From Embeds
0
Number of Embeds
98
Actions
Shares
0
Downloads
69
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

C* Summit 2013: Big Architectures for Big Data by Eric Lubow

  1. 1. Eric Lubow@elubowelubow@simplereach.comBigArchitecturesfor Big Data
  2. 2. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Overvie• SimpleReach• Goals• Tools• Architecture Implementation
  3. 3. Big Architectures for BigDataEric Lubow @elubow#Cassandra13The 2 Truths
  4. 4. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Even with the right tools, 80% ofthe work of building a big datasystem is acquiring and refiningThe Real Truth
  5. 5. Big Architectures for BigDataEric Lubow @elubow#Cassandra13
  6. 6. Big Architectures for BigDataEric Lubow @elubow#Cassandra13
  7. 7. Big Architectures for BigDataEric Lubow @elubow#Cassandra13• Millions of URLs per day• Over 1.25 billion page views per month• 500m events per day (~6k events/second)• Auto-scale 125-160 machines depending on trafficSimpleReach
  8. 8. Big Architectures for BigDataEric Lubow @elubow#Cassandra13And It Goes Like This...C*Vertica
  9. 9. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Goals• Consistent non-data storage layer access patterns• Data accuracy across storage engines• Minimize downtime/Minimize cost of downtime• High availability• Allow access to many toolsets (for all languages, DBs,Engines)• Clients should have minimal architecture knowledge
  10. 10. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Consistent Access Patternsrealtime_score(‘score’,‘realtime’)
  11. 11. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Authentication, Tracking,Per serviceaccess keysTrack callvolume byaccess keyPreventinternaldenial ofserviceMonitoravailability andperformance
  12. 12. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Controlled Data FlowSocialEventCollectorSocialDataBatch & WriteProcessedDataBatch & WriteRaw DataCalculateScoreWriteNSQ Multicast NSQ NSQ
  13. 13. Big Architectures for BigDataEric Lubow @elubow#Cassandra13NSQ by Bit.ly• Distributed and de-centralized topology• At least once delivery guaranteed• Multicast style message routing• Runtime discovery for consumers to findproducers• Allow for maintenance windows with nodowntime
  14. 14. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Path of a PacketInternetECInternalAPISolrC*MongRedisVerticAPIFireHosSCConsumersQueue
  15. 15. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Evolution Takes Work• Know your access patterns• Service Oriented Architecture (Internal API)• Data accuracy checks: visual and programmatic• Built framework for testing out engines (Storage,Queueing, etc)
  16. 16. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Homogeneous Machines at BaseApplicationBase AMIOrganizational BaseEvent CollectionNSQMongosApp ConfigUsersMonitoringConsumerNSQMongosApp ConfigUsersBase Image Layout Producer ConsumerAmazon LinuxMonitoringAmazon LinuxApplication Group
  17. 17. Big Architectures for BigDataEric Lubow @elubow#Cassandra13DevOps Wizardry• Extensive use of AWS• Monitor: Nagios, Statsd, and Graphite• Manage: Chef, OpsWorks, cSSHx, Vagrant• Deployments
  18. 18. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Evolving Amazon Tools• Full Featured API• OpsWorks• Cloud Formation• S3 / CloudFront• Elastic Beanstalk• ElasticMapReduce
  19. 19. Big Architectures for BigDataEric Lubow @elubow#Cassandra13ServiceInternal APISolrReal-timeC*C*Vertica
  20. 20. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Service Architecture MachinesApplicationBase AMIOrganizational BaseiAPI Front EndnginxApp ConfigUsersMonitoringData StoreApp ConfigUsersBase Image Layout Proxy Machines Storage MachinesAmazon LinuxMonitoringAmazon LinuxApplication Group
  21. 21. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Anatomy of an EndpointMongMongVerticC*C*hourlycontentMongMongVerticC*C*tenminutecontentQueryingMachinesHelenHelenPyVerticPyMonPyMonPyVertic
  22. 22. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Endpoint Breakout• Availability• Consistent Access Patterns• Minimal downtime changes• Smaller code deploys• Non-monolithic code base
  23. 23. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Architecture DistributionUS-EAST-1aMONGO-SHARD-0001-BMONGO-SHARD-0000-ACASSANDRA-0001CASSANDRA-0010REDIS-0001AVERTICA-0001iAPI-0001US-EAST-1bMONGO-SHARD-0002-BMONGO-SHARD-0001-ACASSANDRA-0002CASSANDRA-0011REDIS-0001BiAPI-0002US-EAST-1eMONGO-SHARD-0002-AMONGO-SHARD-0000-BCASSANDRA-0003CASSANDRA-0012VERTICA-0003iAPI-0003VERTICA-0002
  24. 24. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Problems?
  25. 25. Big Architectures for BigDataEric Lubow @elubow#Cassandra13The Schrute of the Problem
  26. 26. Big Architectures for BigDataEric Lubow @elubow#Cassandra13New Service Questions• Can its host be completely homogenous?• Can it accept downtime (and what should downtime looklike)?• Does it fit into an existing service?• Does it require datacenter distribution?
  27. 27. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Summary• Solutions Require Evolution• Build, Use, and Integrate Tools• Abstraction• Homogeneous Distribution• Monitoring & Automation
  28. 28. Big Architectures for BigDataEric Lubow @elubow#Cassandra13We’re(Ask about Food Coma Fridays)
  29. 29. Big Architectures for BigDataEric Lubow @elubow#Cassandra13Questions are guaranteed in life.Answers aren’t.Eric Lubow@elubowelubow@simplereach.coThankyou.

×