What We Learned About Scaling with Apache Storm: Pushing the Performance Envelope

3,166 views

Published on

Log management isn’t easy to do at scale. We designed Loggly Gen2 using the latest social-media-scale technologies—including ElasticSearch, Kafka from LinkedIn, and Apache Storm—as the backbone of ingestion processing for our multi-tenant, geo-distributed, and real-time log management system.

Since we launched Gen2, we’ve learned a lot more about these technologies. We regularly contribute back to the open source community, so we decided that it’s time to give an update on our experience with Storm and explain why we have dropped it from our platform, at least for now.

Read full blog post here: http://bit.ly/ScaleApacheStorm

Published in: Technology
1 Comment
9 Likes
Statistics
Notes
  • Wouldn't you have to get some kind of acking even in your Kafka--Module--Kafka system? Unless you are using AsyncProducer without acks, in which case it is not totally reliable (and is similar to no acking in Storm). Plus at every stage you have to have Kafka storage (with replication) meaning you have to have more data nodes. Wouldn't Storm get similar performance with acks in Storm if we add more nodes and increase parallelism?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,166
On SlideShare
0
From Embeds
0
Number of Embeds
608
Actions
Shares
0
Downloads
47
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

What We Learned About Scaling with Apache Storm: Pushing the Performance Envelope

  1. 1. What We Learned About Scaling with Apache Storm Apache Storm Manoj Chaudhary CTO & VP of Engineering August 2014 | Log management as a service Simplifified Log Management
  2. 2. What Loggly Does We’re the world’s most popular cloud-based log management service § More than 5,000 customers § Near real-time indexing of events Distributed architecture, built on AWS Initial production services in 2011 § Loggly Generation 2 released in Sept 2013 | Log management as a service Simplify Log Management
  3. 3. Agenda for this Presentation § The unique challenges of log management § Overview of the Loggly event pipeline § Use of open source technologies § Lessons we have learned § Why we removed Storm § Conclusions: the Storm 411 | Log management as a service Simplify Log Management
  4. 4. How Log Management Starts Everyone starts with … § A bunch of log files (syslog, application specific) § On a bunch of machines Management consists of doing the simple stuff: § Rotate files, compress and delete § Information is there but awkward to find specific events § Log retention policies evolve over time | Log management as a service Simplify Log Management
  5. 5. As Log Data Grows “…how can I make this someone else’s problem!” “…let’s spend time managing our log capacity” “…hmmm, our logs are getting a bit bloated” Log Volume Self-Inflicted Pain | Log management as a service Simplify Log Management
  6. 6. Loggly Makes Log Management Much Easier Use existing logging infrastructure § Real time syslog forwarding is built in § Application log file watching Store logs in the cloud § Accessible when there is a system failure § Cost-effective data retention Log messages in machine parsable format § JSON encoding when logging structured information § Key-value pairs | Log management as a service Simplify Log Management
  7. 7. Loggly’s Evolution Gen1 • 2011-2013 • AWS EC2 deployment • SOLR Cloud • ZeroMQ for message queue Gen2 • Launched September 2013 • AWS deployment • Utilized ElasticSearch, Kafka, Storm Incremental Improvements and Scale | Log management as a service Simplify Log Management
  8. 8. The Challenges of Log Management at Scale § Big data § >750 billion events logged to date § Sustained bursts of 100,000+ events per second § Data space measured in petabytes § Need for high fault tolerance § Near real-time indexing requirements § Time-series index management | Log management as a service Simplify Log Management
  9. 9. About Apache Storm Open sourced by Twitter in September 2011 § Now an Apache Software Foundation project § Currently Incubator Status Framework is for stream processing § Distributed § Fault tolerant § Computation § Fail-fast components | Log management as a service Simplify Log Management
  10. 10. Storm Logical View Bolt Example Topology Spout Bolt Bolt Bolt Spouts emit source stream Bolts perform stream processing | Log management as a service Simplify Log Management
  11. 11. Storm Physical View Nimbus ZooKeeper ZooKeeper Supervisor Worker Supervisor Worker Supervisor Worker Worker Node Java process executing a subset of topology Worker Process ZooKeeper Executor Task Supervisor Supervisor Master Daemon § Distributes Code § Assigns Tasks § Monitors Failures Storing Operational Cluster State Java thread spawned by Worker, runs tasks of same component. Daemon listening for work assigned to its node. Component (spout / bolt) instance, performs the actual data processing. | Log management as a service Simplify Log Management
  12. 12. Log Ingestion and Processing Overview Load Balancing Kafka Stage 2 Storm Event Processing | Log management as a service Simplify Log Management
  13. 13. Event Pipeline in Summary § Storm provides Complex Event Processing § Where we run much of our secret-sauce § Stage 1 contains the raw Events § Stage 2 contains processed Events § Snapshot the last day of Stage 2 events to S3 | Log management as a service Simplify Log Management
  14. 14. What Attracted Us to Storm § Spout and bolts principle fit our network approach, where logs could move from bolt to bolt sequentially or need to be consumed by several bolts in parallel § Guaranteed data processing of data stream § Allowed us to focus on writing the best possible code for different bolts § Dynamic deployment makes it easy to add or remove new nodes to adjust for actual loads and requirements § Log data has peaks and valleys | Log management as a service Simplify Log Management
  15. 15. Loggly Gen2 at Launch: Where Storm Fits In | Log management as a service Simplify Log Management Kafka Stage 1 S3 Bucket Identify Customer Summary Statistics Kafka Stage 2
  16. 16. What We Learned | Log management as a service Simplify Log Management
  17. 17. Guaranteed Delivery Causes Big Performance Hit Guaranteed delivery feature needed for log management resilience but… Bolt Example Topology ack ack ack Spout Bolt Bolt Bolt ack ack Spouts emit source stream Bolts perform stream processing 2.5x hit to performance!! | Log management as a service Simplify Log Management
  18. 18. Our Performance Testing Preload Kafka broker • 50 GB of raw log data from production cluster • Kafka partitions with 8 spouts and 20 mapper bolts • 4K provisioned IPOS backend AWS instance Deploy Storm topology with Kafka spout • TOPOLOGY_ACKERS set to 0 • Kafka disks red hot Ack’ing per tuple turned off • Kafka disks not saturated • Bolts not running on high capacity Ack’ing per tuple enabled Average events per second processed per 250,000 200,000 150,000 100,000 50,000 - cluster Without guaranteed delivery With guaranteed delivery | Log management as a service Simplify Log Management
  19. 19. Potential Workaround: Batch Logs § Ack a set of logs instead of individual events § PROBLEM: not consistent with Storm’s semantics of a “message” It is not trivial to change the Kafka spout as well as each bolt to reinterpret a single message as a bunch of logs. | Log management as a service Simplify Log Management
  20. 20. Ultimate Solution: Build Custom Queue for Module-to-Module Communication Load Balancing Kafka Stage 2 Loggly Custom Module | Log management as a service Simplify Log Management
  21. 21. Benefits of New Approach § High-performance, reliable communication that implements our workflow § Supports sustained rates of 100K+ events per second § Relatively easy to port | Log management as a service Simplify Log Management
  22. 22. Conclusions Storm 0.82 has plenty of potential But… Log management’s unique challenges drive the need for a custom framework | Log management as a service Simplify Log Management
  23. 23. Log Management is Our Full-Time Job. It Shouldn’t Be Yours. Try Loggly for Free! → http://bit.ly/ScaleApacheStorm Unless You Want it to Be (Join us!) Check out our career page to see if there’s a great match for your skills! loggly.com/careers. About Us: Loggly is the world’s most popular cloud-based log management solution, used by more than 5,000 happy customers to effortlessly spot problems in real-time, easily pinpoint root causes and resolve issues faster to ensure application success. Visit us at loggly.com or follow @loggly on Twitter. | Log management as a service Simplify Log Management

×