Successfully reported this slideshow.
Your SlideShare is downloading. ×

Scaling up wso2 bam for billions of requests and terabytes of data

More Related Content

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Scaling up wso2 bam for billions of requests and terabytes of data

  1. 1. Scaling Up WSO2 BAM for Billions of Requests and Terabytes of Data Buddhika Chamith Software Engineer – WSO2 BAM
  2. 2. Business Activity Monitoring “The aggregation, analysis, and presentation of real-time information about activities inside organizations and involving customers and partners.” - Gartner
  3. 3. Aggregation ● Capturing data ● Data storage ● What data to capture?
  4. 4. Analysis ● Data operations ● Building KPIs ● Operate on large amounts of historic data or new data ● Building BI
  5. 5. Presentation ● Visualizing KPIs/BI ● Custom Dashboards ● Visualization tools ● Not just dashboards!
  6. 6. Need for Scalability
  7. 7. BAM 2.x - Component Architecture
  8. 8. Data Agents ● Push data to BAM ● Collecting ● Service data ● Mediation data ● Logs etc. ● Various interceptors used ● Axis2 Handlers ● Synapse Mediators ● Tomcat Valves ● Log4j Appenders
  9. 9. Performance Considerations ● Should be asynchronous ● Event batching ● SOAP? ● Apache Thrift (Binary protocol)
  10. 10. Apache Thrift ● A RPC framework ● With a pluggable architecture for mixing different transports with different protocols ● Has multiple language bindings (Java, C++, Python, Perl, C# etc.) ● We mainly use Java binding
  11. 11. Not Just Performance... ● Load balancing ● Failover ● All available within a Java SDK libary. ● You can use it too.
  12. 12. Data Receiver ● Capture and transfer data to subscribed sinks. ● Not just the database. ● Can be clustered. ● Load balancing is handled from client side.
  13. 13. Data Bridge
  14. 14. Data Storage ● Apache Cassandra ● NoSQL column family implementation ● Scalable, HA and no SPOF. ● Very high write throughput and good read throughput ● Tunable consistency with data replication
  15. 15. Deployment – Storage Cluster
  16. 16. Reciever Cluster
  17. 17. Results With a single receiver node allocated 2GB heap with quad core on RHEL.
  18. 18. Disk Growth
  19. 19. Analyzer Engine ● Idea : Distribute processing to multiple nodes to run in parallel ● Obvious choice : Hadoop ● Uses Map Reduce Programming paradigm
  20. 20. Map Reduce ● Process multiple data chunks paralley at Mappers. ● Aggregate map outputs having similar keys at Reducers and store the result. ● Let's think of a useful example..
  21. 21. Hadoop Components ● Job Tracker ● Name node ● Secondary Name Node ● Task Trackers ● Data Nodes
  22. 22. It's Cool But .. ● Do we need to have a Hadoop cluster in order to try out BAM? ● Are we supposed to code Hadoop jobs to get BAM to summarize some thing? ● Answers 1) No Courtesy: http://goo.gl/QEnpN 2) No. Ok may be very rarely at best.
  23. 23. Apache Hive ● You write SQL. (Almost) ● Let Hive convert to Map Reduce jobs. ● So Hive does two things ● Provide an abstraction for Hadoop Map Reduce ● Submit the analytic jobs to Hadoop ● Hive may spawn a Hadoop JVM locally or delegate to a Hadoop Cluster
  24. 24. A Typical Hive Script
  25. 25. Results
  26. 26. Task Framework ● Run Hive scripts periodically ● Can specify as cron expressions/ predefined templates ● Handles task failover in case of node faliure ● Uses Zookeeper for coordination
  27. 27. Zookeeper ● Can be run seperately or embedded within BAM
  28. 28. Analyzer Cluster
  29. 29. Dashboard ● Making dashboard scale.
  30. 30. Deployment Patterns Single Node
  31. 31. High Availability
  32. 32. Fully Distributed Setup
  33. 33. Summary ● BAM ● Need for scalability ● Scaling BAM components ● Results ● BAM deployment patterns

×