Scaling up wso2 bam for billions of requests and terabytes of data
1.
Scaling Up WSO2 BAM for Billions of
Requests and Terabytes of Data
Buddhika Chamith
Software Engineer – WSO2 BAM
2.
Business Activity Monitoring
“The aggregation, analysis, and
presentation of real-time information
about activities inside organizations
and involving customers and partners.”
- Gartner
3.
Aggregation
● Capturing data
● Data storage
● What data to
capture?
4.
Analysis
● Data operations
● Building KPIs
● Operate on large
amounts of historic
data or new data
● Building BI
5.
Presentation
● Visualizing KPIs/BI
● Custom Dashboards
● Visualization tools
● Not just dashboards!
8.
Data Agents
● Push data to BAM
● Collecting
● Service data
● Mediation data
● Logs etc.
● Various interceptors used
● Axis2 Handlers
● Synapse Mediators
● Tomcat Valves
● Log4j Appenders
9.
Performance Considerations
● Should be asynchronous
● Event batching
● SOAP?
● Apache Thrift (Binary protocol)
10.
Apache Thrift
● A RPC framework
● With a pluggable architecture
for mixing different transports
with different protocols
● Has multiple language
bindings (Java, C++, Python,
Perl, C# etc.)
● We mainly use Java binding
11.
Not Just Performance...
● Load balancing
● Failover
● All available within a Java SDK libary.
● You can use it too.
12.
Data Receiver
● Capture and transfer data to subscribed sinks.
● Not just the database.
● Can be clustered.
● Load balancing is handled from client side.
14.
Data Storage
● Apache Cassandra
● NoSQL column family
implementation
● Scalable, HA and no
SPOF.
● Very high write
throughput and good
read throughput
● Tunable consistency
with data replication
19.
Analyzer Engine
● Idea : Distribute processing to multiple nodes to
run in parallel
● Obvious choice : Hadoop
● Uses Map Reduce Programming paradigm
20.
Map Reduce
● Process multiple data
chunks paralley at
Mappers.
● Aggregate map
outputs having similar
keys at Reducers and
store the result.
● Let's think of a useful
example..
21.
Hadoop Components
● Job Tracker
● Name node
● Secondary Name Node
● Task Trackers
● Data Nodes
22.
It's Cool But ..
● Do we need to have a
Hadoop cluster in order to
try out BAM?
● Are we supposed to code
Hadoop jobs to get
BAM to summarize some
thing?
● Answers
1) No
Courtesy: http://goo.gl/QEnpN 2) No. Ok may be very
rarely at best.
23.
Apache Hive
● You write SQL. (Almost)
● Let Hive convert to Map Reduce jobs.
● So Hive does two things
● Provide an abstraction for Hadoop Map Reduce
● Submit the analytic jobs to Hadoop
● Hive may spawn a Hadoop JVM locally or
delegate to a Hadoop Cluster
26.
Task Framework
● Run Hive scripts periodically
● Can specify as cron expressions/ predefined
templates
● Handles task failover in case of node faliure
● Uses Zookeeper for coordination
27.
Zookeeper
● Can be run seperately or embedded within
BAM