In today’s connected world organizations have access to an enormous amount of data. We often don’t know what they mean or how we can use them, in terms of hindsight, oversight, insight and foresight, to gain competitive advantage in the market. Use cases ranging from simple system monitoring to complex fraud analysis demands this.
The WSO2 Data Analytics platform lets you collect data, allows you to explore it through batch, real-time, interactive and predictive processing technologies and allows you to communicate your results. In this talk, we will discuss the WSO2 Data Analytics platform and how it brings together all analytics technologies into a single platform and user experience.
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics Platform
1. An Introduction to the
WSO2 Analytics Platform
Srinath Perera
VP Research WSO2, Apache Member
(@srinath_perera)
srinath@wso2.com
2. A Story line
• Paul is a frequent user of the London
transport bus on route 29.
• Get to work by Bus, Walking, Cycle
• What if Paul could:
• Know whether buses have been delayed recently ?
• Get information about of delays of the next bus, based
on current traffic and historical data ?
• Subscribe to a Twitter feed that will alert him when a bus
is predicted to be late and get a recommendation on the
fastest route (bus/bike/foot) ?
9. Collect Data
§ One Sensor API to publish
events
- REST, Thrift, Java, JMS,
Kafka
- Java clients, java script clients*
§ First you define streams
(think it as a infinite table in
SQL DB)
§ Then publish events via
Sensor API
11. Collecting Data: Example
§ Java example: create and send events
§ Events send asynchronously
§ See client given in http://goo.gl/vIJzqc for more info
Agent agent = new Agent(agentConfiguration);
publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );
StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION);
definition.addPayloadData("sid", STRING);
...
publisher.addStreamDefinition(definition);
...
Event event = new Event();
event.setPayloadData(eventData);
publisher.publish(STREAM_NAME, VERSION, event); Send events
Define Stream
Initialize Stream
12. Data Collection
Examples
• Collect data from inbuilt agents in
WSO2 products, Tomcat etc.
• Collecting your log data via log stash
• Collecting JVM and JMX stats via agent
• Ingesting data from message queues
such as JMS or Kafka
• Pulling data from a RSS feed, or
scraping a web page
• Write a custom agent to collect data
from your system and push it to DAS
Photo credit h+p://www.torange.us/ CC license
13. Analysis: Batch Analytics
• Batch analytics reads data from a disk ( or some other
storage) and process them record by record
• “MapReduce” is most widely used technology for batch
analytics
– Apache Hadoop
– Apache Spark 30X faster and much more flexible
• Analytics (Min, Max, average, correlation, histograms, might
join or group data in many ways)
• Key Performance indicators (KPIs)
– E.g. Profit per square feet for retail
• Presented as a Dashboard
14. SQL like Queries: Spark SQL
§ Since many understands SQL, Hive made
large scale data processing Big Data
accessible to many
§ Expressive, short, and sweet.
§ Define core operations that covers 90%
of problems
§ Lets experts dig in when they like! (via
User Defined functions)
insert overwrite table BusSpeed
select hour, average(v) as avgV, busID
from BusStream group by busID, getHour(ts);
15. Usecase: API Usage
• Looking at different API calls by countries
• Designed to draw attention to what APIs are used and where
16. Value of some Insights degrade
Fast!
§ For some usecases ( e.g. stock
markets, traffic, surveillance, patient
monitoring) the value of insights
degrades very quickly with time.
§ We need technology that can produce
outputs fast
§ Static Queries, but need very fast output
(Alerts, Realtime control)
§ Dynamic and Interactive Queries ( Data
exploration)
18. People
Tracking via
BLE
• Track people through BLE via
triangulation
• Higher level logic via Complex
Event Processing
• Traffic Monitoring
• Smart retail
• Airport management
19. Case Study: Realtime Soccer Analysis
Watch at:
https://www.youtube.com/watch?v=nRI6buQ0NOM
20. Scaling CEP Queries on top of Storm
▪ Accepts CEP queries with hints about how to partition streams
▪ Partition streams, build a Apache Storm topology running CEP nodes as Storm
Sprouts, and run it. see http://goo.gl/pP3kdX for more info.
21. Interactive Analytics
§ Best way to explore data is by
asking Ad-hoc questions
§ Interactive Analytics ( Search)
let you query the system and
receive fast results (<10s)
§ Shows data in context (e.g. by
grouping events from the
same transaction together)
§ Built using Lucence based
Indexes.
SparkSQL> SELECT * FROM TWITTER_DATA
22. Predictive Analytics
§ Can you “Write a program to drive a Car?”
§ Machine learning
§ Takes in lot of examples, and build a program
that matches those examples
§ We call that program a “model”
§ Lot of tools
- R ( Statistical language)
- Sci-kit learn (Python)
- Apache Spark’s MLBase and Apache Mahout
(Java)
23. Predictive Analytics in DAS
• Building models
– With WSO2 Machine
Learner Product via a
Wizard ( powered by
MLLib)
– Build model using R and
export them as PMML
• Built models can be used
them with both WSO2 CEP
and ESB
24. Usecase: Predictive Maintenance
• Fix the problem before it happens,
avoiding expensive downtimes
– Airplanes, turbines, windmills
– Construction Equipment
– Car, Golf carts
• How
– Build a model for normal operation
and compare deviation
– Match against known error patterns
25. Predict wait time in the Airport
• Predicting the time
to go through airport
• Real-time updates
and events to
passengers
• Let airport manage
by allocate resources
26. Predict Super Bowl
• Predicted 7 of the 11
games
• Done with Random
Forest Algorithm
• Even what we missed
are instructive
See Yuda’s post: Predic;ng the Super Bowl with Machine Learning
27. Communicate: Dashboards
• Dashboard give an “Overall idea”
in a glance (e.g. car dashboard)
– Boring when everything is good!!
• Build your own dashboard.
– WSO2 DAS supports a gadget
generation Wizard
– You can write your own Gadgets
using D3 and Javascript.
28. Gadget Generation Wizard
• Starts with data in tabular format
• Map each column to dimension in your plot
like X,Y, color, point size, etc
• Create a chart with few clicks
Powered by
VizGrammer lib
that uses Vaga
undneath (see
https://github.com/
wso2/
VizGrammar)
29. Communicate: Alerts
▪ Done with CEP Queries
▪ Last Mile
- Email, SMS
- Push notifications to a UI
- Pager
- Trigger physical Alarm
30. Real Life Use Cases
▪ Cisco ( OEM the platform with Cisco
solutions, Health, Smart Parking)
▪ Experian ( Digital Marketing) - see video
▪ Pacific Controls ( Smart City Platform, Vehicle
tracking, building monitoring) - see video
▪ Financial Monitoring ( large bank)
▪ Throttling and Anomaly Detection ( by group
of Telco companies)
▪ API Analytics (13+ customers) No battle plan survives
contact with the enemy
--Helmuth von Moltke
31. Key Differentiators
• Open Source, under Apache 2 license
• Publish data once, analyze it anyway you like
experience.
• Flexible packaging or as a scalable cluster
• Rich, extensible, SQL-like configuration language
• Compact, easy to learn syntax addressing complex
requirements, such as time windows, patterns,
sequences which would be complex to develop in a
programming language such as Java.
• Rich set of data connectors, which can be easily
extended
•