To view recording of this webinar please use below URL:
WSO2 Data Analytics Server (WSO2 DAS) version 3.0 is the successor of WSO2 Business Activity Monitor 2.5. It based on the latest technologies and is an evolutionary upgrade to the current system. WSO2 DAS comes with a comprehensive set of new features including support for pluggable data sources, support for batch processing with Apache Spark, support for distributed data indexing, a new dashboard and support for unified data querying with analytics REST APIs.
The WSO2 DAS combines real-time, batch, interactive, and predictive (via machine learning) analysis of data into a single integrated platform. This webinar will present and demonstrate the following key features and capabilities in detail:
Pluggable data sources support with its new data abstraction layer
Batch analytics using the Apache Spark analytics engine
Interactive analysis powered by Apache Lucene
An analytics dashboard to visualize results
Activity monitoring capabilities for tracking related events in a system
5. Introducing WSO2 Data Analytics Server
● Fully-open source solution with the ability to build systems and
applications that collect and analyze data and communicate the
results.
● Embodies the WSO2 Analytics Platform by combining batch, real-
time, interactive and predictive analytics capabilities
● High performance data capture framework
● Highly available and scalable by design
6. Advantages of DAS 3.0 over WSO2 BAM 2.5.0
● Complete rewrite from the ground up, with performance and extensibility as
core values
● Faster analytics powered by Apache Spark, 10x - 100x speedup
● Rich indexing support, with near real-time text search
● Pluggable data store support, from lightweight embedded RDBMS to highly
scalable HBase/HDFS
● Revamped Analytics Dashboard with wizard-based gadget generation
9. Data Model
{
'name': 'stream.name',
'version': '1.0.0',
'nickName': 'stream nickname',
'description': 'description of the stream',
'metaData':[
{'name':'meta_data_1','type':'STRING'},
],
'correlationData':[
{'name':'correlation_data_1','type':'STRING'}
],
'payloadData':[
{'name':'payload_data_1','type':'BOOL'},
{'name':'payload_data_2','type':'LONG'}
]
}
● Published data conforms to a strongly typed data stream
10. ● One API for Batch and Real-time
Analytics.
● Asynchronous and non-blocking
nature enables extremely fast writes.
● Supports multiple transport adapters
for data collection
Data Receiver
12. Data Persistence
● Data Abstraction Layer to enable pluggable data connectors
○ RDBMS, Cassandra and HBase/HDFS offered. Custom connectors could be easily written
● Analytics Table
○ The data persistence entity in WSO2 Data Analytics Server
○ Provides a backend data source agnostic way of storing and retrieving data
○ Allows applications to be written in a way that it does not depend on a specific data source, e.
g. JDBC (RDBMS), Cassandra APIs etc.
○ WSO2 DAS gives a standard REST API in accessing the Analytics Tables
13. Data Persistence
● Analytics Record Stores
○ An Analytics Record Store houses a specific set of Analytics Tables
○ The Analytics Record Stores to be used for storing incoming events and storing query
processing output are configurable
○ Single Analytics Table namespace, the target record store only given at the time of table
creation
○ Useful in creating Analytics Tables where data will be stored in multiple target databases
● Analytics File System
○ The location where the indexing data is stored
○ Multiple implementations provided OOTB, or custom implementations can be written
16. Batch Analytics - Overview
● Powered by Apache Spark for 10x-100x higher performance than Hadoop
● Parallel, distributed with optimized in-memory processing
● Scalable script-based analytics written using an easy-to-learn, SQL-like
query language powered by Spark SQL
● Interactive built in web interface for ad-hoc query execution
● Scheduled query script execution support with high-availability and failover
● Run Spark on a single node, Spark embedded Carbon server cluster or
connect to external Spark cluster
17. create temporary table product_data using CarbonAnalytics
options (schema …)
create temporary table products using CarbonAnalytics
options (schema …)
insert into products select product_name from product_data
group by …
Batch Analytics - Spark SQL
21. ● Full text data indexing support powered by Apache Lucene
● Drill down search support
● Distributed data indexing
○ Designed to support scalability
● Near real-time data indexing and retrieval
○ Data indexed immediately as received
Interactive Analytics
25. Real-time Analytics in
→
● Gather data from multiple sources
● Correlate data streams over time
● Find interesting occurrences
● And Notify
● All in real-time
What is Real-time Analytics?
28. Predictive Analytics in
→
● Extract, pre-process, and explore data
● Create models, tune algorithms and make predictions
● Integrate for better intelligence
What is Predictive Analytics?
30. Dashboards
● “Overall idea” in a glance (e.g. car
dashboard)
● Support for personalization, you can
build your own dashboard.
● The entry point for Drill-down
● Building a custom dashboard
○ Dashboard via Google Gadgets and content
via HTML5 + JavaScript
○ Leverages WSO2 User Engagement Server to
build a dashboard.
○ Uses charting libraries like Vega, D3.js
31. Dashboards: Gadget Generation Wizard
● Start with data in tabular format
● Map each column to dimension in your plot like X,Y,
color, point size, etc
● Also do drill downs
● Create a chart with few clicks
32. Alerts
● Detecting conditions can be
done via CEP Queries
● “Last Mile” is key
○ Email
○ SMS
○ Push notifications to a UI
○ Pager
○ Trigger physical Alarm
33. APIs
● With mobile Apps, most data are
exposed and shared as APIs
(REST/JSON ) to end users.
● Analytics results can be exposed
through APIs
○ REST API
○ JavaScript API
39. Activity Monitoring
Activity monitoring is for tracking events from multiple nodes in a
flow to understand a specific activity
● Example:
○ A client initiating a web services request which travels through multiple ESBs, application
servers and returns back. This flow will be uniquely identified and visualized in DAS
● Used for tracing messages, finding performance hotspots in the flow
● Implemented based on a correlation id based mechanism using
Interactive Analytics
45. Fraud Detection
● Built for detecting credit card fraud
● The rules are extensible with
customized Siddhi execution plans
for any type of fraud detection
● Currently leverages Real-time and
Interactive Analytics features
Source: multichannelmerchant.com
46. Log Analysis
● Distributed indexing and searching
of any type of logs stored in the
system
● Notifications support with Real-time
event processing features
● Application / Server health prediction
with Machine Learning
● Utilizes Interactive + Real-time
Analytics + Machine Learning
features
Source: www.retrospective.centeractive.com