The Big Data Challenge “Data is the new oil.” This phrase truly captures the myriad of possibilities that are buried in large amounts of data. But it also contains another truth. Data alone will not solve any problems. There must be pipelines to bring the oil where it is needed and refineries to process it for different kinds of usage. In this session we will show how the usage of an algorithm transforms “crude data” to actionable insights. Before displaying the power of algorithms, we will also explore some essential questions t hat should be answered before each data project – no matter if it deals with small or Big Data.
Integrating large amounts of data and combining analytical algorithms are the beginning. With Cubeware Solutions Platform C8 and its component C8 Importer, our customers build homogeneous information hubs on their heterogeneous IT landscapes. With its robust, yet easy - to - use ETL functions, C8 Importer is the power house in the C8 platform . Together with C8 SAP Connect, this tool can even integrate complex SAP solutions. In addition to powerful relational warehouses, the hubs can also include analytical (OLAP) data marts that are built and maintained wit h C8 Importer. Users can access this hub and design dashboards and reports with C8 Cockpit, the visual interface to their data. Once designed, C8 reports can be used many times, shared through C8 Server , and accessed instantly through C8 Mobile
Charlotte van 't Wout - Slim ondernemen met Instagram
Tom Martens - Cube Ware - The big data challenge - bo
1. Big Data Challenge
Real example in industry
Tom Martens
Bussum, 25th November 2014
Business
Analytics
2. Vorname Nachname
Content
The Challenge to define your Big Data vision
Growth of data volume & unstructured data sources
Do I need to invest for Big Data & how can I use it?
Do I have the right solution for it?
Predictive Analytics an operational area for Big Data
Main arguments for predictive Analytics
Fields of operation
Conclusion
Real example based on Cubeware solution and EXASOL analytical DB
3. Vorname Nachname
The data truth!
But please keep in mind: „The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given
body of data.“
Data is the new oil
John Tukey
4. Vorname Nachname
Growth of data volume & unstructured data sources
Sensor Data
Social Media
Server Logs
5. Vorname Nachname
The Challenge to define your Big Data vision
Do I need to invest for Big Data?
The answer is YES, if
You have big volume of data from different source systems
You need to analyze all these data in high speed mode
You believe that you can make additional profit/reduce costs and increase efficiency based on analysis of these data for specific purposes
How can I make use out of my historical data?
Automatic transformation of unstructured* data to structured shape
Making qualitative and quantitative analysis of the structured data
The result of analysis extend the basics for decision support systems
*: e.g. Machines sensor data, social media data, mobile communication data
6. Vorname Nachname
The Enterprise Information Hub
New Data sources
Available Data sources
Reporting
Dash- boarding
Analysis OLAP
Data & Text Mining
Business Analytics
Operational Intelligence
Data Marts OLAP Databases Virtual Cubes (e.g. EXASOL / SAP HANA)
Business Applications ERP, CRM, …
Cloud Data
OLTP DMBS
Hadoop, NoSQL, Log-Data Machine-Data
Streaming Data Real time
Statistical Data
ETL
Business Intelligence offering
Structured and unstructured Data
Data Warehouse(s)
Complex Event Processing
Analogue to: Bitkom 2012 - Combination of traditional BI landscape with Big Data solution
7. Vorname Nachname
Predictive Analytics
What can be targeted with Predictive Analytics (examples):
Increase delivery capacity and adherence to delivery date
More concrete planning of resources
Improve product quality and increase productivity
Efficient forecast and planning of product maintenance
In accordance to this saying: You can not change your past, but you can change your future!
8. Vorname Nachname
Predictive Analytics
Predictive Analytics can provide positive result if it is implemented in right domains. The most recommended operational area are:
Early detection of churn through analysis of customers behavior in specific situations or time frames
Recognition of relationships and pattern to clarify insurance fraud
Forecast about product sales for planning of capacity and resources
Having reasonable mass of stock to keep capital tied as low as possible
Optimized marketing campaign to address customers w. right offering
Avoid machine outages by implementing in time repair & maintenance
And more …
9. Vorname Nachname
The Data (Source Data)
Different, but similar sources
Time series of Events (Occurrence)
Treatment of Occurrences
Categorization of Occurrences Critical Category: affecting net-income
Additional Source Data (attributes)
10. Vorname Nachname
The Approach
Identify sequences
Clustering of features of the occurrence in a sequence (Prediction Patterns)
„Prediction“ of the next critical occurrence
Algorithm (SPADE) = Sequential PAttern Discovery using Equivalence classes
11. Vorname Nachname
The Approach – Overview
Events
A
B
D
C
B
F
B
E
G
C
Sequences
B
D
C
B
F
C
B
E
G
C
Cluster of Sequences
B
D
C
B
E
G
C
A
A
M
L
C
Prediction
B
D
C
12. Vorname Nachname
The Approach – Challenge
Search Space (Number of frequent sequences)
Objects (O) = Sources
Attribute (A) = Occurrence, a source report
Length of frequent sequences (k) ~ average number of events in sequence
Theoretical „Search Space“ = O(A^k)
10*(1000^5) = 1E+16 possible frequent sequences
Sensor Data Source 1 … 10
EXASolution CPU Memory
Storage
C8 Server
C8 Cockpit
Data Visualization
Data Distribution
13. Vorname Nachname
The Solution-Architecture
Events
A
B
D
C
B
F
B
E
G
C
A
A
M
L
C
C8 Solutions Platform
C8 Server
C8 Cockpit
EXASOLUTION
Virtual Cube
Compute Node CPU Memory
Storage
C8 Importer
Cubeware Analyzer
New Data sources
Available Data sources
Business Applications ERP, CRM, …
Cloud Data
Hadoop, NoSQL, Log-Data Machine-Data
OLTP DMBS
ETL Processing (C8 Importer)
14. Vorname Nachname
Achievements
Reduction of critical events by ~ 19% improve the model
Reduction of costs for maintenance ~ 10% expected decline by an improved model
Found unexpected relations
Detection of a construction issue in a machine type