Anomaly detection and data imputation within time series
Business intelligence 3.0 and the data lake
1. Business Intelligence 3.0
(and the Emergence of Data Lake)
1 Cheow Lan Lake, Thailand
โกเมษจันทวิมล
January 31 2559
Komes Chandavimol
หลักสูตร ปริญญาโท Big Data Engineering คณะวิศวกรรมศาสตร์
มหาวิทยาลัย ธุรกิจบัณฑิต
2. Business Intelligence
The set of Techniques and Tools for the Transformation of
Raw data into meaningful and useful Information for business
analysis purposes, Wikipedia
2
BI
Techniques and ToolsRaw data Information
3. Business Intelligence 1.0
What will I get my report then? That not what I want?
Business Request for Information
IT/Data Analyst query in the database/data warehouse
IT/Data Analyst provide the report
BI 1.0 - Delivery to Customer
3Defining Business Intelligence 3.0, Lachlan James (2014)
4. BI 1.0 – Delivery to Customer
4
BI
BI
Present
Aggregate
Defining Business Intelligence 3.0, Lachlan James (2014)
Batch Processing
BI Tools Usage – for only some group/community of people
IT Control
Stand Alone Reports
5. Business Intelligence 2.0
5
I can explore a wide range of data assets; I prefer to
blend data but my report differ from yours
ERP, CRM, Data Warehouse
A single point of view
With a centralized BI tool
Business Analysts can explore the data
in the Web Portal with BI Tools and predefined reports
BI 2.0 - Creation and Delivery for Consumers
Defining Business Intelligence 3.0, Lachlan James (2014)
6. BI 2.0 Creation and Delivery to Consumers
6
Explore
Predict
Defining Business Intelligence 3.0, Lachlan James (2014)
Real time via any device Web Portal
Centralized BI Tools – Business Analyst can explore
Hybrid Control
Web Portal
7. Business Intelligence 2.5
I can use any tool, I can blend data rapidly (if I can find it), it was
so simple but now
BI 2.0 ++ (Agile BI, SOA , Enterprise Search, Visualization)
Give Business more power to BI Tools
Less complicated to IT
On the fly Data Federation
7
8. Business Intelligence 3.0
I collaborate via any devices with content, harness information on the
fly and drive outcome
Focus on Collaborate workgroup
Self regulate, Self governance in data management
Interact between customer, employee, regulators and third parties
No Bottleneck from IT
Include Big Data, Cloud, IoT and Social integration
Creation Delivery and Management for Consumers 8
Defining Business Intelligence 3.0, Lachlan James (2014)
9. BI 3.0 - Creation Delivery and Management
9
Anticipate
Enrich
Self-Service BI with Analytic 3.0
10. The Journey of Business Intelligence
3.0
10
BI 1.0 BI 2.0 BI 3.0
Functionality Present and
Aggregate
Explore and Predict Anticipate and Enrich
Frequency Monthly/Detail Weekly/Daily/Summary Real-time/Process
Level of Focus Community Enterprise Collaborative
Processing Batch Near Real-time In-Process
Data Products Information Intelligence Insight
Foundation/
Influence
Delivery Only Creation + Delivery Creation + Delivery
+ Automation
Defining Business Intelligence 3.0, Lachlan James (2014)
12. Understand the needs of BI 3.0 users
A set of Tools and Techniques that
Delivery Intelligence without making users work for it
Delivery to Right Data to the right Users at the right time
Focus on Scalability + Usability
Provide Self-Guided content creation, delivery and analysis
Support Multi-Device User interface, anywhere, anytime
Support Collaborative methodology
Include Analytics 3.0, Data Discovery, Advanced Visualization, Visual
Analytics, Business Discovery, Self Serve Business
12
Source: Forrester Research’s James Kobielus
13. Understand the needs of BI 3.0
platform
Technology should be in place to enable organization
to acquire, store, combine, and enrich huge volume
of unstructured and structured data in raw format
Ability to perform analytics, real-time and near real-
time data scale , on these huge volume in iterative
way
13
14. Find the tools!
Spreadsheet?
Database?
Data Mart?
Data Warehouse?
14
Source: Forrester Research’s James Kobielus
15. Tools that support these!
15
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
https://www.domo.com/learn/data-never-sleeps-3-0
25. Data Lake
A single place to store every type of data in its native format
with no fixed limits on account size or file size, high throughput
to increase analytic performance and native integration with the
Hadoop ecosystem.
25
Reference: James Serra's Blog
Data Lake Development with Big Data , Pradeep Pasupuleti (2015)
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
30. Data Lake
Type of Data
Raw Data
Derived Data
Aggregated Data
Type of Environment
Discovery Environment
Production Environment
30
The Definition of Data Lake, John O’Brien(2015)
31. How the Data Lake works?
31
http://www.clearpeaks.com/blog/category/tableau
Traditional Enterprise Data warehouse
32. New Data Management Architecture
32
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
36. 4 Maturity Stages of Data Lake
Stage 1 – Pilot Project (Understand the Technology)
Stage 2 – Productionize Hadoop and its capabilities
Stage 3 – Proactive consolidate data to (Big) Data Analytics
Stage 4 – Platform the Data Lake to Core Competency
36
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
37. Stage 1 – Pilot Project
Handling data at scale
Involves getting the plumbing in place and learning to acquire
and transform data at scale.
The analytics may be quite simple, but much is learned about
making Hadoop work the way you desire.
37
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
38. Stage 2– Productionize Hadoop
and its capabilities
Involves improving the ability to transform and analyze data.
Find the tools that are most appropriate to their skillset
Acquiring more data and build applications.
38
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
39. Stage 3 – Proactive consolidate data to
(Big) Data Analytics
Involves getting data and analytics into the hands of as many
people as possible.
It is in this stage that the data lake and the enterprise data
warehouse start to work in unison, each playing its role.
Started with a data lake eventually added an enterprise data
warehouse to operationalize its data.
39
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
41. Data Lake and Big Data Analytics
41http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/
42. Stage 4 – Platform the Data Lake to
Core Competency
Enhance Enterprise Capabilities are added to the data lake.
Few companies have reached this level of maturity, but many
will as the use of big data grows,
Require Data governance, compliance, security, and auditing
(and incorporate to Company Data Strategy)
42
The Technology of the Business Data Lake, Capgemini (2013)
44. The Data Lake Unifies Data Discovery,
Data Science, and BI 3.0
44
Big Data
Self Serve Business
Data Science
Machine Learning
Visual Analytics
Business Discovery
Deep Learning
Self Serve Business
Hadoop
Feature Engineering
Spark
Business Intelligence 3.0
YARN
Predictive Analytics Hive
Data Lake
Data Visualization
Graph Analytics
Big Data