Driven by the need to make more informed, data-driven decisions, today’s business intelligence (BI) and analytics leaders face tough challenges. As more organizations turn to “data scientists” to effectively analyze massive amounts of disparate data for deeper, actionable insights to streamline business processes and improve decision-making capabilities, a new wave of BI is emerging .
At the recent Second Annual BI Leadership Summit in New York, Joe Caserta delivered a keynote presentation, The Impact of the Data Lake on Enterprise BI.
To learn more about our services, visit http://casertaconcepts.com/.
3. @joe_Caserta#BILeaderSummit
About Caserta Concepts
• Consulting Data Innovation and Modern Data Engineering
• Award-winning company
• Internationally recognized work force
• Trusted thought leaders
• Innovation Partner
• Strategic Consulting
• Advanced Architecture
• Execution & Implementation
• Leader in Enterprise Data Solutions
• Big Data Analytics
• Data Warehousing
• Business Intelligence
• Data Science
• Cloud Computing
• Data Governance
5. @joe_Caserta#BILeaderSummit
The Future of Data is Today
As a Mindful Cyborg, Chris
Dancy utilizes up to
700 sensors, devices,
applications, and services to
track, analyze, and optimize as
many areas of his existence.
Data quantification enables
him to see the connections of
otherwise invisible data,
resulting in dramatic upgrades
to his health, productivity, and
quality of life.
6. @joe_Caserta#BILeaderSummit
The Evolution of Data Analytics
Descriptive
Analytics
Diagnostic
Analytics
Predictive
Analytics
Prescriptive
Analytics
What
happened?
Why did it
happen?
What will
happen?
How can we make
It happen?
Data Analytics Sophistication
BusinessValue
Source: Gartner
Reports Correlations Predictions Recommendations
7. @joe_Caserta#BILeaderSummit
The Evolution of Data Analytics
Source: Gartner
Reports Correlations Predictions Recommendations
Cognitive Computing / Cognitive Data Analytics
8. @joe_Caserta#BILeaderSummit
Traditional Data Analytics Methods
• Design – Top Down, Bottom Up
• Customer Interviews and requirements gathering
• Data Profiling
• Create Data Models
• Facts and Dimensions
• Extract Transform Load (ETL)
• Copy data from sources to data warehouse
• Data Governance
• Stewardship, business rules, data quality
• Put a BI Tool on Top
• Design semantic layer
• Develop reports
9. @joe_Caserta#BILeaderSummit
A Day in the Life
• Onboarding new data is difficult!
• Rigid Structures and Data Governance
• IT Disconnected from business requirements:
“Hey – I need to analyze some new data”
IT Conforms and profiles the data
Loads it into dimensional models
Builds a semantic layer nobody is going to use
Creates a dashboard we hope someone will notice
...and then you can access your data 3-6 months later to see if it has
value!
10. @joe_Caserta#BILeaderSummit
Houston, we have a Problem: Data Sprawl
• There is one application for every 5-10 employees generating copies of
the same files leading to massive amounts of duplicate idle data strewn all
across the enterprise. - Michael Vizard, ITBusinessEdge.com
• Employees spend 35% of their work time searching for information...
finding what they seek 50% of the time or less.
- “The High Cost of Not Finding Information,” IDC
13. @joe_Caserta#BILeaderSummit
OLD WAY:
• Structure Ingest Analyze
• Fixed Capacity
• Monolithic
NEW WAY:
• Ingest Analyze Structure
• Dynamic Capacity
• Ecosystem
RECIPE:
• Data Lake
• Cloud
• Polyglot Data Landscape
The Paradigm Shift
Big Data is not the problem,
It’s the Change Agent
17. @joe_Caserta#BILeaderSummit
Technology:
• Scalable distributed storage Hadoop, S3
• Pluggable fit-for-purpose processing Spark, LAMBDA, EMR
Functional Capabilities:
• Remove barriers from data ingestion and analysis
• Storage and processing for all data
• Tunable Governance
18. @joe_Caserta#BILeaderSummit
•This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization
•Definitions, lineage (where does this data come from), business definitions, technical
metadataMetadata
•Identify and control sensitive data, regulatory compliancePrivacy/Security
•Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring
•Policies around data frequency, source availability, etc.Business Process Integration
•Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management
•Data retention, purge schedule, storage/archiving
Information Lifecycle
Management (ILM)
Wait! What about Data Governance?
19. @joe_Caserta#BILeaderSummit
•This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization
•Definitions, lineage (where does this data come from), business definitions, technical
metadataMetadata
•Identify and control sensitive data, regulatory compliancePrivacy/Security
•Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring
•Policies around data frequency, source availability, etc.Business Process Integration
•Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management
•Data retention, purge schedule, storage/archiving
Information Lifecycle
Management (ILM)
• Add Big Data to overall framework and assign responsibility
• Assign data scientists as stewards. New data sets (twitter, call center logs, etc.)
• Graph databases are more flexible than relational
• Lower latency service required
• Data Quality and Monitoring (probably home grown)
• Quality checks not only SQL: python, java, machine learning
• Larger scale
• Integrate with Hive Metastore, HCatalog, emerging products, custom tables
• Ephemeral Data, Fleeting
• Automated retention and purge for regulatory requirements
• Take advantage of compression and archiving (like AWS Glacier)
• Data detection and masking on unstructured data upon ingest
• Near-zero latency, DevOps, Core component of business operations
Data Governance for the Data Lake
20. @joe_Caserta#BILeaderSummit
The Big Data Pyramid
Ingest Raw
Data
Organize, Define,
Complete
Munging, Blending
Machine Learning
Data Quality and Monitoring
Metadata, ILM , Security
Data Catalog
Data Integration
Fully Governed ( trusted)
Arbitrary/Ad-hoc Queries and
Reporting
Usage Pattern Data Governance
Metadata, ILM,
Security
21. @joe_Caserta#BILeaderSummit
The Refinery
• The feedback loop between Data Science, Data Warehouse and Data Lake is
critical
• Ephemeral Data Science Workbench
• Successful work products of science must Graduate into the appropriate layers
of the Data Lake
Cool New
Data
New
Insights
Governance
Refinery
22. @joe_Caserta#BILeaderSummit
Caution: Assembly Required
Some of the most hopeful tools are brand new or in
incubation!
Enterprise big data implementations typically combine
products with custom built components
Governance Tools
People, Processes and Business commitment is still critical!
Data Integration Data Catalog & Governance Emerging Solutions
23. @joe_Caserta#BILeaderSummit
Existing On-Premise Solution
• Challenges with operations of data servers in Data Center
• Increasing infrastructure complexity
• Keeping up with data growth
Cloud Advantages
• Reduced upfront capital investment
• Faster speed to value
• Elasticity
“Those that go out and buy expensive
infrastructure find that the problem scope and
domain shift really quickly. By the time they get
around to answering the original question, the
business has moved on.” - Matt Wood, AWS
Move to the Cloud?
24. @joe_Caserta#BILeaderSummit
Come out and Play
CIL - Caserta
Innovations Lab
Experience
Big Data Warehousing Meetup
• Meet monthly to share data best
practices, experiences
• 3,300+ Members
http://www.meetup.com/Big-Data-Warehousing/
Examples of Previous Topics
• Data Governance, Compliance &
Security in Hadoop w/Cloudera
• Real Time Trade Data Monitoring
with Storm & Cassandra
• Predictive Analytics
• Exploring Big Data Analytics
Techniques w/Datameer
• Using a Graph DB for MDM &
Relationship Mgmt
• Data Science w/Claudia
Perlcih & Revolution Analytics
• Processing 1.4 Trillion Events
in Hadoop
• Building a Relevance Engine
using Hadoop, Mahout & Pig
• Big Data 2.0 – YARN Distributed
ETL & SQL w/Hadoop
• Intro to NoSQL w/10GEN
Next Meetup: Introduction to Kudu
December 9, 2015
Co-host: