SlideShare a Scribd company logo
1 of 26
@joe_Caserta#BILeaderSummit
The Impact of the Data Lake on Enterprise BI
Presented by:
Joe Caserta
@joe_Caserta#BILeaderSummit
@joe_Caserta#BILeaderSummit
About Caserta Concepts
• Consulting Data Innovation and Modern Data Engineering
• Award-winning company
• Internationally recognized work force
• Trusted thought leaders
• Innovation Partner
• Strategic Consulting
• Advanced Architecture
• Execution & Implementation
• Leader in Enterprise Data Solutions
• Big Data Analytics
• Data Warehousing
• Business Intelligence
• Data Science
• Cloud Computing
• Data Governance
@joe_Caserta#BILeaderSummit
Client Portfolio
Retail/eCommerce
& Manufacturing
Digital Media/AdTech
Education & Services
Finance. Healthcare
& Insurance
@joe_Caserta#BILeaderSummit
The Future of Data is Today
As a Mindful Cyborg, Chris
Dancy utilizes up to
700 sensors, devices,
applications, and services to
track, analyze, and optimize as
many areas of his existence.
Data quantification enables
him to see the connections of
otherwise invisible data,
resulting in dramatic upgrades
to his health, productivity, and
quality of life.
@joe_Caserta#BILeaderSummit
The Evolution of Data Analytics
Descriptive
Analytics
Diagnostic
Analytics
Predictive
Analytics
Prescriptive
Analytics
What
happened?
Why did it
happen?
What will
happen?
How can we make
It happen?
Data Analytics Sophistication
BusinessValue
Source: Gartner
Reports  Correlations  Predictions  Recommendations
@joe_Caserta#BILeaderSummit
The Evolution of Data Analytics
Source: Gartner
Reports  Correlations  Predictions  Recommendations
Cognitive Computing / Cognitive Data Analytics
@joe_Caserta#BILeaderSummit
Traditional Data Analytics Methods
• Design – Top Down, Bottom Up
• Customer Interviews and requirements gathering
• Data Profiling
• Create Data Models
• Facts and Dimensions
• Extract Transform Load (ETL)
• Copy data from sources to data warehouse
• Data Governance
• Stewardship, business rules, data quality
• Put a BI Tool on Top
• Design semantic layer
• Develop reports
@joe_Caserta#BILeaderSummit
A Day in the Life
• Onboarding new data is difficult!
• Rigid Structures and Data Governance
• IT Disconnected from business requirements:
“Hey – I need to analyze some new data”
 IT Conforms and profiles the data
 Loads it into dimensional models
 Builds a semantic layer nobody is going to use
 Creates a dashboard we hope someone will notice
...and then you can access your data 3-6 months later to see if it has
value!
@joe_Caserta#BILeaderSummit
Houston, we have a Problem: Data Sprawl
• There is one application for every 5-10 employees generating copies of
the same files leading to massive amounts of duplicate idle data strewn all
across the enterprise. - Michael Vizard, ITBusinessEdge.com
• Employees spend 35% of their work time searching for information...
finding what they seek 50% of the time or less.
- “The High Cost of Not Finding Information,” IDC
@joe_Caserta#BILeaderSummit
@joe_Caserta#BILeaderSummit
@joe_Caserta#BILeaderSummit
OLD WAY:
• Structure  Ingest  Analyze
• Fixed Capacity
• Monolithic
NEW WAY:
• Ingest  Analyze  Structure
• Dynamic Capacity
• Ecosystem
RECIPE:
• Data Lake
• Cloud
• Polyglot Data Landscape
The Paradigm Shift
Big Data is not the problem,
It’s the Change Agent
@joe_Caserta#BILeaderSummit
Enrollments
Claims
Finance
ETL
Horizontally Scalable Environment - Optimized for Analytics
Data Lake
Big Data Analytics
NoSQL
DatabasesETL
Ad-Hoc/Canned
Reporting/Dashboards
Traditional BI
Spark MapReduce Pig/Hive
N1 N2 N4N3 N5
Hadoop Distributed File System (HDFS)
Traditional
EDW
Others…
The Evolution of Data Analytics
@joe_Caserta#BILeaderSummit
Innovation is the only sustainable competitive advantage a company can have
Innovations may fail, but companies that don’t innovate will fail
@joe_Caserta#BILeaderSummit
@joe_Caserta#BILeaderSummit
Technology:
• Scalable distributed storage  Hadoop, S3
• Pluggable fit-for-purpose processing  Spark, LAMBDA, EMR
Functional Capabilities:
• Remove barriers from data ingestion and analysis
• Storage and processing for all data
• Tunable Governance
@joe_Caserta#BILeaderSummit
•This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization
•Definitions, lineage (where does this data come from), business definitions, technical
metadataMetadata
•Identify and control sensitive data, regulatory compliancePrivacy/Security
•Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring
•Policies around data frequency, source availability, etc.Business Process Integration
•Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management
•Data retention, purge schedule, storage/archiving
Information Lifecycle
Management (ILM)
Wait! What about Data Governance?
@joe_Caserta#BILeaderSummit
•This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization
•Definitions, lineage (where does this data come from), business definitions, technical
metadataMetadata
•Identify and control sensitive data, regulatory compliancePrivacy/Security
•Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring
•Policies around data frequency, source availability, etc.Business Process Integration
•Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management
•Data retention, purge schedule, storage/archiving
Information Lifecycle
Management (ILM)
• Add Big Data to overall framework and assign responsibility
• Assign data scientists as stewards. New data sets (twitter, call center logs, etc.)
• Graph databases are more flexible than relational
• Lower latency service required
• Data Quality and Monitoring (probably home grown)
• Quality checks not only SQL: python, java, machine learning
• Larger scale
• Integrate with Hive Metastore, HCatalog, emerging products, custom tables
• Ephemeral Data, Fleeting
• Automated retention and purge for regulatory requirements
• Take advantage of compression and archiving (like AWS Glacier)
• Data detection and masking on unstructured data upon ingest
• Near-zero latency, DevOps, Core component of business operations
Data Governance for the Data Lake
@joe_Caserta#BILeaderSummit
The Big Data Pyramid
Ingest Raw
Data
Organize, Define,
Complete
Munging, Blending
Machine Learning
Data Quality and Monitoring
Metadata, ILM , Security
Data Catalog
Data Integration
Fully Governed ( trusted)
Arbitrary/Ad-hoc Queries and
Reporting
Usage Pattern Data Governance
Metadata, ILM,
Security
@joe_Caserta#BILeaderSummit
The Refinery
• The feedback loop between Data Science, Data Warehouse and Data Lake is
critical
• Ephemeral Data Science Workbench
• Successful work products of science must Graduate into the appropriate layers
of the Data Lake
Cool New
Data
New
Insights
Governance
Refinery
@joe_Caserta#BILeaderSummit
Caution: Assembly Required
 Some of the most hopeful tools are brand new or in
incubation!
 Enterprise big data implementations typically combine
products with custom built components
Governance Tools
People, Processes and Business commitment is still critical!
Data Integration Data Catalog & Governance Emerging Solutions
@joe_Caserta#BILeaderSummit
Existing On-Premise Solution
• Challenges with operations of data servers in Data Center
• Increasing infrastructure complexity
• Keeping up with data growth
Cloud Advantages
• Reduced upfront capital investment
• Faster speed to value
• Elasticity
“Those that go out and buy expensive
infrastructure find that the problem scope and
domain shift really quickly. By the time they get
around to answering the original question, the
business has moved on.” - Matt Wood, AWS
Move to the Cloud?
@joe_Caserta#BILeaderSummit
Come out and Play
CIL - Caserta
Innovations Lab
Experience
Big Data Warehousing Meetup
• Meet monthly to share data best
practices, experiences
• 3,300+ Members
http://www.meetup.com/Big-Data-Warehousing/
Examples of Previous Topics
• Data Governance, Compliance &
Security in Hadoop w/Cloudera
• Real Time Trade Data Monitoring
with Storm & Cassandra
• Predictive Analytics
• Exploring Big Data Analytics
Techniques w/Datameer
• Using a Graph DB for MDM &
Relationship Mgmt
• Data Science w/Claudia
Perlcih & Revolution Analytics
• Processing 1.4 Trillion Events
in Hadoop
• Building a Relevance Engine
using Hadoop, Mahout & Pig
• Big Data 2.0 – YARN Distributed
ETL & SQL w/Hadoop
• Intro to NoSQL w/10GEN
Next Meetup: Introduction to Kudu
December 9, 2015
Co-host:
@joe_Caserta#BILeaderSummit
Thank You / Q&A
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
@joe_Caserta
@joe_Caserta#BILeaderSummit
The Data Scientist Winning Trifecta
Modern Data
Engineering/Data
Preparation
Domain
Knowledge/Business
Expertise
Advanced
Mathematics/
Statistics

More Related Content

Viewers also liked

Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
Data Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOData Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOCaserta
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
Neo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementNeo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementCaserta
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityCaserta
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data LakeWaterlineData
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Data Lake vs. Data Warehouse: Which is Right for Healthcare?Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Data Lake vs. Data Warehouse: Which is Right for Healthcare?Health Catalyst
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementBig MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
Webinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramWebinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramDATAVERSITY
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 

Viewers also liked (17)

Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Data Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOData Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICO
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Neo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementNeo4j Solutions - Master Data Management
Neo4j Solutions - Master Data Management
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data Lake
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Data Lake vs. Data Warehouse: Which is Right for Healthcare?Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Data Lake vs. Data Warehouse: Which is Right for Healthcare?
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementBig MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Webinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramWebinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance Program
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsCaserta
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWSCaserta
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 

More from Caserta (17)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

The Impact of the Data Lake on Enterprise BI

  • 1. @joe_Caserta#BILeaderSummit The Impact of the Data Lake on Enterprise BI Presented by: Joe Caserta
  • 3. @joe_Caserta#BILeaderSummit About Caserta Concepts • Consulting Data Innovation and Modern Data Engineering • Award-winning company • Internationally recognized work force • Trusted thought leaders • Innovation Partner • Strategic Consulting • Advanced Architecture • Execution & Implementation • Leader in Enterprise Data Solutions • Big Data Analytics • Data Warehousing • Business Intelligence • Data Science • Cloud Computing • Data Governance
  • 4. @joe_Caserta#BILeaderSummit Client Portfolio Retail/eCommerce & Manufacturing Digital Media/AdTech Education & Services Finance. Healthcare & Insurance
  • 5. @joe_Caserta#BILeaderSummit The Future of Data is Today As a Mindful Cyborg, Chris Dancy utilizes up to 700 sensors, devices, applications, and services to track, analyze, and optimize as many areas of his existence. Data quantification enables him to see the connections of otherwise invisible data, resulting in dramatic upgrades to his health, productivity, and quality of life.
  • 6. @joe_Caserta#BILeaderSummit The Evolution of Data Analytics Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics What happened? Why did it happen? What will happen? How can we make It happen? Data Analytics Sophistication BusinessValue Source: Gartner Reports  Correlations  Predictions  Recommendations
  • 7. @joe_Caserta#BILeaderSummit The Evolution of Data Analytics Source: Gartner Reports  Correlations  Predictions  Recommendations Cognitive Computing / Cognitive Data Analytics
  • 8. @joe_Caserta#BILeaderSummit Traditional Data Analytics Methods • Design – Top Down, Bottom Up • Customer Interviews and requirements gathering • Data Profiling • Create Data Models • Facts and Dimensions • Extract Transform Load (ETL) • Copy data from sources to data warehouse • Data Governance • Stewardship, business rules, data quality • Put a BI Tool on Top • Design semantic layer • Develop reports
  • 9. @joe_Caserta#BILeaderSummit A Day in the Life • Onboarding new data is difficult! • Rigid Structures and Data Governance • IT Disconnected from business requirements: “Hey – I need to analyze some new data”  IT Conforms and profiles the data  Loads it into dimensional models  Builds a semantic layer nobody is going to use  Creates a dashboard we hope someone will notice ...and then you can access your data 3-6 months later to see if it has value!
  • 10. @joe_Caserta#BILeaderSummit Houston, we have a Problem: Data Sprawl • There is one application for every 5-10 employees generating copies of the same files leading to massive amounts of duplicate idle data strewn all across the enterprise. - Michael Vizard, ITBusinessEdge.com • Employees spend 35% of their work time searching for information... finding what they seek 50% of the time or less. - “The High Cost of Not Finding Information,” IDC
  • 13. @joe_Caserta#BILeaderSummit OLD WAY: • Structure  Ingest  Analyze • Fixed Capacity • Monolithic NEW WAY: • Ingest  Analyze  Structure • Dynamic Capacity • Ecosystem RECIPE: • Data Lake • Cloud • Polyglot Data Landscape The Paradigm Shift Big Data is not the problem, It’s the Change Agent
  • 14. @joe_Caserta#BILeaderSummit Enrollments Claims Finance ETL Horizontally Scalable Environment - Optimized for Analytics Data Lake Big Data Analytics NoSQL DatabasesETL Ad-Hoc/Canned Reporting/Dashboards Traditional BI Spark MapReduce Pig/Hive N1 N2 N4N3 N5 Hadoop Distributed File System (HDFS) Traditional EDW Others… The Evolution of Data Analytics
  • 15. @joe_Caserta#BILeaderSummit Innovation is the only sustainable competitive advantage a company can have Innovations may fail, but companies that don’t innovate will fail
  • 17. @joe_Caserta#BILeaderSummit Technology: • Scalable distributed storage  Hadoop, S3 • Pluggable fit-for-purpose processing  Spark, LAMBDA, EMR Functional Capabilities: • Remove barriers from data ingestion and analysis • Storage and processing for all data • Tunable Governance
  • 18. @joe_Caserta#BILeaderSummit •This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization •Definitions, lineage (where does this data come from), business definitions, technical metadataMetadata •Identify and control sensitive data, regulatory compliancePrivacy/Security •Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring •Policies around data frequency, source availability, etc.Business Process Integration •Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management •Data retention, purge schedule, storage/archiving Information Lifecycle Management (ILM) Wait! What about Data Governance?
  • 19. @joe_Caserta#BILeaderSummit •This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization •Definitions, lineage (where does this data come from), business definitions, technical metadataMetadata •Identify and control sensitive data, regulatory compliancePrivacy/Security •Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring •Policies around data frequency, source availability, etc.Business Process Integration •Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management •Data retention, purge schedule, storage/archiving Information Lifecycle Management (ILM) • Add Big Data to overall framework and assign responsibility • Assign data scientists as stewards. New data sets (twitter, call center logs, etc.) • Graph databases are more flexible than relational • Lower latency service required • Data Quality and Monitoring (probably home grown) • Quality checks not only SQL: python, java, machine learning • Larger scale • Integrate with Hive Metastore, HCatalog, emerging products, custom tables • Ephemeral Data, Fleeting • Automated retention and purge for regulatory requirements • Take advantage of compression and archiving (like AWS Glacier) • Data detection and masking on unstructured data upon ingest • Near-zero latency, DevOps, Core component of business operations Data Governance for the Data Lake
  • 20. @joe_Caserta#BILeaderSummit The Big Data Pyramid Ingest Raw Data Organize, Define, Complete Munging, Blending Machine Learning Data Quality and Monitoring Metadata, ILM , Security Data Catalog Data Integration Fully Governed ( trusted) Arbitrary/Ad-hoc Queries and Reporting Usage Pattern Data Governance Metadata, ILM, Security
  • 21. @joe_Caserta#BILeaderSummit The Refinery • The feedback loop between Data Science, Data Warehouse and Data Lake is critical • Ephemeral Data Science Workbench • Successful work products of science must Graduate into the appropriate layers of the Data Lake Cool New Data New Insights Governance Refinery
  • 22. @joe_Caserta#BILeaderSummit Caution: Assembly Required  Some of the most hopeful tools are brand new or in incubation!  Enterprise big data implementations typically combine products with custom built components Governance Tools People, Processes and Business commitment is still critical! Data Integration Data Catalog & Governance Emerging Solutions
  • 23. @joe_Caserta#BILeaderSummit Existing On-Premise Solution • Challenges with operations of data servers in Data Center • Increasing infrastructure complexity • Keeping up with data growth Cloud Advantages • Reduced upfront capital investment • Faster speed to value • Elasticity “Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on.” - Matt Wood, AWS Move to the Cloud?
  • 24. @joe_Caserta#BILeaderSummit Come out and Play CIL - Caserta Innovations Lab Experience Big Data Warehousing Meetup • Meet monthly to share data best practices, experiences • 3,300+ Members http://www.meetup.com/Big-Data-Warehousing/ Examples of Previous Topics • Data Governance, Compliance & Security in Hadoop w/Cloudera • Real Time Trade Data Monitoring with Storm & Cassandra • Predictive Analytics • Exploring Big Data Analytics Techniques w/Datameer • Using a Graph DB for MDM & Relationship Mgmt • Data Science w/Claudia Perlcih & Revolution Analytics • Processing 1.4 Trillion Events in Hadoop • Building a Relevance Engine using Hadoop, Mahout & Pig • Big Data 2.0 – YARN Distributed ETL & SQL w/Hadoop • Intro to NoSQL w/10GEN Next Meetup: Introduction to Kudu December 9, 2015 Co-host:
  • 25. @joe_Caserta#BILeaderSummit Thank You / Q&A Joe Caserta President, Caserta Concepts joe@casertaconcepts.com @joe_Caserta
  • 26. @joe_Caserta#BILeaderSummit The Data Scientist Winning Trifecta Modern Data Engineering/Data Preparation Domain Knowledge/Business Expertise Advanced Mathematics/ Statistics