SlideShare a Scribd company logo
1 of 39
Download to read offline
Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
The Briefing Room
The Great Data Lakes: How to Approach a Big Data Implementation
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
Twitter Tag: #briefr The Briefing Room
  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
April: BIG DATA
May: CLOUD
June: INNOVATORS
Twitter Tag: #briefr The Briefing Room
Will History Repeat Itself Again?
Ø  Partitioning matters
Ø  File formats matter
Ø  Metadata matters
Ø  Access patterns matter
Hadoop may be
schema-agnostic, but
that doesn’t mean you
shouldn’t carefully plan
your implementation!
“I’ve always found that
plans are useless, but
planning is indispensable.”
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
Twitter Tag: #briefr The Briefing Room
Think Big, A Teradata Company
  Last year Teradata acquired Think Big Analytics, Inc., a
consulting and solutions company focused on big data
solutions
  Think Big has expertise in implementing a variety of open
source technologies, such as Hadoop, Hbase, Cassandra,
MongoDB and Storm, as well as experience with
Hortonworks, Cloudera and MapR
  Its consultants can assist with the planning, management
and deployment of big data implementations
Twitter Tag: #briefr The Briefing Room
Guest: Rick Stellwagen
Rick Stellwagen is Data Lake Program Director
at Think Big, A Teradata Company. Rick is
responsible for defining and rolling out a Data
Lake Solution portfolio, identifying and
integrating internal and external best in class
technologies. He is defining the deployment
model, offerings, skills, career path and
integrated capabilities required for data lake
construction and rollout. He also works with
product management, engineering, marketing
and external partner alliances to define
thought leadership positions and shape
product plans both internally and externally.
MAKING BIG DATA COME ALIVEMAKING BIG DATA COME ALIVE
​ Data Lake Deployment Best Practices
​ Rick Stellwagen, Data Lake Program Director
​ April 7, 2015
CONFIDENTIAL | 11
A centralized repository of raw data into which all data-producing streams
flow and from which downstream facilities may draw
What is a Data Lake?
11
Information Sources Data Lake Downstream
Facilities
Data Variety is the driving factor in building a Data Lake
CONFIDENTIAL | 12
Swamp Reservoir
Data Lake: Swamp or Reservoir?
12
CONFIDENTIAL | 13
Ÿ  Corporate Data Sourcing – Repository – System of Record
- Govern who, what and when data is accessed or provisioned
- Track usage, resolve anomalies, visualize, optimize and clarify data lineage
Ÿ  Historical Data Offload
- Offload history of operational and analytical data platforms
- Centralized control of restore capabilities and leverage deep data history
Ÿ  Data Discovery, Organization and Identification
- Gain ultimate flexibility in data use and access Schema on read
- Lightly conditioned, un-modeled, flexible modeling
Ÿ  ETL Offload
- Foundation for Data Integration – push staging to Hadoop
- Data Quality and validation
Ÿ  Business Reporting
- OLAP analysis sourced & processed directly from the data lake
Primary Data Lake Use Cases
13
CONFIDENTIAL | 14
•  A Data Reservoir is a managed Data Lake
that seeks to guarantee quality, access,
provenance, and governance.
•  An important extra guarantee that makes a
Data Reservoir is the presence of metadata
that might enable non subject matter experts to
easily know the location of and entitlements to
the various forms of stored data within.
•  Schema Metadata is always a given, but……
14
Data Lake: Swamp or Reservoir?
CONFIDENTIAL | 15
Business-Ontology
15
How does this data 

relate to other data?

How do we classify this data

within the business?
CONFIDENTIAL | 1616
Business-Security
Who can read thedata?

Who owns the data?

Who belongs to what
group?

LDAP
Argus
Unix bitmask
Permissions
Who can see a
column?
CONFIDENTIAL | 1717
Operational
Where did my

data come from?

Any environmental context 

about the landing zone, OS,

where my data came from?

What processes
touched my data?

When did my data 

get ingested?

... get transformed?

... get exported?

Identity?
CONFIDENTIAL | 1818
Business-Index
What contents are in

a file?

What is the data 

serialization?

Where can

we find certain

content in the file?

What terms are

in the contents?

e-Discoverysolr
a lotof NoSQL
File
Magic Number
CONFIDENTIAL | 1919
Business-Schema
How does my data

denormalize?

How should I interpret

my data?

What are my column
names?

Are there any
“important”
dimensions?

Metarepository
HCatalog
CONFIDENTIAL | 20 20
Data Lake
Information Sources
Evaluate
Source Data Ingest
Collect & Manage
Metadata
Profile - Structure
Sequence
Downstream
Facilities
Generate Reports
Discovery Signals
Compress
Automate
Protect
Prepare Data
for Ingest
Prepare Source
Metadata
Assembling the Reservoir
Perimeter-Authentication-Authorization
Data Hub
Generate
Reports
CONFIDENTIAL | 21
Enterprise Data Lake Architecture
21
Ÿ  Each Region has different
“areas”
Ÿ  Three areas for three
types of usage
-  Data Treatment
-  Data Reservoir
-  Data Lab
Regional Data Treatment Facility
Regional Reservoir Regional Lab
Op Meta
Data
Index
Collection Pools
Ingest Zone SOR
Zone
Export
Zone
Orchestration VM
Orchestration
DB
Monitoring
Master
Compute Cluster
Biiz Meta
Data
Index
Orchestration VM
Orchestration
DB
Monitoring
Lake
Master
Data
Export
Zone
<LOB>
Zone
Master
Compute Cluster
Lake
Master
Data
<Insight B><Insight A>
VCC VCC
Processes
op md index
HAR Compactor
Ingestion/SOR
Reconciliation
de-dup
key
generation
Processes
x
correlate
x
co-locate
x
cleanse
de-ident
X Y
Virtual
Compute Cluster
continuous
bulk
metadata
capture
metadata
capture
metadata
capture
de-identification
Key: Validate that Ingestion captures Metadata
CONFIDENTIAL | 22
Data Treatment
22
Ÿ  Used by Operations only
Ÿ  Restricted
Ÿ  Non-business process
Ÿ  Lowest-Common-
Denominator Data
Serialization
Ÿ  The entry point for ALL
your data
Master
Compute Cluster
Ingest Zone SOR
Zone
Export
Zone
Op Meta
Data Index
MonitoringOrchestration
DB
Orchestration VM
Regional Data Treatment Facility
Collection Pools
continuous
bulk
metadata
capture
Make sure you capture
Metadata!
Or you risk a swamp
downstream
CONFIDENTIAL | 23
Master
Data
<LOB>
Zone
Export
Zone
Master
Compute Cluster
MonitoringOrchestration
DB
Orchestration VM
Lake
Biz Meta
Data Index
MPPFastAnalytics
Regional ReservoirProcesses
x
correlate
x
co-locate
x
cleanse
de-ident
Data Reservoir
23
Ÿ  Used by Business AND
Operations
Ÿ  Marting !
Ÿ  Business processes
Ÿ  DSS
Ÿ  No Ad Hoc
Ÿ  Business Restricted
Ÿ  First Introduction of SME
Don’t let in
un-vetted data!
CONFIDENTIAL | 24
Data Lab
24
Ÿ  Used by business
primarily
Ÿ  “Un-Safe” Data
Ÿ  Ephemeral (think
virtualization)
Ÿ  Highly experimental
Ÿ  New technologies
Ÿ  Ad Hoc
Regional Lab
Lake
Master
Data
<Insight B><Insight A>
VCC VCC
X Y
Virtual
Compute Cluster
CONFIDENTIAL | 25
•  Know where you are headed – build on Roadmap or Optimizer Planning
•  Quickly put into practice references for company wide Data Lake ingest
•  Establish data lineage and governance tracking with metadata services
•  Establish standards and practices to scale out your data ingest
•  Develop standards for doing profiling and discovery
•  Build out a pipeline framework for data transformations
•  Develop a Security Plan (perimeter, authentication & authorization)
•  Develop an archive and information security approach
•  Plan out next steps and approach for discovery and reporting
Data Lake Best Practices
25
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
Robin Bloor, PhD
There Has Been a Clear Shift
Analytics & BI were
previously EDW-centric
They are becoming
Data Lake-centric
§  Inexpensive (?)
§  Any data
§  May have metadata
§  Poor performance
§  Weak scheduling
§  Weak data mgmt
§  Security?
§  Data Lake
§  Expensive
§  Prepared data
§  Will have metadata
§  Optimized performance
§  Optimized scheduling
§  Good data mgmt
§  Secure
§  Data workhorse
Hadoop vs Data Mgmt Engine
Hadoop DBMS/EDW
Big Data Architecture - 1
Think Logical, Implement Physical
Big Data Architecture - 2
Big Data Architecture - 3
§  Multiple local instances of Hadoop
§  Weak data placement
§  Metadata chaos
§  Lack of tuning capability
§  Security (expense)
§  User self-service becoming a file
system nightmare
Straws in the Wind
Operational Concerns
The Need for Best Practices
This is clear:
Data Lake is a new idea
u  Is a data lake really just a multiplicity of data
marts growing wild?
u  Aside from performance-critical workloads, what
should Hadoop not be used for?
u  Do you have any specific recommendations for
metadata management in a data lake?
u  Is there a need for enforced provenance &
lineage?
u  Security question: Encryption?
u  Where does streaming fit into the picture?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
April: BIG DATA
May: CLOUD
June: INNOVATORS
Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of
Wikimedia Commons

More Related Content

What's hot

Google v Oracle: The Future of Software and Fair Use
Google v Oracle: The Future of Software and Fair UseGoogle v Oracle: The Future of Software and Fair Use
Google v Oracle: The Future of Software and Fair UseAurora Consulting
 
apidays LIVE New York 2021 - Designing API's: Less Data is More! by Damir Svr...
apidays LIVE New York 2021 - Designing API's: Less Data is More! by Damir Svr...apidays LIVE New York 2021 - Designing API's: Less Data is More! by Damir Svr...
apidays LIVE New York 2021 - Designing API's: Less Data is More! by Damir Svr...apidays
 
Punta Dreamin 17 Generic Apex and Tooling Api
Punta Dreamin 17 Generic Apex and Tooling ApiPunta Dreamin 17 Generic Apex and Tooling Api
Punta Dreamin 17 Generic Apex and Tooling ApiAdam Olshansky
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Sri Ambati
 
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...apidays
 
LilaPapiernikResume.DeveloperII
LilaPapiernikResume.DeveloperIILilaPapiernikResume.DeveloperII
LilaPapiernikResume.DeveloperIILila Papiernik
 
apidays LIVE Paris - GraphQL meshes by Jens Neuse
apidays LIVE Paris - GraphQL meshes by Jens Neuseapidays LIVE Paris - GraphQL meshes by Jens Neuse
apidays LIVE Paris - GraphQL meshes by Jens Neuseapidays
 
Building & scaling a live streaming mobile platform - Gr8 road to fame
Building & scaling a live streaming mobile platform - Gr8 road to fameBuilding & scaling a live streaming mobile platform - Gr8 road to fame
Building & scaling a live streaming mobile platform - Gr8 road to fameIndicThreads
 
apidays LIVE London 2021 - Consumer-first APIs in Open Banking by Chris Dudle...
apidays LIVE London 2021 - Consumer-first APIs in Open Banking by Chris Dudle...apidays LIVE London 2021 - Consumer-first APIs in Open Banking by Chris Dudle...
apidays LIVE London 2021 - Consumer-first APIs in Open Banking by Chris Dudle...apidays
 
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays
 
.NET executable requirements
.NET executable requirements.NET executable requirements
.NET executable requirementsGodfrey Nolan
 
apidays LIVE Paris 2021 - Learning the Language of HTTP for a Better Data Exp...
apidays LIVE Paris 2021 - Learning the Language of HTTP for a Better Data Exp...apidays LIVE Paris 2021 - Learning the Language of HTTP for a Better Data Exp...
apidays LIVE Paris 2021 - Learning the Language of HTTP for a Better Data Exp...apidays
 
Whitebox Testing for Blackbox Testers: Simplifying API Testing
Whitebox Testing for Blackbox Testers: Simplifying API TestingWhitebox Testing for Blackbox Testers: Simplifying API Testing
Whitebox Testing for Blackbox Testers: Simplifying API TestingQASymphony
 
apidays LIVE Paris - Augmenting a Legacy REST API with GraphQL by Clément Vil...
apidays LIVE Paris - Augmenting a Legacy REST API with GraphQL by Clément Vil...apidays LIVE Paris - Augmenting a Legacy REST API with GraphQL by Clément Vil...
apidays LIVE Paris - Augmenting a Legacy REST API with GraphQL by Clément Vil...apidays
 
Successfully Implementing BDD in an Agile World
Successfully Implementing BDD in an Agile WorldSuccessfully Implementing BDD in an Agile World
Successfully Implementing BDD in an Agile WorldSmartBear
 
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?VMware Tanzu
 
Feature Flagging to Reduce Risk in Database Migrations
Feature Flagging to Reduce Risk in Database Migrations Feature Flagging to Reduce Risk in Database Migrations
Feature Flagging to Reduce Risk in Database Migrations LaunchDarkly
 
Cultivating Your Design Heuristics
Cultivating Your Design HeuristicsCultivating Your Design Heuristics
Cultivating Your Design HeuristicsRebecca Wirfs-Brock
 

What's hot (20)

Google v Oracle: The Future of Software and Fair Use
Google v Oracle: The Future of Software and Fair UseGoogle v Oracle: The Future of Software and Fair Use
Google v Oracle: The Future of Software and Fair Use
 
apidays LIVE New York 2021 - Designing API's: Less Data is More! by Damir Svr...
apidays LIVE New York 2021 - Designing API's: Less Data is More! by Damir Svr...apidays LIVE New York 2021 - Designing API's: Less Data is More! by Damir Svr...
apidays LIVE New York 2021 - Designing API's: Less Data is More! by Damir Svr...
 
Punta Dreamin 17 Generic Apex and Tooling Api
Punta Dreamin 17 Generic Apex and Tooling ApiPunta Dreamin 17 Generic Apex and Tooling Api
Punta Dreamin 17 Generic Apex and Tooling Api
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
 
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
 
LilaPapiernikResume.DeveloperII
LilaPapiernikResume.DeveloperIILilaPapiernikResume.DeveloperII
LilaPapiernikResume.DeveloperII
 
apidays LIVE Paris - GraphQL meshes by Jens Neuse
apidays LIVE Paris - GraphQL meshes by Jens Neuseapidays LIVE Paris - GraphQL meshes by Jens Neuse
apidays LIVE Paris - GraphQL meshes by Jens Neuse
 
Building & scaling a live streaming mobile platform - Gr8 road to fame
Building & scaling a live streaming mobile platform - Gr8 road to fameBuilding & scaling a live streaming mobile platform - Gr8 road to fame
Building & scaling a live streaming mobile platform - Gr8 road to fame
 
apidays LIVE London 2021 - Consumer-first APIs in Open Banking by Chris Dudle...
apidays LIVE London 2021 - Consumer-first APIs in Open Banking by Chris Dudle...apidays LIVE London 2021 - Consumer-first APIs in Open Banking by Chris Dudle...
apidays LIVE London 2021 - Consumer-first APIs in Open Banking by Chris Dudle...
 
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
 
.NET executable requirements
.NET executable requirements.NET executable requirements
.NET executable requirements
 
apidays LIVE Paris 2021 - Learning the Language of HTTP for a Better Data Exp...
apidays LIVE Paris 2021 - Learning the Language of HTTP for a Better Data Exp...apidays LIVE Paris 2021 - Learning the Language of HTTP for a Better Data Exp...
apidays LIVE Paris 2021 - Learning the Language of HTTP for a Better Data Exp...
 
Agile Not Fragile
Agile Not FragileAgile Not Fragile
Agile Not Fragile
 
Whitebox Testing for Blackbox Testers: Simplifying API Testing
Whitebox Testing for Blackbox Testers: Simplifying API TestingWhitebox Testing for Blackbox Testers: Simplifying API Testing
Whitebox Testing for Blackbox Testers: Simplifying API Testing
 
apidays LIVE Paris - Augmenting a Legacy REST API with GraphQL by Clément Vil...
apidays LIVE Paris - Augmenting a Legacy REST API with GraphQL by Clément Vil...apidays LIVE Paris - Augmenting a Legacy REST API with GraphQL by Clément Vil...
apidays LIVE Paris - Augmenting a Legacy REST API with GraphQL by Clément Vil...
 
Successfully Implementing BDD in an Agile World
Successfully Implementing BDD in an Agile WorldSuccessfully Implementing BDD in an Agile World
Successfully Implementing BDD in an Agile World
 
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?
 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Design
 
Feature Flagging to Reduce Risk in Database Migrations
Feature Flagging to Reduce Risk in Database Migrations Feature Flagging to Reduce Risk in Database Migrations
Feature Flagging to Reduce Risk in Database Migrations
 
Cultivating Your Design Heuristics
Cultivating Your Design HeuristicsCultivating Your Design Heuristics
Cultivating Your Design Heuristics
 

Similar to Grab coffee and enjoy pre-show banter before top of hour briefing

Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI StandardsArcadia Data
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution ShowcaseInside Analysis
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Unlocking the Power of the Data Lake
Unlocking the Power of the Data LakeUnlocking the Power of the Data Lake
Unlocking the Power of the Data LakeArcadia Data
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationInside Analysis
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesArcadia Data
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeJared Winick
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeKoverse, Inc.
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs
 

Similar to Grab coffee and enjoy pre-show banter before top of hour briefing (20)

Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI Standards
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Unlocking the Power of the Data Lake
Unlocking the Power of the Data LakeUnlocking the Power of the Data Lake
Unlocking the Power of the Data Lake
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
 

More from Inside Analysis

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIInside Analysis
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessInside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 

More from Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Grab coffee and enjoy pre-show banter before top of hour briefing

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. The Briefing Room The Great Data Lakes: How to Approach a Big Data Implementation
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. Twitter Tag: #briefr The Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics April: BIG DATA May: CLOUD June: INNOVATORS
  • 6. Twitter Tag: #briefr The Briefing Room Will History Repeat Itself Again? Ø  Partitioning matters Ø  File formats matter Ø  Metadata matters Ø  Access patterns matter Hadoop may be schema-agnostic, but that doesn’t mean you shouldn’t carefully plan your implementation! “I’ve always found that plans are useless, but planning is indispensable.”
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8. Twitter Tag: #briefr The Briefing Room Think Big, A Teradata Company   Last year Teradata acquired Think Big Analytics, Inc., a consulting and solutions company focused on big data solutions   Think Big has expertise in implementing a variety of open source technologies, such as Hadoop, Hbase, Cassandra, MongoDB and Storm, as well as experience with Hortonworks, Cloudera and MapR   Its consultants can assist with the planning, management and deployment of big data implementations
  • 9. Twitter Tag: #briefr The Briefing Room Guest: Rick Stellwagen Rick Stellwagen is Data Lake Program Director at Think Big, A Teradata Company. Rick is responsible for defining and rolling out a Data Lake Solution portfolio, identifying and integrating internal and external best in class technologies. He is defining the deployment model, offerings, skills, career path and integrated capabilities required for data lake construction and rollout. He also works with product management, engineering, marketing and external partner alliances to define thought leadership positions and shape product plans both internally and externally.
  • 10. MAKING BIG DATA COME ALIVEMAKING BIG DATA COME ALIVE ​ Data Lake Deployment Best Practices ​ Rick Stellwagen, Data Lake Program Director ​ April 7, 2015
  • 11. CONFIDENTIAL | 11 A centralized repository of raw data into which all data-producing streams flow and from which downstream facilities may draw What is a Data Lake? 11 Information Sources Data Lake Downstream Facilities Data Variety is the driving factor in building a Data Lake
  • 12. CONFIDENTIAL | 12 Swamp Reservoir Data Lake: Swamp or Reservoir? 12
  • 13. CONFIDENTIAL | 13 Ÿ  Corporate Data Sourcing – Repository – System of Record - Govern who, what and when data is accessed or provisioned - Track usage, resolve anomalies, visualize, optimize and clarify data lineage Ÿ  Historical Data Offload - Offload history of operational and analytical data platforms - Centralized control of restore capabilities and leverage deep data history Ÿ  Data Discovery, Organization and Identification - Gain ultimate flexibility in data use and access Schema on read - Lightly conditioned, un-modeled, flexible modeling Ÿ  ETL Offload - Foundation for Data Integration – push staging to Hadoop - Data Quality and validation Ÿ  Business Reporting - OLAP analysis sourced & processed directly from the data lake Primary Data Lake Use Cases 13
  • 14. CONFIDENTIAL | 14 •  A Data Reservoir is a managed Data Lake that seeks to guarantee quality, access, provenance, and governance. •  An important extra guarantee that makes a Data Reservoir is the presence of metadata that might enable non subject matter experts to easily know the location of and entitlements to the various forms of stored data within. •  Schema Metadata is always a given, but…… 14 Data Lake: Swamp or Reservoir?
  • 15. CONFIDENTIAL | 15 Business-Ontology 15 How does this data relate to other data? How do we classify this data within the business?
  • 16. CONFIDENTIAL | 1616 Business-Security Who can read thedata? Who owns the data? Who belongs to what group? LDAP Argus Unix bitmask Permissions Who can see a column?
  • 17. CONFIDENTIAL | 1717 Operational Where did my data come from? Any environmental context about the landing zone, OS, where my data came from? What processes touched my data? When did my data get ingested? ... get transformed? ... get exported? Identity?
  • 18. CONFIDENTIAL | 1818 Business-Index What contents are in a file? What is the data serialization? Where can we find certain content in the file? What terms are in the contents? e-Discoverysolr a lotof NoSQL File Magic Number
  • 19. CONFIDENTIAL | 1919 Business-Schema How does my data denormalize? How should I interpret my data? What are my column names? Are there any “important” dimensions? Metarepository HCatalog
  • 20. CONFIDENTIAL | 20 20 Data Lake Information Sources Evaluate Source Data Ingest Collect & Manage Metadata Profile - Structure Sequence Downstream Facilities Generate Reports Discovery Signals Compress Automate Protect Prepare Data for Ingest Prepare Source Metadata Assembling the Reservoir Perimeter-Authentication-Authorization Data Hub Generate Reports
  • 21. CONFIDENTIAL | 21 Enterprise Data Lake Architecture 21 Ÿ  Each Region has different “areas” Ÿ  Three areas for three types of usage -  Data Treatment -  Data Reservoir -  Data Lab Regional Data Treatment Facility Regional Reservoir Regional Lab Op Meta Data Index Collection Pools Ingest Zone SOR Zone Export Zone Orchestration VM Orchestration DB Monitoring Master Compute Cluster Biiz Meta Data Index Orchestration VM Orchestration DB Monitoring Lake Master Data Export Zone <LOB> Zone Master Compute Cluster Lake Master Data <Insight B><Insight A> VCC VCC Processes op md index HAR Compactor Ingestion/SOR Reconciliation de-dup key generation Processes x correlate x co-locate x cleanse de-ident X Y Virtual Compute Cluster continuous bulk metadata capture metadata capture metadata capture de-identification Key: Validate that Ingestion captures Metadata
  • 22. CONFIDENTIAL | 22 Data Treatment 22 Ÿ  Used by Operations only Ÿ  Restricted Ÿ  Non-business process Ÿ  Lowest-Common- Denominator Data Serialization Ÿ  The entry point for ALL your data Master Compute Cluster Ingest Zone SOR Zone Export Zone Op Meta Data Index MonitoringOrchestration DB Orchestration VM Regional Data Treatment Facility Collection Pools continuous bulk metadata capture Make sure you capture Metadata! Or you risk a swamp downstream
  • 23. CONFIDENTIAL | 23 Master Data <LOB> Zone Export Zone Master Compute Cluster MonitoringOrchestration DB Orchestration VM Lake Biz Meta Data Index MPPFastAnalytics Regional ReservoirProcesses x correlate x co-locate x cleanse de-ident Data Reservoir 23 Ÿ  Used by Business AND Operations Ÿ  Marting ! Ÿ  Business processes Ÿ  DSS Ÿ  No Ad Hoc Ÿ  Business Restricted Ÿ  First Introduction of SME Don’t let in un-vetted data!
  • 24. CONFIDENTIAL | 24 Data Lab 24 Ÿ  Used by business primarily Ÿ  “Un-Safe” Data Ÿ  Ephemeral (think virtualization) Ÿ  Highly experimental Ÿ  New technologies Ÿ  Ad Hoc Regional Lab Lake Master Data <Insight B><Insight A> VCC VCC X Y Virtual Compute Cluster
  • 25. CONFIDENTIAL | 25 •  Know where you are headed – build on Roadmap or Optimizer Planning •  Quickly put into practice references for company wide Data Lake ingest •  Establish data lineage and governance tracking with metadata services •  Establish standards and practices to scale out your data ingest •  Develop standards for doing profiling and discovery •  Build out a pipeline framework for data transformations •  Develop a Security Plan (perimeter, authentication & authorization) •  Develop an archive and information security approach •  Plan out next steps and approach for discovery and reporting Data Lake Best Practices 25
  • 26. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 28. There Has Been a Clear Shift Analytics & BI were previously EDW-centric They are becoming Data Lake-centric
  • 29. §  Inexpensive (?) §  Any data §  May have metadata §  Poor performance §  Weak scheduling §  Weak data mgmt §  Security? §  Data Lake §  Expensive §  Prepared data §  Will have metadata §  Optimized performance §  Optimized scheduling §  Good data mgmt §  Secure §  Data workhorse Hadoop vs Data Mgmt Engine Hadoop DBMS/EDW
  • 30. Big Data Architecture - 1 Think Logical, Implement Physical
  • 33. §  Multiple local instances of Hadoop §  Weak data placement §  Metadata chaos §  Lack of tuning capability §  Security (expense) §  User self-service becoming a file system nightmare Straws in the Wind Operational Concerns
  • 34. The Need for Best Practices This is clear: Data Lake is a new idea
  • 35. u  Is a data lake really just a multiplicity of data marts growing wild? u  Aside from performance-critical workloads, what should Hadoop not be used for? u  Do you have any specific recommendations for metadata management in a data lake? u  Is there a need for enforced provenance & lineage?
  • 36. u  Security question: Encryption? u  Where does streaming fit into the picture?
  • 37. Twitter Tag: #briefr The Briefing Room
  • 38. Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com April: BIG DATA May: CLOUD June: INNOVATORS
  • 39. Twitter Tag: #briefr The Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons