SlideShare a Scribd company logo
Building a modern data architecture
March 31, 2016
Ben Sharma | CEO and Founder
ben@zaloni.com
•  Award-winning provider of enterprise
data lake management solutions:
Integrated data lake management platform
Self-service data preparation
•  Data Lake Design and Implementation Services
•  Data Science Professional Services
2
Zaloni Proprietary
Delivering on the business of big data
Funded by top-tier technology
investors:
Data lakes will be central to the modern data architecture
Agility Insight Scalability
3
Zaloni Proprietary
•  Store all types data: structured and unstructured data
•  Store raw data in its original form for extended period of time
•  Uses various tools to correlate, enrich and query for insights
on the data
•  Provides democratized access via a single unified view
across the Enterprise
The promise of a data lake: All data is welcome….
Zaloni Proprietary4
Data architecture modernizationTraditionalNew
Data Lake
Sources ETL EDW
Derived
(Transformed)
Discovery Sandbox
EDW
Streaming
Unstructured Data
Various Sources
Zaloni Proprietary
Data Discovery
Analytics
BI
Data Science
Data Discovery
Analytics
BI
5
Data lake challenges and complications
•  Ingestion
•  Lack of Visibility
•  Privacy and Compliance
•  Quality Issues
•  Reliance on IT
•  Reusability
•  Rate of Change
•  Skills Gap
•  Complexity
Building: Managing: Delivering:
Zaloni Proprietary6
Engage the business
• Discover
• Enrich
• Provision
Govern the data in the lake
• Cleanse
• Secure
• Operationalize
Enable the data lake
• Ingest
• Organize
• Catalog
Data lake reference architecture
Consumption
Zone
Source
System
File Data
DB Data
ETL Extracts
Streaming
Transient
Loading Zone
Raw Data
Refined
Data
Trusted
Data
Discovery
Sandbox
Original unaltered
data attributes
Tokenized Data
APIs
Reference Data Master Data
Data Wrangling
Data Discovery
Exploratory Analytics
Metadata Data Quality Data Catalog Security
Data Lake
Integrate to
common format
Data Validation
Data Cleansing
Aggregations
OLTP or ODS
Enterprise Data
Warehouse
Logs
(or other unstructured
data)
Cloud Services
Business Analysts
Researchers
Data Scientists
Zaloni Proprietary
7
Data lake management platform
Unified Data Management
Managed Ingestion
Data Reliability
Data Visibility
Data Security and Privacy
Integrated
Data Lake
Management
Zaloni Proprietary8
•  Ability to ingest vast amounts of data
•  Ability to handle a wide variety of formats
(streaming, files, custom)
•  Ability to handle wide variety of sources
•  Capture operational metadata implicitly
as new data arrives
•  Build in repeatability through automation to pick up
incoming data and apply pre-defined processing
First things first….managed ingestion
Various
Sources
Streaming
Unstructured
Data
Zaloni Proprietary9
•  Reduced time to insight for analytics
•  File and record level watermarking provides data lineage
Capture metadata to improve data visibility and reliability
Type of Metadata Description Example
Technical Captures the form and structure
of each data set
Type of data (text, JSON, Avro), structure
of the data (fields and their types)
Operational Captures lineage, quality, profile
and provenance of the data
Source and target locations of data, size,
number of records, lineage
Business Captures what it all means to the
user
Business names, descriptions, tags,
quality and masking rules
Zaloni Proprietary10
Diagram derived from Gartner report on Self Service Data Preparation
•  Interactive data preparation to address errors, corrupted formats, duplicates
•  Data enrichment to go from raw to refined
•  Self service to prepare data without IT request/SQL knowledge
Data ready: Data preparation required for actionable data
Orchestrate and
automate workflows
Transform Refined
Data
Explore
BI Reports
Enterprise Data
Integrations
Data Science
Data Discovery
Analytics
Raw Data
Automation
Reusable
Transformations
Data Preparation
Zaloni Proprietary11
•  Data lakes enable multiple groups to share access
to centrally stored data
•  Differing permissions require enhanced data security
§  Mask or tokenize data before published in the lake for
consumption
§  Policy-based security
•  Metadata management enables audit and traceability
•  End result: more open and democratized access to
data in the lake for those with permission
Protect sensitive data
Zaloni Proprietary12
Discover, Enrich, Provision
Self Service Data Preparation for Analytics: Catalog, Wrangling, Collaboration
•  See what data is available across your enterprise
•  Blend data in the lake without a costly IT project
•  Perform interactive data-driven transformations
•  Collaborate and share data assets and transformations with peers
EXPLORE PREPARE OPERATIONALIZE
13 Zaloni Proprietary
Catalog with KPIs
Zaloni Confidential and Proprietary14
•  Seeing rapid increase of big data in the Cloud
•  Leverage cloud platforms as complementary to on-premises
•  Support sensitive data on premise and external data in the cloud
(e.g. client data, machine-generated)
Key data challenges for hybrid environments:
“Ground to Cloud” hybrid architectures
Zaloni Proprietary
VISIBILITY GOVERNANCE
Need enterprise-wide data catalog
(logical data lake)
Need consistent data governance
requirements for hybrid platforms
15
INGEST
Manage data ingestion
so you know what is your
Hadoop Data Lake
ORGANIZE
Define and capture
metadata for ease of
searching and browsing
ENRICH
Orchestrate and manage
the data preparation
process
ENGAGE
Data visibility and self-
service data preparation
Manage the complete data pipeline
16
Zaloni Proprietary
Network Data Lake architecture
BI Tools
Network Data Lake
Custom Apps
Data Warehouse
Custom Applications:
•  Subscriber Usage
•  Network Usage Exploration & Ad-hoc Analytics
Data Lake
Manage Ingestion Manage Metadata Manage, Monitor, Schedule
Operations and
Metadata Store
Data Quality &
Rules Engine
Transformation
Engine
Work flow
Executor
Enterprise Data
Warehouse
•  CDR
•  DPI
•  IPFIX
•  SNMP
•  RADIUS
Network Data
•  CRM
•  Billing
•  Inventory
Enterprise Data
Zaloni Proprietary
17
Managed data lake for healthcare payers
Data Lake Management
Edge Node
Data Sources
Relational
Streaming
Files
Data Lake
Configure Ingestion Administer Metadata Manage, Monitor, Schedule
Operations and
Metadata Store
Data Quality &
Rules Engine
Transformation
Engine
Workflow
Executor
Analytical
Applications
Enterprise Data
Warehouse
Consumers
Data Lake
•  Claims
•  EMR
•  Lab/Pathology
•  Pharmacy
•  Member
•  Social
•  Enterprise Data
Applications:
•  HEDIS Reporting
•  Bundle Payments
•  Medical Benefits
Management
•  Scorecards
•  Enterprise Reports
Batch
Ingestion
Streaming
Ingestion
Change Data
Capture
Data Sets:
18
Zaloni Proprietary
Data Lake for BCBS239 Compliance (RDARR)
Register/ update
metadata
RDBMS
Mainframes
Flat files
Binary files
Source Systems
Metadata
repositories
Metadata
Management
solution
Extract/ Read
metadata
Data Ingestion
Data Quality and
Validation
Layout
Standardization
Operational
Metadata
Generation
Data at Rest
Data Acquisition
Automation
•  Automated Data Acquisition Framework providing timeliness of data
•  Capture Metadata in all phases: Ingestion, Transformation
•  Integration with Enterprise Metadata Management
•  Integrated Data Quality Analysis
Zaloni Proprietary
19
Getting Started
Roadmap
Prototype
Analytics Strategy
Business drivers
AND
Business
Questions:
Where is fraud
occurring?
How to optimize
inventory?
Data
Use
Cases
Platform
Subject areas
Source system
Capabilities,
Process
Ingest,
Organize,
Enrich, Explore
Roadmap
Prototype
Analytics Strategy
1Questions 2 Inputs 3 Outcomes
Zaloni Proprietary
20
+ +
=
Stop by booth #1335
and ask for a copy of
our new book and a
free t-shirt!
DON’T GO IN THE DATA
LAKE WITHOUT US
Zaloni Proprietary

More Related Content

What's hot

Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
DataWorks Summit/Hadoop Summit
 
Most Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital EconomyMost Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital Economy
Robyn Bollhorst
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
BHASKAR CHAUDHURY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo
 
Why Use an Oracle Database?
Why Use an Oracle Database?Why Use an Oracle Database?
Why Use an Oracle Database?
Markus Michalewicz
 
Cloud Computing Model with Service Oriented Architecture
Cloud Computing Model with Service Oriented ArchitectureCloud Computing Model with Service Oriented Architecture
Cloud Computing Model with Service Oriented Architecture
Yan Zhao
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data ManagementBhavendra Chavan
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
Cloudera, Inc.
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
DATAVERSITY
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360
Capgemini
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
Amazon Web Services
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
Alan McSweeney
 

What's hot (20)

Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Most Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital EconomyMost Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital Economy
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
 
Why Use an Oracle Database?
Why Use an Oracle Database?Why Use an Oracle Database?
Why Use an Oracle Database?
 
SaaS Presentation
SaaS PresentationSaaS Presentation
SaaS Presentation
 
Cloud Computing Model with Service Oriented Architecture
Cloud Computing Model with Service Oriented ArchitectureCloud Computing Model with Service Oriented Architecture
Cloud Computing Model with Service Oriented Architecture
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Solution Blueprint - Customer 360
Solution Blueprint - Customer 360Solution Blueprint - Customer 360
Solution Blueprint - Customer 360
 

Viewers also liked

Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
Zaloni
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
Zaloni
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Zaloni
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
Zaloni
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
 
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big DataWebinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Zaloni
 
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereOvum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Zaloni
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...
Angad Singh
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Zaloni
 
Jump-Start Health Data Interoperability with Apigee Health APIx
Jump-Start Health Data Interoperability with Apigee Health APIxJump-Start Health Data Interoperability with Apigee Health APIx
Jump-Start Health Data Interoperability with Apigee Health APIx
Apigee | Google Cloud
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Institute of Contemporary Sciences
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
Designing a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion PipelineDesigning a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion Pipeline
DataScience
 
Developing Hadoop strategy for your Enterprise
Developing Hadoop strategy for your EnterpriseDeveloping Hadoop strategy for your Enterprise
Developing Hadoop strategy for your Enterprise
Avkash Chauhan
 
Технологии blockchain в здравоохранении
Технологии blockchain в здравоохраненииТехнологии blockchain в здравоохранении
Технологии blockchain в здравоохранении
Serge Dobridnjuk
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 

Viewers also liked (20)

Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big DataWebinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
 
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereOvum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
 
Jump-Start Health Data Interoperability with Apigee Health APIx
Jump-Start Health Data Interoperability with Apigee Health APIxJump-Start Health Data Interoperability with Apigee Health APIx
Jump-Start Health Data Interoperability with Apigee Health APIx
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Designing a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion PipelineDesigning a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion Pipeline
 
Developing Hadoop strategy for your Enterprise
Developing Hadoop strategy for your EnterpriseDeveloping Hadoop strategy for your Enterprise
Developing Hadoop strategy for your Enterprise
 
Технологии blockchain в здравоохранении
Технологии blockchain в здравоохраненииТехнологии blockchain в здравоохранении
Технологии blockchain в здравоохранении
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 

Similar to Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Dataconomy Media
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
Zaloni
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
Zaloni
 
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced AnalyticsOperationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced Analytics
IDEAS - Int'l Data Engineering and Science Association
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data Warehouse
Denodo
 
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
Denodo
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
Inside Analysis
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Denodo
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
Denodo
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Denodo
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 

Similar to Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016 (20)

Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced AnalyticsOperationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced Analytics
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data Warehouse
 
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

  • 1. Building a modern data architecture March 31, 2016 Ben Sharma | CEO and Founder ben@zaloni.com
  • 2. •  Award-winning provider of enterprise data lake management solutions: Integrated data lake management platform Self-service data preparation •  Data Lake Design and Implementation Services •  Data Science Professional Services 2 Zaloni Proprietary Delivering on the business of big data Funded by top-tier technology investors:
  • 3. Data lakes will be central to the modern data architecture Agility Insight Scalability 3 Zaloni Proprietary
  • 4. •  Store all types data: structured and unstructured data •  Store raw data in its original form for extended period of time •  Uses various tools to correlate, enrich and query for insights on the data •  Provides democratized access via a single unified view across the Enterprise The promise of a data lake: All data is welcome…. Zaloni Proprietary4
  • 5. Data architecture modernizationTraditionalNew Data Lake Sources ETL EDW Derived (Transformed) Discovery Sandbox EDW Streaming Unstructured Data Various Sources Zaloni Proprietary Data Discovery Analytics BI Data Science Data Discovery Analytics BI 5
  • 6. Data lake challenges and complications •  Ingestion •  Lack of Visibility •  Privacy and Compliance •  Quality Issues •  Reliance on IT •  Reusability •  Rate of Change •  Skills Gap •  Complexity Building: Managing: Delivering: Zaloni Proprietary6 Engage the business • Discover • Enrich • Provision Govern the data in the lake • Cleanse • Secure • Operationalize Enable the data lake • Ingest • Organize • Catalog
  • 7. Data lake reference architecture Consumption Zone Source System File Data DB Data ETL Extracts Streaming Transient Loading Zone Raw Data Refined Data Trusted Data Discovery Sandbox Original unaltered data attributes Tokenized Data APIs Reference Data Master Data Data Wrangling Data Discovery Exploratory Analytics Metadata Data Quality Data Catalog Security Data Lake Integrate to common format Data Validation Data Cleansing Aggregations OLTP or ODS Enterprise Data Warehouse Logs (or other unstructured data) Cloud Services Business Analysts Researchers Data Scientists Zaloni Proprietary 7
  • 8. Data lake management platform Unified Data Management Managed Ingestion Data Reliability Data Visibility Data Security and Privacy Integrated Data Lake Management Zaloni Proprietary8
  • 9. •  Ability to ingest vast amounts of data •  Ability to handle a wide variety of formats (streaming, files, custom) •  Ability to handle wide variety of sources •  Capture operational metadata implicitly as new data arrives •  Build in repeatability through automation to pick up incoming data and apply pre-defined processing First things first….managed ingestion Various Sources Streaming Unstructured Data Zaloni Proprietary9
  • 10. •  Reduced time to insight for analytics •  File and record level watermarking provides data lineage Capture metadata to improve data visibility and reliability Type of Metadata Description Example Technical Captures the form and structure of each data set Type of data (text, JSON, Avro), structure of the data (fields and their types) Operational Captures lineage, quality, profile and provenance of the data Source and target locations of data, size, number of records, lineage Business Captures what it all means to the user Business names, descriptions, tags, quality and masking rules Zaloni Proprietary10
  • 11. Diagram derived from Gartner report on Self Service Data Preparation •  Interactive data preparation to address errors, corrupted formats, duplicates •  Data enrichment to go from raw to refined •  Self service to prepare data without IT request/SQL knowledge Data ready: Data preparation required for actionable data Orchestrate and automate workflows Transform Refined Data Explore BI Reports Enterprise Data Integrations Data Science Data Discovery Analytics Raw Data Automation Reusable Transformations Data Preparation Zaloni Proprietary11
  • 12. •  Data lakes enable multiple groups to share access to centrally stored data •  Differing permissions require enhanced data security §  Mask or tokenize data before published in the lake for consumption §  Policy-based security •  Metadata management enables audit and traceability •  End result: more open and democratized access to data in the lake for those with permission Protect sensitive data Zaloni Proprietary12
  • 13. Discover, Enrich, Provision Self Service Data Preparation for Analytics: Catalog, Wrangling, Collaboration •  See what data is available across your enterprise •  Blend data in the lake without a costly IT project •  Perform interactive data-driven transformations •  Collaborate and share data assets and transformations with peers EXPLORE PREPARE OPERATIONALIZE 13 Zaloni Proprietary
  • 14. Catalog with KPIs Zaloni Confidential and Proprietary14
  • 15. •  Seeing rapid increase of big data in the Cloud •  Leverage cloud platforms as complementary to on-premises •  Support sensitive data on premise and external data in the cloud (e.g. client data, machine-generated) Key data challenges for hybrid environments: “Ground to Cloud” hybrid architectures Zaloni Proprietary VISIBILITY GOVERNANCE Need enterprise-wide data catalog (logical data lake) Need consistent data governance requirements for hybrid platforms 15
  • 16. INGEST Manage data ingestion so you know what is your Hadoop Data Lake ORGANIZE Define and capture metadata for ease of searching and browsing ENRICH Orchestrate and manage the data preparation process ENGAGE Data visibility and self- service data preparation Manage the complete data pipeline 16 Zaloni Proprietary
  • 17. Network Data Lake architecture BI Tools Network Data Lake Custom Apps Data Warehouse Custom Applications: •  Subscriber Usage •  Network Usage Exploration & Ad-hoc Analytics Data Lake Manage Ingestion Manage Metadata Manage, Monitor, Schedule Operations and Metadata Store Data Quality & Rules Engine Transformation Engine Work flow Executor Enterprise Data Warehouse •  CDR •  DPI •  IPFIX •  SNMP •  RADIUS Network Data •  CRM •  Billing •  Inventory Enterprise Data Zaloni Proprietary 17
  • 18. Managed data lake for healthcare payers Data Lake Management Edge Node Data Sources Relational Streaming Files Data Lake Configure Ingestion Administer Metadata Manage, Monitor, Schedule Operations and Metadata Store Data Quality & Rules Engine Transformation Engine Workflow Executor Analytical Applications Enterprise Data Warehouse Consumers Data Lake •  Claims •  EMR •  Lab/Pathology •  Pharmacy •  Member •  Social •  Enterprise Data Applications: •  HEDIS Reporting •  Bundle Payments •  Medical Benefits Management •  Scorecards •  Enterprise Reports Batch Ingestion Streaming Ingestion Change Data Capture Data Sets: 18 Zaloni Proprietary
  • 19. Data Lake for BCBS239 Compliance (RDARR) Register/ update metadata RDBMS Mainframes Flat files Binary files Source Systems Metadata repositories Metadata Management solution Extract/ Read metadata Data Ingestion Data Quality and Validation Layout Standardization Operational Metadata Generation Data at Rest Data Acquisition Automation •  Automated Data Acquisition Framework providing timeliness of data •  Capture Metadata in all phases: Ingestion, Transformation •  Integration with Enterprise Metadata Management •  Integrated Data Quality Analysis Zaloni Proprietary 19
  • 20. Getting Started Roadmap Prototype Analytics Strategy Business drivers AND Business Questions: Where is fraud occurring? How to optimize inventory? Data Use Cases Platform Subject areas Source system Capabilities, Process Ingest, Organize, Enrich, Explore Roadmap Prototype Analytics Strategy 1Questions 2 Inputs 3 Outcomes Zaloni Proprietary 20 + + =
  • 21. Stop by booth #1335 and ask for a copy of our new book and a free t-shirt! DON’T GO IN THE DATA LAKE WITHOUT US Zaloni Proprietary