SlideShare a Scribd company logo
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Damon Feldman, Ph.D
@damon.feldman
http://www.marklogic.com/blog/author/dfeldman/
Data Lake, Virtual Database, or Data Hub
How to Choose?
SLIDE: 2 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Who am I?
• Solutions Director at MarkLogic
• About 8 years in the Big Data and Data Integration space
• Previously, in OOP, JEE worlds
• Focus on Data Hub and Customer or Person-360o systems
SLIDE: 3 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
But Why?
• Data Silos
• Usually work well for a single, operational
purpose
• Turn any cross-line-of-business question
into a data integration effort
SLIDE: 4 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
How about EDW
• For a while, Enterprise Data Warehouses were the go-to solution for silos
• One master schema to rule them
• Data Modeler’s Dream!
• Implementors Nightmare!
• BMUF
• Rigid and tightly coupled
SLIDE: 5 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Incompatibililties
• Three forms of data incompatibilities
• Naming is the simplest
• firstName vs. GIVEN_NAME
• Structural is somewhat harder
• Semantic differences are the most challenging
• Status: {in cart, ordered, shipped, delivered}
• Status: {selected, paid, complete}
PERSON
- PERS_ID
- DOB
- FNAME
- LNAME
PERS_ADDR_REL
- PERS_ID
- ADDR_ID
ADDRESS
- ADDR_ID
- LINE1
- CITY
- ZIP
- TYPE: {US, UK}
PERSON
- PERS_ID
- DOB
- FNAME
- LNAME
- ADDR_L1
- ADDR_CITY
- ADDR_ZIP
- ADDR_MAILING_L1
- ADDR_MAILING_ZIP
SLIDE: 6 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Three New Approaches
• Data Lakes
• Put it all somewhere else
• Virtual Databases (AKA Federated Databases)
• Pretend it is somewhere else
• Data Hubs
• Put it all somewhere else, Harmonize, and Index it for operational use
And a Framework to understand and choose approaches
SLIDE: 7 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
A Use Case
Consider a customer churn use case
 Review high-value customers
 .. Who are at-risk customers
 .. Particularly if they are dropping or cancelling services
 Proactively address their trouble tickets or complaints.
Customer Lifetime Value
$$$ $ $$
Customer Support
!@#&!!%! !@#
Order/Change/Drop
 ↑ 😠😠↓
Need
more …
please
upgrade
…
Abysmal…
dissatisfied
…
SLIDE: 8 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Lakes
• Copy the data to a new infrastructure
• Typically Hadoop, but perhaps MarkLogic or other NoSQL
• Difficult with SQL because many sources  Load “as-is”
• Operational Separation
Copy
Process
Support
CLV
Orders
DATA LAKE
Data is Moved to one place,
but still in varied structures
BI/Analytics
SLIDE: 9 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Virtual Database
• Query everything in real time
• Transparent to the caller
• True real-time
• Data is not Moved or Harmonized (except in memory during processing)
Support
CLV
Orders
Data Remains in
source systems
Query Transform
Query Transform
Query Transform
Retain/intervene
Churn Analysis
Reporting
Query
Conversion
Data
Harmonization
SLIDE: 10 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Hubs
• Copy as with a Data Lake
• Harmonize and Index
• Regular structures for analytics, reporting, consumption
• Indexes atop the common structures
Copy
Support
CLV
Orders
DATA HUB
Data is Moved to one place
Also Harmonized and Indexed
Harmonize BI/Analytics
Consumer
Consumer
Consumers
SLIDE: 11 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Beneath and Beyond the Terms
The terms are useful, but vague, and don’t tell us what works for our next project
Consider all these approaches in terms of:
• Movement
• Harmonization
• Indexing
SLIDE: 12 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Movement
• Data Movement is copying data to new, physical storage so it can be accessed via
new servers and processes
• Operational Separation
• Organizational Separation
Orders System
Retain / Intervene
Churn Analysis
Reporting
Sales Department IT
SLIDE: 13 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Movement and the Three Approaches
• Data Lakes are all but defined by Movement
• Operational and Organizational separation
• Virtual Databases - unique in not Moving data
• Load is pushed to the source systems
• Backup, HA/DR, Security implemented on all source systems
• Data Hubs also Move data
SLIDE: 14 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Harmonization
• Recall: Three forms of data incompatibility
• Naming
• Structural
• Semantic
PERSON
- PERS_ID
- DOB
- FNAME
- LNAME
PERS_ADDR_REL
- PERS_ID
- ADDR_ID
ADDRESS
- ADDR_ID
- LINE1
- CITY
- ZIP
- TYPE: {US, UK}
PERSON
- PERS_ID
- DOB
- FNAME
- LNAME
- ADDR_L1
- ADDR_CITY
- ADDR_ZIP
- ADDR_MAILING_L1
- ADDR_MAILING_ZIP
SLIDE: 15 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Harmonization
• Harmonization is mapping into a common structure for key data elements
• Eventually, data must be consumed, aggregated and analyzed in a common form
Orders System
 $1400 equipment order
 £ 270/month – 36 month contract
 Exchange Rate: 1.28
Maintenance/trouble tickets
 Network upgrade needed
 Projected cost $3,000
Customer Expected Net Revenue
Oren Wilkins $4,280
Sarah Ravnick $17,200
David Perez …
SLIDE: 16 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Person
Harmonized
Name
Address
DoB
Source
Eye color
Height
Credit Risk
Data Harmonization
• Harmonization is the “value add” in the process
• The earlier the better for maximum use
• Store it
• Index it
• Yet BMUF fails often
• Progressive Harmonization
Person
Harmonized
Name
Address
DoB
Source
Eye color
Height
Credit Risk
Person
Fname
Lname
BIRTH
PHYSATTR
PHYSATTR
Person
Given-name
Family-name
Eye-color
Demographics
DOB
Person
Harmonized
Name
Address
DoB
EyeColor
Height
Source
Credit Risk
Iteration 1 Iteration 2
SLIDE: 17 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Harmonization and the Approaches
• Data Lakes don’t Harmonize
• Harmonization is pushed downstream, or implicit in the jobs
• Often ETL copies from format to format (particularly in Hadoop)
• Virtual Databases Harmonize in real time
• Each source query and result is harmonized in memory
• Pushes the load to the source systems
• Data Hubs Harmonize and Persist
• Explicit storage and management of Harmonized data
• Governable
Data Lake
Job 1 Job 2
Silo 1
Silo 2
Query
Data Lake Data Hub
SLIDE: 19 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Indexing
“Who Said Databases Weren’t a Good Idea?”
- Ken Krupa, Enterprise CTO, MarkLogic
• Indexing is a decision to make something fast
 Finding, totaling, sorting, grouping, correlating, analyzing
 Sometimes also accessing
• Less obviously
 Caching and memory use
 Reference data usage
SLIDE: 20 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Indexing Benefits
• Advance from Batch to Operational
• Micro-service or SOA architectures
• find the latest address
• A 360o summary record of a customer
• Human Services: reviewing FSA recipients – interactive dashboard
• “Run your business”
SLIDE: 21 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Three Approaches Revisited – Virtual Databases
Issues
• Least-common-denominator Query
• Paradox: more systems = less power
• Coupling to source systems – schema change = broken DB
• Weakest link problem - HA/DR, overload
• Complexity
• Paging, sorting, relevance, dealing with a down federate
Benefit
• Real Time is easy
• May be ok for small or initial systems
SLIDE: 22 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Three Approaches Revisited - Data Lakes
Issues
• Still need to Harmonize the data
• Typically in every batch job, ETL (PIG/HIVE) job, query, analysis
• Risk of the “Data Swamp”
• Batch focus
• In-memory helps, but still batch
• Frankenbeast workarounds create more silos, rather than solving the problem
Benefit
• The data is moved
• Storage is cheap
• One team and process to add functionality
SLIDE: 23 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Three Approaches Revisited – Data Hubs
Data Hubs - Advantages
• Most powerful solution – all of: Movement, Harmonization, Indexing
• “Run your business”
• Indexing builds on Harmonization
• Harmonization is the value add, so index it!
• Grow by regularizing, not by complicating
• More data sources to the Harmonized form
• Progressive Harmonization to increase the Harmonized data elements
• HA/DR, scale, security, query power, batch efficiency, governance
Tradeoffs
• Dedicated hardware
• Change detection or data push needed for real-time
SLIDE: 24 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Lake vs Data Hub
” The fact is, you don't put everything into a datastore and
then go looking for something to do.”
- Ted Dunning, MapR Chief Applications Architect
Data Hubs are Operational and “Purpose-driven”
Use case  API  Progressive Harmonization  Data Integration
The do not merely have Harmonized data and Indexes, they are about serving
Harmonized data and indexes to drive them.
SLIDE: 25 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Value Over Time
Time, Evolution, Range of Data
ROI
Data Lake
Data Hub
Virtual Database0
SLIDE: 26 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Evaluating MarkLogic with the Three Criteria
SLIDE: 27 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic Operational Data Hub Pattern
Some say: “A Data Lake and EDW are better together”
Translation: ”This Data Lake is not doing a very good job, and never will”
 MarkLogic brings database/data warehouse functions into the Data Lake
making it “Operational” and a “Data Hub” by virtue of Harmonization and
Indexing
 but not by trying to build a (smaller) EDW
SLIDE: 28 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic for Operational Data Hubs
• MarkLogic supports all three paradigms
• Our product direction, consulting team, experience are focused on Data Hubs
• MarkLogic is a database
• Allowing an “Operational Data Hub”
• Run your business AND observe your business
• One place for the latest data – address, income, account status, health
• Integrated data for 360o views
SLIDE: 29 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic ODH Features - Movement
• Ingest data “as-is”
• Native support for JSON, XML, Binary, RDF, Text, SQL, Geo
• Data Loading tools for MPP batch ingest
• Index latent structure in each
• Commodity hardware, commodity disk
• Tiered storage for cost effective storage
SLIDE: 30 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Operational Data Hub Pattern in MarkLogic
HARMONIZE
INGEST
Enveloped
Documents
(Entity 1)
SERVE
Enveloped
Documents
(Entity 2)
RDBMS
Source 1
Documents
Message Bus
Content Feed
Data Flow
Staging
Raw, As-is data
Final
Harmonized, Indexed dataSource
Systems
Consuming
Applications
Source 2
Documents
Source N
Documents
… …
Enveloped
Documents
(Entity N)
Operational Apps
Analysis/BI
Data Feeds
Discovery, Harmonization Indexes, Query, Servies
SLIDE: 31 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic ODH Features - Harmonization
• Best in class data Transform capabilities
• XSLT, XQuery implemented to spec from the ground up
• JavaScript via V8 engine
• Triggers, data extraction from binaries, MPP processing
• Multi-modal processing of many data formats
• Ontology processing – RDFS, OWL
SLIDE: 32 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic ODH Features - Indexing
• MarkLogic is built on the “Universal Index”
• Text, document structure, fields, text and security in one index
• Columnar range indexes for analysis and SQL processing
• Triple index for RDF, SPARQL and semantic query
• Geospatial index
• Projection operations to expose one structure (e.g. JSON or XML) as SQL or RDF
• Operational vs. purely analytical. You can run your business on MarkLogic
SLIDE: 33 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Summary
• Data Lakes and Hubs are on a continuum
• Primarily distinguished by level of indexing
• Virtual databases are a very different animal – and not usually in a good way
• Within each pattern, Movement, Harmonization and Indexing are knobs to turn
• Movement – for isolation and data access
• Harmonization – for micro-services and value-add
• Indexing – for speed and operational use cases
• Consider your goals and requirements, and plan accordingly
SLIDE: 34 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
More Info
MarkLogic Data Hub Framework (quick start): https://marklogic.github.io/marklogic-data-hub/
MarkLogic Data Hub information: http://www.marklogic.com/solutions/operational-data-hub/
Damon’s blog on data lakes: http://www.marklogic.com/blog/data-lakes-data-hubs-federation-one-best/
Follow damon on twitter: https://twitter.com/damonfeldman

More Related Content

What's hot

The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Caserta
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)
James Serra
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Cloudera, Inc.
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
MapR Technologies
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
DATAVERSITY
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
Bui Ha
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Caserta
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
Michael Stephenson
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
MetroStar
 
NoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreNoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value Store
DATAVERSITY
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
Jeffrey T. Pollock
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
Zaloni
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
Ricky Barron
 

What's hot (20)

The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
NoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreNoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value Store
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 

Viewers also liked

Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
Denodo
 
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
Yahoo!デベロッパーネットワーク
 
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data LakeManaging a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
DataWorks Summit/Hadoop Summit
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Cambridge Semantics
 
AWS 유안타증권 HPC 적용사례 :: 유안타 증권 추정호 박사 :: AWS Finance Seminar
AWS 유안타증권 HPC 적용사례 :: 유안타 증권 추정호 박사 :: AWS Finance SeminarAWS 유안타증권 HPC 적용사례 :: 유안타 증권 추정호 박사 :: AWS Finance Seminar
AWS 유안타증권 HPC 적용사례 :: 유안타 증권 추정호 박사 :: AWS Finance SeminarAmazon Web Services Korea
 
オープンソースとコミュニティによる価値の創造
オープンソースとコミュニティによる価値の創造オープンソースとコミュニティによる価値の創造
オープンソースとコミュニティによる価値の創造
Rakuten Group, Inc.
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
shrey mehrotra
 
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
Cloudera Japan
 
ブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short verブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short ver
尚行 坂井
 
Enterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataEnterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big Data
Cloudera, Inc.
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
FlyData Inc.
 
The Future of Application integration
The Future of Application integrationThe Future of Application integration
The Future of Application integration
Richard Seroter
 
Awsでつくるapache kafkaといろんな悩み
Awsでつくるapache kafkaといろんな悩みAwsでつくるapache kafkaといろんな悩み
Awsでつくるapache kafkaといろんな悩み
Keigo Suda
 
SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係
datastaxjp
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
DATAVERSITY
 
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
feedforce (株式会社フィードフォース)
 
リクルートにおけるデータのインフラ化への取組
リクルートにおけるデータのインフラ化への取組リクルートにおけるデータのインフラ化への取組
リクルートにおけるデータのインフラ化への取組
Recruit Technologies
 

Viewers also liked (20)

Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
 
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
 
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data LakeManaging a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
 
AWS 유안타증권 HPC 적용사례 :: 유안타 증권 추정호 박사 :: AWS Finance Seminar
AWS 유안타증권 HPC 적용사례 :: 유안타 증권 추정호 박사 :: AWS Finance SeminarAWS 유안타증권 HPC 적용사례 :: 유안타 증권 추정호 박사 :: AWS Finance Seminar
AWS 유안타증권 HPC 적용사례 :: 유안타 증권 추정호 박사 :: AWS Finance Seminar
 
オープンソースとコミュニティによる価値の創造
オープンソースとコミュニティによる価値の創造オープンソースとコミュニティによる価値の創造
オープンソースとコミュニティによる価値の創造
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
マルチテナント化に向けたHadoopの最新セキュリティ事情 #hcj2014
 
ブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short verブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short ver
 
Enterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataEnterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big Data
 
これがCassandra
これがCassandraこれがCassandra
これがCassandra
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
 
The Future of Application integration
The Future of Application integrationThe Future of Application integration
The Future of Application integration
 
Awsでつくるapache kafkaといろんな悩み
Awsでつくるapache kafkaといろんな悩みAwsでつくるapache kafkaといろんな悩み
Awsでつくるapache kafkaといろんな悩み
 
SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係
 
はじめての Elastic Beanstalk
はじめての Elastic Beanstalkはじめての Elastic Beanstalk
はじめての Elastic Beanstalk
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
 
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
Data Feed Landscape 1.0 - データフィード版 カオスマップ (日本)
 
リクルートにおけるデータのインフラ化への取組
リクルートにおけるデータのインフラ化への取組リクルートにおけるデータのインフラ化への取組
リクルートにおけるデータのインフラ化への取組
 

Similar to Data Lake, Virtual Database, or Data Hub - How to Choose?

Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
Precisely
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
A New Way of Thinking About MDM
A New Way of Thinking About MDMA New Way of Thinking About MDM
A New Way of Thinking About MDM
DATAVERSITY
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
DataWorks Summit
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
South West Data Meetup
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Journey to the Cloud: What I Wish I Knew Before I Started
 Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
Datavail
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Precisely
 
Key Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresKey Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to Postgres
EDB
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Northeastern DB Class Introduction to Marklogic NoSQL april 2016Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Matt Turner
 
Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
Datavail
 

Similar to Data Lake, Virtual Database, or Data Hub - How to Choose? (20)

Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
A New Way of Thinking About MDM
A New Way of Thinking About MDMA New Way of Thinking About MDM
A New Way of Thinking About MDM
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Journey to the Cloud: What I Wish I Knew Before I Started
 Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Key Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresKey Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to Postgres
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Northeastern DB Class Introduction to Marklogic NoSQL april 2016Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
 
Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Data Lake, Virtual Database, or Data Hub - How to Choose?

  • 1. © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Damon Feldman, Ph.D @damon.feldman http://www.marklogic.com/blog/author/dfeldman/ Data Lake, Virtual Database, or Data Hub How to Choose?
  • 2. SLIDE: 2 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Who am I? • Solutions Director at MarkLogic • About 8 years in the Big Data and Data Integration space • Previously, in OOP, JEE worlds • Focus on Data Hub and Customer or Person-360o systems
  • 3. SLIDE: 3 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. But Why? • Data Silos • Usually work well for a single, operational purpose • Turn any cross-line-of-business question into a data integration effort
  • 4. SLIDE: 4 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. How about EDW • For a while, Enterprise Data Warehouses were the go-to solution for silos • One master schema to rule them • Data Modeler’s Dream! • Implementors Nightmare! • BMUF • Rigid and tightly coupled
  • 5. SLIDE: 5 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Incompatibililties • Three forms of data incompatibilities • Naming is the simplest • firstName vs. GIVEN_NAME • Structural is somewhat harder • Semantic differences are the most challenging • Status: {in cart, ordered, shipped, delivered} • Status: {selected, paid, complete} PERSON - PERS_ID - DOB - FNAME - LNAME PERS_ADDR_REL - PERS_ID - ADDR_ID ADDRESS - ADDR_ID - LINE1 - CITY - ZIP - TYPE: {US, UK} PERSON - PERS_ID - DOB - FNAME - LNAME - ADDR_L1 - ADDR_CITY - ADDR_ZIP - ADDR_MAILING_L1 - ADDR_MAILING_ZIP
  • 6. SLIDE: 6 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Three New Approaches • Data Lakes • Put it all somewhere else • Virtual Databases (AKA Federated Databases) • Pretend it is somewhere else • Data Hubs • Put it all somewhere else, Harmonize, and Index it for operational use And a Framework to understand and choose approaches
  • 7. SLIDE: 7 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Use Case Consider a customer churn use case  Review high-value customers  .. Who are at-risk customers  .. Particularly if they are dropping or cancelling services  Proactively address their trouble tickets or complaints. Customer Lifetime Value $$$ $ $$ Customer Support !@#&!!%! !@# Order/Change/Drop  ↑ 😠😠↓ Need more … please upgrade … Abysmal… dissatisfied …
  • 8. SLIDE: 8 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Lakes • Copy the data to a new infrastructure • Typically Hadoop, but perhaps MarkLogic or other NoSQL • Difficult with SQL because many sources  Load “as-is” • Operational Separation Copy Process Support CLV Orders DATA LAKE Data is Moved to one place, but still in varied structures BI/Analytics
  • 9. SLIDE: 9 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Virtual Database • Query everything in real time • Transparent to the caller • True real-time • Data is not Moved or Harmonized (except in memory during processing) Support CLV Orders Data Remains in source systems Query Transform Query Transform Query Transform Retain/intervene Churn Analysis Reporting Query Conversion Data Harmonization
  • 10. SLIDE: 10 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Hubs • Copy as with a Data Lake • Harmonize and Index • Regular structures for analytics, reporting, consumption • Indexes atop the common structures Copy Support CLV Orders DATA HUB Data is Moved to one place Also Harmonized and Indexed Harmonize BI/Analytics Consumer Consumer Consumers
  • 11. SLIDE: 11 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Beneath and Beyond the Terms The terms are useful, but vague, and don’t tell us what works for our next project Consider all these approaches in terms of: • Movement • Harmonization • Indexing
  • 12. SLIDE: 12 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Movement • Data Movement is copying data to new, physical storage so it can be accessed via new servers and processes • Operational Separation • Organizational Separation Orders System Retain / Intervene Churn Analysis Reporting Sales Department IT
  • 13. SLIDE: 13 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Movement and the Three Approaches • Data Lakes are all but defined by Movement • Operational and Organizational separation • Virtual Databases - unique in not Moving data • Load is pushed to the source systems • Backup, HA/DR, Security implemented on all source systems • Data Hubs also Move data
  • 14. SLIDE: 14 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Harmonization • Recall: Three forms of data incompatibility • Naming • Structural • Semantic PERSON - PERS_ID - DOB - FNAME - LNAME PERS_ADDR_REL - PERS_ID - ADDR_ID ADDRESS - ADDR_ID - LINE1 - CITY - ZIP - TYPE: {US, UK} PERSON - PERS_ID - DOB - FNAME - LNAME - ADDR_L1 - ADDR_CITY - ADDR_ZIP - ADDR_MAILING_L1 - ADDR_MAILING_ZIP
  • 15. SLIDE: 15 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Harmonization • Harmonization is mapping into a common structure for key data elements • Eventually, data must be consumed, aggregated and analyzed in a common form Orders System  $1400 equipment order  £ 270/month – 36 month contract  Exchange Rate: 1.28 Maintenance/trouble tickets  Network upgrade needed  Projected cost $3,000 Customer Expected Net Revenue Oren Wilkins $4,280 Sarah Ravnick $17,200 David Perez …
  • 16. SLIDE: 16 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Person Harmonized Name Address DoB Source Eye color Height Credit Risk Data Harmonization • Harmonization is the “value add” in the process • The earlier the better for maximum use • Store it • Index it • Yet BMUF fails often • Progressive Harmonization Person Harmonized Name Address DoB Source Eye color Height Credit Risk Person Fname Lname BIRTH PHYSATTR PHYSATTR Person Given-name Family-name Eye-color Demographics DOB Person Harmonized Name Address DoB EyeColor Height Source Credit Risk Iteration 1 Iteration 2
  • 17. SLIDE: 17 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Harmonization and the Approaches • Data Lakes don’t Harmonize • Harmonization is pushed downstream, or implicit in the jobs • Often ETL copies from format to format (particularly in Hadoop) • Virtual Databases Harmonize in real time • Each source query and result is harmonized in memory • Pushes the load to the source systems • Data Hubs Harmonize and Persist • Explicit storage and management of Harmonized data • Governable Data Lake Job 1 Job 2 Silo 1 Silo 2 Query Data Lake Data Hub
  • 18. SLIDE: 19 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Indexing “Who Said Databases Weren’t a Good Idea?” - Ken Krupa, Enterprise CTO, MarkLogic • Indexing is a decision to make something fast  Finding, totaling, sorting, grouping, correlating, analyzing  Sometimes also accessing • Less obviously  Caching and memory use  Reference data usage
  • 19. SLIDE: 20 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Indexing Benefits • Advance from Batch to Operational • Micro-service or SOA architectures • find the latest address • A 360o summary record of a customer • Human Services: reviewing FSA recipients – interactive dashboard • “Run your business”
  • 20. SLIDE: 21 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Three Approaches Revisited – Virtual Databases Issues • Least-common-denominator Query • Paradox: more systems = less power • Coupling to source systems – schema change = broken DB • Weakest link problem - HA/DR, overload • Complexity • Paging, sorting, relevance, dealing with a down federate Benefit • Real Time is easy • May be ok for small or initial systems
  • 21. SLIDE: 22 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Three Approaches Revisited - Data Lakes Issues • Still need to Harmonize the data • Typically in every batch job, ETL (PIG/HIVE) job, query, analysis • Risk of the “Data Swamp” • Batch focus • In-memory helps, but still batch • Frankenbeast workarounds create more silos, rather than solving the problem Benefit • The data is moved • Storage is cheap • One team and process to add functionality
  • 22. SLIDE: 23 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Three Approaches Revisited – Data Hubs Data Hubs - Advantages • Most powerful solution – all of: Movement, Harmonization, Indexing • “Run your business” • Indexing builds on Harmonization • Harmonization is the value add, so index it! • Grow by regularizing, not by complicating • More data sources to the Harmonized form • Progressive Harmonization to increase the Harmonized data elements • HA/DR, scale, security, query power, batch efficiency, governance Tradeoffs • Dedicated hardware • Change detection or data push needed for real-time
  • 23. SLIDE: 24 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Lake vs Data Hub ” The fact is, you don't put everything into a datastore and then go looking for something to do.” - Ted Dunning, MapR Chief Applications Architect Data Hubs are Operational and “Purpose-driven” Use case  API  Progressive Harmonization  Data Integration The do not merely have Harmonized data and Indexes, they are about serving Harmonized data and indexes to drive them.
  • 24. SLIDE: 25 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Value Over Time Time, Evolution, Range of Data ROI Data Lake Data Hub Virtual Database0
  • 25. SLIDE: 26 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Evaluating MarkLogic with the Three Criteria
  • 26. SLIDE: 27 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic Operational Data Hub Pattern Some say: “A Data Lake and EDW are better together” Translation: ”This Data Lake is not doing a very good job, and never will”  MarkLogic brings database/data warehouse functions into the Data Lake making it “Operational” and a “Data Hub” by virtue of Harmonization and Indexing  but not by trying to build a (smaller) EDW
  • 27. SLIDE: 28 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic for Operational Data Hubs • MarkLogic supports all three paradigms • Our product direction, consulting team, experience are focused on Data Hubs • MarkLogic is a database • Allowing an “Operational Data Hub” • Run your business AND observe your business • One place for the latest data – address, income, account status, health • Integrated data for 360o views
  • 28. SLIDE: 29 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic ODH Features - Movement • Ingest data “as-is” • Native support for JSON, XML, Binary, RDF, Text, SQL, Geo • Data Loading tools for MPP batch ingest • Index latent structure in each • Commodity hardware, commodity disk • Tiered storage for cost effective storage
  • 29. SLIDE: 30 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Operational Data Hub Pattern in MarkLogic HARMONIZE INGEST Enveloped Documents (Entity 1) SERVE Enveloped Documents (Entity 2) RDBMS Source 1 Documents Message Bus Content Feed Data Flow Staging Raw, As-is data Final Harmonized, Indexed dataSource Systems Consuming Applications Source 2 Documents Source N Documents … … Enveloped Documents (Entity N) Operational Apps Analysis/BI Data Feeds Discovery, Harmonization Indexes, Query, Servies
  • 30. SLIDE: 31 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic ODH Features - Harmonization • Best in class data Transform capabilities • XSLT, XQuery implemented to spec from the ground up • JavaScript via V8 engine • Triggers, data extraction from binaries, MPP processing • Multi-modal processing of many data formats • Ontology processing – RDFS, OWL
  • 31. SLIDE: 32 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic ODH Features - Indexing • MarkLogic is built on the “Universal Index” • Text, document structure, fields, text and security in one index • Columnar range indexes for analysis and SQL processing • Triple index for RDF, SPARQL and semantic query • Geospatial index • Projection operations to expose one structure (e.g. JSON or XML) as SQL or RDF • Operational vs. purely analytical. You can run your business on MarkLogic
  • 32. SLIDE: 33 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Summary • Data Lakes and Hubs are on a continuum • Primarily distinguished by level of indexing • Virtual databases are a very different animal – and not usually in a good way • Within each pattern, Movement, Harmonization and Indexing are knobs to turn • Movement – for isolation and data access • Harmonization – for micro-services and value-add • Indexing – for speed and operational use cases • Consider your goals and requirements, and plan accordingly
  • 33. SLIDE: 34 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. More Info MarkLogic Data Hub Framework (quick start): https://marklogic.github.io/marklogic-data-hub/ MarkLogic Data Hub information: http://www.marklogic.com/solutions/operational-data-hub/ Damon’s blog on data lakes: http://www.marklogic.com/blog/data-lakes-data-hubs-federation-one-best/ Follow damon on twitter: https://twitter.com/damonfeldman