How Data Virtualization Puts
Enterprise Machine Learning
Programs into Production
Chris Day
Director, APAC Sales Engineering
cday@denodo.com
2 9 S E P T E M B E R 2 0 2 0
Agenda1. What are Advanced Analytics?
2. The Data Challenge
3. The Rise of Logical Data Architectures
4. Tackling the Data Pipeline Problem
5. Customer Stories
6. Key Takeaways
7. Q&A
8. Next Steps
4
Advanced Analytics & Machine Learning Exercises Need Data
Improving Patient
Outcomes
Data includes patient demographics,
family history, patient vitals, lab test
results, claims data etc.
Predictive Maintenance
Maintenance data logs, data coming in
from sensors – including temperature,
running time, power level duration etc.
Predicting Late Payment
Data includes company or individual
demographics, payment history,
customer support logs etc.
Preventing Frauds
Data includes the location where the
claim originated, time of the day,
claimant history and any recent adverse
events.
Reducing Customer Churn
Data includes customer demographics,
products purchased, products used, pat
transaction, company size, history,
revenue etc.
5
VentureBeat AI, July 2019
87% of data science projects never make it
into production.
7
McCormick Uses Denodo to Provide Data to Its AI Project
Background
§ McCormick’s AI and machine learning based project required data
that was stored in internal systems spread across 4 different
continents and in spreadsheets.
§ Portions of data in the internal systems and spreadsheets that
were shared with McCormick's research partner firms needed to be
masked and at the same time unmasked when shared internally.
§ McCormick wanted to create a data service that could simplify the
process of data access and data sharing across the organisation
and be used by the analytics teams for their machine learning
projects.
8
• Data Quality
• Multiple Brands
• Which Data to Use?
9
McCormick – Multi-purpose Platform
Solution Highlights
§ Agile Data Delivery
§ High Level of Reuse
§ Single Discovery & Consumption
Platform
10
Data Virtualization Benefits for McCormick
§ Machine learning and applications were able to
access refreshed, validated and indexed data in
real time, without replication, from Denodo
enterprise data service.
§ The Denodo enterprise data service gave the
business users the capability to compare data in
multiple systems.
§ Spreadsheets now the exception.
§ Ensure the quality of proposed data and services.
Logical Data Warehouse
12
Gartner, Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs, May 2018
“When designed properly, Data Virtualization can speed data integration, lower data
latency, offer flexibility and reuse, and reduce data sprawl across
dispersed data sources. Due to its many benefits, data virtualization is often the first
step for organizations evolving a traditional, repository-style data
warehouse into a Logical Architecture.”
13
Logical Data Warehouse Reference Architecture
ETL
Data Warehouse
Kafka
Physical Data Lake
Machine
Learning
SQL
interface
Logical Data Warehouse
Streaming
Analytics
Distributed Storage
Files
14
Why A Logical Architecture Is Needed
ü The analytical technology landscape has shifted over time.
ü You need a flexible architecture that will allow you to embrace those shifts rather
than tie you down to a monolithic approach.
ü Only a logical architecture will easily accommodate such changes, and not a
physical architecture.
ü IT should be able to adopt newer technologies without impacting business users.
Tackling the Data Pipeline Problem
16
Typical Data Science Workflow
A typical workflow for a data scientist is:
1. Gather the requirements for the business problem
2. Identify useful data
§ Ingest data
3. Cleanse data into a useful format
4. Analyze data
5. Prepare input for your algorithms
6. Execute data science algorithms (ML, AI, etc.)
§ Iterate steps 2 to 6 until valuable insights are
produced
7. Visualize and share
Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
17
Where Does Your Time Go?
• 80% of time – Finding and
preparing the data
• 10% of time – Analysis
• 10% of time – Visualizing data
Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
18
Where Does Your Time Go?
A large amount of time and effort goes into tasks not intrinsically related to data science:
• Finding where the right data may be
• Getting access to the data
§ Bureaucracy
§ Understand access methods and technology (noSQL, REST APIs, etc.)
• Transforming data into a format easy to work with
• Combining data originally available in different sources and formats
• Profile and cleanse data to eliminate incomplete or inconsistent data points
19
Data Scientist Workflow
Identify useful
data
Modify datainto
auseful format
Analyzedata Executedata
science algorithms
(ML,AI, etc.)
Prepare for
MLalgorithm
20
Identify Useful Data
If the company has a virtual layer with a good coverage of
data sources, this task is greatly simplified.
§ A data virtualization tool like Denodo can offer
unified access to all data available in the company.
§ It abstracts the technologies underneath, offering a
standard SQL interface to query and manipulate.
To further simplify the challenge, Denodo offers a Data
Catalog to search, find and explore your data assets.
21
Data Scientist Workflow
Identify useful
data
Modify datainto
auseful format
Analyzedata Executedata
science algorithms
(ML,AI, etc.)
Prepare for
MLalgorithm
22
Data Virtualization offers the unique opportunity of
using standard SQL (joins, aggregations,
transformations, etc.) to access, manipulate and
analyze any data.
Cleansing and transformation steps can be easily
accomplished in SQL.
Its modeling capabilities enable the definition of views
that embed this logic to foster reusability.
Ingestion And Data Manipulation Tasks
23
Prologis Launches Data Analytics Program for Cost Optimization
Background
§ Create a single governed data access layer to create
reusable and consistent analytical assets that could be used
by the rest of the business teams to run their own analytics.
§ Save time for data scientists in finding , transforming and
analysing data sets without having to learn new skills and
create data models that could be refreshed on demand.
§ Efficiently maintain its new data architecture with minimum
downtime and configuration management.
Prologis is the largest industrial real estate
company in the world, serving 5000 customers
in over 20 countries and USD 87 billion in
assets under management.
24
Prologis Architecture Diagram
wc_monthly_occupancy_rpt_f wc_lease_amendment_d w_day_d wc_property_d
MARKET_AVAILABILITY_CURRENT MARKET_AVAILABILITY_FUTURE
Prologis
SnowFlake
API
Access
Informatica
Cloud
ShareHouse
ODBC JDBC
peoplesoft_gl_actuals yardi_unit_leasing p360_property
WAF
AWS Lambda APIs
25
Data Virtualization Benefits Experienced by Prologis
§ The analytics team was able to create business focussed subject areas with
consistent data sets that were 30% faster in speed to analytics.
§ Denodo made it possible for Prologis to quick start advanced analytics projects.
§ The Denodo platform’s deployment was as easy as a click of a button with
centralized configuration management. This simplified Prologis’s data architecture
and also helped bring down the overall maintenance cost.
26
ü Denodo can play key role in the data science ecosystem to reduce data
exploration and analysis timeframes.
ü Extends and integrates with the capabilities of notebooks, Python, R, etc.
to improve the toolset of the data scientist.
ü Provides a modern “SQL-on-Anything” engine.
ü Can leverage Big Data technologies like Spark (as a data source, an
ingestion tool and for external processing) to efficiently work with large
data volumes.
ü New and expanded tools for data scientists and citizen analysts: “Apache
Zeppelin for Denodo” Notebook.
Data Virtualization Benefits for AI and Machine Learning Projects
More Information?
28
Next Steps
Access Denodo Platform in the Cloud
Take a Test Drive today!
bit.ly/32rJ8JQ
GET STARTED TODAY
29
D E N O D O V I R T U A L L U N C H & L E A R N A S E A N :
Respond Quickly in a Crisis
with a Logical Data Layer
Thursday, 15 October 2020
12.00pm - 1.30pm SGT
REGISTER YOUR INTEREST
bit.ly/2Ro9PZF
Elaine Chan
Regional Vice President,
Sales, ASEAN & Korea
Chris Day
Director,
APAC Sales Engineering
Bridging the Last Mile: Getting Data to the People
Who Need It
Thursday, 22 October 2020 | 10.00am SGT | 1.00pm AEDT REGISTER HERE
bit.ly/3hn6eWs
Chris Day
Director, APAC Sales Engineering, Denodo
Sushant Kumar
Product Marketing Manager, Denodo
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

How Data Virtualization Puts Enterprise Machine Learning Programs into Production (ASEAN)

  • 1.
    How Data VirtualizationPuts Enterprise Machine Learning Programs into Production Chris Day Director, APAC Sales Engineering cday@denodo.com 2 9 S E P T E M B E R 2 0 2 0
  • 2.
    Agenda1. What areAdvanced Analytics? 2. The Data Challenge 3. The Rise of Logical Data Architectures 4. Tackling the Data Pipeline Problem 5. Customer Stories 6. Key Takeaways 7. Q&A 8. Next Steps
  • 3.
    4 Advanced Analytics &Machine Learning Exercises Need Data Improving Patient Outcomes Data includes patient demographics, family history, patient vitals, lab test results, claims data etc. Predictive Maintenance Maintenance data logs, data coming in from sensors – including temperature, running time, power level duration etc. Predicting Late Payment Data includes company or individual demographics, payment history, customer support logs etc. Preventing Frauds Data includes the location where the claim originated, time of the day, claimant history and any recent adverse events. Reducing Customer Churn Data includes customer demographics, products purchased, products used, pat transaction, company size, history, revenue etc.
  • 4.
    5 VentureBeat AI, July2019 87% of data science projects never make it into production.
  • 5.
    7 McCormick Uses Denodoto Provide Data to Its AI Project Background § McCormick’s AI and machine learning based project required data that was stored in internal systems spread across 4 different continents and in spreadsheets. § Portions of data in the internal systems and spreadsheets that were shared with McCormick's research partner firms needed to be masked and at the same time unmasked when shared internally. § McCormick wanted to create a data service that could simplify the process of data access and data sharing across the organisation and be used by the analytics teams for their machine learning projects.
  • 6.
    8 • Data Quality •Multiple Brands • Which Data to Use?
  • 7.
    9 McCormick – Multi-purposePlatform Solution Highlights § Agile Data Delivery § High Level of Reuse § Single Discovery & Consumption Platform
  • 8.
    10 Data Virtualization Benefitsfor McCormick § Machine learning and applications were able to access refreshed, validated and indexed data in real time, without replication, from Denodo enterprise data service. § The Denodo enterprise data service gave the business users the capability to compare data in multiple systems. § Spreadsheets now the exception. § Ensure the quality of proposed data and services.
  • 9.
  • 10.
    12 Gartner, Adopt theLogical Data Warehouse Architecture to Meet Your Modern Analytical Needs, May 2018 “When designed properly, Data Virtualization can speed data integration, lower data latency, offer flexibility and reuse, and reduce data sprawl across dispersed data sources. Due to its many benefits, data virtualization is often the first step for organizations evolving a traditional, repository-style data warehouse into a Logical Architecture.”
  • 11.
    13 Logical Data WarehouseReference Architecture ETL Data Warehouse Kafka Physical Data Lake Machine Learning SQL interface Logical Data Warehouse Streaming Analytics Distributed Storage Files
  • 12.
    14 Why A LogicalArchitecture Is Needed ü The analytical technology landscape has shifted over time. ü You need a flexible architecture that will allow you to embrace those shifts rather than tie you down to a monolithic approach. ü Only a logical architecture will easily accommodate such changes, and not a physical architecture. ü IT should be able to adopt newer technologies without impacting business users.
  • 13.
    Tackling the DataPipeline Problem
  • 14.
    16 Typical Data ScienceWorkflow A typical workflow for a data scientist is: 1. Gather the requirements for the business problem 2. Identify useful data § Ingest data 3. Cleanse data into a useful format 4. Analyze data 5. Prepare input for your algorithms 6. Execute data science algorithms (ML, AI, etc.) § Iterate steps 2 to 6 until valuable insights are produced 7. Visualize and share Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
  • 15.
    17 Where Does YourTime Go? • 80% of time – Finding and preparing the data • 10% of time – Analysis • 10% of time – Visualizing data Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
  • 16.
    18 Where Does YourTime Go? A large amount of time and effort goes into tasks not intrinsically related to data science: • Finding where the right data may be • Getting access to the data § Bureaucracy § Understand access methods and technology (noSQL, REST APIs, etc.) • Transforming data into a format easy to work with • Combining data originally available in different sources and formats • Profile and cleanse data to eliminate incomplete or inconsistent data points
  • 17.
    19 Data Scientist Workflow Identifyuseful data Modify datainto auseful format Analyzedata Executedata science algorithms (ML,AI, etc.) Prepare for MLalgorithm
  • 18.
    20 Identify Useful Data Ifthe company has a virtual layer with a good coverage of data sources, this task is greatly simplified. § A data virtualization tool like Denodo can offer unified access to all data available in the company. § It abstracts the technologies underneath, offering a standard SQL interface to query and manipulate. To further simplify the challenge, Denodo offers a Data Catalog to search, find and explore your data assets.
  • 19.
    21 Data Scientist Workflow Identifyuseful data Modify datainto auseful format Analyzedata Executedata science algorithms (ML,AI, etc.) Prepare for MLalgorithm
  • 20.
    22 Data Virtualization offersthe unique opportunity of using standard SQL (joins, aggregations, transformations, etc.) to access, manipulate and analyze any data. Cleansing and transformation steps can be easily accomplished in SQL. Its modeling capabilities enable the definition of views that embed this logic to foster reusability. Ingestion And Data Manipulation Tasks
  • 21.
    23 Prologis Launches DataAnalytics Program for Cost Optimization Background § Create a single governed data access layer to create reusable and consistent analytical assets that could be used by the rest of the business teams to run their own analytics. § Save time for data scientists in finding , transforming and analysing data sets without having to learn new skills and create data models that could be refreshed on demand. § Efficiently maintain its new data architecture with minimum downtime and configuration management. Prologis is the largest industrial real estate company in the world, serving 5000 customers in over 20 countries and USD 87 billion in assets under management.
  • 22.
    24 Prologis Architecture Diagram wc_monthly_occupancy_rpt_fwc_lease_amendment_d w_day_d wc_property_d MARKET_AVAILABILITY_CURRENT MARKET_AVAILABILITY_FUTURE Prologis SnowFlake API Access Informatica Cloud ShareHouse ODBC JDBC peoplesoft_gl_actuals yardi_unit_leasing p360_property WAF AWS Lambda APIs
  • 23.
    25 Data Virtualization BenefitsExperienced by Prologis § The analytics team was able to create business focussed subject areas with consistent data sets that were 30% faster in speed to analytics. § Denodo made it possible for Prologis to quick start advanced analytics projects. § The Denodo platform’s deployment was as easy as a click of a button with centralized configuration management. This simplified Prologis’s data architecture and also helped bring down the overall maintenance cost.
  • 24.
    26 ü Denodo canplay key role in the data science ecosystem to reduce data exploration and analysis timeframes. ü Extends and integrates with the capabilities of notebooks, Python, R, etc. to improve the toolset of the data scientist. ü Provides a modern “SQL-on-Anything” engine. ü Can leverage Big Data technologies like Spark (as a data source, an ingestion tool and for external processing) to efficiently work with large data volumes. ü New and expanded tools for data scientists and citizen analysts: “Apache Zeppelin for Denodo” Notebook. Data Virtualization Benefits for AI and Machine Learning Projects
  • 25.
  • 26.
    28 Next Steps Access DenodoPlatform in the Cloud Take a Test Drive today! bit.ly/32rJ8JQ GET STARTED TODAY
  • 27.
    29 D E NO D O V I R T U A L L U N C H & L E A R N A S E A N : Respond Quickly in a Crisis with a Logical Data Layer Thursday, 15 October 2020 12.00pm - 1.30pm SGT REGISTER YOUR INTEREST bit.ly/2Ro9PZF Elaine Chan Regional Vice President, Sales, ASEAN & Korea Chris Day Director, APAC Sales Engineering
  • 28.
    Bridging the LastMile: Getting Data to the People Who Need It Thursday, 22 October 2020 | 10.00am SGT | 1.00pm AEDT REGISTER HERE bit.ly/3hn6eWs Chris Day Director, APAC Sales Engineering, Denodo Sushant Kumar Product Marketing Manager, Denodo
  • 29.
    Thanks! www.denodo.com info@denodo.com © CopyrightDenodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.