Watch full webinar here: https://bit.ly/3xj6fnm
Presented at Chief Data Officer Live 2021 A/NZ
The world is changing faster than ever. And for companies to compete and succeed they need to be agile in order to respond quickly to market changes and emerging opportunities. Data plays an integral role in achieving this business agility. However, given the complex nature of the enterprise data architecture finding and analysing data is an increasingly challenging task. Data virtualization is a modern data integration technique that integrates data in real-time, without having to physically replicate it.
Watch on-demand this session to understand what data virtualization is and how it:
- Delivers data in real-time, and without replication
- Creates a logical architecture to provide a single view of truth
- Centralises the data governance and security framework
- Democratises data for faster decision making and business agility
2. 3
Advanced Analytics & Machine Learning Projects Need Data
Improving patient
outcomes
• Data includes patient
demographics, family history,
patient vitals, lab test results,
claims data etc.
Predictive maintenance
• Maintenance data logs, data
coming in from sensors –
including temperature, running
time, power level duration etc.
Predicting late payment
• Data includes company or
individual demographics,
payment history, customer
support logs etc.
Preventing frauds
• Data includes the location
where the claim originated,
time of the day, claimant
history and any recent adverse
events.
Reducing customer churn
• Data includes customer
demographics , products
purchased, products used, pat
transaction, company size,
history, revenue etc.
8. 9
Data Virtualization: Unified Data Integration and Delivery
• Data Abstraction:
decoupling applications
from data sources
• Data Integration without
replication or relocation
of physical data
• Easy Access to Any Data,
high performance and
real-time/ right-time
• Unified metadata, security
& governance across all
data assets
• Dynamic Data Catalog for
self-service data services
and easy discovery
• Data Delivery in any
format with intelligent
query optimization
10. 11
Typical Data Science Workflow
A typical workflow for a data scientist is:
1. Gather the requirements for the business problem
2. Identify useful data
▪ Ingest data
3. Cleanse data into a useful format
4. Analyze data
5. Prepare input for your algorithms
6. Execute data science algorithms (ML, AI, etc.)
▪ Iterate steps 2 to 6 until valuable insights are
produced
7. Visualize and share
Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
11. 12
Where Does Your Time Go?
• 80% of time – Finding and
preparing the data
• 10% of time – Analysis
• 10% of time – Visualizing data
Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
12. 13
Where Does Your Time Go?
A large amount of time and effort goes into tasks not intrinsically related to data science:
• Finding where the right data may be
• Getting access to the data
▪ Bureaucracy
▪ Understand access methods and technology (noSQL, REST APIs, etc.)
• Transforming data into a format easy to work with
• Combining data originally available in different sources and formats
• Profile & cleanse data to eliminate incomplete or inconsistent data points
13. 14
Data Scientist Workflow
Identify useful
data
Modify datainto
auseful format
Analyzedata Executedata
science algorithms
(ML,AI, etc.)
Prepare for
MLalgorithm
14. 15
Identify Useful Data
If the company has a virtual layer with a good coverage of
data sources, this task is greatly simplified.
▪ A data virtualization tool like Denodo can offer
unified access to all data available in the company.
▪ It abstracts the technologies underneath, offering a
standard SQL interface to query and manipulate.
To further simplify the challenge, Denodo offers a Data
Catalog to search, find and explore your data assets.
15. 16
Data Scientist Workflow
Identify useful
data
Modify datainto
auseful format
Analyzedata Execute data
science algorithms
(ML,AI, etc.)
Prepare for
MLalgorithm
16. 17
Data Virtualization offers the unique opportunity of
using standard SQL (joins, aggregations,
transformations, etc.) to access, manipulate and
analyze any data.
Cleansing and transformation steps can be easily
accomplished in SQL.
Its modeling capabilities enable the definition of views
that embed this logic to foster reusability.
Ingestion And Data Manipulation Tasks
17. 18
McCormick Uses Denodo to Provide Data to Its AI Project
Background
▪ McCormick’s AI and machine learning based project required data
that was stored in internal systems spread across 4 different
continents and in spreadsheets.
▪ Portions of data in the internal systems and spreadsheets that
were shared with McCormick's research partner firms needed to be
masked and at the same time unmasked when shared internally.
▪ McCormick wanted to create a data service that could simplify the
process of data access and data sharing across the organisation
and be used by the analytics teams for their machine learning
projects.
19. 20
McCormick – Multi-purpose Platform
Solution Highlights
▪ Agile Data Delivery
▪ High Level of Reuse
▪ Single Discovery & Consumption
Platform
20. 21
Data Virtualization Benefits for McCormick
▪ Machine learning and applications were able to
access refreshed, validated and indexed data in
real time, without replication, from Denodo
enterprise data service.
▪ The Denodo enterprise data service gave the
business users the capability to compare data in
multiple systems.
▪ Spreadsheets now the exception.
▪ Ensure the quality of proposed data and services.
21. 22
✓ Denodo can play key role in the data science ecosystem to reduce data
exploration and analysis timeframes.
✓ Extends and integrates with the capabilities of notebooks, Python, R, etc.
to improve the toolset of the data scientist.
✓ Provides a modern “SQL-on-Anything” engine.
✓ Can leverage Big Data technologies like Spark (as a data source, an
ingestion tool and for external processing) to efficiently work with large
data volumes.
✓ New and expanded tools for data scientists and citizen analysts: “Apache
Zeppelin for Denodo” Notebook.
Data Virtualization Benefits for AI and Machine Learning Projects