Watch full webinar here: https://bit.ly/3kVmYJl
As we move into a world driven by AI initiatives, we find ourselves facing new and diverse challenges when it comes to operationalization. Creating a solution and putting it into practice, is certainly not the same. The challenges span various organizational and data facades. In many instances, the data scientists may be working in silos and connecting to the live data may not always be possible. But how does one guarantee their developed model in a silo is still relevant to live data? How can we manage the data flow and data access across the entire AI operationalization cycle?
Watch on-demand to explore:
- The journey and challenges of the Data Scientist
- How Denodo data virtualization with data movement streamlines operationalization
- The best practices and techniques when dealing with siloed data
- How customers have used data virtualization in their data science initiatives
3. Agenda
1. Business Needs in the AI Driven Organizations
2. Enterprise Data Science Lifecycle and Challenges
3. Data Driven Techniques for Successful Deployment
4. Preview of the follow-up Tech-Talk!
5. Q & A
5. 5
Ever-growing collection of
Valuable Data from diverse
sources
Self-Service Initiatives and
expansion of the consumer
and user-base
Cloud Migration and
Infrastructure Modernization
for the Future
Data Discovery and
Collaboration within and
outside the organization
Business Drivers
Why are we talking about Machine Learning, Artificial Intelligence, and Data Science?
6. 6
Data Science Needs DATA!
Improving Patient
Outcomes
Data includes patient demographics,
family history, patient vitals, lab test
results, claims data etc.
Predictive Maintenance
Maintenance data logs, data coming in
from sensors – including temperature,
running time, power level duration etc.
Predicting Late Payment
Data includes company or individual
demographics, payment history,
customer support logs etc.
Preventing Frauds
Data includes the location where the
claim originated, time of the day,
claimant history and any recent adverse
events.
Reducing Customer Churn
Data includes customer demographics,
products purchased, products used, pat
transaction, company size, history,
revenue etc.
Common use-cases across the industry
11. 11
Data Scientist Workflow
Data
Discovery
Data Wrangling Analysis Model Validation &
Execution
Preparation
• Data Scientists spend 80% of their time identifying and getting access to useful data
• Data Consistency is critical, and data copies can cause data isolation and skewed models
12. 12
Data Scientist Workflow
Data
Discovery
Data Wrangling Analysis Model Validation &
Execution
Preparation
• Data Preparation is time consuming and can introduce additional inconsistencies
• Data Governance and Security play a critical role in data access and unification
13. 13
Data Scientist Workflow
Data
Discovery
Data Wrangling Analysis Model Validation &
Execution
Preparation
• Model training and deployment is not a one-off exercise, but iterative process
• Deployment and maintenance of the model is key to operationalization
14. 14
Where do we stand?
✓ We know the business problem and why we need Data Science projects
✓ We have ML models which are working!
✓ We know the challenges with diverse data and various consumers
➢ Needs a solution, traditional pipelines will not scale
✓ We know the importance of getting trusted and curated data to the Data Scientist
➢ Needs a solution, possibly a semantics layer with centralized governance
✓ Are we missing anything?
Yes, operationalization of the Models we have created…
15. 15
Data Science Operationalization
“Data science operationalization is most simply
defined as the application and maintenance of
predictive and prescriptive models. Both clients and
vendors are placing an emphasis on the importance of
moving data science out of a prototype environment
and into a state of production and continuous
improvement.”
https://blogs.gartner.com/peter-krensky/2018/08/01/operationalization-is-the-shibboleth-of-
data-science/
16. 16
Data Science Operationalization - Challenges
“Data science operationalization is most simply
defined as the application and maintenance of
predictive and prescriptive models. Both clients and
vendors are placing an emphasis on the importance of
moving data science out of a prototype environment
and into a state of production and continuous
improvement.”
https://blogs.gartner.com/peter-krensky/2018/08/01/operationalization-is-the-shibboleth-of-
data-science/
▪ Integrate Models with Live and Current data
▪ Continues Model enhancements driven by data
▪ Data consistency across all models and consumers
▪ Implement Governance and Security across teams
18. 18
Vizualisation
ML / AI
Data Science
Data Quality
Getting Data to Consumers
Data Sources
Data Warehouse
noSQL
RDBMS
Governance, Metadata Management, Data Mart
Security
Data Access
Data Virtualization Data Services
19. 19
Data Virtualization for the Enterprise
✓ Virtualize data without data movement guarantees current data for the models
✓ Semantics driven layer ensures data consistency for all consumers and applications
✓ Centralized Governance and Security layer ensures managed data access
✓ ETL/ELT and Data Movement support is critical for Data Science projects
▪ Ability to automate data movement and join it with original data
▪ On-the fly data movement ensures optimized execution
▪ Remote Tables provide for data migration pipelines
20. 20
Data Virtualization for the Enterprise
✓ Virtualize data without data movement guarantees current data for the models
✓ Semantics driven layer ensures data consistency for all consumers and applications
✓ Centralized Governance and Security layer ensures managed data access
✓ ETL/ELT and Data Movement support is critical for Data Science projects
▪ Ability to automate data movement and join it with original data
▪ On-the fly data movement ensures optimized execution
▪ Remote Tables provide for data migration pipelines
21. 21
Denodo Platform – How does virtualization work?
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Hadoop
& NoSQL OLAP Files Apps Streaming SaaS
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
CONNECT
22. 22
Denodo Platform – How does virtualization work?
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Hadoop
& NoSQL OLAP Files Apps Streaming SaaS
Unified
View
Unified
View
Unified
View
Unified
View
A
J
J
Derived
View
Derived
View
J
J
S
Transformation
& Cleansing
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
COMBINE
23. 23
Denodo Platform: Data Virtualization and Semantics
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Hadoop
& NoSQL OLAP Files Apps Streaming SaaS
U
Customer 360
View
Virtual Data
Mart View
J
Unified
View
Unified
View
Unified
View
Unified
View
A
J
J
Derived
View
Derived
View
J
J
S
Transformation
& Cleansing
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
CONSUME
24. 24
Where do we stand?
✓ We know the business problem and why we need Data Science project
✓ We have ML models which are working!
✓ We know the challenges with diverse data and various consumers
➢ Needs a solution, traditional pipelines will not scale
✓ We know the importance of getting trusted and curated data to the Data Scientist
➢ Needs a solution, possibly a semantics layer with centralized governance
➢ SOLUTION: Centralized and Governed Semantics Layer with live data access
25. 25
Data Virtualization for the Enterprise
✓ Virtualize data without data movement guarantees current data for the models
✓ Semantics driven layer ensures data consistency for all consumers and applications
✓ Centralized Governance and Security layer ensures managed data access
✓ ETL/ELT and Data Movement support is critical for Data Science projects
▪ Ability to automate data movement and join it with original data
▪ On-the fly data movement ensures optimized execution
▪ Remote Tables provide for data migration pipelines
26. 26
Denodo Platform - Integrated ETL / ELT Pipelines
▪ Real time logical integration is not always the right answer for all use cases.
▪ Support Integration technique that fits your Enterprise Environment
▪ For those scenarios, Denodo also offer integrated ETL/ELT replication and ingestion pipelines
Create table in any location
Load with data from any other data source
Examples:
▪ Data Lake management
▪ Load data where and when needed
▪ Materialize data in different zones (ELT processing)
▪ Data Science
▪ Move data to Spark after initial analysis for model
creation and training
▪ Cloud and Hybrid Architecture
▪ Replicate and refresh data to cloud system
▪ Data Refresh for external consumers and models
27. 27
Where do we stand?
✓ We know the business problem and why we need Data Science project
✓ We have ML models which are working!
✓ We know the challenges with diverse data and various consumers
➢ Needs a solution, traditional pipelines will not scale
➢ SOLUTION: Flexibility of ETL/ELT/DV enables diverse data access patterns
✓ We know the importance of getting trusted and curated data to the Data Scientist
➢ Needs a solution, possibly a semantics layer with centralized governance
➢ SOLUTION: Centralized and Governed Semantics Layer with live data access
28. 28
Data Virtualization Benefits
✓ Denodo plays a key role in the data science ecosystem to
reduce data exploration and analysis timeframes
✓ Enables governed data access for all the Data Science needs
and other consumer applications
✓ Provides for a curated data sets and semantics driven model
approach to ensure data coherency
✓ Facilitates collaboration across the data community as a
single platform for all data requirements
31. 31
Next Steps and Q&A
Access Denodo Platform in the Cloud!
Try the 30 day Free Trial today in the Cloud Marketplaces
GET STARTED TODAY
• Choice: Under your cloud account
• Support: Community forum AND remote
sales engineer
• Optional: 30 minutes free consultation with
Denodo Cloud specialist
www.denodo.com/free-trials