Patrocina Colabora
DataOps
El ciclo de despliegue continuo
en el análisis de datos
Olivier Perard| Data Scientist en Oracle
DataOps
Definitions
VP Technology Strategy, MapR
DataOps is an agile methodology for developing and deploying data-intensive
applications, including data science and machine learning. A DataOps workflow supports
cross-functional collaboration and fast time to value.
http://www.gartner.com/it-glossary/data-ops/
A hub for collecting and distributing data, with a mandate to provide controlled access to systems
of record for customer and marketing performance data, while protecting privacy, usage
restrictions, and data integrity..
Tamr CEO Andy Palmer
DataOps is an enterprise collaboration framework that aligns data-management
objectives with data-consumption ideals to maximize data-derived value.
Nexla CEO
DataOps is the function within an organization that controls the data journey from
source to value.
DataOps
Gartner
Data & Analytics Summit 2018
DataOps, la plataforma de base de datos de nube privada como servicio (dbPaaS) y la gestión de
datos habilitados para el aprendizaje automático.
DataOps es una nueva práctica sin estándares ni frameworks
Nick Heudecker, vicepresidente de investigación de Gartner
COMPARING
DEVOPS AND
DATAOPS
WHAT’S DIFFERENT OR THE
SAME?
Developers &
Architects
Data Engineers
Data
Scientists
Security &
Governance
Operations
DataOps
DevOps DataOps
DataOps
Brings Flexibility & Focus
Expands DevOps to include data-heavy roles
Organized around data-related goals
Better collaboration and communication between roles
DataOps
AN AGILE METHODOLOGY
FOR DATA-DRIVEN
ORGANIZATIONS
AXIOMS:
Continuous model deployment
Promote repeatability
Promote productivity -- focus on core competencies
Promote agility
Promote self-service
Data is central to disruptive enterprise applications
• Lightweight, stateless functions do not represent the majority of workloads
Data science and machine learning are an important paradigm
• Scientists become active users -- no longer just application developers
• Iterative workflow with different data usage patterns
Data volumes continue to grow
Moving data is a performance bottleneck
DataOps Goals:
DataOps 7
Analyze and VisualizeStore and ProcessConnect and Integrate
Structured
Data
Unstructured
Data
1010101
01010 Sandboxes
Data lakes
Varying data
types
Quick and actionable
business insights
Focus on algorithms,
not infrastructure
Data available from
structured and
unstructured sources
Data marts / warehouses
DATA PLATFORM DATA Stream DATA ANALYTICS
Data Science
Platforms CLOUD PROVIDERS
ETL & DATA
ENGINEERING VERTICAL
APPLICATIONS
BI & VISUALIZATION
TOOLS
SECURIT
Y
INFRASTRUCTU
RE
LIBRARIE
S
TOOL
S
DATA PLATFORMS
DATA SCIENCE PLATFORMS
DataOps
Approach Advantages
Data Self-Service
• Data Scientists need to develop Use Cases
quickly using the enterprise’s data without
any restrictions from IT.
Improved efficiency and better use of Team’s time
• Deploy Analytic platform in one click
Faster Time-to-Value
Improve productivity
• Implement use cases in parallel using the
same data, but with dedicated platforms to
each analytic teams. Storage
Compute
LIBRARI
ES
TOO
LS
DATA SCIENCE
PLATFORMS
DataOps
Continuous Model
Deployment
Key Building Blocks for Agility:
• Unified data platform
• Data governance
• Self-service data and compute access
• Multitenancy and resource management
Data
Engineering
Model
Development
Model
Management
Model
Deployment
Model Monitoring &
Rescoring
DataOps
Storage
Compute
Data
Lab
Sand
box
Data
Pod
DataOps
Data Platform Deployment
Oracle GitHub OCI Ansible Modules
Oracle Database 12c
Jupyter
Zeppelin OML
1
2
Data Integration
CDC / ETL
3
Data Lab
DataOps
Data-Driven Architecture
Traditional and Modern
Legacy, Custom, Mainframe, SaaS, Microservices, …
Source: Oracle Insight
Data Platform
Analytics
• Advanced Analytics
• Self-service
• Predictive
Data Science
• Machine Learning
• Deep Learning
Modern Data
Platform
Security & Compliance
X Data
Applications
Real-time Analytics
• Real-time
Marketing
• Fraud detection • Exec
Dashboarding
Real-time
Real-time Services
{OOP}
SparklineData
• Accessing multiple source of data
(Technologies, Silos/Locations,
Clouds) …
• … with high performances …
• … for broader Cross Multi-model
queries/algorithms on real-time
data as well as historical data
Applications
BigData SQL
DataOps
Cloud Native & Open Source
Community
Artificial
Intelligence Block Chain Internet of
Things
Container Native Microservices
Open Serverless Computing DevOps
Prometeus
Open Source
Cloud Native
Innovation
Open Source
Cloud Native
Development
ISTIO
Cloud-Native and Community Driven Innovation
Open Source Managed and Autonomous Cloud Native
DataOps
Data Stream
Data Preparation
Data Replication
Data ETLLogs
Oracle Cloud Infrastructure
Analytics
Consumers
Data Platform
BI
NL / AI
Data Integration
CDC / ETL
Discovering Structuring Cleaning Enriching Validating Deploying
DataOps
Data Stream
Lineage
Pipeline
Quality
Speed
Efficiency
Oracle Data
Science
Data Science Requires a Comprehensive Platform to Simplify Operations
and Deliver Value at Scale
• Accelerate use of proper tools, frameworks and infrastructure
• Overcome restricted skillsets with a simple, collaborative platform
• Quickly leverage predictive analytics to drive positive business outcomes
Collaborate
securely
Power
business
Work in standardized
environments
A Robust, Easy-to-Use Data Science Platform Removes Barriers to
Deploying Valuable Machine Learning Models in Production
Manage data
and tools
Oracle Data
Science
Projects LifeCycle
Reproducibility
Data
Versioning
Code
Versioning
Model
Versioning
Environment
Management
Model Deployment
Operationalize Models as
Scalable APIs
Model Management
Monitor and Optimize Model
Performance
Data Exploration
Collaborative Data Analysis /
Feature Engineering
Model Build and
Train
with Open Source
Frameworks
Collaborators
∙ Data Scientists
∙ Business Stakeholders
∙ App Developers
∙ IT Admins
Business
Analyst/Leader
Defining business
problem and
objective of analyses
Data Engineer
Prepare data, build
pipelines, and provide
data access for
analytical or
operational uses.
IT Admin
Oversees underlying
process, architecture,
operations, resource
constraints.
Data Scientist
Analyze data using
statistical methods
and coding languages
like Python, R, Scala
Application
Developer
Deploy data science
models into
applications. Build
data products.
Oracle Data
Science
Modules
Collaborative
Integrated
Enterprise-Grade
Oracle Data Science Cloud
Oracle PaaS & IaaS
Projects Notebooks
Open Source
Languages &
Libraries
Version Control Use Case
Templates
Model
Build & Train
Self-Service Scalable Compute (OCI)
Object
Store
Catalog Data Lake Streaming
Autonomous
Data Warehouse
Model
Deployment
Model
Monitoring
Access
Controls &
Security
Project driven UI enables teams to easily
work together on end-to-end modeling
workflows with self-service access to data
and resources
Support for latest open source tools, version
control, and tight integration with OCI and
Oracle Big Data Platform
A fully managed platform built to meet the
needs of the modern enterprise
Oracle Data
Science
Environment complexity
Oracle Data
Science
Configure, Train & Deploy
Oracle PaaS
Language
Image
Video
HREmotion
Easy Deployment
3
Deploy
Model
Train
Data
Definitio
n
Model
Test
Publish
API
Data
Select
Code
Noteboo
k
2
Train
• Frameworks
• AI libraries
• Samples
• GPU clusters
• Connect to data
• Auto scale, updates
• HS network, storage
•Object Stores
•Database CS
•Spark
Easy Data Access
+
1
Configure
Autonomous
Setup
Model Sharing Model Library APIsModel Analytics
IT Persona
DevOps
Data Scientist
Data Scientist
Easy Development
Easy setup
Oracle Data
Science
Build & Train
DEV
TEST
PROD
Oracle Data
Science
Deploy
DEV
TEST
PROD
DataOps
Conclusiones
Multi-Model Data Access
Interoperability
Data preparation and pipeline
Automation
Elasticity
Multidimensional agility
Automated governance
Next Generation
Platform for
All Data
Complete,
Integrated, Open
AI and Machine
Learning
ALL IN ONE
ORACLE PROVIDES
Patrocina Colabora
Muchas Gracias
Olivier Perard
https://twitter.com/oracle_es?lang=es

DevOps Spain 2019. Olivier Perard-Oracle

  • 1.
    Patrocina Colabora DataOps El ciclode despliegue continuo en el análisis de datos Olivier Perard| Data Scientist en Oracle
  • 2.
    DataOps Definitions VP Technology Strategy,MapR DataOps is an agile methodology for developing and deploying data-intensive applications, including data science and machine learning. A DataOps workflow supports cross-functional collaboration and fast time to value. http://www.gartner.com/it-glossary/data-ops/ A hub for collecting and distributing data, with a mandate to provide controlled access to systems of record for customer and marketing performance data, while protecting privacy, usage restrictions, and data integrity.. Tamr CEO Andy Palmer DataOps is an enterprise collaboration framework that aligns data-management objectives with data-consumption ideals to maximize data-derived value. Nexla CEO DataOps is the function within an organization that controls the data journey from source to value.
  • 3.
    DataOps Gartner Data & AnalyticsSummit 2018 DataOps, la plataforma de base de datos de nube privada como servicio (dbPaaS) y la gestión de datos habilitados para el aprendizaje automático. DataOps es una nueva práctica sin estándares ni frameworks Nick Heudecker, vicepresidente de investigación de Gartner
  • 4.
    COMPARING DEVOPS AND DATAOPS WHAT’S DIFFERENTOR THE SAME? Developers & Architects Data Engineers Data Scientists Security & Governance Operations DataOps DevOps DataOps
  • 5.
    DataOps Brings Flexibility &Focus Expands DevOps to include data-heavy roles Organized around data-related goals Better collaboration and communication between roles
  • 6.
    DataOps AN AGILE METHODOLOGY FORDATA-DRIVEN ORGANIZATIONS AXIOMS: Continuous model deployment Promote repeatability Promote productivity -- focus on core competencies Promote agility Promote self-service Data is central to disruptive enterprise applications • Lightweight, stateless functions do not represent the majority of workloads Data science and machine learning are an important paradigm • Scientists become active users -- no longer just application developers • Iterative workflow with different data usage patterns Data volumes continue to grow Moving data is a performance bottleneck DataOps Goals:
  • 7.
    DataOps 7 Analyze andVisualizeStore and ProcessConnect and Integrate Structured Data Unstructured Data 1010101 01010 Sandboxes Data lakes Varying data types Quick and actionable business insights Focus on algorithms, not infrastructure Data available from structured and unstructured sources Data marts / warehouses DATA PLATFORM DATA Stream DATA ANALYTICS
  • 8.
    Data Science Platforms CLOUDPROVIDERS ETL & DATA ENGINEERING VERTICAL APPLICATIONS BI & VISUALIZATION TOOLS SECURIT Y INFRASTRUCTU RE LIBRARIE S TOOL S DATA PLATFORMS DATA SCIENCE PLATFORMS
  • 9.
    DataOps Approach Advantages Data Self-Service •Data Scientists need to develop Use Cases quickly using the enterprise’s data without any restrictions from IT. Improved efficiency and better use of Team’s time • Deploy Analytic platform in one click Faster Time-to-Value Improve productivity • Implement use cases in parallel using the same data, but with dedicated platforms to each analytic teams. Storage Compute LIBRARI ES TOO LS DATA SCIENCE PLATFORMS
  • 10.
    DataOps Continuous Model Deployment Key BuildingBlocks for Agility: • Unified data platform • Data governance • Self-service data and compute access • Multitenancy and resource management Data Engineering Model Development Model Management Model Deployment Model Monitoring & Rescoring
  • 11.
  • 12.
    DataOps Data Platform Deployment OracleGitHub OCI Ansible Modules Oracle Database 12c Jupyter Zeppelin OML 1 2 Data Integration CDC / ETL 3 Data Lab
  • 13.
    DataOps Data-Driven Architecture Traditional andModern Legacy, Custom, Mainframe, SaaS, Microservices, … Source: Oracle Insight Data Platform Analytics • Advanced Analytics • Self-service • Predictive Data Science • Machine Learning • Deep Learning Modern Data Platform Security & Compliance X Data Applications Real-time Analytics • Real-time Marketing • Fraud detection • Exec Dashboarding Real-time Real-time Services {OOP} SparklineData • Accessing multiple source of data (Technologies, Silos/Locations, Clouds) … • … with high performances … • … for broader Cross Multi-model queries/algorithms on real-time data as well as historical data Applications BigData SQL
  • 14.
    DataOps Cloud Native &Open Source Community Artificial Intelligence Block Chain Internet of Things Container Native Microservices Open Serverless Computing DevOps Prometeus Open Source Cloud Native Innovation Open Source Cloud Native Development ISTIO Cloud-Native and Community Driven Innovation Open Source Managed and Autonomous Cloud Native
  • 15.
    DataOps Data Stream Data Preparation DataReplication Data ETLLogs Oracle Cloud Infrastructure Analytics Consumers Data Platform BI NL / AI Data Integration CDC / ETL Discovering Structuring Cleaning Enriching Validating Deploying
  • 16.
  • 17.
    Oracle Data Science Data ScienceRequires a Comprehensive Platform to Simplify Operations and Deliver Value at Scale • Accelerate use of proper tools, frameworks and infrastructure • Overcome restricted skillsets with a simple, collaborative platform • Quickly leverage predictive analytics to drive positive business outcomes Collaborate securely Power business Work in standardized environments A Robust, Easy-to-Use Data Science Platform Removes Barriers to Deploying Valuable Machine Learning Models in Production Manage data and tools
  • 18.
    Oracle Data Science Projects LifeCycle Reproducibility Data Versioning Code Versioning Model Versioning Environment Management ModelDeployment Operationalize Models as Scalable APIs Model Management Monitor and Optimize Model Performance Data Exploration Collaborative Data Analysis / Feature Engineering Model Build and Train with Open Source Frameworks Collaborators ∙ Data Scientists ∙ Business Stakeholders ∙ App Developers ∙ IT Admins Business Analyst/Leader Defining business problem and objective of analyses Data Engineer Prepare data, build pipelines, and provide data access for analytical or operational uses. IT Admin Oversees underlying process, architecture, operations, resource constraints. Data Scientist Analyze data using statistical methods and coding languages like Python, R, Scala Application Developer Deploy data science models into applications. Build data products.
  • 19.
    Oracle Data Science Modules Collaborative Integrated Enterprise-Grade Oracle DataScience Cloud Oracle PaaS & IaaS Projects Notebooks Open Source Languages & Libraries Version Control Use Case Templates Model Build & Train Self-Service Scalable Compute (OCI) Object Store Catalog Data Lake Streaming Autonomous Data Warehouse Model Deployment Model Monitoring Access Controls & Security Project driven UI enables teams to easily work together on end-to-end modeling workflows with self-service access to data and resources Support for latest open source tools, version control, and tight integration with OCI and Oracle Big Data Platform A fully managed platform built to meet the needs of the modern enterprise
  • 20.
  • 21.
    Oracle Data Science Configure, Train& Deploy Oracle PaaS Language Image Video HREmotion Easy Deployment 3 Deploy Model Train Data Definitio n Model Test Publish API Data Select Code Noteboo k 2 Train • Frameworks • AI libraries • Samples • GPU clusters • Connect to data • Auto scale, updates • HS network, storage •Object Stores •Database CS •Spark Easy Data Access + 1 Configure Autonomous Setup Model Sharing Model Library APIsModel Analytics IT Persona DevOps Data Scientist Data Scientist Easy Development Easy setup
  • 22.
    Oracle Data Science Build &Train DEV TEST PROD
  • 23.
  • 24.
    DataOps Conclusiones Multi-Model Data Access Interoperability Datapreparation and pipeline Automation Elasticity Multidimensional agility Automated governance Next Generation Platform for All Data Complete, Integrated, Open AI and Machine Learning ALL IN ONE ORACLE PROVIDES
  • 25.
    Patrocina Colabora Muchas Gracias OlivierPerard https://twitter.com/oracle_es?lang=es