Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Learn more at datascience.com | Empower Your Data Scientists
November 7, 2017
Best Practices:
Implementing DataOps with a ...
Learn more at datascience.com | Empower Your Data Scientists
• Evolving data science landscape
• Data growth and impacts
•...
Learn more at datascience.com | Empower Your Data Scientists 3
EVOLVING LANDSCAPE
Learn more at datascience.com | Empower Your Data Scientists
DOING DATA SCIENCE HAS GROWN IN COMPLEXITY
4
Windows OSX Clou...
Learn more at datascience.com | Empower Your Data Scientists
DATA SCIENCE TRENDS: GROWING TEAMS & OPEN SOURCE AS THE NEW
S...
Learn more at datascience.com | Empower Your Data Scientists
DATA SCIENCE PLATFORMS ARE EMERGING CATEGORY BRINGING TOGETHE...
Learn more at datascience.com | Empower Your Data Scientists 7
DATA GROWTH
Learn more at datascience.com | Empower Your Data Scientists
DATA IS THE LEVERAGE POINT FOR COMPETITIVE ADVANTAGE
Learn more at datascience.com | Empower Your Data Scientists
DATA VOLUMES GROWING FASTER THAN MOORE’S LAW
Source: McKinsey...
Learn more at datascience.com | Empower Your Data Scientists
THE VALUE OF DATA
Size
$
Valu
e
Cost
Legacy Value Model
Net
V...
Learn more at datascience.com | Empower Your Data Scientists
WE HAVE PASSED AN INFLECTION POINT
Legacy technology investme...
Learn more at datascience.com | Empower Your Data Scientists 12
DATAOPS
Learn more at datascience.com | Empower Your Data Scientists
DATAOPS: AN AGILE METHODOLOGY FOR DATA-DRIVEN ORGANIZATIONS
1...
Learn more at datascience.com | Empower Your Data Scientists
COMPARING DEVOPS AND DATAOPS: WHAT’S DIFFERENT OR THE SAME?
1...
Learn more at datascience.com | Empower Your Data Scientists
CONTINUOUS MODEL DEPLOYMENT
Data
Engineering
Model
Developmen...
Learn more at datascience.com | Empower Your Data Scientists 16
BEST PRACTICES
Learn more at datascience.com | Empower Your Data Scientists
INDUSTRY LEADING DATA SCIENCE ORGANIZATIONS ADOPTING DATAOPS
...
Learn more at datascience.com | Empower Your Data Scientists 18
DataOps Platform Checklist
Unified platform for all data -...
Learn more at datascience.com | Empower Your Data Scientists 19
Thank you!
Learn more at datascience.com | Empower Your Data Scientists 20
NEW DATAOPS APPROACH FOR DATA SCIENCE TEAMS
DataOps
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Applying Data Quality Best Practices at Big Data Scale
Next
Download to read offline and view in fullscreen.

Share

Best Practices: Implementing DataOps with a Data Science Platform

Download to read offline

With the growing number of data-driven organizations new approaches are needed to drive innovation in scaling and implementing data science. We will discuss how data and data science platforms take advantage of what we are calling DataOps. We will share background on this approach and how it supports putting data science models into production. We will provide best practices and a roadmap on how to implement these techniques to become a leader in machine learning and data science. More: http://info.mapr.com/WB_Implementing-DataOps-BestPractices_Global_DG_17.11.07_RegistrationPage.html

Best Practices: Implementing DataOps with a Data Science Platform

  1. 1. Learn more at datascience.com | Empower Your Data Scientists November 7, 2017 Best Practices: Implementing DataOps with a Data Science Platform
  2. 2. Learn more at datascience.com | Empower Your Data Scientists • Evolving data science landscape • Data growth and impacts • Defining DataOps • DataOps Vs. DevOps • Best practices in applying DataOps • Q&A Agenda 2 Crystal Valentine VP Technology Strategy MapR cvalentine@mapr.com William Merchan CSO DataScience.com william@datascience.com
  3. 3. Learn more at datascience.com | Empower Your Data Scientists 3 EVOLVING LANDSCAPE
  4. 4. Learn more at datascience.com | Empower Your Data Scientists DOING DATA SCIENCE HAS GROWN IN COMPLEXITY 4 Windows OSX Cloud On Prem Laptops Remote Environments Security AWS Google Azure Notebooks Jupyter R Studio Zeppelin Languages Python Scala R SAS Tools Libraries Sharing & Collaboration ? Results Models Chat Email .ppt Code Email Shared Drives Deployments Monitoring Support Logging Style A Logging Style B Tools PMML Flask Lineage and Repeatability ? Data Lake Database Data Inventory Spark PigHive Data ToolsETL Cron Users
  5. 5. Learn more at datascience.com | Empower Your Data Scientists DATA SCIENCE TRENDS: GROWING TEAMS & OPEN SOURCE AS THE NEW STANDARD 5 2017: 2,350,000 data science and analytics job listings* *Source: Kaggle 2017 data science trend report, Burning Glass Quant Crunch Report, Microsoft Revolutions Blog 2017
  6. 6. Learn more at datascience.com | Empower Your Data Scientists DATA SCIENCE PLATFORMS ARE EMERGING CATEGORY BRINGING TOGETHER ESSENTIAL ELEMENTS FOR DATA SCIENCE SCALING 6 CLOUD PROVIDERS ETL & DATA ENGINEERING VERTICAL APPLICATIONS BI & VISUALIZATION TOOLS SECURITY INFRASTRUCTURE LIBRARIESTOOLS DATA PLATFORMS DATA SCIENCE PLATFORMS
  7. 7. Learn more at datascience.com | Empower Your Data Scientists 7 DATA GROWTH
  8. 8. Learn more at datascience.com | Empower Your Data Scientists DATA IS THE LEVERAGE POINT FOR COMPETITIVE ADVANTAGE
  9. 9. Learn more at datascience.com | Empower Your Data Scientists DATA VOLUMES GROWING FASTER THAN MOORE’S LAW Source: McKinsey Global Institute 20101987 1.2 Zettabytes of Data 3 Exabytes of Data Data Diversity 2020 44 Zettabytes of Data EmailsCall Detail Records Click stream CSV DocumentsData PDFBilling Data Meta Data JSON Network Data Mobile Data XMLProduct Catalog Medical Records Text Files VideoText Messages Merchant Listings Sensor Data Server Logs Set Top Box Social Media Audio
  10. 10. Learn more at datascience.com | Empower Your Data Scientists THE VALUE OF DATA Size $ Valu e Cost Legacy Value Model Net Value Size $ Valu e Next-Gen Value Model Cost Net Value OPT OPT
  11. 11. Learn more at datascience.com | Empower Your Data Scientists WE HAVE PASSED AN INFLECTION POINT Legacy technology investmentNext-Gen technology investment Source: IDC, Gartner; Analysis & Estimates: MapR Next-gen consists of cloud, big data, software and hardware related expenses $ (millions) INVESTMENT IN NEXT-GEN VS. LEGACY TECHNOLOGIES FOR DATA Total $ growth of IT market 90% of data is on next-gen technology by 2020
  12. 12. Learn more at datascience.com | Empower Your Data Scientists 12 DATAOPS
  13. 13. Learn more at datascience.com | Empower Your Data Scientists DATAOPS: AN AGILE METHODOLOGY FOR DATA-DRIVEN ORGANIZATIONS 13 Axioms: 1. Data is central to disruptive enterprise applications a. Lightweight, stateless functions do not represent the majority of workloads 2. Data science and machine learning are an important paradigm a. Scientists become active users -- no longer just application developers b. Iterative workflow with different data usage patterns 3. Data volumes continue to grow 4. Moving data is a performance bottleneck DataOps Goals: • Continuous model deployment • Promote repeatability • Promote productivity -- focus on core competencies • Promote agility • Promote self-service
  14. 14. Learn more at datascience.com | Empower Your Data Scientists COMPARING DEVOPS AND DATAOPS: WHAT’S DIFFERENT OR THE SAME? 14 Developers & Architects Data Engineers Data Scientists Security & Governance Operations DataOps DevOps DataOps
  15. 15. Learn more at datascience.com | Empower Your Data Scientists CONTINUOUS MODEL DEPLOYMENT Data Engineering Model Development Model Management Model Deployment Model Monitoring & Rescoring Key Building Blocks for Agility: 1) Unified data platform 2) Data governance 3) Self-service data and compute access 4) Multitenancy and resource management
  16. 16. Learn more at datascience.com | Empower Your Data Scientists 16 BEST PRACTICES
  17. 17. Learn more at datascience.com | Empower Your Data Scientists INDUSTRY LEADING DATA SCIENCE ORGANIZATIONS ADOPTING DATAOPS Versioning Platform approach Team makeup and organization Self service
  18. 18. Learn more at datascience.com | Empower Your Data Scientists 18 DataOps Platform Checklist Unified platform for all data -- historical and real-time production Multitenancy and resource utilization Single security and access model for governance and self-service access Enterprise-grade for mission-critical applications and open source tools Run compute on the data platform -- leverage data locality
  19. 19. Learn more at datascience.com | Empower Your Data Scientists 19 Thank you!
  20. 20. Learn more at datascience.com | Empower Your Data Scientists 20 NEW DATAOPS APPROACH FOR DATA SCIENCE TEAMS DataOps
  • qadria

    Dec. 1, 2020
  • ssuserce170b

    Jan. 27, 2020
  • OdaraThongsamouth

    Nov. 27, 2019
  • hmesha

    Apr. 16, 2019
  • mathieudumoulin2

    Feb. 15, 2019
  • DaphnDeTroch

    Jan. 26, 2019
  • SaddamZEMMALI

    Jan. 2, 2018
  • benjdunn

    Dec. 19, 2017

With the growing number of data-driven organizations new approaches are needed to drive innovation in scaling and implementing data science. We will discuss how data and data science platforms take advantage of what we are calling DataOps. We will share background on this approach and how it supports putting data science models into production. We will provide best practices and a roadmap on how to implement these techniques to become a leader in machine learning and data science. More: http://info.mapr.com/WB_Implementing-DataOps-BestPractices_Global_DG_17.11.07_RegistrationPage.html

Views

Total views

3,260

On Slideshare

0

From embeds

0

Number of embeds

360

Actions

Downloads

238

Shares

0

Comments

0

Likes

8

×