Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Thoughtworks
We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself.
Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities.
The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics:
•sophisticated analytics techniques, plus
•lean learning principles, plus
•agile delivery methods, plus
•so-called "big data" technologies
Learn:
•The analytical modeling process and techniques
•How analytical models are deployed using modern technologies
•The complexities of data discovery, harvesting, and preparation
•How to apply agile techniques to shorten the analytics development cycle
•How to apply lean learning principles to develop actionable and valuable analytics
•How to apply continuous delivery techniques to operationalize analytical models
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Thoughtworks
We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself.
Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities.
The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics:
•sophisticated analytics techniques, plus
•lean learning principles, plus
•agile delivery methods, plus
•so-called "big data" technologies
Learn:
•The analytical modeling process and techniques
•How analytical models are deployed using modern technologies
•The complexities of data discovery, harvesting, and preparation
•How to apply agile techniques to shorten the analytics development cycle
•How to apply lean learning principles to develop actionable and valuable analytics
•How to apply continuous delivery techniques to operationalize analytical models
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.
This presentation anchors best practices for Enterprise Data Science based on Microsoft's "Team Data Science Process". The talk includes introducing the concepts, describing some real-world advice for project planning, and discusses typical titles of professionals who make enterprise data science successful. These techniques also apply for AI (artificial intelligence), deep learning, machine learning, and advanced analytics.
APPLIED DATA SCIENCE: HET ONTWIKKELEN VAN SLIMME ICT-PRODUCTEN DIE LEREN VAN ...webwinkelvakdag
Als docent bij Fontys ICT zie ik meer en meer studenten die als bijvoorbeeld afstudeeropdracht een stuk software moeten opleveren waarin machine learning moet worden toegepast. Ook hebben we een specialisatierichting Applied Data Science opgezet waarin we studenten leren hoe ze machine learning toepassen in ICT-producten. Daarmee hebben we een hoop kennis verzameld over de best practices bij het ontwikkelen van machine learning applicaties. Daarnaast hebben we een aantal interessante cases om te laten zien wat de toegevoegde waarde van machine learning in applicaties kan zijn. Sinds februari 2019 heb ik die activiteiten voortgezet in een postdoc-onderzoek getiteld: "Applied data science: Ontwikkeling van lCT-producten die leren van data". In dit onderzoek verzamel ik best practices uit het onderwijs, de literatuur en het werkveld om te komen tot een "toolbox" voor software engineers die machine learning applicaties willen bouwen. In deze lezing zal ik ingaan op waarom het ontwikkelen van machine learning applicaties anders is dan traditionele software. Welke methoden, technieken en tools heb je ervoor nodig? Welk proces moet je volgen? We bespreken een aantal concrete projecten om een goed beeld te geven van waar je tegenaan kunt lopen. Na afloop van deze lezing heb je een aantal praktische handvatten om in je eigen softwareontwikkelpraktijk toe te passen.
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...Domino Data Lab
The prevailing issue when working with Operating Room (OR) scheduling within a hospital setting is that it is difficult to schedule and predict available OR block times. This leads to empty and unused operating rooms leading to longer waiting times for patients for their procedures. Using multi-variate linear regression, we will show how they can predict available OR block times using Spark MLlib resulting in better OR utilization and shorter wait times for patients. Presented by Denny Lee, Data Scientist and Evangelist at Databricks.
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellIT Arena
Dr. Nadine Schöne is a Senior Solutions Architect at Dataiku in Berlin. In this role, she deals with all aspects of the data value chain for all users – including integration of data sources, ETL, cooperation, statistics, modelling, but also operationalization, monitoring, automatization and security during production. She regularly talks at conferences, holds webinars and writes articles.
Speech Overview:
How can you get the most out of your data – while staying flexible in your choice of infrastructure and without having to integrate a multitude of tools for the different personas involved? Maximizing the value you get out of your data is a necessity today. Looking at the whole picture as well as careful planning are the key for success. We will have a look at the complete data value chain from end to end: from the data stores, collaboration features, data preparation, visualization and automation capabilities, and external compute to scheduling, operationalization, monitoring and security.
Slow down. Be Human. Building trust across teams with dataMatthew Eng
IBM Design’s mission was to shift how it approached product strategy, but it led to friction between multidisciplinary teams grasping for a unified vision. Learn lessons from assembling a research team that broke bad data analysis habits and started inclusive generative and evaluative techniques.
This presentation was delivered at the Makati Testers Meetup hosted by Sandstone Technology on 4 August 2016.
The information in this presentation and some of the slides are taken directly from James Bach & Michael Bolton’s Rapid Software Testing (RST) class and the notes from that class (which are publicly available from satisfice.com).
This presentation is intended to provide an overview of some ideas presented in that class, I am not claiming any ownership of these ideas.
Q: Can I simply hire one rockstar data scientist to cover all this kind of work?
A: No, interdisciplinary work requires teams
A: Hire leads who can speak the lingo of each required discipline
A: Hire individual contributors who cover 2+ roles, when possible
Statistical Thinking – Solve the Whole Problem
BONUS: Meta Organization – Integration with Adjacent Teams
Co-authors Allen Day @allenday and Paco Nathan @pacoid
How Data Science Builds Better Products - Data Science Pop-up SeattleDomino Data Lab
Data Science and Big Data are ushering in a new era in adaptive applications that learn from large and varied datasets and adjust their features based on the changing environment. This talk will look at how Data Science can be successfully bridged with Big Data Architectures and Agile Software Delivery to create a new class of software that answers the demands of today's rapidly-changing enterprises. Practical techniques and real-world case studies will highlight the approaches required to successfully build these exciting new enterprise tools. Presented by Sean McClure, Ph.D. Data Scientist, Senior Consultant at ThoughtWorks.
How to effectively deliver Data Science projects. This presentation is aimed at improving collaboration and communication between data science and data engineering. With Agile discipline, we could further improve the process of incremental value delivery.
Lean Analytics is a set of rules to make data science more streamlined and productive. It touches on many aspects of what a data scientist should be and how a data science project should be defined to be successful. During this presentation Richard will present where data science projects go wrong, how you should think of data science projects, what constitutes success in data science and how you can measure progress. This session will be loaded with terms, stories and descriptions of project successes and failures. If you're wondering whether you're getting value out of data science, how to get more value out of it and even whether you need it then this talk is for you!
What you will take away from this session
Learn how to make your data science projects successful
Evaluate how to track progress and report on the efficacy of data science solutions
Understand the role of engineering and data scientists
Understand your options for processes and software
In this presentation, Microsoft data scientists Ben Keen and Shahzia Holtom cover an introduction to data science with respect to:
- What is a data scientist?
- What data does a data scientist need?
- AI ethics and responsibility
- What is MLOps and how does it drive value?
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.
This presentation anchors best practices for Enterprise Data Science based on Microsoft's "Team Data Science Process". The talk includes introducing the concepts, describing some real-world advice for project planning, and discusses typical titles of professionals who make enterprise data science successful. These techniques also apply for AI (artificial intelligence), deep learning, machine learning, and advanced analytics.
APPLIED DATA SCIENCE: HET ONTWIKKELEN VAN SLIMME ICT-PRODUCTEN DIE LEREN VAN ...webwinkelvakdag
Als docent bij Fontys ICT zie ik meer en meer studenten die als bijvoorbeeld afstudeeropdracht een stuk software moeten opleveren waarin machine learning moet worden toegepast. Ook hebben we een specialisatierichting Applied Data Science opgezet waarin we studenten leren hoe ze machine learning toepassen in ICT-producten. Daarmee hebben we een hoop kennis verzameld over de best practices bij het ontwikkelen van machine learning applicaties. Daarnaast hebben we een aantal interessante cases om te laten zien wat de toegevoegde waarde van machine learning in applicaties kan zijn. Sinds februari 2019 heb ik die activiteiten voortgezet in een postdoc-onderzoek getiteld: "Applied data science: Ontwikkeling van lCT-producten die leren van data". In dit onderzoek verzamel ik best practices uit het onderwijs, de literatuur en het werkveld om te komen tot een "toolbox" voor software engineers die machine learning applicaties willen bouwen. In deze lezing zal ik ingaan op waarom het ontwikkelen van machine learning applicaties anders is dan traditionele software. Welke methoden, technieken en tools heb je ervoor nodig? Welk proces moet je volgen? We bespreken een aantal concrete projecten om een goed beeld te geven van waar je tegenaan kunt lopen. Na afloop van deze lezing heb je een aantal praktische handvatten om in je eigen softwareontwikkelpraktijk toe te passen.
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...Domino Data Lab
The prevailing issue when working with Operating Room (OR) scheduling within a hospital setting is that it is difficult to schedule and predict available OR block times. This leads to empty and unused operating rooms leading to longer waiting times for patients for their procedures. Using multi-variate linear regression, we will show how they can predict available OR block times using Spark MLlib resulting in better OR utilization and shorter wait times for patients. Presented by Denny Lee, Data Scientist and Evangelist at Databricks.
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellIT Arena
Dr. Nadine Schöne is a Senior Solutions Architect at Dataiku in Berlin. In this role, she deals with all aspects of the data value chain for all users – including integration of data sources, ETL, cooperation, statistics, modelling, but also operationalization, monitoring, automatization and security during production. She regularly talks at conferences, holds webinars and writes articles.
Speech Overview:
How can you get the most out of your data – while staying flexible in your choice of infrastructure and without having to integrate a multitude of tools for the different personas involved? Maximizing the value you get out of your data is a necessity today. Looking at the whole picture as well as careful planning are the key for success. We will have a look at the complete data value chain from end to end: from the data stores, collaboration features, data preparation, visualization and automation capabilities, and external compute to scheduling, operationalization, monitoring and security.
Slow down. Be Human. Building trust across teams with dataMatthew Eng
IBM Design’s mission was to shift how it approached product strategy, but it led to friction between multidisciplinary teams grasping for a unified vision. Learn lessons from assembling a research team that broke bad data analysis habits and started inclusive generative and evaluative techniques.
This presentation was delivered at the Makati Testers Meetup hosted by Sandstone Technology on 4 August 2016.
The information in this presentation and some of the slides are taken directly from James Bach & Michael Bolton’s Rapid Software Testing (RST) class and the notes from that class (which are publicly available from satisfice.com).
This presentation is intended to provide an overview of some ideas presented in that class, I am not claiming any ownership of these ideas.
Q: Can I simply hire one rockstar data scientist to cover all this kind of work?
A: No, interdisciplinary work requires teams
A: Hire leads who can speak the lingo of each required discipline
A: Hire individual contributors who cover 2+ roles, when possible
Statistical Thinking – Solve the Whole Problem
BONUS: Meta Organization – Integration with Adjacent Teams
Co-authors Allen Day @allenday and Paco Nathan @pacoid
How Data Science Builds Better Products - Data Science Pop-up SeattleDomino Data Lab
Data Science and Big Data are ushering in a new era in adaptive applications that learn from large and varied datasets and adjust their features based on the changing environment. This talk will look at how Data Science can be successfully bridged with Big Data Architectures and Agile Software Delivery to create a new class of software that answers the demands of today's rapidly-changing enterprises. Practical techniques and real-world case studies will highlight the approaches required to successfully build these exciting new enterprise tools. Presented by Sean McClure, Ph.D. Data Scientist, Senior Consultant at ThoughtWorks.
How to effectively deliver Data Science projects. This presentation is aimed at improving collaboration and communication between data science and data engineering. With Agile discipline, we could further improve the process of incremental value delivery.
Lean Analytics is a set of rules to make data science more streamlined and productive. It touches on many aspects of what a data scientist should be and how a data science project should be defined to be successful. During this presentation Richard will present where data science projects go wrong, how you should think of data science projects, what constitutes success in data science and how you can measure progress. This session will be loaded with terms, stories and descriptions of project successes and failures. If you're wondering whether you're getting value out of data science, how to get more value out of it and even whether you need it then this talk is for you!
What you will take away from this session
Learn how to make your data science projects successful
Evaluate how to track progress and report on the efficacy of data science solutions
Understand the role of engineering and data scientists
Understand your options for processes and software
In this presentation, Microsoft data scientists Ben Keen and Shahzia Holtom cover an introduction to data science with respect to:
- What is a data scientist?
- What data does a data scientist need?
- AI ethics and responsibility
- What is MLOps and how does it drive value?
From Lab to Factory: Or how to turn data into valuePeadar Coyle
We've all heard of 'big data' or data science, but how do we convert these trends into actual business value. I share case studies, and technology tips and talk about the challenges of the data science process. This is all based on two years of in-the-field research of deploying models, and going from prototypes to production.
These are slides from my talk at PyCon Ireland 2015
NDC Oslo : A Practical Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this?
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
(1) I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
(2) Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
(3) The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
At Monsanto, emerging technologies such as IoT, advanced imaging and geo-spatial platforms; molecular breeding, ancestry and genomics data sets have made us rethink how we approach developing, deploying, scaling and distributing our software to accelerate predictive and prescriptive decisions. We created a Cloud based Data Science platform for the enterprise to address this need. Our primary goals were to perform analytics@scale and integrate analytics with our core product platforms.
As part of this talk, we will be sharing our journey of transformation showing how we enabled: a collaborative discovery analytics environment for data science teams to perform model development, provisioning data through APIs, streams and deploying models to production through our auto-scaling big-data compute in the cloud to perform streaming, cognitive, predictive, prescriptive, historical and batch analytics@scale, integrating analytics with our core product platforms to turn data into actionable insights.
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
Code: https://github.com/markwest1972/titanic
Video: https://vimeo.com/289705893
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all of this?
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Big Data for Data Scientists - Info SessionWeCloudData
In this talk, WeCloudData introduces the Hadoop/Spark ecosystem and how businesses use big data tools and platforms. For more detail about WeCloudData's big data for data scientist course please visit: https://weclouddata.com/data-science/
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd
OSA Con 2022: Scaling your Pandas Analytics with Modin
Doris Lee - Ponder
Pandas is one of the most commonly used data science libraries in Python, with a convenient set of APIs for data cleaning, visualization, analysis, and exploration. However, despite its widespread adoption, Pandas suffers from severe scalability issues on large datasets. We developed the open-source project Modin, which is a fast, scalable drop-in replacement for pandas. Modin has been downloaded more than 4 million times and is used by leading data science teams, including Fortune 100 companies.
Every company eventually encounters that “do-or-die” moment when the product cycle reaches the maturity stage and it’s necessary to pursue innovations. Borys Pratsiuk, Ph.D., Head of R&D Engineering at Ciklum describes how companies across all sectors and industries can use Research and Development as a Service to update and improve their products or services.
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...Docker, Inc.
When you have to say ""YES"" to everything, what does that really mean?
The Boston Consulting Group (BCG) is a global management consulting firm that advises industry leading companies on value creation strategies, innovation, transformation, supply chain management and much more. BCG provides recommendations and analytics that are custom tailored to the needs of each client. To do this BCG is transforming what used to be presentations and manual models into software that is prototyped and distributed to their client to run on their infrastructure - in effect changing the way BCG delivers value.
However the productization process poses unique challenges and opportunities. Their clients have every type of infrastructure and application stack within their IT environments, how can BCG ensure that the custom built analytics applications will operate well at scale in their client's production environment - especially when it is an environment that they don't control? The bespoke nature of the BCG business led the team to embark on an engineering led journey to containerization with Docker. Attend this session to learn more about the approach, challenges and how BCG is enabling transformation with Docker Enterprise Edition.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
Speaker: Venkatesh Umaashankar
LinkedIn: https://www.linkedin.com/in/venkateshumaashankar/
What will be discussed?
What is Data Science?
Types of data scientists
What makes a Data Science Team? Who are its members?
Why does a DS team need Full Stack Developer?
Who should lead the DS Team
Building a Data Science team in a Startup Vs Enterprise
Case studies on:
Evolution Of Airbnb’s DS Team
How Facebook on-boards DS team and trains them
Apple’s Acqui-hiring Strategy to build DS team
Spotify -‘Center of Excellence’ Model
Who should attend?
Managers
Technical Leaders who want to get started with Data Science
Similar to How to deliver effective data science projects (20)
Have you ever wondered how search works while visiting an e-commerce site, internal website, or searching through other types of online resources? Look no further than this informative session on the ways that taxonomies help end-users navigate the internet! Hear from taxonomists and other information professionals who have first-hand experience creating and working with taxonomies that aid in navigation, search, and discovery across a range of disciplines.
This presentation by Morris Kleiner (University of Minnesota), was made during the discussion “Competition and Regulation in Professions and Occupations” held at the Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found out at oe.cd/crps.
This presentation was uploaded with the author’s consent.
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures.
About the Speaker
===============
Diogo Sousa, Engineering Manager @ Canonical
An opinionated individual with an interest in cryptography and its intersection with secure software development.
Acorn Recovery: Restore IT infra within minutesIP ServerOne
Introducing Acorn Recovery as a Service, a simple, fast, and secure managed disaster recovery (DRaaS) by IP ServerOne. A DR solution that helps restore your IT infra within minutes.
This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Orkestra
UIIN Conference, Madrid, 27-29 May 2024
James Wilson, Orkestra and Deusto Business School
Emily Wise, Lund University
Madeline Smith, The Glasgow School of Art
2. Patterns I came across
“We are on the pace
of transforming
ourselves into a tech
company, we must
explore some data
science PoC’s”
“We need to make
better use of our
data, must be
good case for data
science”
“We can automate
many tasks using
machine learning,
lets do a PoC”
“Let’s build a
cool machine
learning model
and take it
business”
6. ! Data Science projects often start as
PoC’s
! Works great to mitigate the hype
! Low cost
The Proof of Concept (PoC) Mode
7. ! Not always business value focussed
! Suffers from ad hoc prioritization
! Expectation mismatch
! Often unclear roadmap/vision
Limited Value with PoC
8. !
!
!
Business first approach
Empower Data team with
product mindset
Focus on reusability (code,
infrastructure)
! Poly-skilled team
! Avoids standalone tabletop
solutions
! Iterate with a vision
! Enables build platform to
support multiple solutions
From PoC to MVP
10. !
!
Data Science projects are Not
Requirement driven (Well, For the
most part)
Data Science projects are not
always Test driven (still very
important to write tests)
! Data Science projects are always
Hypothesis driven
Data
Exploration
Feature
Analysis
Feature
Engineering
BuildEvaluate
Model
Domain
Knowledge
DSLC: The Data Science Life Cycle
16. Getting data, data quality
checks, begin to build the
pipeline
Exploration ,
Experimentation
and Model building
Now that we have data and
certain transformations, Time to
start on underlying computation
framework
Feature engineering,
Experimentation,
and Model building
& evaluation
Build CD, Model
management,integrate
Agile DPLC in action
17. ●A machine learning platform allows rapid
experimentation
●Allows feature sharing between teams
● Model management and versioning
● Faster path to production
●A collaborative and shareable infrastructure
Product Thinking +
Platform Approach
Accelerate with Platform
18. Pay attention to Underlying Math
“I would rather have questions that can't be answered than answers that
can't be questioned.” - Richard P.Feynman, Physicist