How to deliver effective data science projects

•Download as PPTX, PDF•

1 like•482 views

IDEAS - Int'l Data Engineering and Science Association

Aravind Chiruvelli's presentation at IDEAS Dallas Data Science Conference (Lead Data Scientist at ThoughtWorks)

Un-siloing Data Science
Teams
Aravind Chiruvelli, PhD
Lead Data Scientist,
ThoughtWorks

Patterns I came across
“We are on the pace
of transforming
ourselves into a tech
company, we must
explore some data
science PoC’s”
“We need to make
better use of our
data, must be
good case for data
science”
“We can automate
many tasks using
machine learning,
lets do a PoC”
“Let’s build a
cool machine
learning model
and take it
business”

Hindsight Insight Foresight
Dr. Ken Collier,
Director -AgileAnalytics,
ThoughtWorks.
Value from Data

Data is mine Model is mine
Data First Businesses

! Data Science projects often start as
PoC’s
! Works great to mitigate the hype
! Low cost
The Proof of Concept (PoC) Mode

! Not always business value focussed
! Suffers from ad hoc prioritization
! Expectation mismatch
! Often unclear roadmap/vision
Limited Value with PoC

!
!
!
Business first approach
Empower Data team with
product mindset
Focus on reusability (code,
infrastructure)
! Poly-skilled team
! Avoids standalone tabletop
solutions
! Iterate with a vision
! Enables build platform to
support multiple solutions
From PoC to MVP

!
!
Data Science projects are Not
Requirement driven (Well, For the
most part)
Data Science projects are not
always Test driven (still very
important to write tests)
! Data Science projects are always
Hypothesis driven
Data
Exploration
Feature
Analysis
Feature
Engineering
BuildEvaluate
Model
Domain
Knowledge
DSLC: The Data Science Life Cycle

Data
Exploration
Feature
Analysis
Feature
Engineering
BuildEvaluate
Model
But this is PoC
DoMmodaein
Knowled
ge
DSLC: The Data Science Life Cycle

Domain
Knowledge
Productionalize
Model
Data
Exploration
Feature
Analysis
Feature
Engineering
BuildEvaluate
Model
DPLC: The Data Product Life Cycle

Productionalize
Model
Data
Exploration
Feature
Analysis
Feature
Engineering
BuildEvaluate
Model
! Productionalizing advanced analytics is
not an afterthought
! Requires Data Scientists work with Data
engineers
!DomainPushes product thinking for the entire
Kntea
om
wledg! Enables production ready code
e
DPLC: The Data Product Life Cycle

Productionalize
Model
Data
Exploration
Feature
Analysis
Feature
Engineering
BuildEvaluate
Model
Iteration 1
Learn
Measure
Adopt Agile Discipline

Data
Exploration
Feature
Analysis
Feature
Engineering
Build Evaluate
Model
Iteration 1
Iteration 2
Productionalize
Model
Learn
Measure
Adopt Agile Discipline

Getting data, data quality
checks, begin to build the
pipeline
Exploration ,
Experimentation
and Model building
Now that we have data and
certain transformations, Time to
start on underlying computation
framework
Feature engineering,
Experimentation,
and Model building
& evaluation
Build CD, Model
management,integrate
Agile DPLC in action

●A machine learning platform allows rapid
experimentation
●Allows feature sharing between teams
● Model management and versioning
● Faster path to production
●A collaborative and shareable infrastructure
Product Thinking +
Platform Approach
Accelerate with Platform

Pay attention to Underlying Math
“I would rather have questions that can't be answered than answers that
can't be questioned.” - Richard P.Feynman, Physicist

“Data Science is a team sport” -DJ Patil

We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself. Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities. The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics: •sophisticated analytics techniques, plus •lean learning principles, plus •agile delivery methods, plus •so-called "big data" technologies Learn: •The analytical modeling process and techniques •How analytical models are deployed using modern technologies •The complexities of data discovery, harvesting, and preparation •How to apply agile techniques to shorten the analytics development cycle •How to apply lean learning principles to develop actionable and valuable analytics •How to apply continuous delivery techniques to operationalize analytical models

Dataiku data science studio

Norman Poh

Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.

Managing Enterprise Data Science 201904

Mark Tabladillo

This presentation anchors best practices for Enterprise Data Science based on Microsoft's "Team Data Science Process". The talk includes introducing the concepts, describing some real-world advice for project planning, and discusses typical titles of professionals who make enterprise data science successful. These techniques also apply for AI (artificial intelligence), deep learning, machine learning, and advanced analytics.

APPLIED DATA SCIENCE: HET ONTWIKKELEN VAN SLIMME ICT-PRODUCTEN DIE LEREN VAN ...

webwinkelvakdag

Als docent bij Fontys ICT zie ik meer en meer studenten die als bijvoorbeeld afstudeeropdracht een stuk software moeten opleveren waarin machine learning moet worden toegepast. Ook hebben we een specialisatierichting Applied Data Science opgezet waarin we studenten leren hoe ze machine learning toepassen in ICT-producten. Daarmee hebben we een hoop kennis verzameld over de best practices bij het ontwikkelen van machine learning applicaties. Daarnaast hebben we een aantal interessante cases om te laten zien wat de toegevoegde waarde van machine learning in applicaties kan zijn. Sinds februari 2019 heb ik die activiteiten voortgezet in een postdoc-onderzoek getiteld: "Applied data science: Ontwikkeling van lCT-producten die leren van data". In dit onderzoek verzamel ik best practices uit het onderwijs, de literatuur en het werkveld om te komen tot een "toolbox" voor software engineers die machine learning applicaties willen bouwen. In deze lezing zal ik ingaan op waarom het ontwikkelen van machine learning applicaties anders is dan traditionele software. Welke methoden, technieken en tools heb je ervoor nodig? Welk proces moet je volgen? We bespreken een aantal concrete projecten om een goed beeld te geven van waar je tegenaan kunt lopen. Na afloop van deze lezing heb je een aantal praktische handvatten om in je eigen softwareontwikkelpraktijk toe te passen.

Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...

Domino Data Lab

The prevailing issue when working with Operating Room (OR) scheduling within a hospital setting is that it is difficult to schedule and predict available OR block times. This leads to empty and unused operating rooms leading to longer waiting times for patients for their procedures. Using multi-variate linear regression, we will show how they can predict available OR block times using Spark MLlib resulting in better OR utilization and shorter wait times for patients. Presented by Denny Lee, Data Scientist and Evangelist at Databricks.

Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

IT Arena

Dr. Nadine Schöne is a Senior Solutions Architect at Dataiku in Berlin. In this role, she deals with all aspects of the data value chain for all users – including integration of data sources, ETL, cooperation, statistics, modelling, but also operationalization, monitoring, automatization and security during production. She regularly talks at conferences, holds webinars and writes articles. Speech Overview: How can you get the most out of your data – while staying flexible in your choice of infrastructure and without having to integrate a multitude of tools for the different personas involved? Maximizing the value you get out of your data is a necessity today. Looking at the whole picture as well as careful planning are the key for success. We will have a look at the complete data value chain from end to end: from the data stores, collaboration features, data preparation, visualization and automation capabilities, and external compute to scheduling, operationalization, monitoring and security.

Slow down. Be Human. Building trust across teams with data

Matthew Eng

Resume

AnshulAggarwal74

1440 track 2 boire_using our laptop

Rising Media, Inc.

Data science ppt

Alexander Fleming

RST - Makati Testers Meetup

Michele Playfair

This presentation was delivered at the Makati Testers Meetup hosted by Sandstone Technology on 4 August 2016. The information in this presentation and some of the slides are taken directly from James Bach & Michael Bolton’s Rapid Software Testing (RST) class and the notes from that class (which are publicly available from satisfice.com). This presentation is intended to provide an overview of some ideas presented in that class, I am not claiming any ownership of these ideas.

Building Data Science Teams, Abbreviated

Allen Day, PhD

Q: Can I simply hire one rockstar data scientist to cover all this kind of work? A: No, interdisciplinary work requires teams A: Hire leads who can speak the lingo of each required discipline A: Hire individual contributors who cover 2+ roles, when possible Statistical Thinking – Solve the Whole Problem BONUS: Meta Organization – Integration with Adjacent Teams Co-authors Allen Day @allenday and Paco Nathan @pacoid

Webinar - Patient Readmission Risk

Turi, Inc.

How Data Science Builds Better Products - Data Science Pop-up Seattle

Domino Data Lab

Data Science and Big Data are ushering in a new era in adaptive applications that learn from large and varied datasets and adjust their features based on the changing environment. This talk will look at how Data Science can be successfully bridged with Big Data Architectures and Agile Software Delivery to create a new class of software that answers the demands of today's rapidly-changing enterprises. Practical techniques and real-world case studies will highlight the approaches required to successfully build these exciting new enterprise tools. Presented by Sean McClure, Ph.D. Data Scientist, Senior Consultant at ThoughtWorks.

Big data expo - machine learning in the elastic stack

BigDataExpo

Un-siloing data science teams

Aravind Chiruvelli, PhD

Data science tools of the trade

Fangda Wang

Lean Analytics: How to get more out of your data science team

Digital Transformation EXPO Event Series

Lean Analytics is a set of rules to make data science more streamlined and productive. It touches on many aspects of what a data scientist should be and how a data science project should be defined to be successful. During this presentation Richard will present where data science projects go wrong, how you should think of data science projects, what constitutes success in data science and how you can measure progress. This session will be loaded with terms, stories and descriptions of project successes and failures. If you're wondering whether you're getting value out of data science, how to get more value out of it and even whether you need it then this talk is for you! What you will take away from this session Learn how to make your data science projects successful Evaluate how to track progress and report on the efficacy of data science solutions Understand the role of engineering and data scientists Understand your options for processes and software

Data science 101 Masterclass

Ben Keen

What's hot

Big data & analytics forum (yubin evh)

Yubin Park

Yhat - Applied Data Science - Feb 2016

Austin Ogilvie

Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th

Dataiku

Dataiku - From Big Data To Machine Learning

Dataiku

Back to Square One: Building a Data Science Team from Scratch

Klaas Bosteels

Managing Enterprise Data Science 201904

Mark Tabladillo

APPLIED DATA SCIENCE: HET ONTWIKKELEN VAN SLIMME ICT-PRODUCTEN DIE LEREN VAN ...

webwinkelvakdag

Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...

Domino Data Lab

Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

IT Arena

Slow down. Be Human. Building trust across teams with data

Matthew Eng

Resume

AnshulAggarwal74

1440 track 2 boire_using our laptop

Rising Media, Inc.

Data science ppt

Alexander Fleming

RST - Makati Testers Meetup

Michele Playfair

Building Data Science Teams, Abbreviated

Allen Day, PhD

Webinar - Patient Readmission Risk

Turi, Inc.

How Data Science Builds Better Products - Data Science Pop-up Seattle

Domino Data Lab

Big data expo - machine learning in the elastic stack

BigDataExpo

What's hot (18)

Big data & analytics forum (yubin evh)

Yhat - Applied Data Science - Feb 2016

Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th

Dataiku - From Big Data To Machine Learning

Back to Square One: Building a Data Science Team from Scratch

Managing Enterprise Data Science 201904

APPLIED DATA SCIENCE: HET ONTWIKKELEN VAN SLIMME ICT-PRODUCTEN DIE LEREN VAN ...

Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...

Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

Slow down. Be Human. Building trust across teams with data

Resume

1440 track 2 boire_using our laptop

Data science ppt

RST - Makati Testers Meetup

Building Data Science Teams, Abbreviated

Webinar - Patient Readmission Risk

How Data Science Builds Better Products - Data Science Pop-up Seattle

Big data expo - machine learning in the elastic stack

Similar to How to deliver effective data science projects

Un-siloing data science teams

Aravind Chiruvelli, PhD

Data science tools of the trade

Fangda Wang

Lean Analytics: How to get more out of your data science team

Digital Transformation EXPO Event Series

Data science 101 Masterclass

Ben Keen

From Lab to Factory: Or how to turn data into value

Peadar Coyle

NDC Oslo : A Practical Introduction to Data Science

Mark West

Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this? In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections: (1) I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation. (2) Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied. (3) The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.

Turn Data Into Actionable Insights - StampedeCon 2016

StampedeCon

At Monsanto, emerging technologies such as IoT, advanced imaging and geo-spatial platforms; molecular breeding, ancestry and genomics data sets have made us rethink how we approach developing, deploying, scaling and distributing our software to accelerate predictive and prescriptive decisions. We created a Cloud based Data Science platform for the enterprise to address this need. Our primary goals were to perform analytics@scale and integrate analytics with our core product platforms. As part of this talk, we will be sharing our journey of transformation showing how we enabled: a collaborative discovery analytics environment for data science teams to perform model development, provisioning data through APIs, streams and deploying models to production through our auto-scaling big-data compute in the cloud to perform streaming, cognitive, predictive, prescriptive, historical and batch analytics@scale, integrating analytics with our core product platforms to turn data into actionable insights.

JavaZone 2018 - A Practical(ish) Introduction to Data Science

Mark West

Code: https://github.com/markwest1972/titanic Video: https://vimeo.com/289705893 Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all of this? In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections: 1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation. 2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied. 3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.

Big Data for Data Scientists - Info Session

WeCloudData

OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf

Altinity Ltd

OSA Con 2022: Scaling your Pandas Analytics with Modin Doris Lee - Ponder Pandas is one of the most commonly used data science libraries in Python, with a convenient set of APIs for data cleaning, visualization, analysis, and exploration. However, despite its widespread adoption, Pandas suffers from severe scalability issues on large datasets. We developed the open-source project Modin, which is a fast, scalable drop-in replacement for pandas. Modin has been downloaded more than 4 million times and is used by leading data science teams, including Fortune 100 companies.

Ds for finance day 4

QuantUniversity

Bridging Big Data and Data Science Using Scalable Workflows

Ilkay Altintas, Ph.D.

OSCON 2014: Data Workflows for Machine LearningPaco Nathan

Maciej Marek (Philip Morris International) - The Tools of The Trade

Codiax

Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...

Neo4j

R&D as a service

Ciklum International

Machine Learning Infrastructure

SigOpt

As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future. Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)

The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...

Docker, Inc.

When you have to say ""YES"" to everything, what does that really mean? The Boston Consulting Group (BCG) is a global management consulting firm that advises industry leading companies on value creation strategies, innovation, transformation, supply chain management and much more. BCG provides recommendations and analytics that are custom tailored to the needs of each client. To do this BCG is transforming what used to be presentations and manual models into software that is prototyped and distributed to their client to run on their infrastructure - in effect changing the way BCG delivers value. However the productization process poses unique challenges and opportunities. Their clients have every type of infrastructure and application stack within their IT environments, how can BCG ensure that the custom built analytics applications will operate well at scale in their client's production environment - especially when it is an environment that they don't control? The bespoke nature of the BCG business led the team to embark on an engineering led journey to containerization with Docker. Attend this session to learn more about the approach, challenges and how BCG is enabling transformation with Docker Enterprise Edition.

Data Workflows for Machine Learning - Seattle DAML

Paco Nathan

First public meetup at Twitter Seattle, for Seattle DAML: http://www.meetup.com/Seattle-DAML/events/159043422/ We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.

Building successful data science teams

Venkatesh Umaashankar

Speaker: Venkatesh Umaashankar LinkedIn: https://www.linkedin.com/in/venkateshumaashankar/ What will be discussed? What is Data Science? Types of data scientists What makes a Data Science Team? Who are its members? Why does a DS team need Full Stack Developer? Who should lead the DS Team Building a Data Science team in a Startup Vs Enterprise Case studies on: Evolution Of Airbnb’s DS Team How Facebook on-boards DS team and trains them Apple’s Acqui-hiring Strategy to build DS team Spotify -‘Center of Excellence’ Model Who should attend? Managers Technical Leaders who want to get started with Data Science

Similar to How to deliver effective data science projects (20)

Un-siloing data science teams

Data science tools of the trade

Lean Analytics: How to get more out of your data science team

Data science 101 Masterclass

From Lab to Factory: Or how to turn data into value

NDC Oslo : A Practical Introduction to Data Science

Turn Data Into Actionable Insights - StampedeCon 2016

JavaZone 2018 - A Practical(ish) Introduction to Data Science

Big Data for Data Scientists - Info Session

OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf

Ds for finance day 4

Bridging Big Data and Data Science Using Scalable Workflows

OSCON 2014: Data Workflows for Machine Learning

Maciej Marek (Philip Morris International) - The Tools of The Trade

Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...

R&D as a service

Machine Learning Infrastructure

The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...

Data Workflows for Machine Learning - Seattle DAML

Building successful data science teams

More from IDEAS - Int'l Data Engineering and Science Association

Digital cracks in banking--Sid Nandi

IDEAS - Int'l Data Engineering and Science Association

“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...

IDEAS - Int'l Data Engineering and Science Association

Battling Skynet: The Role of Humanity in Artificial Intelligence

IDEAS - Int'l Data Engineering and Science Association

Implementing Artificial Intelligence with Big Data

IDEAS - Int'l Data Engineering and Science Association

Data Architecture (i.e., normalization / relational algebra) and Database Sec...

IDEAS - Int'l Data Engineering and Science Association

Blockchain Application in Real Estate Transactions

IDEAS - Int'l Data Engineering and Science Association

Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...

IDEAS - Int'l Data Engineering and Science Association

Practical Machine Learning at Work

IDEAS - Int'l Data Engineering and Science Association

Artificial Intelligence: Hype, Reality, Vision.

IDEAS - Int'l Data Engineering and Science Association

Operationalizing your Data Lake: Get Ready for Advanced Analytics

IDEAS - Int'l Data Engineering and Science Association

Introduction to Deep Reinforcement Learning

IDEAS - Int'l Data Engineering and Science Association

Best Practices in Data Partnerships Between Mayor's Office and Academia

IDEAS - Int'l Data Engineering and Science Association

Everything You Wish You Knew About Search

IDEAS - Int'l Data Engineering and Science Association

AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...

IDEAS - Int'l Data Engineering and Science Association

Data-Driven AI for Entertainment and Healthcare

IDEAS - Int'l Data Engineering and Science Association

Generating Creative Works with AI

IDEAS - Int'l Data Engineering and Science Association

Using AI to Tackle the Future of Health Care Data

IDEAS - Int'l Data Engineering and Science Association

State of AI/ML in Real Estate

IDEAS - Int'l Data Engineering and Science Association

Hot Dog, Not Hot Dog! Generate new training data without taking more photos.

IDEAS - Int'l Data Engineering and Science Association

Machine Learning in Healthcare and Life Science

IDEAS - Int'l Data Engineering and Science Association

More from IDEAS - Int'l Data Engineering and Science Association (20)

Digital cracks in banking--Sid Nandi

“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...

Battling Skynet: The Role of Humanity in Artificial Intelligence

Implementing Artificial Intelligence with Big Data

Data Architecture (i.e., normalization / relational algebra) and Database Sec...

Blockchain Application in Real Estate Transactions

Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...

Practical Machine Learning at Work

Artificial Intelligence: Hype, Reality, Vision.

Operationalizing your Data Lake: Get Ready for Advanced Analytics

Introduction to Deep Reinforcement Learning

Best Practices in Data Partnerships Between Mayor's Office and Academia

Everything You Wish You Knew About Search

AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...

Data-Driven AI for Entertainment and Healthcare

Generating Creative Works with AI

Using AI to Tackle the Future of Health Care Data

State of AI/ML in Real Estate

Hot Dog, Not Hot Dog! Generate new training data without taking more photos.

Machine Learning in Healthcare and Life Science

Recently uploaded

International Workshop on Artificial Intelligence in Software Testing

Sebastiano Panichella

Obesity causes and management and associated medical conditions

Faculty of Medicine And Health Sciences

Eureka, I found it! - Special Libraries Association 2021 Presentation

Access Innovations, Inc.

Have you ever wondered how search works while visiting an e-commerce site, internal website, or searching through other types of online resources? Look no further than this informative session on the ways that taxonomies help end-users navigate the internet! Hear from taxonomists and other information professionals who have first-hand experience creating and working with taxonomies that aid in navigation, search, and discovery across a range of disciplines.

Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...

OECD Directorate for Financial and Enterprise Affairs

Announcement of 18th IEEE International Conference on Software Testing, Verif...

Sebastiano Panichella

0x01 - Newton's Third Law: Static vs. Dynamic Abusers

OWASP Beja

f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures. About the Speaker =============== Diogo Sousa, Engineering Manager @ Canonical An opinionated individual with an interest in cryptography and its intersection with secure software development.

Acorn Recovery: Restore IT infra within minutes

IP ServerOne

María Carolina Martínez - eCommerce Day Colombia 2024

eCommerce Institute

Bitcoin Lightning wallet and tic-tac-toe game XOXO

Matjaž Lipuš

Media as a Mind Controlling Strategy In Old and Modern Era

faizulhassanfaiz1670

This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.

Doctoral Symposium at the 17th IEEE International Conference on Software Test...

Sebastiano Panichella

Gregory Harris' Civics Presentation.pptx

gharris9

Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf

Access Innovations, Inc.

Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...

Orkestra

Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf

khadija278284

somanykidsbutsofewfathers-140705000023-phpapp02.pptx

Howard Spence

Getting started with Amazon Bedrock Studio and Control Tower

Vladimir Samoylov

Recently uploaded (17)

International Workshop on Artificial Intelligence in Software Testing

Obesity causes and management and associated medical conditions

Eureka, I found it! - Special Libraries Association 2021 Presentation

Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...

Announcement of 18th IEEE International Conference on Software Testing, Verif...

0x01 - Newton's Third Law: Static vs. Dynamic Abusers

Acorn Recovery: Restore IT infra within minutes

María Carolina Martínez - eCommerce Day Colombia 2024

Bitcoin Lightning wallet and tic-tac-toe game XOXO

Media as a Mind Controlling Strategy In Old and Modern Era

Doctoral Symposium at the 17th IEEE International Conference on Software Test...

Gregory Harris' Civics Presentation.pptx

Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf

Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...

Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf

somanykidsbutsofewfathers-140705000023-phpapp02.pptx

Getting started with Amazon Bedrock Studio and Control Tower

How to deliver effective data science projects

1. Un-siloing Data Science Teams Aravind Chiruvelli, PhD Lead Data Scientist, ThoughtWorks

2. Patterns I came across “We are on the pace of transforming ourselves into a tech company, we must explore some data science PoC’s” “We need to make better use of our data, must be good case for data science” “We can automate many tasks using machine learning, lets do a PoC” “Let’s build a cool machine learning model and take it business”

3. Hindsight Insight Foresight Dr. Ken Collier, Director -AgileAnalytics, ThoughtWorks. Value from Data

4. Data First Businesses

5. Data is mine Model is mine Data First Businesses

6. ! Data Science projects often start as PoC’s ! Works great to mitigate the hype ! Low cost The Proof of Concept (PoC) Mode

7. ! Not always business value focussed ! Suffers from ad hoc prioritization ! Expectation mismatch ! Often unclear roadmap/vision Limited Value with PoC

8. ! ! ! Business first approach Empower Data team with product mindset Focus on reusability (code, infrastructure) ! Poly-skilled team ! Avoids standalone tabletop solutions ! Iterate with a vision ! Enables build platform to support multiple solutions From PoC to MVP

9. POC MVP

10. ! ! Data Science projects are Not Requirement driven (Well, For the most part) Data Science projects are not always Test driven (still very important to write tests) ! Data Science projects are always Hypothesis driven Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model Domain Knowledge DSLC: The Data Science Life Cycle

11. Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model But this is PoC DoMmodaein Knowled ge DSLC: The Data Science Life Cycle

12. Domain Knowledge Productionalize Model Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model DPLC: The Data Product Life Cycle

13. Productionalize Model Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model ! Productionalizing advanced analytics is not an afterthought ! Requires Data Scientists work with Data engineers !DomainPushes product thinking for the entire Kntea om wledg! Enables production ready code e DPLC: The Data Product Life Cycle

14. Productionalize Model Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model Iteration 1 Learn Measure Adopt Agile Discipline

15. Data Exploration Feature Analysis Feature Engineering Build Evaluate Model Iteration 1 Iteration 2 Productionalize Model Learn Measure Adopt Agile Discipline

16. Getting data, data quality checks, begin to build the pipeline Exploration , Experimentation and Model building Now that we have data and certain transformations, Time to start on underlying computation framework Feature engineering, Experimentation, and Model building & evaluation Build CD, Model management,integrate Agile DPLC in action

17. ●A machine learning platform allows rapid experimentation ●Allows feature sharing between teams ● Model management and versioning ● Faster path to production ●A collaborative and shareable infrastructure Product Thinking + Platform Approach Accelerate with Platform

18. Pay attention to Underlying Math “I would rather have questions that can't be answered than answers that can't be questioned.” - Richard P.Feynman, Physicist

19. “Data Science is a team sport” -DJ Patil

20. Thank you archiru@thoughtworks.com

How to deliver effective data science projects

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to How to deliver effective data science projects

Similar to How to deliver effective data science projects (20)

More from IDEAS - Int'l Data Engineering and Science Association

More from IDEAS - Int'l Data Engineering and Science Association (20)

Recently uploaded

Recently uploaded (17)

How to deliver effective data science projects