SlideShare a Scribd company logo
1 of 19
Download to read offline
Data Science in Action
Applications of Data Science in Drug
Discovery, Financial Services, Project
Management, Human Resources and
Marketing
What this talk is about
My Journey
- Change of career
- Lessons learned
Industries covered
- Pharmaceutical
- Project management
- Financial services within a major UK bank
- Human resources and recruitment
- Marketing
Druuuuugssssss *the boring kind
Pubmed: a repository of all medical/biological
related literature
Every day hundreds of papers are published.
How can an expert in ontology know about
advances in diabetes research that will impact
his/her research?
Expediate the arrival at a Eureka! moment
Benevolent AI
Headquartered in London UK
Startup
Partnered with Astrazeneca
AI to the rescue!
- GPUs
- TPUs
- Lots of money
Combining free unstructured text with hand
curated databases in chemical-drug
protein-gene gene-drug databases to achieve
insight and aid the drug discovery scientist
reach their Eureka! moment.
NLP
- POS tagging
- Syntactic parsing
- Entity detection
Graph Theory
- Inferred edges
- Path analysis
Reinforcement Learning
- Software engineers
- Data Scientists
- Bionformaticians
- Drug discovery scientists
- Clever business type people
Show me the money!
Onset of open banking in the UK
Major banks want to get ahead of the curve by
extracting maximal insight from their data
Millins of transactional data
Transaction Classification -- what is a salary
payment? Are we losing money to competitors
when it comes to savings?
Pensions: how to identify trends in pensions and
attract/retain customers
Mudano
Consultancy
Startup
Visionaries in implementing ai in the project
management domain.
Give me time! Give me clarity!
Think Kanban, Atlassian JIRA
Think like a project manager-- how do I keep
track of all tickets of all projects of all my
human resources?
How do I streamline a project to cut waste
maximise production?
How do I identify pain points in a project? I.e.
issues that might delay delivery?
I want to hire the best!
Think like a recruiter.
How do I identify the optimum person for a job?
- Specific experience
- Specific qualification
- Likely to be ready to move
- Go beyond keyword matches
Often the client knows the person they want to
hire -- and want someone similar!
Pre-seed startup
Emerged from a startup incubator in London
(Founders Factory)
What to look for?
- Compare the companies a candidate worked for
- Large companies are distinctly different to smaller ones
- Similarity matrix -- dimensionality reduction
- Look for candidates who have similar job titles
- Semantic search
- From job descriptions
- Propensity to move: who is likely to be open to new job opportunities?
Who should I advertise to?
- Identifying target audiences online
- Demographics
- Online behaviour
- Location data
Want to know:
- Who is likely to visit a store
- Who is likely to click on a link
- Keeping things inline with GDPR
Part of the Ominicom Group
Operates like a startup (flat management, fluid
job description room to innovate)
With the backing of a large organisation
The common theme? Recommender Systems
Recommend --
- The protein which activates a gene
- A drug that activates the protein
- The ideal candidate for a job
- The most relevant notifications for a project
- The right audience for an advertisement
Recommender systems: deconstructed
Getting the data--
- Buy it
- Scrape it
- Mine it (via apps cookies etc.)
- Download it
Identifying data quality: startup pitfall
- Many startups jump straight into the ds model
- Don't allow time for data quality checks
- Or understanding the data
- Decide on the list of desirables in advance
- Check for missing variables
- Correlations
- Check the volume of data when joined to other data sets
- Ask: can we impute the data?
- What can we do with missing/incomplete/inaccurate data?
The Data Science Model: circle of life
- Don't be clever
- Start simple
- Iterate
- Play with the data! Feature selection, feature engineering
- Understand the data -- gain a little domain knowledge
- Measured by whether you can hold a conversation with a domain expert
- Understand what is required!
- Determine what the desired outcome is
- Good precision or recall
- Auc
- If unsupervised how to determine quality
- Specific gain for the business? (more revenue? More efficient work? New discoveries)
- Precision might come at the expense of discovery!
- Recall might be at the expense of efficiency
Supervised vs unsupervised
- Do you have training data?
- Is that training data reliable?
- What is the source?
- Mechanical Turk?
- Expert annotation?
- Is it biased?
- Is my training data copious?
- Can I combine golden corpora with silver?
ML vs DL
- Do you need transparency or explainability?
- E.g. legal or financial services
- How much data do you have?
- Does it support DL?
- Do you have the technology?
- Time?
- Money?
- DL models on GPUs are expensive
Presenting results
Conveying the:
- Significance
- Importance
- Limitations
Of a project to stakeholders/clients/management
- People who are non-experts in the field
Presenting the results: pearls of wisdom
- Don’t lie
- Don’t exaggerate
- Be clear
- Be honest
- Try to think like a stakeholder
- What do they want from the project?
- How do I present the importance and usefulness of the results?
- Explain the benefit of the complicates/time consuming/ expensive DS approach with the
easier/cheaper faster methods

More Related Content

What's hot

What's hot (20)

Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Data science
Data scienceData science
Data science
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Data analytics
Data analyticsData analytics
Data analytics
 
2005)
2005)2005)
2005)
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
data science
data sciencedata science
data science
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 

Similar to Data Science in Action

Actionable Intelligence: Finding Insights & Opportunities
Actionable Intelligence: Finding Insights & OpportunitiesActionable Intelligence: Finding Insights & Opportunities
Actionable Intelligence: Finding Insights & Opportunities
Hubbard One
 
Lessons Learned from Hiring and Retaining Data Practitioners
Lessons Learned from Hiring and Retaining Data PractitionersLessons Learned from Hiring and Retaining Data Practitioners
Lessons Learned from Hiring and Retaining Data Practitioners
Tereza Iofciu
 

Similar to Data Science in Action (20)

Getting into ai event slides
Getting into ai   event slidesGetting into ai   event slides
Getting into ai event slides
 
Big data for sales and marketing people
Big data for sales and marketing peopleBig data for sales and marketing people
Big data for sales and marketing people
 
ICIC 2014 The Information World Doesn’t Stop at Patents!
ICIC 2014 The Information World Doesn’t Stop at Patents! ICIC 2014 The Information World Doesn’t Stop at Patents!
ICIC 2014 The Information World Doesn’t Stop at Patents!
 
InfoTools: Beyond Search
InfoTools: Beyond SearchInfoTools: Beyond Search
InfoTools: Beyond Search
 
Actionable Intelligence: Finding Insights & Opportunities
Actionable Intelligence: Finding Insights & OpportunitiesActionable Intelligence: Finding Insights & Opportunities
Actionable Intelligence: Finding Insights & Opportunities
 
Lessons Learned from Hiring and Retaining Data Practitioners
Lessons Learned from Hiring and Retaining Data PractitionersLessons Learned from Hiring and Retaining Data Practitioners
Lessons Learned from Hiring and Retaining Data Practitioners
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016
 
Information Architecture 101
Information Architecture 101Information Architecture 101
Information Architecture 101
 
Marcus Baker: People Analytics at Scale
Marcus Baker: People Analytics at ScaleMarcus Baker: People Analytics at Scale
Marcus Baker: People Analytics at Scale
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17
 
Digital Strategy for Cultural Heritage Institutions
Digital Strategy for Cultural Heritage InstitutionsDigital Strategy for Cultural Heritage Institutions
Digital Strategy for Cultural Heritage Institutions
 
Does market information, marketing and consumer research have a role in busin...
Does market information, marketing and consumer research have a role in busin...Does market information, marketing and consumer research have a role in busin...
Does market information, marketing and consumer research have a role in busin...
 
What is Data Science?
What is Data Science?What is Data Science?
What is Data Science?
 
What is data science? No really, what is a data scientist?
What is data science? No really, what is a data scientist?What is data science? No really, what is a data scientist?
What is data science? No really, what is a data scientist?
 
D92-198gstindspdx
D92-198gstindspdxD92-198gstindspdx
D92-198gstindspdx
 
Sentient Services (Ubiquity Marketing Un Summit 2009) V1
Sentient Services (Ubiquity Marketing Un Summit 2009) V1Sentient Services (Ubiquity Marketing Un Summit 2009) V1
Sentient Services (Ubiquity Marketing Un Summit 2009) V1
 
Science to Data Science: PhDs and postdocs moving to startups and industry (2...
Science to Data Science: PhDs and postdocs moving to startups and industry (2...Science to Data Science: PhDs and postdocs moving to startups and industry (2...
Science to Data Science: PhDs and postdocs moving to startups and industry (2...
 
Fundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineFundamentals of Data Analytics Outline
Fundamentals of Data Analytics Outline
 

More from Jordan Open Source Association

More from Jordan Open Source Association (20)

JOSA TechTalks - Data Oriented Architecture
JOSA TechTalks - Data Oriented ArchitectureJOSA TechTalks - Data Oriented Architecture
JOSA TechTalks - Data Oriented Architecture
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured Data
 
OpenSooq Mobile Infrastructure @ Scale
OpenSooq Mobile Infrastructure @ ScaleOpenSooq Mobile Infrastructure @ Scale
OpenSooq Mobile Infrastructure @ Scale
 
Data-Driven Digital Transformation
Data-Driven Digital TransformationData-Driven Digital Transformation
Data-Driven Digital Transformation
 
Processing Arabic Text
Processing Arabic TextProcessing Arabic Text
Processing Arabic Text
 
JOSA TechTalks - Downgrade your Costs
JOSA TechTalks - Downgrade your CostsJOSA TechTalks - Downgrade your Costs
JOSA TechTalks - Downgrade your Costs
 
JOSA TechTalks - Docker in Production
JOSA TechTalks - Docker in ProductionJOSA TechTalks - Docker in Production
JOSA TechTalks - Docker in Production
 
JOSA TechTalks - Word Embedding and Word2Vec Explained
JOSA TechTalks - Word Embedding and Word2Vec ExplainedJOSA TechTalks - Word Embedding and Word2Vec Explained
JOSA TechTalks - Word Embedding and Word2Vec Explained
 
JOSA TechTalks - Better Web Apps with React and Redux
JOSA TechTalks - Better Web Apps with React and ReduxJOSA TechTalks - Better Web Apps with React and Redux
JOSA TechTalks - Better Web Apps with React and Redux
 
JOSA TechTalks - RESTful API Concepts and Best Practices
JOSA TechTalks - RESTful API Concepts and Best PracticesJOSA TechTalks - RESTful API Concepts and Best Practices
JOSA TechTalks - RESTful API Concepts and Best Practices
 
Web app architecture
Web app architectureWeb app architecture
Web app architecture
 
Intro to the Principles of Graphic Design
Intro to the Principles of Graphic DesignIntro to the Principles of Graphic Design
Intro to the Principles of Graphic Design
 
Intro to Graphic Design Elements
Intro to Graphic Design ElementsIntro to Graphic Design Elements
Intro to Graphic Design Elements
 
JOSA TechTalk: Realtime monitoring and alerts
JOSA TechTalk: Realtime monitoring and alerts JOSA TechTalk: Realtime monitoring and alerts
JOSA TechTalk: Realtime monitoring and alerts
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
JOSA TechTalk: Introduction to Supervised Learning
JOSA TechTalk: Introduction to Supervised LearningJOSA TechTalk: Introduction to Supervised Learning
JOSA TechTalk: Introduction to Supervised Learning
 
JOSA TechTalk: Taking Docker to Production
JOSA TechTalk: Taking Docker to ProductionJOSA TechTalk: Taking Docker to Production
JOSA TechTalk: Taking Docker to Production
 
JOSA TechTalk: Introduction to docker
JOSA TechTalk: Introduction to dockerJOSA TechTalk: Introduction to docker
JOSA TechTalk: Introduction to docker
 
D programming language
D programming languageD programming language
D programming language
 
A taste of Functional Programming
A taste of Functional ProgrammingA taste of Functional Programming
A taste of Functional Programming
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Data Science in Action

  • 1. Data Science in Action Applications of Data Science in Drug Discovery, Financial Services, Project Management, Human Resources and Marketing
  • 2.
  • 3. What this talk is about My Journey - Change of career - Lessons learned Industries covered - Pharmaceutical - Project management - Financial services within a major UK bank - Human resources and recruitment - Marketing
  • 4. Druuuuugssssss *the boring kind Pubmed: a repository of all medical/biological related literature Every day hundreds of papers are published. How can an expert in ontology know about advances in diabetes research that will impact his/her research? Expediate the arrival at a Eureka! moment Benevolent AI Headquartered in London UK Startup Partnered with Astrazeneca
  • 5. AI to the rescue! - GPUs - TPUs - Lots of money Combining free unstructured text with hand curated databases in chemical-drug protein-gene gene-drug databases to achieve insight and aid the drug discovery scientist reach their Eureka! moment. NLP - POS tagging - Syntactic parsing - Entity detection Graph Theory - Inferred edges - Path analysis Reinforcement Learning - Software engineers - Data Scientists - Bionformaticians - Drug discovery scientists - Clever business type people
  • 6. Show me the money! Onset of open banking in the UK Major banks want to get ahead of the curve by extracting maximal insight from their data Millins of transactional data Transaction Classification -- what is a salary payment? Are we losing money to competitors when it comes to savings? Pensions: how to identify trends in pensions and attract/retain customers Mudano Consultancy Startup Visionaries in implementing ai in the project management domain.
  • 7. Give me time! Give me clarity! Think Kanban, Atlassian JIRA Think like a project manager-- how do I keep track of all tickets of all projects of all my human resources? How do I streamline a project to cut waste maximise production? How do I identify pain points in a project? I.e. issues that might delay delivery?
  • 8. I want to hire the best! Think like a recruiter. How do I identify the optimum person for a job? - Specific experience - Specific qualification - Likely to be ready to move - Go beyond keyword matches Often the client knows the person they want to hire -- and want someone similar! Pre-seed startup Emerged from a startup incubator in London (Founders Factory)
  • 9. What to look for? - Compare the companies a candidate worked for - Large companies are distinctly different to smaller ones - Similarity matrix -- dimensionality reduction - Look for candidates who have similar job titles - Semantic search - From job descriptions - Propensity to move: who is likely to be open to new job opportunities?
  • 10. Who should I advertise to? - Identifying target audiences online - Demographics - Online behaviour - Location data Want to know: - Who is likely to visit a store - Who is likely to click on a link - Keeping things inline with GDPR Part of the Ominicom Group Operates like a startup (flat management, fluid job description room to innovate) With the backing of a large organisation
  • 11. The common theme? Recommender Systems Recommend -- - The protein which activates a gene - A drug that activates the protein - The ideal candidate for a job - The most relevant notifications for a project - The right audience for an advertisement
  • 13. Getting the data-- - Buy it - Scrape it - Mine it (via apps cookies etc.) - Download it
  • 14. Identifying data quality: startup pitfall - Many startups jump straight into the ds model - Don't allow time for data quality checks - Or understanding the data - Decide on the list of desirables in advance - Check for missing variables - Correlations - Check the volume of data when joined to other data sets - Ask: can we impute the data? - What can we do with missing/incomplete/inaccurate data?
  • 15. The Data Science Model: circle of life - Don't be clever - Start simple - Iterate - Play with the data! Feature selection, feature engineering - Understand the data -- gain a little domain knowledge - Measured by whether you can hold a conversation with a domain expert - Understand what is required! - Determine what the desired outcome is - Good precision or recall - Auc - If unsupervised how to determine quality - Specific gain for the business? (more revenue? More efficient work? New discoveries) - Precision might come at the expense of discovery! - Recall might be at the expense of efficiency
  • 16. Supervised vs unsupervised - Do you have training data? - Is that training data reliable? - What is the source? - Mechanical Turk? - Expert annotation? - Is it biased? - Is my training data copious? - Can I combine golden corpora with silver?
  • 17. ML vs DL - Do you need transparency or explainability? - E.g. legal or financial services - How much data do you have? - Does it support DL? - Do you have the technology? - Time? - Money? - DL models on GPUs are expensive
  • 18. Presenting results Conveying the: - Significance - Importance - Limitations Of a project to stakeholders/clients/management - People who are non-experts in the field
  • 19. Presenting the results: pearls of wisdom - Don’t lie - Don’t exaggerate - Be clear - Be honest - Try to think like a stakeholder - What do they want from the project? - How do I present the importance and usefulness of the results? - Explain the benefit of the complicates/time consuming/ expensive DS approach with the easier/cheaper faster methods