SlideShare a Scribd company logo
HUMAN + MACHINE
COLLABORATION
FOR IMPROVED ANALYTICAL PROCESSES
TONY OJEDA (ME)
• Data Scientist @ Follett
• Founder @ District Data Labs
• Co-Author
• Applied Text Analysis with Python
(O’Reilly, Fall 2017)
• Practical Data Science Cookbook
(Packt, Fall 2014)
• Conference Speaker
• Data Day Seattle 2016
• PyData - Carolinas & DC 2016
SOME BACKGROUND…
RECENT AI HEADLINES
• Can The Rise Of Artificial Intelligence End Humanity?
• Will Artificial Intelligence Leave You Jobless?
• Essential Skills To Keep Your Job In The Era Of
Artificial Intelligence
• Artificial intelligence probably won't kill you, but it
could take your job
How can we combine
human & machine abilities
to produce better outcomes
than either could on their own?
Not Human vs. Machine
But Human + Machine
DESIGNING COLLABORATIVE
ANALYTICAL PROCESSES
WHAT IS AN ANALYTICAL PROCESS?
• A series of tasks for ingesting, transforming, analyzing, modeling, or
visualizing data.
Ingestion Wrangling Analysis Modeling Visualization
Data Science Pipeline
DECONSTRUCTING A PROCESS
Steps
Tasks
Process
What types of tasks are humans better at?
What types of tasks are machines better at?
TYPES OF TASKS HUMANS ARE GOOD AT
• Sensory Tasks
• Social/Language/Communication Tasks
• General or Domain Knowledge Tasks
• Tasks Requiring Flexibility, Adaptability, or Creativity
• Exploratory or Investigative Tasks
TYPES OF TASKS MACHINES ARE GOOD AT
• Tasks Where Precision is Important
• Tasks that Require Processing Vast Amounts of Information
• Memory and Recollection Tasks
• Repetitive Tasks Where Consistency is Important
DESIGNING COLLABORATIVE PROCESSES
• Deconstruct the process into tasks and steps.
• Determine which steps should be performed by the human and
which should be performed by the machine.
• Identify the points of interaction and ensure those are intuitive.
THE INTERFACE IS IMPORTANT
COLLABORATIVE DATA EXPLORATION
DATA EXPLORATION FRAMEWORK
Prep
Phase
Explore
Phase
CREATE: CATEGORY AGGREGATIONS
Categorical variables with a lot
of categories (ex. more than 10)
Distill down into fewer
categories
CATEGORY AGGREGATION REQUIREMENTS
• Identification of categorical variables and unique values
• Natural language understanding
• General and/or domain knowledge
• Similarity in meaning
• Sometimes creativity
CREATE: CONTINUOUS BINS
Very Low
Low
Moderate
High
Very High
Identify continuous
variables
Assign them to buckets or
bins based on how high or
low their values are.
BINNING REQUIREMENTS
• Identification of continuous variables
• Comparison, ordering, and segregation
• Knowing whether higher or lower values are better
• Meaningful naming of resulting categories
CONTINUOUS BINNING EXAMPLE
import pandas as pd
import numpy as np
numeric_cols = data.select_dtypes(include=[np.number]).columns.values
for column in numeric_cols:
quint_levels = ['Very Low', 'Low','Moderate', 'High', 'Very High']
data[column + ' Level'] = pd.qcut(data[column], 5, quint_levels)
data[column + ' Decile'] = pd.qcut(data[column], 10, range(1,11))
data[column + ' Perc'] = pd.qcut(data[column],100, range(1,101))
CREATE: CLUSTER CATEGORIES
CLUSTERING REQUIREMENTS
• Identification of numeric variables
• Clustering similar records together
• Determining quality and appropriate numbers of clusters
• Meaningful naming of resulting categories
DATA EXPLORATION FRAMEWORK
Prep
Phase
Explore
Phase
EXPLORE: FILTER + AGGREGATE
FILTER + AGGREGATE REQUIREMENTS
• Identifying categorical and numeric variables.
• Filtering/sub-setting the data set by categories.
• Aggregating on categories and calculation of numeric fields.
• Interpreting results and determining what is useful.
EXPLORE:
FIELD
RELATIONSHIPS
FIELD RELATIONSHIP REQUIREMENTS
• Identifying numeric fields.
• Comparing cross-distributions of values across all combinations of
numeric fields.
• Identifying existence, direction, strength, and type of relationship.
• Determining which relationships (or lack thereof) are interesting or
insightful.
EXPLORE: ENTITY RELATIONSHIPS
GRAPH ANALYSIS REQUIREMENTS
• Identifying hierarchical entity levels in the data.
• Identifying similarities and strength of similarities between entities.
• Identifying clusters, communities, sub-networks and other important
groupings within the network.
• Interpreting those relationships and what they mean in the real
world.
KEY TAKE-AWAYS
• Human machine collaboration is important and very useful.
• We can design these processes via deconstruction into tasks and
steps.
• Pay special attention to the interfaces.
• There is plenty of room for development and advancement in this
area, and Python already contains a lot of the tools we need to make
progress.
WHERE TO LEARN MORE & GET INVOLVED
• Blog: blog.districtdatalabs.com
• Cultivar: github.com/DistrictDataLabs/cultivar
• Yellowbrick: github.com/DistrictDataLabs/yellowbrick
• Twitter: @tonyojeda3
• LinkedIn: linkedin.com/in/tonyojeda
THANK YOU!

More Related Content

Similar to Human Machine Collaboration for Improved Analytical Processes

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Predict oscars (4:17)
Predict oscars (4:17)Predict oscars (4:17)
Predict oscars (4:17)
Thinkful
 
Labou "Data Science and the Library at UC San Diego"
Labou "Data Science and the Library at UC San Diego"Labou "Data Science and the Library at UC San Diego"
Labou "Data Science and the Library at UC San Diego"
National Information Standards Organization (NISO)
 
DC SPUG Feb 2015 The Secret Sauce to Information Architecture
DC SPUG Feb 2015 The Secret Sauce to Information ArchitectureDC SPUG Feb 2015 The Secret Sauce to Information Architecture
DC SPUG Feb 2015 The Secret Sauce to Information Architecture
Jill Hannemann
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
OCLC
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Lynn Connaway
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
Carlos Edo
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
Aravindharamanan S
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
John Kinmonth
 
Bright "Analysis or What The Data Demands"
Bright "Analysis or What The Data Demands"Bright "Analysis or What The Data Demands"
Bright "Analysis or What The Data Demands"
National Information Standards Organization (NISO)
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012
srosenblatt
 
Information Architecture Exposing the Secret Sauce for Success
Information Architecture Exposing the Secret Sauce for Success Information Architecture Exposing the Secret Sauce for Success
Information Architecture Exposing the Secret Sauce for Success
Baltimore SharePoint (BSPUG)
 
Building successful research collaboration
Building successful research collaborationBuilding successful research collaboration
Building successful research collaboration
QSR International
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
DamianMingle
 
Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...
German Jordanian university
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012
srosenblatt
 

Similar to Human Machine Collaboration for Improved Analytical Processes (20)

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Predict oscars (4:17)
Predict oscars (4:17)Predict oscars (4:17)
Predict oscars (4:17)
 
Labou "Data Science and the Library at UC San Diego"
Labou "Data Science and the Library at UC San Diego"Labou "Data Science and the Library at UC San Diego"
Labou "Data Science and the Library at UC San Diego"
 
DC SPUG Feb 2015 The Secret Sauce to Information Architecture
DC SPUG Feb 2015 The Secret Sauce to Information ArchitectureDC SPUG Feb 2015 The Secret Sauce to Information Architecture
DC SPUG Feb 2015 The Secret Sauce to Information Architecture
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
 
Bright "Analysis or What The Data Demands"
Bright "Analysis or What The Data Demands"Bright "Analysis or What The Data Demands"
Bright "Analysis or What The Data Demands"
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012
 
Information Architecture Exposing the Secret Sauce for Success
Information Architecture Exposing the Secret Sauce for Success Information Architecture Exposing the Secret Sauce for Success
Information Architecture Exposing the Secret Sauce for Success
 
Building successful research collaboration
Building successful research collaborationBuilding successful research collaboration
Building successful research collaboration
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 

Human Machine Collaboration for Improved Analytical Processes

  • 1. HUMAN + MACHINE COLLABORATION FOR IMPROVED ANALYTICAL PROCESSES
  • 2. TONY OJEDA (ME) • Data Scientist @ Follett • Founder @ District Data Labs • Co-Author • Applied Text Analysis with Python (O’Reilly, Fall 2017) • Practical Data Science Cookbook (Packt, Fall 2014) • Conference Speaker • Data Day Seattle 2016 • PyData - Carolinas & DC 2016
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. RECENT AI HEADLINES • Can The Rise Of Artificial Intelligence End Humanity? • Will Artificial Intelligence Leave You Jobless? • Essential Skills To Keep Your Job In The Era Of Artificial Intelligence • Artificial intelligence probably won't kill you, but it could take your job
  • 9. How can we combine human & machine abilities to produce better outcomes than either could on their own?
  • 10. Not Human vs. Machine But Human + Machine
  • 12. WHAT IS AN ANALYTICAL PROCESS? • A series of tasks for ingesting, transforming, analyzing, modeling, or visualizing data. Ingestion Wrangling Analysis Modeling Visualization Data Science Pipeline
  • 14. What types of tasks are humans better at? What types of tasks are machines better at?
  • 15. TYPES OF TASKS HUMANS ARE GOOD AT • Sensory Tasks • Social/Language/Communication Tasks • General or Domain Knowledge Tasks • Tasks Requiring Flexibility, Adaptability, or Creativity • Exploratory or Investigative Tasks
  • 16. TYPES OF TASKS MACHINES ARE GOOD AT • Tasks Where Precision is Important • Tasks that Require Processing Vast Amounts of Information • Memory and Recollection Tasks • Repetitive Tasks Where Consistency is Important
  • 17. DESIGNING COLLABORATIVE PROCESSES • Deconstruct the process into tasks and steps. • Determine which steps should be performed by the human and which should be performed by the machine. • Identify the points of interaction and ensure those are intuitive.
  • 18. THE INTERFACE IS IMPORTANT
  • 21. CREATE: CATEGORY AGGREGATIONS Categorical variables with a lot of categories (ex. more than 10) Distill down into fewer categories
  • 22. CATEGORY AGGREGATION REQUIREMENTS • Identification of categorical variables and unique values • Natural language understanding • General and/or domain knowledge • Similarity in meaning • Sometimes creativity
  • 23. CREATE: CONTINUOUS BINS Very Low Low Moderate High Very High Identify continuous variables Assign them to buckets or bins based on how high or low their values are.
  • 24. BINNING REQUIREMENTS • Identification of continuous variables • Comparison, ordering, and segregation • Knowing whether higher or lower values are better • Meaningful naming of resulting categories
  • 25. CONTINUOUS BINNING EXAMPLE import pandas as pd import numpy as np numeric_cols = data.select_dtypes(include=[np.number]).columns.values for column in numeric_cols: quint_levels = ['Very Low', 'Low','Moderate', 'High', 'Very High'] data[column + ' Level'] = pd.qcut(data[column], 5, quint_levels) data[column + ' Decile'] = pd.qcut(data[column], 10, range(1,11)) data[column + ' Perc'] = pd.qcut(data[column],100, range(1,101))
  • 27. CLUSTERING REQUIREMENTS • Identification of numeric variables • Clustering similar records together • Determining quality and appropriate numbers of clusters • Meaningful naming of resulting categories
  • 29. EXPLORE: FILTER + AGGREGATE
  • 30. FILTER + AGGREGATE REQUIREMENTS • Identifying categorical and numeric variables. • Filtering/sub-setting the data set by categories. • Aggregating on categories and calculation of numeric fields. • Interpreting results and determining what is useful.
  • 32. FIELD RELATIONSHIP REQUIREMENTS • Identifying numeric fields. • Comparing cross-distributions of values across all combinations of numeric fields. • Identifying existence, direction, strength, and type of relationship. • Determining which relationships (or lack thereof) are interesting or insightful.
  • 34. GRAPH ANALYSIS REQUIREMENTS • Identifying hierarchical entity levels in the data. • Identifying similarities and strength of similarities between entities. • Identifying clusters, communities, sub-networks and other important groupings within the network. • Interpreting those relationships and what they mean in the real world.
  • 35. KEY TAKE-AWAYS • Human machine collaboration is important and very useful. • We can design these processes via deconstruction into tasks and steps. • Pay special attention to the interfaces. • There is plenty of room for development and advancement in this area, and Python already contains a lot of the tools we need to make progress.
  • 36. WHERE TO LEARN MORE & GET INVOLVED • Blog: blog.districtdatalabs.com • Cultivar: github.com/DistrictDataLabs/cultivar • Yellowbrick: github.com/DistrictDataLabs/yellowbrick • Twitter: @tonyojeda3 • LinkedIn: linkedin.com/in/tonyojeda