SlideShare a Scribd company logo
1 of 13
Download to read offline
The Power of Unstructured Data
Olga Scrivner, PhD
Research Scientist, CNS, Indiana University
Visiting Lecturer, Data Science Program, Indiana University
Corporate Faculty, Data Analytics, Harrisburg University of Science & Technology
Recommendation Systems
Transforming Data into Insights
80% of data will be unstructured
(IDC)
Data-Driven Decision Making (credits: PwC)
“Information is the
currency of this
digital age”
Carly Fiorina, Former CEO
of HP
2025
1 zettabyte = 1021 bytes
1 175 zettabytes of data globally
(IDC)
85% of customer interaction will
be without human interaction
(Gartner)
2
3
Use Cases
Jim Kitterman. 2018. The Why behind the What.
Banking (Fraud prediction
& Recommendations)
Human Resources
(Automated HR)
Marketing (Automated
Customer service)
Retail (Product
Recommendations)
Two of the leading drivers for AI adoption are delivering a
better customer experience and helping employees to get
better at their jobs (IDC, 2020)
Leading AI Use Cases: automated customer service agents, recommendation, and automation
Text Mining Landscape
(Zhai, 2016)
Real World Text Data
Observed World
(English)
Formal Language
(Chiang, 2018)
Natural Language
- Full of ambiguity
- Use of contextual
clues and other
information
Ambiguity
- Nearly or completely
unambiguous
- Any statement has exactly
one meaning, regardless of
context
- Verbose to reduce
ambiguity
- Redundant
Redundancy- Concise
- Less redundant
- More than one
meaning
- Many idioms and
metaphors
Literalness- Exactly one meaning
She spilled the beans
http://www.idioms4you.com/complete-idioms/spill-the-beans.html
https://www.quora.com/When-was-the-first-English-idiom-used-Why-was-it-used
Dan Jurafsky. 2012. Slides – Introduction to NLP
Natural Language Challenges
Sarkar, D. 2018. Deep Learning Methods for Text Data – Word2Vec, GloVe, FastText. Towards Data Science
Based on distributed representations (a dense
representations of words in a low-dimensional vector
space): Word2Vec, FastText
Prediction-Based
Models
Word is associated with a
continuous vector
representation
NLP Feature Extractions
Count-based: TF, TF-IDF, N-grams
Bag-of-Words
Models
NLP Landscape
(Zhai, 2016)
Real World Text Data
Observed World
AI Cognitive Application
NLP Application – Recommender System
1
2
3
Improving with Use: Customer retention
Improving Cart Value: Filter system (Amazon)
Improving Engagement: Using subscriptions (YouTube)
Corinna Underwood. 2020. Use Cases of Recommendation Systems.
Recommendation System Types
Collaborative
Filtering
Shortcoming: Cold Start Problem
Content-Based
Systems
User-Based
Users Similarity
(Classification task)
Item-Based
Items Similarity based on
Ratings (Pearson)
Similarity between Features
(Nearest Neighbor)
User Likes and Feedback
Rounak Banik. 2018. Hands-On Recommendation Systems with Python.
NLP Content-Based Recommendation
Job-recommendation System
Armand Olivares. 2019. NLP Content-Based Recommendation Systems.Data: Kaggle - job-recommendation-datasets
Job Description Preprocessing
Data: Kaggle - job-recommendation-datasets
Armand Olivares. 2019. NLP Content-Based Recommendation Systems.
1. Remove stop words
2. Remove not alphanumeric characters
3. Lemmatize the columns
4. Extract features (TF-IDF)
5. Use Cosine similarity (scores close to
one = more similarity between items)
Combined title, company, city, job type, description
vector1 vector2
Euclidean Distance
components of vectors
What is Next?
Career path recommendation
Skill recommendation
Course recommendation
e-recruiting
Graph-Based approach + NLP
Job recommendation
(Zhu et al., 2020)

More Related Content

What's hot

Structured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookStructured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookEmcien Corporation
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsMatthias Braunhofer
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science TeamsGanes Kesari
 
My Dissertation Proposal Defense
My Dissertation Proposal DefenseMy Dissertation Proposal Defense
My Dissertation Proposal DefenseLaura Pasquini
 
Presentation on Research Proposal (Qualitative); Digitalisation & Top Managem...
Presentation on Research Proposal (Qualitative); Digitalisation & Top Managem...Presentation on Research Proposal (Qualitative); Digitalisation & Top Managem...
Presentation on Research Proposal (Qualitative); Digitalisation & Top Managem...Himanish Kar Purkayastha
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxStephen Senn
 
Dissertation defense power point
Dissertation defense power pointDissertation defense power point
Dissertation defense power pointKelly Dodson
 
Over 100 Eye Opening Stats About Generation Z
Over 100 Eye Opening Stats About Generation ZOver 100 Eye Opening Stats About Generation Z
Over 100 Eye Opening Stats About Generation ZGregg L. Witt
 
DAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDATAVERSITY
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
 
Workshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisWorkshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisDr. Yaar Muhammad
 
Google Knowledge Graph
Google Knowledge GraphGoogle Knowledge Graph
Google Knowledge Graphkarthikzinavo
 
Sequential Kernel Association Test (SKAT) for rare and common variants
Sequential Kernel Association Test (SKAT) for rare and common variantsSequential Kernel Association Test (SKAT) for rare and common variants
Sequential Kernel Association Test (SKAT) for rare and common variantsDaisuke Yoneoka
 
Sample Ppt For Thesis Defense Powerpoint Presentation Slides
Sample Ppt For Thesis Defense Powerpoint Presentation SlidesSample Ppt For Thesis Defense Powerpoint Presentation Slides
Sample Ppt For Thesis Defense Powerpoint Presentation SlidesSlideTeam
 
FINAL-Vol-2-20160216
FINAL-Vol-2-20160216FINAL-Vol-2-20160216
FINAL-Vol-2-20160216alyssaduncan
 

What's hot (20)

Research proposal presentation
Research proposal presentationResearch proposal presentation
Research proposal presentation
 
Structured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookStructured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebook
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start Recommendations
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
My Dissertation Proposal Defense
My Dissertation Proposal DefenseMy Dissertation Proposal Defense
My Dissertation Proposal Defense
 
Presentation on Research Proposal (Qualitative); Digitalisation & Top Managem...
Presentation on Research Proposal (Qualitative); Digitalisation & Top Managem...Presentation on Research Proposal (Qualitative); Digitalisation & Top Managem...
Presentation on Research Proposal (Qualitative); Digitalisation & Top Managem...
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Thesis Defense Presentation
Thesis Defense PresentationThesis Defense Presentation
Thesis Defense Presentation
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 
Dissertation defense power point
Dissertation defense power pointDissertation defense power point
Dissertation defense power point
 
Over 100 Eye Opening Stats About Generation Z
Over 100 Eye Opening Stats About Generation ZOver 100 Eye Opening Stats About Generation Z
Over 100 Eye Opening Stats About Generation Z
 
DAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata Management
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working Together
 
Big Data Strategies
Big Data StrategiesBig Data Strategies
Big Data Strategies
 
Workshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisWorkshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysis
 
Google Knowledge Graph
Google Knowledge GraphGoogle Knowledge Graph
Google Knowledge Graph
 
Sequential Kernel Association Test (SKAT) for rare and common variants
Sequential Kernel Association Test (SKAT) for rare and common variantsSequential Kernel Association Test (SKAT) for rare and common variants
Sequential Kernel Association Test (SKAT) for rare and common variants
 
Sample Ppt For Thesis Defense Powerpoint Presentation Slides
Sample Ppt For Thesis Defense Powerpoint Presentation SlidesSample Ppt For Thesis Defense Powerpoint Presentation Slides
Sample Ppt For Thesis Defense Powerpoint Presentation Slides
 
FINAL-Vol-2-20160216
FINAL-Vol-2-20160216FINAL-Vol-2-20160216
FINAL-Vol-2-20160216
 

Similar to The power of unstructured data: Recommendation systems

How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Artificial Intelligence adoption factor in the University libraries of Pakist...
Artificial Intelligence adoption factor in the University libraries of Pakist...Artificial Intelligence adoption factor in the University libraries of Pakist...
Artificial Intelligence adoption factor in the University libraries of Pakist...Muhammad Yousuf Ali
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Gabriel Moreira
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptxprathammishra28
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf13DikshaDatir
 
AI in Multi Billion Search Engines. Career building in AI / Search. What make...
AI in Multi Billion Search Engines. Career building in AI / Search. What make...AI in Multi Billion Search Engines. Career building in AI / Search. What make...
AI in Multi Billion Search Engines. Career building in AI / Search. What make...Andrei Lopatenko
 
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisClassification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisSHAILENDRA KUMAR SINGH
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
 
Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Katrien Verbert
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews cscpconf
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
 
Building a Career in Data Science -WiCDS meetup
Building a Career in Data Science -WiCDS meetupBuilding a Career in Data Science -WiCDS meetup
Building a Career in Data Science -WiCDS meetupParul Pandey
 
A Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
A Recommendation Engine For Predicting Movie Ratings Using A Big Data ApproachA Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
A Recommendation Engine For Predicting Movie Ratings Using A Big Data ApproachFelicia Clark
 
Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfDawn Anderson MSc DigM
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
 
A1hfjjfjfjfifififiififififififififififfi8.pptx
A1hfjjfjfjfifififiififififififififififfi8.pptxA1hfjjfjfjfifififiififififififififififfi8.pptx
A1hfjjfjfjfifififiififififififififififfi8.pptxTamilArasan564275
 
Andjjdjdjdjdjdjdjfjfjkdkfkfjdkfjfjfjfjfjf18.pptx
Andjjdjdjdjdjdjdjfjfjkdkfkfjdkfjfjfjfjfjf18.pptxAndjjdjdjdjdjdjdjfjfjkdkfkfjdkfjfjfjfjfjf18.pptx
Andjjdjdjdjdjdjdjfjfjkdkfkfjdkfjfjfjfjfjf18.pptxTamilArasan564275
 

Similar to The power of unstructured data: Recommendation systems (20)

SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Artificial Intelligence adoption factor in the University libraries of Pakist...
Artificial Intelligence adoption factor in the University libraries of Pakist...Artificial Intelligence adoption factor in the University libraries of Pakist...
Artificial Intelligence adoption factor in the University libraries of Pakist...
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptx
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf
 
AI in Multi Billion Search Engines. Career building in AI / Search. What make...
AI in Multi Billion Search Engines. Career building in AI / Search. What make...AI in Multi Billion Search Engines. Career building in AI / Search. What make...
AI in Multi Billion Search Engines. Career building in AI / Search. What make...
 
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisClassification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
 
Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
Building a Career in Data Science -WiCDS meetup
Building a Career in Data Science -WiCDS meetupBuilding a Career in Data Science -WiCDS meetup
Building a Career in Data Science -WiCDS meetup
 
A Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
A Recommendation Engine For Predicting Movie Ratings Using A Big Data ApproachA Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
A Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
 
Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
A1hfjjfjfjfifififiififififififififififfi8.pptx
A1hfjjfjfjfifififiififififififififififfi8.pptxA1hfjjfjfjfifififiififififififififififfi8.pptx
A1hfjjfjfjfifififiififififififififififfi8.pptx
 
Andjjdjdjdjdjdjdjfjfjkdkfkfjdkfjfjfjfjfjf18.pptx
Andjjdjdjdjdjdjdjfjfjkdkfkfjdkfjfjfjfjfjf18.pptxAndjjdjdjdjdjdjdjfjfjkdkfkfjdkfjfjfjfjfjf18.pptx
Andjjdjdjdjdjdjdjfjfjkdkfkfjdkfjfjfjfjfjf18.pptx
 

More from Olga Scrivner

Engaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxEngaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxOlga Scrivner
 
HICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesHICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesOlga Scrivner
 
Cognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderCognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderOlga Scrivner
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with PythonOlga Scrivner
 
Call for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyCall for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyOlga Scrivner
 
Jupyter machine learning crash course
Jupyter machine learning crash courseJupyter machine learning crash course
Jupyter machine learning crash courseOlga Scrivner
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash courseOlga Scrivner
 
The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...Olga Scrivner
 
If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...Olga Scrivner
 
Introduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationIntroduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationOlga Scrivner
 
Introduction to Overleaf Workshop
Introduction to Overleaf WorkshopIntroduction to Overleaf Workshop
Introduction to Overleaf WorkshopOlga Scrivner
 
R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303Olga Scrivner
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
Gender Disparity in Employment and Education
Gender Disparity in Employment and EducationGender Disparity in Employment and Education
Gender Disparity in Employment and EducationOlga Scrivner
 
CrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersCrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersOlga Scrivner
 
Optimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOptimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOlga Scrivner
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowOlga Scrivner
 
Reproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataReproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataOlga Scrivner
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFOlga Scrivner
 
Building Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTMLBuilding Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTMLOlga Scrivner
 

More from Olga Scrivner (20)

Engaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxEngaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptx
 
HICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesHICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning Technologies
 
Cognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderCognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use Disorder
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python
 
Call for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyCall for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and Technology
 
Jupyter machine learning crash course
Jupyter machine learning crash courseJupyter machine learning crash course
Jupyter machine learning crash course
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
 
The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...
 
If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...
 
Introduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationIntroduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web Application
 
Introduction to Overleaf Workshop
Introduction to Overleaf WorkshopIntroduction to Overleaf Workshop
Introduction to Overleaf Workshop
 
R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Gender Disparity in Employment and Education
Gender Disparity in Employment and EducationGender Disparity in Employment and Education
Gender Disparity in Employment and Education
 
CrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersCrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for Beginners
 
Optimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOptimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with Shiny
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R Workflow
 
Reproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataReproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid data
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 
Building Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTMLBuilding Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTML
 

Recently uploaded

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

The power of unstructured data: Recommendation systems

  • 1. The Power of Unstructured Data Olga Scrivner, PhD Research Scientist, CNS, Indiana University Visiting Lecturer, Data Science Program, Indiana University Corporate Faculty, Data Analytics, Harrisburg University of Science & Technology Recommendation Systems
  • 2. Transforming Data into Insights 80% of data will be unstructured (IDC) Data-Driven Decision Making (credits: PwC) “Information is the currency of this digital age” Carly Fiorina, Former CEO of HP 2025 1 zettabyte = 1021 bytes 1 175 zettabytes of data globally (IDC) 85% of customer interaction will be without human interaction (Gartner) 2 3
  • 3. Use Cases Jim Kitterman. 2018. The Why behind the What. Banking (Fraud prediction & Recommendations) Human Resources (Automated HR) Marketing (Automated Customer service) Retail (Product Recommendations) Two of the leading drivers for AI adoption are delivering a better customer experience and helping employees to get better at their jobs (IDC, 2020) Leading AI Use Cases: automated customer service agents, recommendation, and automation
  • 4. Text Mining Landscape (Zhai, 2016) Real World Text Data Observed World (English)
  • 5. Formal Language (Chiang, 2018) Natural Language - Full of ambiguity - Use of contextual clues and other information Ambiguity - Nearly or completely unambiguous - Any statement has exactly one meaning, regardless of context - Verbose to reduce ambiguity - Redundant Redundancy- Concise - Less redundant - More than one meaning - Many idioms and metaphors Literalness- Exactly one meaning She spilled the beans http://www.idioms4you.com/complete-idioms/spill-the-beans.html https://www.quora.com/When-was-the-first-English-idiom-used-Why-was-it-used
  • 6. Dan Jurafsky. 2012. Slides – Introduction to NLP Natural Language Challenges
  • 7. Sarkar, D. 2018. Deep Learning Methods for Text Data – Word2Vec, GloVe, FastText. Towards Data Science Based on distributed representations (a dense representations of words in a low-dimensional vector space): Word2Vec, FastText Prediction-Based Models Word is associated with a continuous vector representation NLP Feature Extractions Count-based: TF, TF-IDF, N-grams Bag-of-Words Models
  • 8. NLP Landscape (Zhai, 2016) Real World Text Data Observed World AI Cognitive Application
  • 9. NLP Application – Recommender System 1 2 3 Improving with Use: Customer retention Improving Cart Value: Filter system (Amazon) Improving Engagement: Using subscriptions (YouTube) Corinna Underwood. 2020. Use Cases of Recommendation Systems.
  • 10. Recommendation System Types Collaborative Filtering Shortcoming: Cold Start Problem Content-Based Systems User-Based Users Similarity (Classification task) Item-Based Items Similarity based on Ratings (Pearson) Similarity between Features (Nearest Neighbor) User Likes and Feedback Rounak Banik. 2018. Hands-On Recommendation Systems with Python.
  • 11. NLP Content-Based Recommendation Job-recommendation System Armand Olivares. 2019. NLP Content-Based Recommendation Systems.Data: Kaggle - job-recommendation-datasets
  • 12. Job Description Preprocessing Data: Kaggle - job-recommendation-datasets Armand Olivares. 2019. NLP Content-Based Recommendation Systems. 1. Remove stop words 2. Remove not alphanumeric characters 3. Lemmatize the columns 4. Extract features (TF-IDF) 5. Use Cosine similarity (scores close to one = more similarity between items) Combined title, company, city, job type, description vector1 vector2 Euclidean Distance components of vectors
  • 13. What is Next? Career path recommendation Skill recommendation Course recommendation e-recruiting Graph-Based approach + NLP Job recommendation (Zhu et al., 2020)