SlideShare a Scribd company logo
© 2020 Minitab, LLC.
© 2020 Minitab, LLC.
Mikhail has been prototyping new machine
learning algorithms and modeling automation
for 20 years, and he has been a major
contributor to developing technological
improvements to the most important
algorithms in Machine Learning: CART®️
Decision Trees, MARS®️ Non-linear
Regression, TreeNet®️ gradient boosting, and
Random Forests®️. He holds master’s
degrees both in rocket science from Kharkov
State Polytechnic University in Ukraine and
statistical computing from the University of
Central Florida.
Meet the Presenter:
Mikhail Golovnya
Minitab Senior Advisory Data Scientist
© 2020 Minitab, LLC.
The Challenge of Text Mining
► Data sets often have a character variable that
contains a possibly long text (user feedback,
comments, etc.)
► Such a variable will usually have as many distinct
values as there are records in the dataset – thus, it
cannot be used directly for modeling
► Core objective of Text Mining:
Find ways to extract numeric measures from a
text variable that can be used in quantitative
modeling
3
Wine Review
Excellent wine!
HIGHLY
Recomended
LOVE IT;
AWESOME!
Too bitter, fordettable
Love this wine
Had better wine
before
© 2020 Minitab, LLC.
Simple Text Statistics
► The following simple numeric summaries of the raw text itself can be extracted and used in quantitative
analysis as derived numeric variables
▪ Total count of words
▪ Total count of characters
▪ Average word length (in characters)
▪ Count of stop-words (commonly occurring words)
▪ Count of numeric words (series of digits)
▪ Count of words written in all upper-case
4
© 2020 Minitab, LLC.
Simple Stats
5
Wine Review
Excellent wine! HIGHLY Recomended
LOVE IT; AWESOME.
Too bitter, forgettable
Love this wine
Had beter wine before
© 2020 Minitab, LLC.
6
© 2020 Minitab, LLC.
Text Cleaning Steps
► Raw text stats summarize the original text in its raw form
► The following steps (cleaning up) are normally employed to prepare a raw text variable for further
analysis
▪ Converting all characters to lower case only
▪ Removing all punctuation
▪ Removing all stop-words
▪ Correct spelling errors
▪ Removing infrequent words
► More advanced analyses (semantic extraction, etc.) might omit some of the above steps
7
© 2020 Minitab, LLC.
Cleaning Up Process
8
Wine Review
Excellent wine! Highly Recomended
Love it; awesome.
Too bitter, forgettable
Love this wine
Had beter wine before
Wine Review
excellent wine highly recommended
love awesome
bitter forgettable
love wine
better wine
© 2020 Minitab, LLC.
Summary Statistics
► The following summary statistics can now be computed and visualized for a
“beautified” text variable
▪ Total word count for each word that “survived the beautification process”
▪ Inverse Document Frequency (IDF) for each word
𝐼𝐷𝐹 = log
𝑁
𝐷𝐹
here N – number of observations
DF – number of documents where a given word occurs
A word present in all observations has IDF=0
A word present in only one observation has the largest
possible IDF
▪ Bar chart of the most frequently occurring words and their IDFs
▪ Word-cloud image of the most frequently occurring words
9
© 2020 Minitab, LLC.
Summary Statistics
10
© 2020 Minitab, LLC.
Word Counts
11
© 2020 Minitab, LLC.
Word IDFs
12
© 2020 Minitab, LLC.
Extracting Sentiment Values
► Sentiment value is a number that summarizes writer’s overall
attitude based on the linguistic analysis of the text
▪ Positive sentiment reflects positive attitude
▪ Negative sentiment reflects negative attitude
13
© 2020 Minitab, LLC.
Creating a Bag of Words
► For each word create a new variable that reports how many times the word
occurs in the text field
► To avoid explosion of new variables, the user might want to exclude
infrequent words
14
© 2020 Minitab, LLC.
Extracting Singular Vectors
15
© 2020 Minitab, LLC.
Summary
► Reporting stage (text_summary.py)
▪ Word frequencies and IDFs
▪ Bar charts and word cloud
► Extracting stage (text_convert.py)
▪ Created original raw text statistics variables
▪ Cleaning up stage
▪ Created sentiment value variable
▪ Created bag of words variables
▪ Created singular vector variables
► We have solved the original text mining challenge:
all these numeric variables summarize the original text variable and can be
used in predictive modeling algorithms along with the rest of the predictors!
16
© 2020 Minitab, LLC.
Reporting Stage
► LET K1 = "reviews.csv“ – input data set
► LET K2 = "Review“ – text variable
► LET K3 = 1 – word count limit
► PYSC "text_summary.py“ – reporting script
17
© 2020 Minitab, LLC.
Extracting Stage
► LET K1 = "reviews.csv“ – input data set
► LET K2 = "Review“ – text variable
► LET K3 = 1 – word count limit
► LET K5 = 5 – number of singular vectors
► LET K6 = "reviews_bow.csv“ – bag of words dataset
► LET K7 = "reviews_svd.csv“ – singular vector dataset
► LET K8 = "reviews_lds.csv“ – word loadings
► PYSC "text_convert.py“ – extracting script
18
© 2020 Minitab, LLC.
Our Approach: More Than Business Analytics… Solutions Analytics
Software
Services
Training
Learn first-hand by attending public
trainings or customized trainings
according to your requirements.
Statistical
Consulting
Personalized help with statistical
challenges from collecting the right data
to interpreting analysis more.
Support
Assistance with installation,
implementation, version updates
and license management.
Master statistics and
Minitab anywhere
with online training
Machine learning and
predictive analytics
software
Start, track, manage
and execute
improvement projects
with real-time
dashboards
Powerful statistical
software everyone
can use
Data Analysis Predictive Modeling Visual Business Tools Project Oversight
Visual tools to
process and product
excellence
Online Training
Solutions analytics is our integrated approach to providing software and services that enable organizations to
make better decisions that drive business excellence.
© 2020 Minitab, LLC.
Upcoming Webinar Wednesdays
Continue learning and working efficiently with our free webinar series:
• A TEDx Coach’s Secrets To Developing Innovative Leaders
and Ensuring They Thrive at Your Organization – July 15
info.minitab.com/resources/webinars/webinar-wednesdays
Minitab Training is now virtual!
Learn more at minitab.com/training
© 2020 Minitab, LLC.
Thank You!
From all of us at

More Related Content

What's hot

Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
CGI
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
Andy Petrella
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
DataWorks Summit
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
Jean-Paul Calbimonte
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
CitiusTech
 
アサヒのデータ活用基盤を支えるデータ仮想化技術
アサヒのデータ活用基盤を支えるデータ仮想化技術アサヒのデータ活用基盤を支えるデータ仮想化技術
アサヒのデータ活用基盤を支えるデータ仮想化技術
Denodo
 
Hadoop Tutorial | Big Data Hadoop Tutorial For Beginners | Hadoop Certificati...
Hadoop Tutorial | Big Data Hadoop Tutorial For Beginners | Hadoop Certificati...Hadoop Tutorial | Big Data Hadoop Tutorial For Beginners | Hadoop Certificati...
Hadoop Tutorial | Big Data Hadoop Tutorial For Beginners | Hadoop Certificati...
Edureka!
 
JupyterHub: Learning at Scale
JupyterHub: Learning at ScaleJupyterHub: Learning at Scale
JupyterHub: Learning at Scale
Carol Willing
 
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Artem Chebotko
 
Devoxx 2012 hibernate envers
Devoxx 2012   hibernate enversDevoxx 2012   hibernate envers
Devoxx 2012 hibernate envers
Romain Linsolas
 
Data cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsData cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flows
Mark Kromer
 
Apache Hiveの今とこれから
Apache Hiveの今とこれからApache Hiveの今とこれから
Apache Hiveの今とこれから
Yifeng Jiang
 
Lambda EdgeとALB認証を導入した話
Lambda EdgeとALB認証を導入した話Lambda EdgeとALB認証を導入した話
Lambda EdgeとALB認証を導入した話
淳 千葉
 
Snowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdfSnowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdf
Dustin Liu
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
MoniqueO Opris
 
SplunkLive 2011 Advanced Session
SplunkLive 2011 Advanced SessionSplunkLive 2011 Advanced Session
SplunkLive 2011 Advanced Session
Splunk
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Data-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityData-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data Quality
DATAVERSITY
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Amazon Web Services
 
Amplify로 Neptune 그래프 DB 기반 모바일 앱 만들기 :: 김현민 - AWS Community Day 2019
Amplify로 Neptune 그래프 DB 기반 모바일 앱 만들기 :: 김현민 - AWS Community Day 2019Amplify로 Neptune 그래프 DB 기반 모바일 앱 만들기 :: 김현민 - AWS Community Day 2019
Amplify로 Neptune 그래프 DB 기반 모바일 앱 만들기 :: 김현민 - AWS Community Day 2019
AWSKRUG - AWS한국사용자모임
 

What's hot (20)

Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
 
アサヒのデータ活用基盤を支えるデータ仮想化技術
アサヒのデータ活用基盤を支えるデータ仮想化技術アサヒのデータ活用基盤を支えるデータ仮想化技術
アサヒのデータ活用基盤を支えるデータ仮想化技術
 
Hadoop Tutorial | Big Data Hadoop Tutorial For Beginners | Hadoop Certificati...
Hadoop Tutorial | Big Data Hadoop Tutorial For Beginners | Hadoop Certificati...Hadoop Tutorial | Big Data Hadoop Tutorial For Beginners | Hadoop Certificati...
Hadoop Tutorial | Big Data Hadoop Tutorial For Beginners | Hadoop Certificati...
 
JupyterHub: Learning at Scale
JupyterHub: Learning at ScaleJupyterHub: Learning at Scale
JupyterHub: Learning at Scale
 
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
 
Devoxx 2012 hibernate envers
Devoxx 2012   hibernate enversDevoxx 2012   hibernate envers
Devoxx 2012 hibernate envers
 
Data cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsData cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flows
 
Apache Hiveの今とこれから
Apache Hiveの今とこれからApache Hiveの今とこれから
Apache Hiveの今とこれから
 
Lambda EdgeとALB認証を導入した話
Lambda EdgeとALB認証を導入した話Lambda EdgeとALB認証を導入した話
Lambda EdgeとALB認証を導入した話
 
Snowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdfSnowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdf
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
SplunkLive 2011 Advanced Session
SplunkLive 2011 Advanced SessionSplunkLive 2011 Advanced Session
SplunkLive 2011 Advanced Session
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityData-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data Quality
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
Amplify로 Neptune 그래프 DB 기반 모바일 앱 만들기 :: 김현민 - AWS Community Day 2019
Amplify로 Neptune 그래프 DB 기반 모바일 앱 만들기 :: 김현민 - AWS Community Day 2019Amplify로 Neptune 그래프 DB 기반 모바일 앱 만들기 :: 김현민 - AWS Community Day 2019
Amplify로 Neptune 그래프 DB 기반 모바일 앱 만들기 :: 김현민 - AWS Community Day 2019
 

Similar to Performing at your best turning words into numbers and numbers into data driven insights with Minitab, Python and Text Mining

Boost Your Data Expertise with the Latest Release of Minitab Statistical Soft...
Boost Your Data Expertise with the Latest Release of Minitab Statistical Soft...Boost Your Data Expertise with the Latest Release of Minitab Statistical Soft...
Boost Your Data Expertise with the Latest Release of Minitab Statistical Soft...
Minitab, LLC
 
Tips & Tricks for CART (Classification and Regression Trees) in Minitab Stati...
Tips & Tricks for CART (Classification and Regression Trees) in Minitab Stati...Tips & Tricks for CART (Classification and Regression Trees) in Minitab Stati...
Tips & Tricks for CART (Classification and Regression Trees) in Minitab Stati...
Minitab, LLC
 
Boost Your Data Expertise - What's New in Minitab 19.2020.1
Boost Your Data Expertise -  What's New in Minitab 19.2020.1Boost Your Data Expertise -  What's New in Minitab 19.2020.1
Boost Your Data Expertise - What's New in Minitab 19.2020.1
Minitab, LLC
 
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Minitab, LLC
 
Machine Learning with Classification & Regression Trees - APAC
Machine Learning with Classification & Regression Trees - APAC Machine Learning with Classification & Regression Trees - APAC
Machine Learning with Classification & Regression Trees - APAC
Minitab, LLC
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
Greg Makowski
 
Machinelearning: The next step in manufacturing performance
Machinelearning: The next step in manufacturing performance Machinelearning: The next step in manufacturing performance
Machinelearning: The next step in manufacturing performance
Blackberry&Cross
 
Watson Analytics for HSE - Copy
Watson Analytics for HSE - CopyWatson Analytics for HSE - Copy
Watson Analytics for HSE - Copy
Alexei Cherenkov
 
Meet-Minitab-Connect-Oct-28-2020-Webinar-Slides.pdf
Meet-Minitab-Connect-Oct-28-2020-Webinar-Slides.pdfMeet-Minitab-Connect-Oct-28-2020-Webinar-Slides.pdf
Meet-Minitab-Connect-Oct-28-2020-Webinar-Slides.pdf
KaHina28
 
Discover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsDiscover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends Analytics
Webtrends
 
Discover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsDiscover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends Analytics
Webtrends
 
Visualizations that make an impact - see what s new in minitab statistical s...
Visualizations that make an impact  - see what s new in minitab statistical s...Visualizations that make an impact  - see what s new in minitab statistical s...
Visualizations that make an impact - see what s new in minitab statistical s...
Minitab, LLC
 
Innovation World 2008
Innovation World 2008Innovation World 2008
Innovation World 2008
Roman Stanek
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - Measurefest
Guido X Jansen
 
Rohit Nagpal_Resume
Rohit Nagpal_ResumeRohit Nagpal_Resume
Rohit Nagpal_Resume
rohitnagpal92
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital Transformation
VMware Tanzu
 
WMBT Team Pitch: Sustainability Management Platform
WMBT Team Pitch: Sustainability Management PlatformWMBT Team Pitch: Sustainability Management Platform
WMBT Team Pitch: Sustainability Management Platform
Kirill Zimin
 
Discover Minitab Workspace - The Ultimate Visual Toolkit to Elevate Your Work...
Discover Minitab Workspace - The Ultimate Visual Toolkit to Elevate Your Work...Discover Minitab Workspace - The Ultimate Visual Toolkit to Elevate Your Work...
Discover Minitab Workspace - The Ultimate Visual Toolkit to Elevate Your Work...
Minitab, LLC
 
23.pdf
23.pdf23.pdf
23.pdf
JeanJaggu
 
Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage
reshmamajji123
 

Similar to Performing at your best turning words into numbers and numbers into data driven insights with Minitab, Python and Text Mining (20)

Boost Your Data Expertise with the Latest Release of Minitab Statistical Soft...
Boost Your Data Expertise with the Latest Release of Minitab Statistical Soft...Boost Your Data Expertise with the Latest Release of Minitab Statistical Soft...
Boost Your Data Expertise with the Latest Release of Minitab Statistical Soft...
 
Tips & Tricks for CART (Classification and Regression Trees) in Minitab Stati...
Tips & Tricks for CART (Classification and Regression Trees) in Minitab Stati...Tips & Tricks for CART (Classification and Regression Trees) in Minitab Stati...
Tips & Tricks for CART (Classification and Regression Trees) in Minitab Stati...
 
Boost Your Data Expertise - What's New in Minitab 19.2020.1
Boost Your Data Expertise -  What's New in Minitab 19.2020.1Boost Your Data Expertise -  What's New in Minitab 19.2020.1
Boost Your Data Expertise - What's New in Minitab 19.2020.1
 
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
 
Machine Learning with Classification & Regression Trees - APAC
Machine Learning with Classification & Regression Trees - APAC Machine Learning with Classification & Regression Trees - APAC
Machine Learning with Classification & Regression Trees - APAC
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
Machinelearning: The next step in manufacturing performance
Machinelearning: The next step in manufacturing performance Machinelearning: The next step in manufacturing performance
Machinelearning: The next step in manufacturing performance
 
Watson Analytics for HSE - Copy
Watson Analytics for HSE - CopyWatson Analytics for HSE - Copy
Watson Analytics for HSE - Copy
 
Meet-Minitab-Connect-Oct-28-2020-Webinar-Slides.pdf
Meet-Minitab-Connect-Oct-28-2020-Webinar-Slides.pdfMeet-Minitab-Connect-Oct-28-2020-Webinar-Slides.pdf
Meet-Minitab-Connect-Oct-28-2020-Webinar-Slides.pdf
 
Discover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsDiscover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends Analytics
 
Discover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsDiscover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends Analytics
 
Visualizations that make an impact - see what s new in minitab statistical s...
Visualizations that make an impact  - see what s new in minitab statistical s...Visualizations that make an impact  - see what s new in minitab statistical s...
Visualizations that make an impact - see what s new in minitab statistical s...
 
Innovation World 2008
Innovation World 2008Innovation World 2008
Innovation World 2008
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - Measurefest
 
Rohit Nagpal_Resume
Rohit Nagpal_ResumeRohit Nagpal_Resume
Rohit Nagpal_Resume
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital Transformation
 
WMBT Team Pitch: Sustainability Management Platform
WMBT Team Pitch: Sustainability Management PlatformWMBT Team Pitch: Sustainability Management Platform
WMBT Team Pitch: Sustainability Management Platform
 
Discover Minitab Workspace - The Ultimate Visual Toolkit to Elevate Your Work...
Discover Minitab Workspace - The Ultimate Visual Toolkit to Elevate Your Work...Discover Minitab Workspace - The Ultimate Visual Toolkit to Elevate Your Work...
Discover Minitab Workspace - The Ultimate Visual Toolkit to Elevate Your Work...
 
23.pdf
23.pdf23.pdf
23.pdf
 
Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage
 

More from Minitab, LLC

L'art de la visualisation pour une meilleure compréhension des données
L'art de la visualisation pour une meilleure compréhension des donnéesL'art de la visualisation pour une meilleure compréhension des données
L'art de la visualisation pour une meilleure compréhension des données
Minitab, LLC
 
Pilotez le développement de vos produits et de vos procédés avec Minitab et M...
Pilotez le développement de vos produits et de vos procédés avec Minitab et M...Pilotez le développement de vos produits et de vos procédés avec Minitab et M...
Pilotez le développement de vos produits et de vos procédés avec Minitab et M...
Minitab, LLC
 
Introducing Graph Builder: Visualizations Built to Move You Forward
Introducing Graph Builder: Visualizations Built to Move You ForwardIntroducing Graph Builder: Visualizations Built to Move You Forward
Introducing Graph Builder: Visualizations Built to Move You Forward
Minitab, LLC
 
Les solutions Minitab pour développer vos produits selon les réglementations ...
Les solutions Minitab pour développer vos produits selon les réglementations ...Les solutions Minitab pour développer vos produits selon les réglementations ...
Les solutions Minitab pour développer vos produits selon les réglementations ...
Minitab, LLC
 
Minitab webinar presentation See the unknown with monte carlo simulation
Minitab webinar presentation See the unknown with monte carlo simulationMinitab webinar presentation See the unknown with monte carlo simulation
Minitab webinar presentation See the unknown with monte carlo simulation
Minitab, LLC
 
Concrétisez votre transformation digitale avec Minitab et Minitab Connect
Concrétisez votre transformation digitale avec Minitab et Minitab ConnectConcrétisez votre transformation digitale avec Minitab et Minitab Connect
Concrétisez votre transformation digitale avec Minitab et Minitab Connect
Minitab, LLC
 
En route vers l'excellence avec les solutions Minitab
En route vers l'excellence avec les solutions MinitabEn route vers l'excellence avec les solutions Minitab
En route vers l'excellence avec les solutions Minitab
Minitab, LLC
 
Meet Minitab Engage Your End-to-End Improvement Solution From Idea Generation...
Meet Minitab Engage Your End-to-End Improvement Solution From Idea Generation...Meet Minitab Engage Your End-to-End Improvement Solution From Idea Generation...
Meet Minitab Engage Your End-to-End Improvement Solution From Idea Generation...
Minitab, LLC
 
La puissance du machine learning et des algorithmes cart au service des métiers
La puissance du machine learning et des algorithmes cart au service des métiersLa puissance du machine learning et des algorithmes cart au service des métiers
La puissance du machine learning et des algorithmes cart au service des métiers
Minitab, LLC
 
Strukturierte problemloesung mit datenunterstuetzung
Strukturierte problemloesung mit datenunterstuetzungStrukturierte problemloesung mit datenunterstuetzung
Strukturierte problemloesung mit datenunterstuetzung
Minitab, LLC
 
Visualizaciones que crean impacto: Vea las novedades de Minitab Statistical ...
Visualizaciones que crean impacto:  Vea las novedades de Minitab Statistical ...Visualizaciones que crean impacto:  Vea las novedades de Minitab Statistical ...
Visualizaciones que crean impacto: Vea las novedades de Minitab Statistical ...
Minitab, LLC
 
Powerful Statistical Tools in the Pharmaceutical and Medical Devices Industry...
Powerful Statistical Tools in the Pharmaceutical and Medical Devices Industry...Powerful Statistical Tools in the Pharmaceutical and Medical Devices Industry...
Powerful Statistical Tools in the Pharmaceutical and Medical Devices Industry...
Minitab, LLC
 
Statistical solutions to help you with 5 FDA medical devices stages
Statistical solutions to help you with 5 FDA medical devices stagesStatistical solutions to help you with 5 FDA medical devices stages
Statistical solutions to help you with 5 FDA medical devices stages
Minitab, LLC
 
Machine Learning with Binary Logistic Regression - APAC
Machine Learning with Binary Logistic Regression - APACMachine Learning with Binary Logistic Regression - APAC
Machine Learning with Binary Logistic Regression - APAC
Minitab, LLC
 
Machine Learning with Multiple Regression - APAC
Machine Learning with Multiple Regression - APACMachine Learning with Multiple Regression - APAC
Machine Learning with Multiple Regression - APAC
Minitab, LLC
 
Unleashing the Power of Python Using the New Minitab/Python Integration Modul...
Unleashing the Power of Python Using the New Minitab/Python Integration Modul...Unleashing the Power of Python Using the New Minitab/Python Integration Modul...
Unleashing the Power of Python Using the New Minitab/Python Integration Modul...
Minitab, LLC
 
Einführung in den Minitab Workspace_Visuelle Toolkit zur Verbesserung Ihrer A...
Einführung in den Minitab Workspace_Visuelle Toolkit zur Verbesserung Ihrer A...Einführung in den Minitab Workspace_Visuelle Toolkit zur Verbesserung Ihrer A...
Einführung in den Minitab Workspace_Visuelle Toolkit zur Verbesserung Ihrer A...
Minitab, LLC
 
Melhore seu conhecimento sobre analise de dados com a versao mais recente do ...
Melhore seu conhecimento sobre analise de dados com a versao mais recente do ...Melhore seu conhecimento sobre analise de dados com a versao mais recente do ...
Melhore seu conhecimento sobre analise de dados com a versao mais recente do ...
Minitab, LLC
 
Pilotez, structurez et cartographiez vos processus avec minitab workspace
Pilotez, structurez et cartographiez vos processus avec minitab workspacePilotez, structurez et cartographiez vos processus avec minitab workspace
Pilotez, structurez et cartographiez vos processus avec minitab workspace
Minitab, LLC
 
Minitab Preview Training: Introduction to t-Tests for Manufacturing
Minitab Preview Training: Introduction to t-Tests for ManufacturingMinitab Preview Training: Introduction to t-Tests for Manufacturing
Minitab Preview Training: Introduction to t-Tests for Manufacturing
Minitab, LLC
 

More from Minitab, LLC (20)

L'art de la visualisation pour une meilleure compréhension des données
L'art de la visualisation pour une meilleure compréhension des donnéesL'art de la visualisation pour une meilleure compréhension des données
L'art de la visualisation pour une meilleure compréhension des données
 
Pilotez le développement de vos produits et de vos procédés avec Minitab et M...
Pilotez le développement de vos produits et de vos procédés avec Minitab et M...Pilotez le développement de vos produits et de vos procédés avec Minitab et M...
Pilotez le développement de vos produits et de vos procédés avec Minitab et M...
 
Introducing Graph Builder: Visualizations Built to Move You Forward
Introducing Graph Builder: Visualizations Built to Move You ForwardIntroducing Graph Builder: Visualizations Built to Move You Forward
Introducing Graph Builder: Visualizations Built to Move You Forward
 
Les solutions Minitab pour développer vos produits selon les réglementations ...
Les solutions Minitab pour développer vos produits selon les réglementations ...Les solutions Minitab pour développer vos produits selon les réglementations ...
Les solutions Minitab pour développer vos produits selon les réglementations ...
 
Minitab webinar presentation See the unknown with monte carlo simulation
Minitab webinar presentation See the unknown with monte carlo simulationMinitab webinar presentation See the unknown with monte carlo simulation
Minitab webinar presentation See the unknown with monte carlo simulation
 
Concrétisez votre transformation digitale avec Minitab et Minitab Connect
Concrétisez votre transformation digitale avec Minitab et Minitab ConnectConcrétisez votre transformation digitale avec Minitab et Minitab Connect
Concrétisez votre transformation digitale avec Minitab et Minitab Connect
 
En route vers l'excellence avec les solutions Minitab
En route vers l'excellence avec les solutions MinitabEn route vers l'excellence avec les solutions Minitab
En route vers l'excellence avec les solutions Minitab
 
Meet Minitab Engage Your End-to-End Improvement Solution From Idea Generation...
Meet Minitab Engage Your End-to-End Improvement Solution From Idea Generation...Meet Minitab Engage Your End-to-End Improvement Solution From Idea Generation...
Meet Minitab Engage Your End-to-End Improvement Solution From Idea Generation...
 
La puissance du machine learning et des algorithmes cart au service des métiers
La puissance du machine learning et des algorithmes cart au service des métiersLa puissance du machine learning et des algorithmes cart au service des métiers
La puissance du machine learning et des algorithmes cart au service des métiers
 
Strukturierte problemloesung mit datenunterstuetzung
Strukturierte problemloesung mit datenunterstuetzungStrukturierte problemloesung mit datenunterstuetzung
Strukturierte problemloesung mit datenunterstuetzung
 
Visualizaciones que crean impacto: Vea las novedades de Minitab Statistical ...
Visualizaciones que crean impacto:  Vea las novedades de Minitab Statistical ...Visualizaciones que crean impacto:  Vea las novedades de Minitab Statistical ...
Visualizaciones que crean impacto: Vea las novedades de Minitab Statistical ...
 
Powerful Statistical Tools in the Pharmaceutical and Medical Devices Industry...
Powerful Statistical Tools in the Pharmaceutical and Medical Devices Industry...Powerful Statistical Tools in the Pharmaceutical and Medical Devices Industry...
Powerful Statistical Tools in the Pharmaceutical and Medical Devices Industry...
 
Statistical solutions to help you with 5 FDA medical devices stages
Statistical solutions to help you with 5 FDA medical devices stagesStatistical solutions to help you with 5 FDA medical devices stages
Statistical solutions to help you with 5 FDA medical devices stages
 
Machine Learning with Binary Logistic Regression - APAC
Machine Learning with Binary Logistic Regression - APACMachine Learning with Binary Logistic Regression - APAC
Machine Learning with Binary Logistic Regression - APAC
 
Machine Learning with Multiple Regression - APAC
Machine Learning with Multiple Regression - APACMachine Learning with Multiple Regression - APAC
Machine Learning with Multiple Regression - APAC
 
Unleashing the Power of Python Using the New Minitab/Python Integration Modul...
Unleashing the Power of Python Using the New Minitab/Python Integration Modul...Unleashing the Power of Python Using the New Minitab/Python Integration Modul...
Unleashing the Power of Python Using the New Minitab/Python Integration Modul...
 
Einführung in den Minitab Workspace_Visuelle Toolkit zur Verbesserung Ihrer A...
Einführung in den Minitab Workspace_Visuelle Toolkit zur Verbesserung Ihrer A...Einführung in den Minitab Workspace_Visuelle Toolkit zur Verbesserung Ihrer A...
Einführung in den Minitab Workspace_Visuelle Toolkit zur Verbesserung Ihrer A...
 
Melhore seu conhecimento sobre analise de dados com a versao mais recente do ...
Melhore seu conhecimento sobre analise de dados com a versao mais recente do ...Melhore seu conhecimento sobre analise de dados com a versao mais recente do ...
Melhore seu conhecimento sobre analise de dados com a versao mais recente do ...
 
Pilotez, structurez et cartographiez vos processus avec minitab workspace
Pilotez, structurez et cartographiez vos processus avec minitab workspacePilotez, structurez et cartographiez vos processus avec minitab workspace
Pilotez, structurez et cartographiez vos processus avec minitab workspace
 
Minitab Preview Training: Introduction to t-Tests for Manufacturing
Minitab Preview Training: Introduction to t-Tests for ManufacturingMinitab Preview Training: Introduction to t-Tests for Manufacturing
Minitab Preview Training: Introduction to t-Tests for Manufacturing
 

Recently uploaded

LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
VALiNTRY360
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 

Recently uploaded (20)

LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 

Performing at your best turning words into numbers and numbers into data driven insights with Minitab, Python and Text Mining

  • 2. © 2020 Minitab, LLC. Mikhail has been prototyping new machine learning algorithms and modeling automation for 20 years, and he has been a major contributor to developing technological improvements to the most important algorithms in Machine Learning: CART®️ Decision Trees, MARS®️ Non-linear Regression, TreeNet®️ gradient boosting, and Random Forests®️. He holds master’s degrees both in rocket science from Kharkov State Polytechnic University in Ukraine and statistical computing from the University of Central Florida. Meet the Presenter: Mikhail Golovnya Minitab Senior Advisory Data Scientist
  • 3. © 2020 Minitab, LLC. The Challenge of Text Mining ► Data sets often have a character variable that contains a possibly long text (user feedback, comments, etc.) ► Such a variable will usually have as many distinct values as there are records in the dataset – thus, it cannot be used directly for modeling ► Core objective of Text Mining: Find ways to extract numeric measures from a text variable that can be used in quantitative modeling 3 Wine Review Excellent wine! HIGHLY Recomended LOVE IT; AWESOME! Too bitter, fordettable Love this wine Had better wine before
  • 4. © 2020 Minitab, LLC. Simple Text Statistics ► The following simple numeric summaries of the raw text itself can be extracted and used in quantitative analysis as derived numeric variables ▪ Total count of words ▪ Total count of characters ▪ Average word length (in characters) ▪ Count of stop-words (commonly occurring words) ▪ Count of numeric words (series of digits) ▪ Count of words written in all upper-case 4
  • 5. © 2020 Minitab, LLC. Simple Stats 5 Wine Review Excellent wine! HIGHLY Recomended LOVE IT; AWESOME. Too bitter, forgettable Love this wine Had beter wine before
  • 7. © 2020 Minitab, LLC. Text Cleaning Steps ► Raw text stats summarize the original text in its raw form ► The following steps (cleaning up) are normally employed to prepare a raw text variable for further analysis ▪ Converting all characters to lower case only ▪ Removing all punctuation ▪ Removing all stop-words ▪ Correct spelling errors ▪ Removing infrequent words ► More advanced analyses (semantic extraction, etc.) might omit some of the above steps 7
  • 8. © 2020 Minitab, LLC. Cleaning Up Process 8 Wine Review Excellent wine! Highly Recomended Love it; awesome. Too bitter, forgettable Love this wine Had beter wine before Wine Review excellent wine highly recommended love awesome bitter forgettable love wine better wine
  • 9. © 2020 Minitab, LLC. Summary Statistics ► The following summary statistics can now be computed and visualized for a “beautified” text variable ▪ Total word count for each word that “survived the beautification process” ▪ Inverse Document Frequency (IDF) for each word 𝐼𝐷𝐹 = log 𝑁 𝐷𝐹 here N – number of observations DF – number of documents where a given word occurs A word present in all observations has IDF=0 A word present in only one observation has the largest possible IDF ▪ Bar chart of the most frequently occurring words and their IDFs ▪ Word-cloud image of the most frequently occurring words 9
  • 10. © 2020 Minitab, LLC. Summary Statistics 10
  • 11. © 2020 Minitab, LLC. Word Counts 11
  • 12. © 2020 Minitab, LLC. Word IDFs 12
  • 13. © 2020 Minitab, LLC. Extracting Sentiment Values ► Sentiment value is a number that summarizes writer’s overall attitude based on the linguistic analysis of the text ▪ Positive sentiment reflects positive attitude ▪ Negative sentiment reflects negative attitude 13
  • 14. © 2020 Minitab, LLC. Creating a Bag of Words ► For each word create a new variable that reports how many times the word occurs in the text field ► To avoid explosion of new variables, the user might want to exclude infrequent words 14
  • 15. © 2020 Minitab, LLC. Extracting Singular Vectors 15
  • 16. © 2020 Minitab, LLC. Summary ► Reporting stage (text_summary.py) ▪ Word frequencies and IDFs ▪ Bar charts and word cloud ► Extracting stage (text_convert.py) ▪ Created original raw text statistics variables ▪ Cleaning up stage ▪ Created sentiment value variable ▪ Created bag of words variables ▪ Created singular vector variables ► We have solved the original text mining challenge: all these numeric variables summarize the original text variable and can be used in predictive modeling algorithms along with the rest of the predictors! 16
  • 17. © 2020 Minitab, LLC. Reporting Stage ► LET K1 = "reviews.csv“ – input data set ► LET K2 = "Review“ – text variable ► LET K3 = 1 – word count limit ► PYSC "text_summary.py“ – reporting script 17
  • 18. © 2020 Minitab, LLC. Extracting Stage ► LET K1 = "reviews.csv“ – input data set ► LET K2 = "Review“ – text variable ► LET K3 = 1 – word count limit ► LET K5 = 5 – number of singular vectors ► LET K6 = "reviews_bow.csv“ – bag of words dataset ► LET K7 = "reviews_svd.csv“ – singular vector dataset ► LET K8 = "reviews_lds.csv“ – word loadings ► PYSC "text_convert.py“ – extracting script 18
  • 19. © 2020 Minitab, LLC. Our Approach: More Than Business Analytics… Solutions Analytics Software Services Training Learn first-hand by attending public trainings or customized trainings according to your requirements. Statistical Consulting Personalized help with statistical challenges from collecting the right data to interpreting analysis more. Support Assistance with installation, implementation, version updates and license management. Master statistics and Minitab anywhere with online training Machine learning and predictive analytics software Start, track, manage and execute improvement projects with real-time dashboards Powerful statistical software everyone can use Data Analysis Predictive Modeling Visual Business Tools Project Oversight Visual tools to process and product excellence Online Training Solutions analytics is our integrated approach to providing software and services that enable organizations to make better decisions that drive business excellence.
  • 20. © 2020 Minitab, LLC. Upcoming Webinar Wednesdays Continue learning and working efficiently with our free webinar series: • A TEDx Coach’s Secrets To Developing Innovative Leaders and Ensuring They Thrive at Your Organization – July 15 info.minitab.com/resources/webinars/webinar-wednesdays Minitab Training is now virtual! Learn more at minitab.com/training
  • 21. © 2020 Minitab, LLC. Thank You! From all of us at