SlideShare a Scribd company logo
ENRON EMAIL TEXT
ANALYTICS
AUTHOR : RADHIKA KINI
BACKGROUND : ENRON
• United states company based in Texas was into production and distribution of
energy
• Major Bankruptcy scandal surfaced in 2000 which lead to the dissolution of
accounting firm Arthur Anderson
• Synonymous to Fraudulency in Finance Sector
• Played a key role in energy crisis
ENRON DATA SET
• Repository of emails exchanged between 158 users
• Contents of the dataset
1. Email: contents of the email
2. Reponsive: was the email related to energy?
• Use of predictive analytics(CART Model) to understand the behavior of the emails.
• Capture the sentiments of the data set
STEPS INVOLVED
1. Cleaning the data
2. Pre-processing
3. Bag of words creation
4. Extracting the frequent words
5. Classifying the sentiments
6. Applying the CART Model on the training and test data sets
7. Checking for accuracy (Confusion Matrix, ROC, AUC)
GRAPHS
• The 20 Frequent words of the
data set
• The sentiments of the Corpus
• CART Tree
GRAPHS
• Responsiveness of the Test Data
set
• Graph indicating the actual
accuracy to the model accuracy
• ROCR Curve of the CART Model
PARAMETERS
• Confusion Matrix (CART Model)
• Accuracy_Model : 84.04%
• Actual_Accuracy: 83.67
• AUC : 83.57
CONCLUSION
• The CART model depicts the terms such as California, Gas, Bid, Jeff to be decision
points as they are heavily involved terms during the scandal
• The accuracy of the CART model outperforms the accuracy of the baseline model
• The ROC is a tradeoff between sensitivity and specificity. From the graph we can
clearly see that
• AUC means that the model can differentiate between a randomly selected
responsive and non-responsive document 83.57% of the time.
THANK YOU

More Related Content

Viewers also liked

PosterIST654_Group1
PosterIST654_Group1PosterIST654_Group1
PosterIST654_Group1Radhika Kini
 
Future of Google Analytics
Future of Google AnalyticsFuture of Google Analytics
Future of Google Analytics
Loves Data
 
Using Google Analytics For Content & Email Marketing @ Demand Generation Mark...
Using Google Analytics For Content & Email Marketing @ Demand Generation Mark...Using Google Analytics For Content & Email Marketing @ Demand Generation Mark...
Using Google Analytics For Content & Email Marketing @ Demand Generation Mark...
Jane Morgan
 
Getting your Analysis Noticed
Getting your Analysis NoticedGetting your Analysis Noticed
Getting your Analysis Noticed
Loves Data
 
Zaplecze SEO - wykrywanie, zabezpieczanie - Festiwal SEO 2014
Zaplecze SEO - wykrywanie, zabezpieczanie - Festiwal SEO 2014Zaplecze SEO - wykrywanie, zabezpieczanie - Festiwal SEO 2014
Zaplecze SEO - wykrywanie, zabezpieczanie - Festiwal SEO 2014
Krzysztof Marzec
 
10 Tips for using the Google Analytics App
10 Tips for using the Google Analytics App10 Tips for using the Google Analytics App
10 Tips for using the Google Analytics App
Loves Data
 
Optimisation with Google Analytics
Optimisation with Google AnalyticsOptimisation with Google Analytics
Optimisation with Google Analytics
Loves Data
 
Google Analytics Multi-Channel Funnels
Google Analytics Multi-Channel FunnelsGoogle Analytics Multi-Channel Funnels
Google Analytics Multi-Channel Funnels
Loves Data
 
Mobile Trends 2017 - Mobile AdWords i Analytics - Krzysztof Marzec
Mobile Trends 2017 - Mobile AdWords i Analytics - Krzysztof MarzecMobile Trends 2017 - Mobile AdWords i Analytics - Krzysztof Marzec
Mobile Trends 2017 - Mobile AdWords i Analytics - Krzysztof Marzec
Krzysztof Marzec
 
The 10 Best Copywriting Formulas for Social Media Headlines
The 10 Best Copywriting Formulas for Social Media HeadlinesThe 10 Best Copywriting Formulas for Social Media Headlines
The 10 Best Copywriting Formulas for Social Media Headlines
Buffer
 

Viewers also liked (10)

PosterIST654_Group1
PosterIST654_Group1PosterIST654_Group1
PosterIST654_Group1
 
Future of Google Analytics
Future of Google AnalyticsFuture of Google Analytics
Future of Google Analytics
 
Using Google Analytics For Content & Email Marketing @ Demand Generation Mark...
Using Google Analytics For Content & Email Marketing @ Demand Generation Mark...Using Google Analytics For Content & Email Marketing @ Demand Generation Mark...
Using Google Analytics For Content & Email Marketing @ Demand Generation Mark...
 
Getting your Analysis Noticed
Getting your Analysis NoticedGetting your Analysis Noticed
Getting your Analysis Noticed
 
Zaplecze SEO - wykrywanie, zabezpieczanie - Festiwal SEO 2014
Zaplecze SEO - wykrywanie, zabezpieczanie - Festiwal SEO 2014Zaplecze SEO - wykrywanie, zabezpieczanie - Festiwal SEO 2014
Zaplecze SEO - wykrywanie, zabezpieczanie - Festiwal SEO 2014
 
10 Tips for using the Google Analytics App
10 Tips for using the Google Analytics App10 Tips for using the Google Analytics App
10 Tips for using the Google Analytics App
 
Optimisation with Google Analytics
Optimisation with Google AnalyticsOptimisation with Google Analytics
Optimisation with Google Analytics
 
Google Analytics Multi-Channel Funnels
Google Analytics Multi-Channel FunnelsGoogle Analytics Multi-Channel Funnels
Google Analytics Multi-Channel Funnels
 
Mobile Trends 2017 - Mobile AdWords i Analytics - Krzysztof Marzec
Mobile Trends 2017 - Mobile AdWords i Analytics - Krzysztof MarzecMobile Trends 2017 - Mobile AdWords i Analytics - Krzysztof Marzec
Mobile Trends 2017 - Mobile AdWords i Analytics - Krzysztof Marzec
 
The 10 Best Copywriting Formulas for Social Media Headlines
The 10 Best Copywriting Formulas for Social Media HeadlinesThe 10 Best Copywriting Formulas for Social Media Headlines
The 10 Best Copywriting Formulas for Social Media Headlines
 

Similar to ENRON EMAIL TEXT ANALYTICS

Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Data Works MD
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
Thi K. Tran-Nguyen, PhD
 
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdfMario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
MarioAlejandroLeonGa
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization 
CS, NcState
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Subrata Saharia
 
Data Mining Email SPam Detection PPT WITH Algorithms
Data Mining Email SPam Detection PPT WITH AlgorithmsData Mining Email SPam Detection PPT WITH Algorithms
Data Mining Email SPam Detection PPT WITH Algorithms
deepika90811
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
Institute of Contemporary Sciences
 
Deep Learning for EHR Data
Deep Learning for EHR DataDeep Learning for EHR Data
Deep Learning for EHR Data
Thi K. Tran-Nguyen, PhD
 
Classifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsClassifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer Pairs
Jinho Choi
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineering
MD TOUFIQ HASAN ANIK
 

Similar to ENRON EMAIL TEXT ANALYTICS (11)

Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
 
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdfMario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization 
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
SIAM_CSE_PosterPresentation
SIAM_CSE_PosterPresentationSIAM_CSE_PosterPresentation
SIAM_CSE_PosterPresentation
 
Data Mining Email SPam Detection PPT WITH Algorithms
Data Mining Email SPam Detection PPT WITH AlgorithmsData Mining Email SPam Detection PPT WITH Algorithms
Data Mining Email SPam Detection PPT WITH Algorithms
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
 
Deep Learning for EHR Data
Deep Learning for EHR DataDeep Learning for EHR Data
Deep Learning for EHR Data
 
Classifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsClassifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer Pairs
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineering
 

ENRON EMAIL TEXT ANALYTICS

  • 2. BACKGROUND : ENRON • United states company based in Texas was into production and distribution of energy • Major Bankruptcy scandal surfaced in 2000 which lead to the dissolution of accounting firm Arthur Anderson • Synonymous to Fraudulency in Finance Sector • Played a key role in energy crisis
  • 3. ENRON DATA SET • Repository of emails exchanged between 158 users • Contents of the dataset 1. Email: contents of the email 2. Reponsive: was the email related to energy? • Use of predictive analytics(CART Model) to understand the behavior of the emails. • Capture the sentiments of the data set
  • 4. STEPS INVOLVED 1. Cleaning the data 2. Pre-processing 3. Bag of words creation 4. Extracting the frequent words 5. Classifying the sentiments 6. Applying the CART Model on the training and test data sets 7. Checking for accuracy (Confusion Matrix, ROC, AUC)
  • 5. GRAPHS • The 20 Frequent words of the data set • The sentiments of the Corpus • CART Tree
  • 6. GRAPHS • Responsiveness of the Test Data set • Graph indicating the actual accuracy to the model accuracy • ROCR Curve of the CART Model
  • 7. PARAMETERS • Confusion Matrix (CART Model) • Accuracy_Model : 84.04% • Actual_Accuracy: 83.67 • AUC : 83.57
  • 8. CONCLUSION • The CART model depicts the terms such as California, Gas, Bid, Jeff to be decision points as they are heavily involved terms during the scandal • The accuracy of the CART model outperforms the accuracy of the baseline model • The ROC is a tradeoff between sensitivity and specificity. From the graph we can clearly see that • AUC means that the model can differentiate between a randomly selected responsive and non-responsive document 83.57% of the time.