Python vs R for Data Analytics Final

•Download as PPTX, PDF•

0 likes•53 views

BobSmith712

Python vs R for data analytics capstone final (videos are password protected)

Technology

PYTHON VS R
BY: KENNAN DUFFY, DARIA GBOR, CHRIS LUKENS,
JOHN SAVIELLO, & JAMES SCHEUREN
http://project.mis.temple.edu/pythonvsranalytics/final-deliverables/

AGENDA
1. Our Process
2. Use Cases
3. Sentiment Analysis - Python
4. Sentiment Analysis - R
5. Scorecard
6. Recommendation
7. Q & A
2

OUR PROCESS
3
RESEARCH
Conduct web research on the use
case and the language
PYTHON
Complete the use case in Python
ANALYZE
Review and analyze the results of
Python & R as a team
R CODE
Complete the use case in R
DEFINE
Define the business purpose of the
use case and completion plan
SCORE
Fill out the scorecard based on previously
defined scoring criteria

SCORECARD
4
Criteria Weight (%)
Package Requirement 10%
Lines of Code 5%
Simplicity 10%
Popularity 5%
Development Sources 10%
Data Visualization 15%
Functionality 45%
Total 100%

USE CASE #1 - PREDICTIVE ANALYTICS
What
➔ NFL franchise wants to ensure that the player they are selecting
from the draft will be a high performer
How
➔ Linear Regression using the NFL combine dataset from 1985-2015

USE CASE #2 - TEXT MINING
What
➔ Justin Trudeau’s campaign team wants to stay updated on
what the public opinion is on him
How
➔ Sentiment analysis using Twitter feed as our dataset

USE CASE #3 - IMAGE ANALYTICS
What
➔ England wants to keep track of what is going on in the
busy streets for security purposes
How
➔ Object detection using a picture of a busy street in England

SENTIMENT ANALYSIS - PYTHON
8
csv
Allows us to write output to
csv file for analysis
Tweepy
Python library that allows
access twitter API and use
different functions
TextBlob
Natural language processor
to get subjectivity and
polarity of tweets
01
03 02

PERFORMANCE - PYTHON
10
Overall Accuracy: 28%
▰ Negative Accuracy: 52% (11/21)
▰ Positive Accuracy: 27% (7/26)
▰ Neutral Accuracy: 19% (10/53)

SENTIMENT ANALYSIS - R
11
04
03
02
01Syuzhet
Sentiment Analysis
TwitteR
Twitter API
Snowball C
Concision
TM
Text Mining

PERFORMANCE - R
13
Overall Accuracy: 50%
▰ Negative Accuracy: 77% (30/39)
▰ Positive Accuracy: 27% (9/33)
▰ Neutral Accuracy: 39% (11/28)

Our Recommendation
15
- Built for Data Analytics
- Package Accuracy
- Usability

GRADING CRITERIA
1. Package Requirement:
0 packages = 10 points
1 package = 9 points
2 packages = 8 points
3 packages = 7 points
4 packages = 6 points
5 packages = 5 points
6 packages = 4 points
7 packages = 3 points
8 packages = 2 points
9 packages = 1 point
10 packages = 0 points
3. Simplicity:
Quick, really simple to write, really simple to read = 10
Took a while to complete, but pretty simple, easy to understand = 7
Took so long to complete, not very simple, hard to understand = 4
Hard to write, almost impossible, not able to read = 1
4. Popularity:
Very Popular among the industry = 10
A lot of people use this language = 7
Some people use this language = 4
No one uses it = 1
5. Development Sources:
A lot of help in the online community = 10
Some resources available, decently helpful sources = 7
Not many resources available = 4
No help available online = 1
18
6. Data Visualization
Easy to manipulate, cleanliness, visually appealing = 10
Harder to manipulate, messy, not exciting = 7
Harder to manipulate, difficult to read = 4
Unable to manipulate, unreadable = 1
7. Functionality
Accurate data, does everything it needs to do = 10
Mostly accurate data, does most of what it needs to do = 7
Inaccurate data, barely does what it needs to do = 4
Is not able to complete the task = 1
2. Lines of Code:
0-10 lines = 10 points
11-20 = 9 points
21-30 = 8 points
31-40 = 7 points
41-50 = 6 points
51-60 = 5 points
61-70 = 4 points
71-80 = 3 points
81-90 = 2 points
91-100 = 1 point
101 + = 0 points

Similar to Python vs R for Data Analytics Final

IPPROJECT61-66 (2).pdfSaketMishra61

Using Generative AI to Assess the Quality of Open-Ended Responses in SurveysRay Poynter

Python webinar 4th juneEdureka!

An introduction to R is a document usefulssuser3c3f88

FEC2017-Introduction-to-programmingHenrikki Tenkanen

Design + Devops: What We've Learned from Our Developer FriendsUXPA International

Pig latinBita Kazemi

JDO 2019: Data Science for Developers - Matthew RenzePROIDEA

A Large Scale Study of Multiple Programming Languages and Code QualityPavneet Singh Kochhar

Splunk for DataScience (.conf2014)stelligence

LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta

Splunk conf2014 - Splunk for Data ScienceSplunk

OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...Ganesan Narayanasamy

Iwsm2014 application of function points to software based on open source - ...Nesma

DATA MINING USING R (1).pptxmyworld93

Ask me anything:A Conversational Interface to Augment Information Security w...Matthew Park

Introduction To RSpotle.ai

DevOps Is More than Dev and Ops: It’s about Tearing Down WallsTechWell

196 - Evaluation in Practice: Artifact-based Requirements Engineering and Sc...ESEM 2014

DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMSPraveen Penumathsa

Similar to Python vs R for Data Analytics Final (20)

IPPROJECT61-66 (2).pdf

Using Generative AI to Assess the Quality of Open-Ended Responses in Surveys

Python webinar 4th june

An introduction to R is a document useful

FEC2017-Introduction-to-programming

Design + Devops: What We've Learned from Our Developer Friends

Pig latin

JDO 2019: Data Science for Developers - Matthew Renze

A Large Scale Study of Multiple Programming Languages and Code Quality

Splunk for DataScience (.conf2014)

LaGatta and de Garrigues - Splunk for Data Science - .conf2014

Splunk conf2014 - Splunk for Data Science

OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...

Iwsm2014 application of function points to software based on open source - ...

DATA MINING USING R (1).pptx

Ask me anything:A Conversational Interface to Augment Information Security w...

Introduction To R

DevOps Is More than Dev and Ops: It’s about Tearing Down Walls

196 - Evaluation in Practice: Artifact-based Requirements Engineering and Sc...

DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Artificial intelligence in the post-deep learning eraDeakin University

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

AI as an Interface for Commercial BuildingsMemoori

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

CloudStudio User manual (basic edition):comworks

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

08448380779 Call Girls In Friends Colony Women Seeking Men

Artificial intelligence in the post-deep learning era

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

How to Remove Document Management Hurdles with X-Docs?

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Maximizing Board Effectiveness 2024 Webinar.pptx

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Advanced Test Driven-Development @ php[tek] 2024

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

The transition to renewables in India.pdf

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

AI as an Interface for Commercial Buildings

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

CloudStudio User manual (basic edition):

Injustice - Developers Among Us (SciFiDevCon 2024)

Python vs R for Data Analytics Final

1. PYTHON VS R BY: KENNAN DUFFY, DARIA GBOR, CHRIS LUKENS, JOHN SAVIELLO, & JAMES SCHEUREN http://project.mis.temple.edu/pythonvsranalytics/final-deliverables/

2. AGENDA 1. Our Process 2. Use Cases 3. Sentiment Analysis - Python 4. Sentiment Analysis - R 5. Scorecard 6. Recommendation 7. Q & A 2

3. OUR PROCESS 3 RESEARCH Conduct web research on the use case and the language PYTHON Complete the use case in Python ANALYZE Review and analyze the results of Python & R as a team R CODE Complete the use case in R DEFINE Define the business purpose of the use case and completion plan SCORE Fill out the scorecard based on previously defined scoring criteria

4. SCORECARD 4 Criteria Weight (%) Package Requirement 10% Lines of Code 5% Simplicity 10% Popularity 5% Development Sources 10% Data Visualization 15% Functionality 45% Total 100%

5. USE CASE #1 - PREDICTIVE ANALYTICS What ➔ NFL franchise wants to ensure that the player they are selecting from the draft will be a high performer How ➔ Linear Regression using the NFL combine dataset from 1985-2015

6. USE CASE #2 - TEXT MINING What ➔ Justin Trudeau’s campaign team wants to stay updated on what the public opinion is on him How ➔ Sentiment analysis using Twitter feed as our dataset

7. USE CASE #3 - IMAGE ANALYTICS What ➔ England wants to keep track of what is going on in the busy streets for security purposes How ➔ Object detection using a picture of a busy street in England

8. SENTIMENT ANALYSIS - PYTHON 8 csv Allows us to write output to csv file for analysis Tweepy Python library that allows access twitter API and use different functions TextBlob Natural language processor to get subjectivity and polarity of tweets 01 03 02

9. DEMONSTRATION - PYTHON 9

10. PERFORMANCE - PYTHON 10 Overall Accuracy: 28% ▰ Negative Accuracy: 52% (11/21) ▰ Positive Accuracy: 27% (7/26) ▰ Neutral Accuracy: 19% (10/53)

11. SENTIMENT ANALYSIS - R 11 04 03 02 01Syuzhet Sentiment Analysis TwitteR Twitter API Snowball C Concision TM Text Mining

12. DEMONSTRATION - R 12

13. PERFORMANCE - R 13 Overall Accuracy: 50% ▰ Negative Accuracy: 77% (30/39) ▰ Positive Accuracy: 27% (9/33) ▰ Neutral Accuracy: 39% (11/28)

14. SCORECARD 14

15. Our Recommendation 15 - Built for Data Analytics - Package Accuracy - Usability

16. 16 THANK YOU! Any questions?

17. APPENDIX

18. GRADING CRITERIA 1. Package Requirement: 0 packages = 10 points 1 package = 9 points 2 packages = 8 points 3 packages = 7 points 4 packages = 6 points 5 packages = 5 points 6 packages = 4 points 7 packages = 3 points 8 packages = 2 points 9 packages = 1 point 10 packages = 0 points 3. Simplicity: Quick, really simple to write, really simple to read = 10 Took a while to complete, but pretty simple, easy to understand = 7 Took so long to complete, not very simple, hard to understand = 4 Hard to write, almost impossible, not able to read = 1 4. Popularity: Very Popular among the industry = 10 A lot of people use this language = 7 Some people use this language = 4 No one uses it = 1 5. Development Sources: A lot of help in the online community = 10 Some resources available, decently helpful sources = 7 Not many resources available = 4 No help available online = 1 18 6. Data Visualization Easy to manipulate, cleanliness, visually appealing = 10 Harder to manipulate, messy, not exciting = 7 Harder to manipulate, difficult to read = 4 Unable to manipulate, unreadable = 1 7. Functionality Accurate data, does everything it needs to do = 10 Mostly accurate data, does most of what it needs to do = 7 Inaccurate data, barely does what it needs to do = 4 Is not able to complete the task = 1 2. Lines of Code: 0-10 lines = 10 points 11-20 = 9 points 21-30 = 8 points 31-40 = 7 points 41-50 = 6 points 51-60 = 5 points 61-70 = 4 points 71-80 = 3 points 81-90 = 2 points 91-100 = 1 point 101 + = 0 points

19. 19 USE CASE 1 - Python

20. 20 Use CASE 1 - R

21. USE CASE 3 – IMAGE ANALYTICS 21

Editor's Notes

Lines of code - we set up standard criteria for this measurement so if it was between 1-10 lines it got a 10, 11-20 lines it got a 9, and so forth Development sources - how strong is the online support community, how many helpful sources are out there for us to help us complete the use case and problem solve if issues arise Functionality - is it able to do what we want it to & how well is it able to accompish that
TEXTBLOB struggled to identify positive/neutral tweetsExplain how we got accuracy - retrieved 100 tweets and compared them (as a team) to the package results and see if we agreed with the outcome Neutral = 10/53 Negative = 11/21 Positive = 7/26
Syuzhet- Used for sentiment analysis - what is reading the tweets T M - Works with Snowball C and TwitteR to mine text TwitteR - Interacts with Twitter API to get tweet for analysis Snowball C - Makes words more concise so that they are easier for other packages to read Explain how we got accuracy - retrieved 100 tweets and compared them (as a team) to the package results and see if we agreed with the outcome
Load packages, Import Twitter API, Scrape, Cleaning, Analyze, Apply
50% overall. 77% negative (30/39). 27% positive (9/33). 39% neutral (11/28). Accuracy MENTION: Functionality USE FOR LESSONS LEARNED Found out that it is more accurate with negative tweets Not perfect, picks out certain words to decide whether it is positive or negative. Sarcasm is difficult. Shouldn’t trust positivity tweet analyses
Language is built for predictive analytics, ready to run predictive analytics where as python needs to be molded into running the linear regression The packages we ran for R were much more accurate than the Python packages for running sentiment analysis More functionality available when running image analytics than Python and very simple to change, it was a matter of changing only 2 lines of code to switch between face detection, landmark detection, logo detection, object detection

Python vs R for Data Analytics Final

Recommended

Recommended

More Related Content

Similar to Python vs R for Data Analytics Final

Similar to Python vs R for Data Analytics Final (20)

More from BobSmith712

More from BobSmith712 (7)

Recently uploaded

Recently uploaded (20)

Python vs R for Data Analytics Final

Editor's Notes