Python vs R for Data Analytics Final

•Download as PPTX, PDF•

0 likes•53 views

Python and R were compared on a sentiment analysis use case. Python's accuracy was 28% overall while R had 50% accuracy. The document outlines the process used to complete the analysis in both Python and R, including required packages, code demonstrations, and performance results for each language. A scorecard was used to evaluate and compare Python and R based on criteria like package requirements, lines of code, simplicity, popularity, and functionality. Based on the scorecard results, R was recommended for data analytics tasks due to its accuracy, usability, and support from online resources and communities.

PYTHON VS R
BY: KENNAN DUFFY, DARIA GBOR, CHRIS LUKENS,
JOHN SAVIELLO, & JAMES SCHEUREN
http://project.mis.temple.edu/pythonvsranalytics/final-deliverables/

AGENDA
1. Our Process
2. Use Cases
3. Sentiment Analysis - Python
4. Sentiment Analysis - R
5. Scorecard
6. Recommendation
7. Q & A
2

OUR PROCESS
3
RESEARCH
Conduct web research on the use
case and the language
PYTHON
Complete the use case in Python
ANALYZE
Review and analyze the results of
Python & R as a team
R CODE
Complete the use case in R
DEFINE
Define the business purpose of the
use case and completion plan
SCORE
Fill out the scorecard based on previously
defined scoring criteria

SCORECARD
4
Criteria Weight (%)
Package Requirement 10%
Lines of Code 5%
Simplicity 10%
Popularity 5%
Development Sources 10%
Data Visualization 15%
Functionality 45%
Total 100%

USE CASE #1 - PREDICTIVE ANALYTICS
What
➔ NFL franchise wants to ensure that the player they are selecting
from the draft will be a high performer
How
➔ Linear Regression using the NFL combine dataset from 1985-2015

USE CASE #2 - TEXT MINING
What
➔ Justin Trudeau’s campaign team wants to stay updated on
what the public opinion is on him
How
➔ Sentiment analysis using Twitter feed as our dataset

USE CASE #3 - IMAGE ANALYTICS
What
➔ England wants to keep track of what is going on in the
busy streets for security purposes
How
➔ Object detection using a picture of a busy street in England

SENTIMENT ANALYSIS - PYTHON
8
csv
Allows us to write output to
csv file for analysis
Tweepy
Python library that allows
access twitter API and use
different functions
TextBlob
Natural language processor
to get subjectivity and
polarity of tweets
01
03 02

PERFORMANCE - PYTHON
10
Overall Accuracy: 28%
▰ Negative Accuracy: 52% (11/21)
▰ Positive Accuracy: 27% (7/26)
▰ Neutral Accuracy: 19% (10/53)

SENTIMENT ANALYSIS - R
11
04
03
02
01Syuzhet
Sentiment Analysis
TwitteR
Twitter API
Snowball C
Concision
TM
Text Mining

PERFORMANCE - R
13
Overall Accuracy: 50%
▰ Negative Accuracy: 77% (30/39)
▰ Positive Accuracy: 27% (9/33)
▰ Neutral Accuracy: 39% (11/28)

Our Recommendation
15
- Built for Data Analytics
- Package Accuracy
- Usability

GRADING CRITERIA
1. Package Requirement:
0 packages = 10 points
1 package = 9 points
2 packages = 8 points
3 packages = 7 points
4 packages = 6 points
5 packages = 5 points
6 packages = 4 points
7 packages = 3 points
8 packages = 2 points
9 packages = 1 point
10 packages = 0 points
3. Simplicity:
Quick, really simple to write, really simple to read = 10
Took a while to complete, but pretty simple, easy to understand = 7
Took so long to complete, not very simple, hard to understand = 4
Hard to write, almost impossible, not able to read = 1
4. Popularity:
Very Popular among the industry = 10
A lot of people use this language = 7
Some people use this language = 4
No one uses it = 1
5. Development Sources:
A lot of help in the online community = 10
Some resources available, decently helpful sources = 7
Not many resources available = 4
No help available online = 1
18
6. Data Visualization
Easy to manipulate, cleanliness, visually appealing = 10
Harder to manipulate, messy, not exciting = 7
Harder to manipulate, difficult to read = 4
Unable to manipulate, unreadable = 1
7. Functionality
Accurate data, does everything it needs to do = 10
Mostly accurate data, does most of what it needs to do = 7
Inaccurate data, barely does what it needs to do = 4
Is not able to complete the task = 1
2. Lines of Code:
0-10 lines = 10 points
11-20 = 9 points
21-30 = 8 points
31-40 = 7 points
41-50 = 6 points
51-60 = 5 points
61-70 = 4 points
71-80 = 3 points
81-90 = 2 points
91-100 = 1 point
101 + = 0 points

Many organisations are creating groups dedicated to data. These groups have many names : Data Team, Data Labs, Analytics Teams…. But whatever the name, the success of those teams depends a lot on the quality of the data infrastructure and their ability to actually deploy data science applications in production. In that regards a new role of “DataOps” is emerging. Similar, to Dev Ops for (Web) Dev, the Data Ops is a merge between a data engineer and a platform administrator. Well versed in cluster administration and optimisation, a data ops would have also a perspective on the quality of data quality and the relevance of predictive models. Do you want to be a Data Ops ? We’ll discuss its role and challenges during this talk

python training.docx

AkshitaYadav49

Analyzing Data With Python

Sarah Guido

Artificial intelligence use cases for International Dating Apps. iDate 2018. ...

Lluis Carreras

As Andrew Ng says, AI is the new electricity and it will transform many industries, therefore dating is going to be transform by the use of AI. With the experience of having learned AI during the university several years ago, and having updated it since 2016, plus 10 years working experience in the dating industry, this presentation shows the evolution of AI during these last years, and shows some AI examples already used in Dating services. Then shows where AI can be applied to dating services, what is needed, which models can be used, shows the building process, and how can be done.

DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...

DataMind-slides

An introduction to python | Python Assignment Help

Sample Assignment

[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人

台灣資料科學年會

Presented by Karine Pepin. An enormous amount of time and resources are devoted to improving data quality in survey research. The quality of open-ended responses has become a critical factor for researchers in evaluating the validity of each participant. Reading through open-ended responses is a time-consuming task that has been difficult to automate. While most tools are capable of identifying obviously inappropriate responses such as gibberish and profanity, they generally lack the ability to effectively assess the quality of the content. With its ability to understand context and user-friendliness, Generative AI opens up opportunities for researchers to automate this laborious cleaning process. During this session, you will be presented with a practical use case demonstrating how GPT was utilized to efficiently clean open-ended responses on a large scale. Access the presentation recording via NewMR.org/Play-Again

Python webinar 4th june

Edureka!

Programmers love Python because of how fast and easy it is to use. Python cuts development time in half with its simple to read syntax and easy compilation feature. Debugging your programs is a breeze in Python with its built in debugger. Using Python makes Programmers more productive and their programs ultimately better. Python is continued to be a favourite option for data scientists who use it for building and using Machine learning applications and other scientific computations. Python runs on Windows, Linux/Unix, Mac OS and has been ported to Java and .NET virtual machines. Python is free to use, even for the commercial products, because of its OSI-approved open source license. Python has evolved as the most preferred Language for Data Analytics and the increasing search trends on python also indicates that Python is the next "Big Thing" and a must for Professionals in the Data Analytics domain.

An introduction to R is a document useful

ssuser3c3f88

FEC2017-Introduction-to-programming

Henrikki Tenkanen

Design + Devops: What We've Learned from Our Developer Friends

UXPA International

Pig latin

Bita Kazemi

JDO 2019: Data Science for Developers - Matthew Renze

PROIDEA

Data science is revolutionizing the world around us. We’re incorporating artificial intelligence, machine learning, and data-driven decision making into all aspects of business. However, many software developers have yet to learn how to leverage these practices to create better software. In this presentation, we’ll learn how expert developers are using data science to create better software. We’ll learn how to use data analytics, machine learning, and anticipatory design to create more intelligent software. In addition, we’ll learn how to use data from our dev-ops pipeline to improve our software development practices.

A Large Scale Study of Multiple Programming Languages and Code Quality

Pavneet Singh Kochhar

Nowadays, most software use multiple programming languages to implement certain functionalities based on the strengths and weaknesses of different languages. Researchers in the past have studied the impact of independent programming languages on software quality, however, there has been little or no research on the impact of multiple languages on the quality of software. Does the use of multiple languages cause more bugs? Are certain languages when used with other languages make software more bug prone? What are the relationships between multi-language usage and various bug categories? In this study, we perform a large scale empirical investigation to provide some answers to these questions. We gather a large dataset consisting of popular projects from GitHub (628 projects, 85 million SLOC, 134 thousand authors, 3 million commits, in 17 languages) to understand the impact of using multiple languages on software quality. We build multiple regression models to study the effects of using different languages on the number of bug fixing commits while controlling for factors such as project age, project size, team size, and the number of commits. Our results show that in general implementing a project with more languages has a significant effect on project quality, as it increases defect proneness. Moreover, we find specific languages that are statistically significantly more defect prone when they are used in a multi-language setting. These include popular languages like C++, Objective-C, and Java. Furthermore, we note that the use of more languages significantly increases bug proneness across all bug categories. The effect is strongest for memory, concurrency, and algorithm bugs.

Splunk for DataScience (.conf2014)

stelligence

LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta

Splunk conf2014 - Splunk for Data Science

Splunk

OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...

Ganesan Narayanasamy

Iwsm2014 application of function points to software based on open source - ...

Nesma

DATA MINING USING R (1).pptx

myworld93

Ask me anything:A Conversational Interface to Augment Information Security w...

Matthew Park

Security products often create more problems than they solve, drowning users in alerts without providing the context required to remediate threats. This challenge is compounded by a lack of experienced personnel and security tools with complex interfaces. These interfaces require users to become domain experts or rely on repetitive, time consuming tasks to turn this data deluge into actionable intelligence. In this paper we present Artemis, a conversational interface to endpoint detection and response (EDR) event data. Artemis leverages dialog to drive the automation of complex tasks and reduce the need to learn a structured query language. Designed to empower inexperienced and junior security workers to better understand their security environment, Artemis provides an intuitive platform to ask questions of alert data as users are guided through triage and hunt workflows. In this paper, we will discuss our user-centric design methodology, feedback from user interviews, and the design requirements generated upon completion of our study. We will also present core functionality, findings from scenario-based testing, and future research for the Artemis platform.

Introduction To R

Spotle.ai

DevOps Is More than Dev and Ops: It’s about Tearing Down Walls

TechWell

The word DevOps is quickly becoming the new Agile—an overused word that has lost its meaning. Cutting through the jargon, Lee Eason gets to the heart of what DevOps means, where it came from, and why it is crucial for your company to embrace it. If you want to deliver on the promise of agile—to improve quality and reduce time to market—you must understand and implement DevOps. Lee shares three mechanisms of change—enablement, mentoring, and coaching—you can use to drive the transformation, as well as key performance indicators to measure your progress along the way. Learn where the big technical roadblocks lie, why they exist in your company, and how to navigate them successfully. Finally, Lee shares key benefits you can expect with your shift to DevOps—the effect on consumers’ loyalty, developer satisfaction, systems uptime, and software quality.

196 - Evaluation in Practice: Artifact-based Requirements Engineering and Sc...

ESEM 2014

Context: In the context of the research and development project ARAMiS, multiple partners from research and industry are collaborating in the development of new methods and technologies in the field of multicore systems. Goal: We designed and executed studies for evaluating the results of the ARAMiS sub-project responsible for requirements engineering: an artifact-based requirements engineering approach, its tooling, and a cross-domain scenario. Method: This evaluation was performed along with the dissemination of the results in the project. The evaluation included two studies aimed at collecting the opinions of the project participants regarding the requirements engineering results from the viewpoints of industry and research. Results: The mainly positive results showed us that the different parts of the requirements engineering approach in this project are being accepted. Conclusions: Nonetheless, especially the results for the scenario revealed some weaknesses, such as the so-called “ARAMiS gap”, i.e., a gap between the high-level requirements engineering artifacts and the detailed engineering artifacts.

DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS

Praveen Penumathsa

Sprint 6

BobSmith712

Sprint_5_Python_vs_R

BobSmith712

Similar to Python vs R for Data Analytics Final

IPPROJECT61-66 (2).pdf

SaketMishra61

Using Generative AI to Assess the Quality of Open-Ended Responses in Surveys

Ray Poynter

Python webinar 4th june

Edureka!

An introduction to R is a document useful

ssuser3c3f88

FEC2017-Introduction-to-programming

Henrikki Tenkanen

Design + Devops: What We've Learned from Our Developer Friends

UXPA International

Pig latin

Bita Kazemi

JDO 2019: Data Science for Developers - Matthew Renze

PROIDEA

A Large Scale Study of Multiple Programming Languages and Code Quality

Pavneet Singh Kochhar

Splunk for DataScience (.conf2014)

stelligence

LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta

Splunk conf2014 - Splunk for Data Science

Splunk

OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...

Ganesan Narayanasamy

Iwsm2014 application of function points to software based on open source - ...

Nesma

DATA MINING USING R (1).pptx

myworld93

Ask me anything:A Conversational Interface to Augment Information Security w...

Matthew Park

Introduction To R

Spotle.ai

DevOps Is More than Dev and Ops: It’s about Tearing Down Walls

TechWell

196 - Evaluation in Practice: Artifact-based Requirements Engineering and Sc...

ESEM 2014

DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS

Praveen Penumathsa

Similar to Python vs R for Data Analytics Final (20)

IPPROJECT61-66 (2).pdf

Using Generative AI to Assess the Quality of Open-Ended Responses in Surveys

Python webinar 4th june

An introduction to R is a document useful

FEC2017-Introduction-to-programming

Design + Devops: What We've Learned from Our Developer Friends

Pig latin

JDO 2019: Data Science for Developers - Matthew Renze

A Large Scale Study of Multiple Programming Languages and Code Quality

Splunk for DataScience (.conf2014)

LaGatta and de Garrigues - Splunk for Data Science - .conf2014

Splunk conf2014 - Splunk for Data Science

OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...

Iwsm2014 application of function points to software based on open source - ...

DATA MINING USING R (1).pptx

Ask me anything:A Conversational Interface to Augment Information Security w...

Introduction To R

DevOps Is More than Dev and Ops: It’s about Tearing Down Walls

196 - Evaluation in Practice: Artifact-based Requirements Engineering and Sc...

DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS

Recently uploaded

GridMate - End to end testing is a critical piece to ensure quality and avoid...

ThomasParaiso2

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

Large Language Model (LLM) and it’s Geospatial Applications

Rohit Gautam

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Neo4j

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Building RAG with self-deployed Milvus vector database and Snowpark Container...

Zilliz

RESUME BUILDER APPLICATION Project for students

KAMESHS29

By Design, not by Accident - Agile Venture Bolzano 2024

Pierluigi Pugliese

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

DevOps and Testing slides at DASA Connect

Kari Kakkonen

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

SOFTTECHHUB

As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.

Recently uploaded (20)

GridMate - End to end testing is a critical piece to ensure quality and avoid...

Introduction to CHERI technology - Cybersecurity

Large Language Model (LLM) and it’s Geospatial Applications

PCI PIN Basics Webinar from the Controlcase Team

Securing your Kubernetes cluster_ a step-by-step guide to success !

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

UiPath Test Automation using UiPath Test Suite series, part 5

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Epistemic Interaction - tuning interfaces to provide information for AI support

Removing Uninteresting Bytes in Software Fuzzing

Building RAG with self-deployed Milvus vector database and Snowpark Container...

RESUME BUILDER APPLICATION Project for students

By Design, not by Accident - Agile Venture Bolzano 2024

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Uni Systems Copilot event_05062024_C.Vlachos.pdf

20240605 QFM017 Machine Intelligence Reading List May 2024

DevOps and Testing slides at DASA Connect

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

Python vs R for Data Analytics Final

1. PYTHON VS R BY: KENNAN DUFFY, DARIA GBOR, CHRIS LUKENS, JOHN SAVIELLO, & JAMES SCHEUREN http://project.mis.temple.edu/pythonvsranalytics/final-deliverables/

2. AGENDA 1. Our Process 2. Use Cases 3. Sentiment Analysis - Python 4. Sentiment Analysis - R 5. Scorecard 6. Recommendation 7. Q & A 2

3. OUR PROCESS 3 RESEARCH Conduct web research on the use case and the language PYTHON Complete the use case in Python ANALYZE Review and analyze the results of Python & R as a team R CODE Complete the use case in R DEFINE Define the business purpose of the use case and completion plan SCORE Fill out the scorecard based on previously defined scoring criteria

4. SCORECARD 4 Criteria Weight (%) Package Requirement 10% Lines of Code 5% Simplicity 10% Popularity 5% Development Sources 10% Data Visualization 15% Functionality 45% Total 100%

5. USE CASE #1 - PREDICTIVE ANALYTICS What ➔ NFL franchise wants to ensure that the player they are selecting from the draft will be a high performer How ➔ Linear Regression using the NFL combine dataset from 1985-2015

6. USE CASE #2 - TEXT MINING What ➔ Justin Trudeau’s campaign team wants to stay updated on what the public opinion is on him How ➔ Sentiment analysis using Twitter feed as our dataset

7. USE CASE #3 - IMAGE ANALYTICS What ➔ England wants to keep track of what is going on in the busy streets for security purposes How ➔ Object detection using a picture of a busy street in England

8. SENTIMENT ANALYSIS - PYTHON 8 csv Allows us to write output to csv file for analysis Tweepy Python library that allows access twitter API and use different functions TextBlob Natural language processor to get subjectivity and polarity of tweets 01 03 02

9. DEMONSTRATION - PYTHON 9

10. PERFORMANCE - PYTHON 10 Overall Accuracy: 28% ▰ Negative Accuracy: 52% (11/21) ▰ Positive Accuracy: 27% (7/26) ▰ Neutral Accuracy: 19% (10/53)

11. SENTIMENT ANALYSIS - R 11 04 03 02 01Syuzhet Sentiment Analysis TwitteR Twitter API Snowball C Concision TM Text Mining

12. DEMONSTRATION - R 12

13. PERFORMANCE - R 13 Overall Accuracy: 50% ▰ Negative Accuracy: 77% (30/39) ▰ Positive Accuracy: 27% (9/33) ▰ Neutral Accuracy: 39% (11/28)

14. SCORECARD 14

15. Our Recommendation 15 - Built for Data Analytics - Package Accuracy - Usability

16. 16 THANK YOU! Any questions?

17. APPENDIX

18. GRADING CRITERIA 1. Package Requirement: 0 packages = 10 points 1 package = 9 points 2 packages = 8 points 3 packages = 7 points 4 packages = 6 points 5 packages = 5 points 6 packages = 4 points 7 packages = 3 points 8 packages = 2 points 9 packages = 1 point 10 packages = 0 points 3. Simplicity: Quick, really simple to write, really simple to read = 10 Took a while to complete, but pretty simple, easy to understand = 7 Took so long to complete, not very simple, hard to understand = 4 Hard to write, almost impossible, not able to read = 1 4. Popularity: Very Popular among the industry = 10 A lot of people use this language = 7 Some people use this language = 4 No one uses it = 1 5. Development Sources: A lot of help in the online community = 10 Some resources available, decently helpful sources = 7 Not many resources available = 4 No help available online = 1 18 6. Data Visualization Easy to manipulate, cleanliness, visually appealing = 10 Harder to manipulate, messy, not exciting = 7 Harder to manipulate, difficult to read = 4 Unable to manipulate, unreadable = 1 7. Functionality Accurate data, does everything it needs to do = 10 Mostly accurate data, does most of what it needs to do = 7 Inaccurate data, barely does what it needs to do = 4 Is not able to complete the task = 1 2. Lines of Code: 0-10 lines = 10 points 11-20 = 9 points 21-30 = 8 points 31-40 = 7 points 41-50 = 6 points 51-60 = 5 points 61-70 = 4 points 71-80 = 3 points 81-90 = 2 points 91-100 = 1 point 101 + = 0 points

19. 19 USE CASE 1 - Python

20. 20 Use CASE 1 - R

21. USE CASE 3 – IMAGE ANALYTICS 21

Editor's Notes

Lines of code - we set up standard criteria for this measurement so if it was between 1-10 lines it got a 10, 11-20 lines it got a 9, and so forth Development sources - how strong is the online support community, how many helpful sources are out there for us to help us complete the use case and problem solve if issues arise Functionality - is it able to do what we want it to & how well is it able to accompish that
TEXTBLOB struggled to identify positive/neutral tweetsExplain how we got accuracy - retrieved 100 tweets and compared them (as a team) to the package results and see if we agreed with the outcome Neutral = 10/53 Negative = 11/21 Positive = 7/26
Syuzhet- Used for sentiment analysis - what is reading the tweets T M - Works with Snowball C and TwitteR to mine text TwitteR - Interacts with Twitter API to get tweet for analysis Snowball C - Makes words more concise so that they are easier for other packages to read Explain how we got accuracy - retrieved 100 tweets and compared them (as a team) to the package results and see if we agreed with the outcome
Load packages, Import Twitter API, Scrape, Cleaning, Analyze, Apply
50% overall. 77% negative (30/39). 27% positive (9/33). 39% neutral (11/28). Accuracy MENTION: Functionality USE FOR LESSONS LEARNED Found out that it is more accurate with negative tweets Not perfect, picks out certain words to decide whether it is positive or negative. Sarcasm is difficult. Shouldn’t trust positivity tweet analyses
Language is built for predictive analytics, ready to run predictive analytics where as python needs to be molded into running the linear regression The packages we ran for R were much more accurate than the Python packages for running sentiment analysis More functionality available when running image analytics than Python and very simple to change, it was a matter of changing only 2 lines of code to switch between face detection, landmark detection, logo detection, object detection

Python vs R for Data Analytics Final

Recommended

Recommended

More Related Content

Similar to Python vs R for Data Analytics Final

Similar to Python vs R for Data Analytics Final (20)

More from BobSmith712

More from BobSmith712 (7)

Recently uploaded

Recently uploaded (20)

Python vs R for Data Analytics Final

Editor's Notes