SlideShare a Scribd company logo
What StackOverflow Tells Us
About Programming Languages
Rahul Thankachan Nada Aldarrab
Prathmesh Gat
University of Southern California
1
2
Agenda
Agenda
1. Introduction
2. Problem Statement
3. Temporal Based Trend Analysis
4. Topic Analysis
5. Predicting Time To Answer
6. Summary and Q&A
3
4
Introduction
Introduction
• Dataset Used : Stackoverflow - Internet Archive.
• Why? It is one of the largest developer focused
open collaborative platform currently.
• Through our study we intend to answers some
interesting questions
• Study the rise and fall of popular programming
languages
• Can be used to predict future enhancements
• Study effectiveness of Stack Overflow model
5
Problem Definition
• Basic Analysis: What are the most popular
programming languages?
• What are the trends in programming languages?
• What are the most popular topics discussed in a
programming language?
• Can we accurately predict the time it takes until a
questioner gets an answer?
6
Related Work
Miltiadis Allamanis and Charles Sutton. 2013. Why, When and What: Analyzing Stack Overflow Questions by
Topic, Type & Code In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 53-
56.
• Topic modeling analysis
• Used Latent Dirichlet Allocation (LDA)
• Modeled Java Topics of Questions
• Can evaluate the orthogonality of different languages
• Stack Overflow questions are about the code and are not application domain specific
7
Related Work
V. Bhat, A. Gokhale, R. Jadhav, J. Pudipeddi, and L. Akoglu. Min (e) d your tags: Analysis of question response time in
stackoverflow. In Proceedings of ASONAM 2014, pages 328–335. IEEE, 2014.
• Two linear classifiers: logistic regression and SVM with linear kernel
• Two non- linear classifiers: decision tree (DT) and SVM with radial basis function kernel
8
Related Work
Prediction Accuracy:
9
10
Basic Analysis &
Temporal Trends
StackOverFlow Activity
11
Approx. 45% of all programming languages in world are
discussed on StackOverflow!
Post Type
12
Top Ten Languages
13
Question Count
14
Answer Count
15
Answer Fraction
16
Question Fraction
17
Questioners/Answerers Distribution
18
19
Topic Analysis
Topic Analysis
20
21
Predicting time until
Answer
Approach
Following attributes selected for study:
1. Tag (Only top 10 programming languages)
2. Creation Month
3. Body Length
4. Tag Length
5. Introduced new Nominal class - Time_Answer
6. (less6, bet6and20, 20andmore)
22
Approach
Tools
Weka - Weka is a collection of machine learning
algorithms for data mining tasks.
Data Preprocessing:
Challenging!
23
Approach(Data Pre -processing)
1. Parse all the answers and link first answer’s creation
time to creation time of question. We called this
field delta-answer.
2. Remove all the Questions which had delta answer
negative or zero
3. We developed a Python script which develops .arff
file On the fly (Wish to contribute this file)
24
Evaluation
• Subset Size: 4490947 - Subset - 449000
• Classify response time into 3 types: less than 6
minutes, between 6 and 20 minutes, 20 minutes
and more.
• 10-fold cross-validation
• Results are obtained using different feature
combinations and different classifiers
25
Evaluation
Results of classifier J48 (all Attributes)
26
Evaluation
27
Results of classifier (body_length/ tag_length)
Summary
• We were successfully able to find interesting
temporal trends for major programming languages
• Using tag based topic analysis we were able to find
major discussion topics and to some extent the
difficult topics in a programming language
• Using machine learning techniques we were
successfully able to predict - time to answer with
good accuracy
28
Future Scope of Work
• Contribute the .arff on the fly generator script.
• Adding Parts of speech as an attribute
• Showcasing the results on a website
29
Questions?
30

More Related Content

What's hot

Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Amazon Web Services
 
Automated hardware testing using python
Automated hardware testing using pythonAutomated hardware testing using python
Automated hardware testing using python
Yuvaraja Ravi
 
Lambda Expressions in Java
Lambda Expressions in JavaLambda Expressions in Java
Lambda Expressions in Java
Erhan Bagdemir
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
Whiteklay
 
第9回rest勉強会 ダウンロード・アップロード編
第9回rest勉強会 ダウンロード・アップロード編第9回rest勉強会 ダウンロード・アップロード編
第9回rest勉強会 ダウンロード・アップロード編
ksimoji
 
Class introduction in java
Class introduction in javaClass introduction in java
Class introduction in java
yugandhar vadlamudi
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
Monika Mishra
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Flink Forward
 
Virtual machine and javascript engine
Virtual machine and javascript engineVirtual machine and javascript engine
Virtual machine and javascript engine
Duoyi Wu
 
Core java complete ppt(note)
Core java  complete  ppt(note)Core java  complete  ppt(note)
Core java complete ppt(note)
arvind pandey
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
Java reflection
Java reflectionJava reflection
Java reflection
NexThoughts Technologies
 
Real Time UI with Apache Kafka Streaming Analytics of Fast Data and Server Push
Real Time UI with Apache Kafka Streaming Analytics of Fast Data and Server PushReal Time UI with Apache Kafka Streaming Analytics of Fast Data and Server Push
Real Time UI with Apache Kafka Streaming Analytics of Fast Data and Server Push
Lucas Jellema
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
 
Gradle Introduction
Gradle IntroductionGradle Introduction
Gradle Introduction
Dmitry Buzdin
 
Eventually, Scylla Chooses Consistency
Eventually, Scylla Chooses ConsistencyEventually, Scylla Chooses Consistency
Eventually, Scylla Chooses Consistency
ScyllaDB
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 

What's hot (20)

Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Automated hardware testing using python
Automated hardware testing using pythonAutomated hardware testing using python
Automated hardware testing using python
 
Lambda Expressions in Java
Lambda Expressions in JavaLambda Expressions in Java
Lambda Expressions in Java
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
第9回rest勉強会 ダウンロード・アップロード編
第9回rest勉強会 ダウンロード・アップロード編第9回rest勉強会 ダウンロード・アップロード編
第9回rest勉強会 ダウンロード・アップロード編
 
Class introduction in java
Class introduction in javaClass introduction in java
Class introduction in java
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Virtual machine and javascript engine
Virtual machine and javascript engineVirtual machine and javascript engine
Virtual machine and javascript engine
 
Core java complete ppt(note)
Core java  complete  ppt(note)Core java  complete  ppt(note)
Core java complete ppt(note)
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Java reflection
Java reflectionJava reflection
Java reflection
 
Real Time UI with Apache Kafka Streaming Analytics of Fast Data and Server Push
Real Time UI with Apache Kafka Streaming Analytics of Fast Data and Server PushReal Time UI with Apache Kafka Streaming Analytics of Fast Data and Server Push
Real Time UI with Apache Kafka Streaming Analytics of Fast Data and Server Push
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
 
Gradle Introduction
Gradle IntroductionGradle Introduction
Gradle Introduction
 
Eventually, Scylla Chooses Consistency
Eventually, Scylla Chooses ConsistencyEventually, Scylla Chooses Consistency
Eventually, Scylla Chooses Consistency
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 

Viewers also liked

STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
Shrinivasaragav Balasubramanian
 
Understanding Stack Overflow
Understanding Stack OverflowUnderstanding Stack Overflow
Understanding Stack Overflow
Alexander Serebrenik
 
Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3
Ayush Tak
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural Overview
Folio3 Software
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - Problem
Amrith Krishna
 
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Ontico
 
Software Engineering and Social media
Software Engineering and Social mediaSoftware Engineering and Social media
Software Engineering and Social media
Jorge Melegati
 
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Margaret-Anne Storey
 
Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
 Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag... Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
Departamento de Informática Educativa UNAN-Managua
 
groovy & grails - lecture 13
groovy & grails - lecture 13groovy & grails - lecture 13
groovy & grails - lecture 13
Alexandre Masselot
 
Implementación Repositorio De Objetos De Aprendizajes Basado En
Implementación Repositorio De Objetos De Aprendizajes Basado EnImplementación Repositorio De Objetos De Aprendizajes Basado En
Implementación Repositorio De Objetos De Aprendizajes Basado En
f.cabrera1
 
Soluciones tecnológicas para REA
Soluciones tecnológicas para REASoluciones tecnológicas para REA
Soluciones tecnológicas para REA
Ricardo Corai
 
What is Node.js used for: The 2015 Node.js Overview Report
What is Node.js used for: The 2015 Node.js Overview ReportWhat is Node.js used for: The 2015 Node.js Overview Report
What is Node.js used for: The 2015 Node.js Overview Report
Gabor Nagy
 
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Paola Amadeo
 
Responsive Design
Responsive DesignResponsive Design
Responsive Design
MRMtech
 
Stack_Overflow-Network_Graph
Stack_Overflow-Network_GraphStack_Overflow-Network_Graph
Stack_Overflow-Network_Graph
Yaopeng (Gyoho) Wu
 
Modern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureModern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & Structure
Raven Tools
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
Shubhra Kar
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?
Matt Wood
 
Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge
Eleftherios Spyromitros-Xioufis
 

Viewers also liked (20)

STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
Understanding Stack Overflow
Understanding Stack OverflowUnderstanding Stack Overflow
Understanding Stack Overflow
 
Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural Overview
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - Problem
 
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
 
Software Engineering and Social media
Software Engineering and Social mediaSoftware Engineering and Social media
Software Engineering and Social media
 
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
 
Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
 Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag... Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
 
groovy & grails - lecture 13
groovy & grails - lecture 13groovy & grails - lecture 13
groovy & grails - lecture 13
 
Implementación Repositorio De Objetos De Aprendizajes Basado En
Implementación Repositorio De Objetos De Aprendizajes Basado EnImplementación Repositorio De Objetos De Aprendizajes Basado En
Implementación Repositorio De Objetos De Aprendizajes Basado En
 
Soluciones tecnológicas para REA
Soluciones tecnológicas para REASoluciones tecnológicas para REA
Soluciones tecnológicas para REA
 
What is Node.js used for: The 2015 Node.js Overview Report
What is Node.js used for: The 2015 Node.js Overview ReportWhat is Node.js used for: The 2015 Node.js Overview Report
What is Node.js used for: The 2015 Node.js Overview Report
 
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
 
Responsive Design
Responsive DesignResponsive Design
Responsive Design
 
Stack_Overflow-Network_Graph
Stack_Overflow-Network_GraphStack_Overflow-Network_Graph
Stack_Overflow-Network_Graph
 
Modern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureModern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & Structure
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?
 
Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge
 

Similar to Stack Overflow slides Data Analytics

1. course introduction
1. course introduction1. course introduction
1. course introduction
Saeed Parsa
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Preetha Chatterjee
 
Automatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow PostsAutomatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow Posts
Preetha Chatterjee
 
01.intro
01.intro01.intro
01.intro
Philip Johnson
 
Programming for Problem Solving
Programming for Problem SolvingProgramming for Problem Solving
Programming for Problem Solving
Kathirvel Ayyaswamy
 
Introduction.pptx
Introduction.pptxIntroduction.pptx
Introduction.pptx
Samar954063
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
Steffen Staab
 
Asms app planning
Asms app planningAsms app planning
Asms app planning
majapamaya
 
Devry CIS 355A Full Course Latest
Devry CIS 355A Full Course LatestDevry CIS 355A Full Course Latest
Devry CIS 355A Full Course Latest
Atifkhilji
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
openseesdays
 
Are High Level Programming Languages for Multicore and Safety Critical Conver...
Are High Level Programming Languages for Multicore and Safety Critical Conver...Are High Level Programming Languages for Multicore and Safety Critical Conver...
Are High Level Programming Languages for Multicore and Safety Critical Conver...
InfinIT - Innovationsnetværket for it
 
Lec 01 introduction
Lec 01   introductionLec 01   introduction
Lec 01 introduction
UmairMuzaffar9
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
nikit meshram
 
6th sem
6th sem6th sem
6th sem
nastysuman009
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
alessio_ferrari
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuity
Steffen Staab
 
A New Hiring Paradigm
A New Hiring ParadigmA New Hiring Paradigm
A New Hiring Paradigm
MaRS Discovery District
 
Texas.gov Presents: Battle of Programming Languages
Texas.gov Presents:  Battle of Programming LanguagesTexas.gov Presents:  Battle of Programming Languages
Texas.gov Presents: Battle of Programming Languages
Texas.gov
 
Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)
Anuj Menta
 

Similar to Stack Overflow slides Data Analytics (20)

1. course introduction
1. course introduction1. course introduction
1. course introduction
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
 
Automatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow PostsAutomatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow Posts
 
01.intro
01.intro01.intro
01.intro
 
Programming for Problem Solving
Programming for Problem SolvingProgramming for Problem Solving
Programming for Problem Solving
 
Introduction.pptx
Introduction.pptxIntroduction.pptx
Introduction.pptx
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
 
Asms app planning
Asms app planningAsms app planning
Asms app planning
 
Devry CIS 355A Full Course Latest
Devry CIS 355A Full Course LatestDevry CIS 355A Full Course Latest
Devry CIS 355A Full Course Latest
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
Are High Level Programming Languages for Multicore and Safety Critical Conver...
Are High Level Programming Languages for Multicore and Safety Critical Conver...Are High Level Programming Languages for Multicore and Safety Critical Conver...
Are High Level Programming Languages for Multicore and Safety Critical Conver...
 
Lec 01 introduction
Lec 01   introductionLec 01   introduction
Lec 01 introduction
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
 
6th sem
6th sem6th sem
6th sem
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuity
 
A New Hiring Paradigm
A New Hiring ParadigmA New Hiring Paradigm
A New Hiring Paradigm
 
Texas.gov Presents: Battle of Programming Languages
Texas.gov Presents:  Battle of Programming LanguagesTexas.gov Presents:  Battle of Programming Languages
Texas.gov Presents: Battle of Programming Languages
 
Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)
 

Recently uploaded

Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
ramrag33
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
People as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimalaPeople as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimala
riddhimaagrawal986
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 

Recently uploaded (20)

Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
People as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimalaPeople as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimala
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 

Stack Overflow slides Data Analytics

  • 1. What StackOverflow Tells Us About Programming Languages Rahul Thankachan Nada Aldarrab Prathmesh Gat University of Southern California 1
  • 3. Agenda 1. Introduction 2. Problem Statement 3. Temporal Based Trend Analysis 4. Topic Analysis 5. Predicting Time To Answer 6. Summary and Q&A 3
  • 5. Introduction • Dataset Used : Stackoverflow - Internet Archive. • Why? It is one of the largest developer focused open collaborative platform currently. • Through our study we intend to answers some interesting questions • Study the rise and fall of popular programming languages • Can be used to predict future enhancements • Study effectiveness of Stack Overflow model 5
  • 6. Problem Definition • Basic Analysis: What are the most popular programming languages? • What are the trends in programming languages? • What are the most popular topics discussed in a programming language? • Can we accurately predict the time it takes until a questioner gets an answer? 6
  • 7. Related Work Miltiadis Allamanis and Charles Sutton. 2013. Why, When and What: Analyzing Stack Overflow Questions by Topic, Type & Code In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 53- 56. • Topic modeling analysis • Used Latent Dirichlet Allocation (LDA) • Modeled Java Topics of Questions • Can evaluate the orthogonality of different languages • Stack Overflow questions are about the code and are not application domain specific 7
  • 8. Related Work V. Bhat, A. Gokhale, R. Jadhav, J. Pudipeddi, and L. Akoglu. Min (e) d your tags: Analysis of question response time in stackoverflow. In Proceedings of ASONAM 2014, pages 328–335. IEEE, 2014. • Two linear classifiers: logistic regression and SVM with linear kernel • Two non- linear classifiers: decision tree (DT) and SVM with radial basis function kernel 8
  • 11. StackOverFlow Activity 11 Approx. 45% of all programming languages in world are discussed on StackOverflow!
  • 22. Approach Following attributes selected for study: 1. Tag (Only top 10 programming languages) 2. Creation Month 3. Body Length 4. Tag Length 5. Introduced new Nominal class - Time_Answer 6. (less6, bet6and20, 20andmore) 22
  • 23. Approach Tools Weka - Weka is a collection of machine learning algorithms for data mining tasks. Data Preprocessing: Challenging! 23
  • 24. Approach(Data Pre -processing) 1. Parse all the answers and link first answer’s creation time to creation time of question. We called this field delta-answer. 2. Remove all the Questions which had delta answer negative or zero 3. We developed a Python script which develops .arff file On the fly (Wish to contribute this file) 24
  • 25. Evaluation • Subset Size: 4490947 - Subset - 449000 • Classify response time into 3 types: less than 6 minutes, between 6 and 20 minutes, 20 minutes and more. • 10-fold cross-validation • Results are obtained using different feature combinations and different classifiers 25
  • 26. Evaluation Results of classifier J48 (all Attributes) 26
  • 27. Evaluation 27 Results of classifier (body_length/ tag_length)
  • 28. Summary • We were successfully able to find interesting temporal trends for major programming languages • Using tag based topic analysis we were able to find major discussion topics and to some extent the difficult topics in a programming language • Using machine learning techniques we were successfully able to predict - time to answer with good accuracy 28
  • 29. Future Scope of Work • Contribute the .arff on the fly generator script. • Adding Parts of speech as an attribute • Showcasing the results on a website 29