SlideShare a Scribd company logo
1 of 14
Download to read offline
Towards Correlating Search on Google and Asking
on Stack Overflow
Chunyang Chen, Zhenchang Xing
๏ฝ Stack Overflow
๏ฝ 12m questions, 19m answers, 5.6m users;
๏ฝ A huge treasure with knowledge covering most parts in
Software Engineering;
๏ฝ Many research works had been carried out to mine Stack
Overflow
Background
๏ฝ Can Stack Overflow represent the interest of developers
around the world?
๏ฝ Many research works had been carried out to mining
Stack Overflow, but no one considered this question.
๏ฝ We expect to demonstrate the representativeness of
Stack Overflow so that to lay the solid foundation for
further mining in Stack Overflow.
Motivation
๏ฝ Are there relationships between the queries developers
use in search engines and the questions developers ask in
Q&A sites?
๏ฝ Google is the dominating search engine and most
developers use Google
๏ฝ Its search log can definitely indicates developersโ€™ interest
๏ฝ Google Trends provides the statistics of a search item that
people use as query to search Google
๏ฝ Once we can demonstrate that correlation exists
between Google Trends and Stack Overflow, Stack
Overflow is representative.
Goal
๏ฝ Select technical terms (i.e., tags in Stack Overflow):
๏ฝ E.g., c#, visual-studio, ios, jquery, android-layout (no ambiguiation)
๏ฝ Collect frequent search and asking technical terms
๏ฝ i.e., collect frequent co-occurring keywords of terms mentioned
above
๏ฝ Build search and asking trends for technical terms
๏ฝ Collect time-series data of term usage frequency both in Stack
Overflow (300 weeks) and Google (574 weeks) and normalize
them to 0~1 for comparison.
Dataset
๏ฝ Given 185 technical terms, 65% of their co-occurring
terms in Google and Stack Overflow overlap.
TECHNICAL TERMS SEARCHED AND ASKED
Google Stack Overflow
๏ฝ There are six trend patterns
๏ฝ Distribution of term number
in different trend patterns
Patterns of search and asking trends
Similar number
in each pattern
๏ฝ We adopt cross correlation to compute the Pearson
correlation coefficient and delay between two trends for
each term.
Alignment of search and asking trends
๏ฝ The relationship between correlation and beginning week
of the technology:
Alignment of search and asking trends
The newer technology,
the higher correlation
between two sources.
At the launch of
SO, no obvious
correlation
Alignment of search and asking trends
๏ฝ We also explore the delay between trends.
The newer technology,
the shorter delay
between two sources.
At the launch of
SO, it is always
behind Google
๏ฝ Some general terms also have different specific version
number
๏ฝ E.g., visual-studio: vs2008, vs2010, vs2012
python: python-2.7, python-3.x
๏ฝ High correlation for specific terms (vs2008, python-2.7), low
correlation for general terms (visual-studio)
Trends between general & specific technical terms
๏ฝ Replacement process
๏ฝ New technology is replacing old ones;
๏ฝ High correlation also exists between aggregated asking and
search trend (e.g., asp.net-mvc).
Trends between general & specific technical terms
Google Stack Overflow
๏ฝ There is correlation between asking in Stack Overflow
and searching in Google especially those new
technologies.
๏ฝ The results ensure the representativeness of Stack
Overflow as a software repository for further research.
Conclusion
Chen, Chunyang, and Zhenchang Xing. "Towards correlating search on google and
asking on stack overflow." In 2016 IEEE 40th Annual Computer Software and Applications
Conference (COMPSAC), vol. 1, pp. 83-92. IEEE, 2016.
Thanks for listening
Chen Chunyang

More Related Content

Similar to Towards Correlating Search on Google and Asking on Stack Overflow

Techniques For Deep Query Understanding
Techniques For Deep Query UnderstandingTechniques For Deep Query Understanding
Techniques For Deep Query Understanding
Abhay Prakash
ย 
Rรฉpondre ร  la question automatique avec le web
Rรฉpondre ร  la question automatique avec le webRรฉpondre ร  la question automatique avec le web
Rรฉpondre ร  la question automatique avec le web
Ahmed Hammami
ย 

Similar to Towards Correlating Search on Google and Asking on Stack Overflow (20)

Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
ย 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904Labs
ย 
User friendly pattern search paradigm
User friendly pattern search paradigmUser friendly pattern search paradigm
User friendly pattern search paradigm
ย 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
ย 
SLA Summer 2008
SLA Summer 2008SLA Summer 2008
SLA Summer 2008
ย 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
ย 
Using Search Analytics to Diagnose Whatโ€™s Ailing your Information Architecture
Using Search Analytics to Diagnose Whatโ€™s Ailing your Information ArchitectureUsing Search Analytics to Diagnose Whatโ€™s Ailing your Information Architecture
Using Search Analytics to Diagnose Whatโ€™s Ailing your Information Architecture
ย 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
ย 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
ย 
Techniques For Deep Query Understanding
Techniques For Deep Query UnderstandingTechniques For Deep Query Understanding
Techniques For Deep Query Understanding
ย 
Systematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping StudiesSystematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping Studies
ย 
Web 3.0 Emerging
Web 3.0 EmergingWeb 3.0 Emerging
Web 3.0 Emerging
ย 
Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
ย 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
ย 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
ย 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
ย 
Faceted search using Solr and Ontopia
Faceted search using Solr and OntopiaFaceted search using Solr and Ontopia
Faceted search using Solr and Ontopia
ย 
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
ย 
Rรฉpondre ร  la question automatique avec le web
Rรฉpondre ร  la question automatique avec le webRรฉpondre ร  la question automatique avec le web
Rรฉpondre ร  la question automatique avec le web
ย 
Working in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language ModelsWorking in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language Models
ย 

Recently uploaded

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
ย 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
ย 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
ย 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
kumargunjan9515
ย 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
ย 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
ย 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
ย 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludรคscher
ย 
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
ย 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
ย 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
ย 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
ย 

Recently uploaded (20)

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ย 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
ย 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
ย 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
ย 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
ย 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
ย 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
ย 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
ย 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
ย 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ย 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
ย 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
ย 
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
ย 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
ย 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
ย 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
ย 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
ย 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
ย 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
ย 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
ย 

Towards Correlating Search on Google and Asking on Stack Overflow

  • 1. Towards Correlating Search on Google and Asking on Stack Overflow Chunyang Chen, Zhenchang Xing
  • 2. ๏ฝ Stack Overflow ๏ฝ 12m questions, 19m answers, 5.6m users; ๏ฝ A huge treasure with knowledge covering most parts in Software Engineering; ๏ฝ Many research works had been carried out to mine Stack Overflow Background
  • 3. ๏ฝ Can Stack Overflow represent the interest of developers around the world? ๏ฝ Many research works had been carried out to mining Stack Overflow, but no one considered this question. ๏ฝ We expect to demonstrate the representativeness of Stack Overflow so that to lay the solid foundation for further mining in Stack Overflow. Motivation
  • 4. ๏ฝ Are there relationships between the queries developers use in search engines and the questions developers ask in Q&A sites? ๏ฝ Google is the dominating search engine and most developers use Google ๏ฝ Its search log can definitely indicates developersโ€™ interest ๏ฝ Google Trends provides the statistics of a search item that people use as query to search Google ๏ฝ Once we can demonstrate that correlation exists between Google Trends and Stack Overflow, Stack Overflow is representative. Goal
  • 5. ๏ฝ Select technical terms (i.e., tags in Stack Overflow): ๏ฝ E.g., c#, visual-studio, ios, jquery, android-layout (no ambiguiation) ๏ฝ Collect frequent search and asking technical terms ๏ฝ i.e., collect frequent co-occurring keywords of terms mentioned above ๏ฝ Build search and asking trends for technical terms ๏ฝ Collect time-series data of term usage frequency both in Stack Overflow (300 weeks) and Google (574 weeks) and normalize them to 0~1 for comparison. Dataset
  • 6. ๏ฝ Given 185 technical terms, 65% of their co-occurring terms in Google and Stack Overflow overlap. TECHNICAL TERMS SEARCHED AND ASKED Google Stack Overflow
  • 7. ๏ฝ There are six trend patterns ๏ฝ Distribution of term number in different trend patterns Patterns of search and asking trends Similar number in each pattern
  • 8. ๏ฝ We adopt cross correlation to compute the Pearson correlation coefficient and delay between two trends for each term. Alignment of search and asking trends
  • 9. ๏ฝ The relationship between correlation and beginning week of the technology: Alignment of search and asking trends The newer technology, the higher correlation between two sources. At the launch of SO, no obvious correlation
  • 10. Alignment of search and asking trends ๏ฝ We also explore the delay between trends. The newer technology, the shorter delay between two sources. At the launch of SO, it is always behind Google
  • 11. ๏ฝ Some general terms also have different specific version number ๏ฝ E.g., visual-studio: vs2008, vs2010, vs2012 python: python-2.7, python-3.x ๏ฝ High correlation for specific terms (vs2008, python-2.7), low correlation for general terms (visual-studio) Trends between general & specific technical terms
  • 12. ๏ฝ Replacement process ๏ฝ New technology is replacing old ones; ๏ฝ High correlation also exists between aggregated asking and search trend (e.g., asp.net-mvc). Trends between general & specific technical terms Google Stack Overflow
  • 13. ๏ฝ There is correlation between asking in Stack Overflow and searching in Google especially those new technologies. ๏ฝ The results ensure the representativeness of Stack Overflow as a software repository for further research. Conclusion Chen, Chunyang, and Zhenchang Xing. "Towards correlating search on google and asking on stack overflow." In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 83-92. IEEE, 2016.