SlideShare a Scribd company logo
1 of 27
EnTagRec: An Enhanced Tag 
Recommendation System for 
Software Information Sites 
Shaowei Wang, David Lo, 
Bogdan Vasilescu, Alexander Serebrenik 
@b_vasilescu @aserebrenik
/ department of mathematics and computer science 25-9-2014 Page 2
/ department of mathematics and computer science 25-9-2014 Page 3
/ department of mathematics and computer science 25-9-2014 Page 4
/ department of mathematics and computer science 25-9-2014 Page 5
/ department of mathematics and computer science 25-9-2014 Page 6
??? 
/ department of mathematics and computer science 25-9-2014 Page 7
EnTagRec 
/ department of mathematics and computer science 25-9-2014 Page 8
Xia et al. 
MSR’13 
EnTagRec TagCombine 
r@5 0.805 0.595 
p@5 0.346 0.221 
r@5 0.815 0.568 
p@5 0.358 0.251 
r@5 0.88 0.675 
p@5 0.369 0.278 
r@5 0.64 0.639 
p@5 0.382 0.381 
/ department of mathematics and computer science 25-9-2014 Page 9
EnTagRec TagCombine 
r@5 0.805 0.595 
p@5 0.346 0.221 
r@5 0.815 0.568 
p@5 0.358 0.251 
r@5 0.88 0.675 
p@5 0.369 0.278 
r@5 0.64 0.639 
p@5 0.382 0.381 
/ department of mathematics and computer science 25-9-2014 Page 10
EnTagRec: How have we done it?
EnTagRec: How have we done it? 
L-LDA [Ramage et al. 2009] 
tokenization, 
identifier 
splitting, 
stop words, 
stemming
EnTagRec: How have we done it?
I have Java 
daemon which I 
want to pass 
shell commands. 
For example… P( )? 
/ department of mathematics and computer science 25-9-2014 Page 14
Actually (preprocessing…) 
P( )? 
Java daemon 
want pass shell 
command 
exampl daemon 
load configur 
possibl 
/ department of mathematics and computer science 25-9-2014 Page 15
Tags = nouns (phrases) 
P( )? 
Java daemon 
want pass shell 
command 
exampl daemon 
load configur 
possibl 
/ department of mathematics and computer science 25-9-2014 Page 16
P( | Java ) 
P( |daemon ) 
P( |shell ) 
… 
Estimate from 
the training 
data 
Combine to 
get P for the 
entire text 
/ department of mathematics and computer science 25-9-2014 Page 17
Java daemon 
want pass shell 
command 
exampl daemon 
load configur 
possibl 
P( |…) P( |… ) 
P( | … ) 
/ department of mathematics and computer science 25-9-2014 Page 18
Supercalifragilisticexpialidocious 
/ department of mathematics and computer science 25-9-2014 Page 19
Supercalifragilisticexpialidocious 
0.85 
/ department of mathematics and computer science 25-9-2014 Page 20
EnTagRec: How have we done it? 
α*BIC + β*FIC 
Train α and β
EnTagRec 
• is better than BIC and FIC separately 
EnTagRec BIC FIC 
r@5 0.805 0.565 0.593 
p@5 0.346 0.232 0.258 
r@5 0.815 0.505 0.637 
p@5 0.358 0.212 0.282 
r@5 0.88 0.523 0.713 
p@5 0.369 0.212 0.298 
r@5 0.64 0.391 0.545 
p@5 0.382 0.230 0.322 
/ department of mathematics and computer science 25-9-2014 Page 22
EnTagRec 
• is better than BIC and FIC separately 
• is better than TagCombine 
/ department of mathematics and computer science 25-9-2014 Page 23
So, what was 
all this 
about? 
/ department of mathematics and computer science 25-9-2014 Page 24
/ department of mathematics and computer science 25-9-2014 Page 25
/ department of mathematics and computer science 25-9-2014 Page 26
/ department of mathematics and computer science 25-9-2014 Page 27

More Related Content

Viewers also liked

An empirical study of the evolution of Eclipse third-party plug-ins
An empirical study of the evolution of Eclipse third-party plug-insAn empirical study of the evolution of Eclipse third-party plug-ins
An empirical study of the evolution of Eclipse third-party plug-ins
Alexander Serebrenik
 
Compatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse Releases
Compatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse ReleasesCompatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse Releases
Compatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse Releases
Alexander Serebrenik
 
Assignment 1 Mhi2003 Ppt Murtaza Ali
Assignment 1 Mhi2003 Ppt Murtaza AliAssignment 1 Mhi2003 Ppt Murtaza Ali
Assignment 1 Mhi2003 Ppt Murtaza Ali
Murtaza Ali
 
865 Project presentation
865 Project presentation865 Project presentation
865 Project presentation
Ian Pollock
 
Benevol 2013: Visualizing the complexity of software module upgrades
Benevol 2013: Visualizing the complexity of software module upgradesBenevol 2013: Visualizing the complexity of software module upgrades
Benevol 2013: Visualizing the complexity of software module upgrades
Alexander Serebrenik
 
Starting With Microsoft Excel Itzel
Starting With Microsoft Excel ItzelStarting With Microsoft Excel Itzel
Starting With Microsoft Excel Itzel
itzellaguna
 
865 social capital
865 social capital865 social capital
865 social capital
Ian Pollock
 

Viewers also liked (19)

An empirical study of the evolution of Eclipse third-party plug-ins
An empirical study of the evolution of Eclipse third-party plug-insAn empirical study of the evolution of Eclipse third-party plug-ins
An empirical study of the evolution of Eclipse third-party plug-ins
 
Regreso A Clase
Regreso A ClaseRegreso A Clase
Regreso A Clase
 
Compatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse Releases
Compatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse ReleasesCompatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse Releases
Compatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse Releases
 
actionreserchpbl
actionreserchpblactionreserchpbl
actionreserchpbl
 
Mo E Training 00 Welcome
Mo E Training   00   WelcomeMo E Training   00   Welcome
Mo E Training 00 Welcome
 
Assignment 1 Mhi2003 Ppt Murtaza Ali
Assignment 1 Mhi2003 Ppt Murtaza AliAssignment 1 Mhi2003 Ppt Murtaza Ali
Assignment 1 Mhi2003 Ppt Murtaza Ali
 
De Andrea Nicole James
De Andrea Nicole JamesDe Andrea Nicole James
De Andrea Nicole James
 
Arts & Crafts Expo
Arts & Crafts ExpoArts & Crafts Expo
Arts & Crafts Expo
 
865 Project presentation
865 Project presentation865 Project presentation
865 Project presentation
 
ไตร่ตรองงานวิจัยของฉัน
ไตร่ตรองงานวิจัยของฉันไตร่ตรองงานวิจัยของฉัน
ไตร่ตรองงานวิจัยของฉัน
 
Benevol 2013: Visualizing the complexity of software module upgrades
Benevol 2013: Visualizing the complexity of software module upgradesBenevol 2013: Visualizing the complexity of software module upgrades
Benevol 2013: Visualizing the complexity of software module upgrades
 
TTT
TTTTTT
TTT
 
Regreso A Clase
Regreso A ClaseRegreso A Clase
Regreso A Clase
 
Starting With Microsoft Excel Itzel
Starting With Microsoft Excel ItzelStarting With Microsoft Excel Itzel
Starting With Microsoft Excel Itzel
 
Fresh Produce
Fresh ProduceFresh Produce
Fresh Produce
 
865 social capital
865 social capital865 social capital
865 social capital
 
Pbl_action_reserch
Pbl_action_reserchPbl_action_reserch
Pbl_action_reserch
 
Roman Vorobyev
Roman VorobyevRoman Vorobyev
Roman Vorobyev
 
System7 Five Point
System7 Five PointSystem7 Five Point
System7 Five Point
 

Similar to EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

C interview questions and answers
C interview questions and answersC interview questions and answers
C interview questions and answers
inTwentyEight Minutes
 
Shaaban Salah - SW Team Leader
Shaaban Salah - SW Team LeaderShaaban Salah - SW Team Leader
Shaaban Salah - SW Team Leader
Shaaban Salah
 
Resume - Ramsundar K G
Resume - Ramsundar K GResume - Ramsundar K G
Resume - Ramsundar K G
Ramsundar K G
 

Similar to EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites (20)

Diploma In MATLAB for Scientific Computing – Edu Kite
Diploma In MATLAB for Scientific Computing – Edu KiteDiploma In MATLAB for Scientific Computing – Edu Kite
Diploma In MATLAB for Scientific Computing – Edu Kite
 
Programming examples for beginners
Programming examples for beginnersProgramming examples for beginners
Programming examples for beginners
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Pavankumar Banakar Resume
Pavankumar Banakar ResumePavankumar Banakar Resume
Pavankumar Banakar Resume
 
Presentation on Best E-content Award - R.D.Sivakumar
Presentation on Best E-content Award - R.D.SivakumarPresentation on Best E-content Award - R.D.Sivakumar
Presentation on Best E-content Award - R.D.Sivakumar
 
What Is Spring Framework In Java | Spring Framework Tutorial For Beginners Wi...
What Is Spring Framework In Java | Spring Framework Tutorial For Beginners Wi...What Is Spring Framework In Java | Spring Framework Tutorial For Beginners Wi...
What Is Spring Framework In Java | Spring Framework Tutorial For Beginners Wi...
 
C interview questions and answers
C interview questions and answersC interview questions and answers
C interview questions and answers
 
Shaaban Salah - SW Team Leader
Shaaban Salah - SW Team LeaderShaaban Salah - SW Team Leader
Shaaban Salah - SW Team Leader
 
Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...
 
Free Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence TaggingFree Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence Tagging
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Machine learning for java developers
Machine learning for java developersMachine learning for java developers
Machine learning for java developers
 
Resume - Ramsundar K G
Resume - Ramsundar K GResume - Ramsundar K G
Resume - Ramsundar K G
 
TsymzhitovGB - Jira Day
TsymzhitovGB - Jira DayTsymzhitovGB - Jira Day
TsymzhitovGB - Jira Day
 
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
 
Bdt unit iv data analyics using r
Bdt unit iv data analyics using rBdt unit iv data analyics using r
Bdt unit iv data analyics using r
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
 
Resume
ResumeResume
Resume
 
Spring tutorial for beginners with examples
Spring tutorial for beginners with examplesSpring tutorial for beginners with examples
Spring tutorial for beginners with examples
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R users
 

More from Alexander Serebrenik

“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
Alexander Serebrenik
 
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
Alexander Serebrenik
 
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Alexander Serebrenik
 
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis AlarmsAn Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
Alexander Serebrenik
 
Classification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis AlarmsClassification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis Alarms
Alexander Serebrenik
 
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The NetherlandsWhat Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
Alexander Serebrenik
 
Opinion Mining for Software Engineering
Opinion Mining for Software EngineeringOpinion Mining for Software Engineering
Opinion Mining for Software Engineering
Alexander Serebrenik
 

More from Alexander Serebrenik (20)

Software development is a human activity: understanding software requires und...
Software development is a human activity: understanding software requires und...Software development is a human activity: understanding software requires und...
Software development is a human activity: understanding software requires und...
 
Towards Continuous Performance Assessment of Java Applications With PerfBot
Towards Continuous Performance Assessment of Java Applications With PerfBotTowards Continuous Performance Assessment of Java Applications With PerfBot
Towards Continuous Performance Assessment of Java Applications With PerfBot
 
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
 
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
 
Emotion Analysis in Software Ecosystems
Emotion Analysis in Software EcosystemsEmotion Analysis in Software Ecosystems
Emotion Analysis in Software Ecosystems
 
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
 
Gender and Age in Software Engineering
Gender and Age in Software EngineeringGender and Age in Software Engineering
Gender and Age in Software Engineering
 
Alexander - intro
Alexander - introAlexander - intro
Alexander - intro
 
Diversity and inclusion in a CS classroom
Diversity and inclusion in a CS classroomDiversity and inclusion in a CS classroom
Diversity and inclusion in a CS classroom
 
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis AlarmsAn Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
 
Classification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis AlarmsClassification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis Alarms
 
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The NetherlandsWhat Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
 
Gender and Community Smells
Gender and Community SmellsGender and Community Smells
Gender and Community Smells
 
Bias in MSR Research
Bias in MSR ResearchBias in MSR Research
Bias in MSR Research
 
From team organisation to software quality
From team organisation to software qualityFrom team organisation to software quality
From team organisation to software quality
 
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
 
My research story (presentation at ICSE 2021 New Faculty Symposium)
My research story (presentation at ICSE 2021 New Faculty Symposium)My research story (presentation at ICSE 2021 New Faculty Symposium)
My research story (presentation at ICSE 2021 New Faculty Symposium)
 
Opinion Mining for Software Engineering
Opinion Mining for Software EngineeringOpinion Mining for Software Engineering
Opinion Mining for Software Engineering
 
Removing Self Admitted Technical Debt
Removing Self Admitted Technical DebtRemoving Self Admitted Technical Debt
Removing Self Admitted Technical Debt
 
Gender Diversity and Inclusion and Software Engineering
Gender Diversity and Inclusion and Software EngineeringGender Diversity and Inclusion and Software Engineering
Gender Diversity and Inclusion and Software Engineering
 

Recently uploaded

Lipids: types, structure and important functions.
Lipids: types, structure and important functions.Lipids: types, structure and important functions.
Lipids: types, structure and important functions.
Cherry
 
Pteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecyclePteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecycle
Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Cherry
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 

Recently uploaded (20)

Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Lipids: types, structure and important functions.
Lipids: types, structure and important functions.Lipids: types, structure and important functions.
Lipids: types, structure and important functions.
 
Pteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecyclePteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecycle
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Daily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter PhysicsDaily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter Physics
 

EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

  • 1. EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites Shaowei Wang, David Lo, Bogdan Vasilescu, Alexander Serebrenik @b_vasilescu @aserebrenik
  • 2. / department of mathematics and computer science 25-9-2014 Page 2
  • 3. / department of mathematics and computer science 25-9-2014 Page 3
  • 4. / department of mathematics and computer science 25-9-2014 Page 4
  • 5. / department of mathematics and computer science 25-9-2014 Page 5
  • 6. / department of mathematics and computer science 25-9-2014 Page 6
  • 7. ??? / department of mathematics and computer science 25-9-2014 Page 7
  • 8. EnTagRec / department of mathematics and computer science 25-9-2014 Page 8
  • 9. Xia et al. MSR’13 EnTagRec TagCombine r@5 0.805 0.595 p@5 0.346 0.221 r@5 0.815 0.568 p@5 0.358 0.251 r@5 0.88 0.675 p@5 0.369 0.278 r@5 0.64 0.639 p@5 0.382 0.381 / department of mathematics and computer science 25-9-2014 Page 9
  • 10. EnTagRec TagCombine r@5 0.805 0.595 p@5 0.346 0.221 r@5 0.815 0.568 p@5 0.358 0.251 r@5 0.88 0.675 p@5 0.369 0.278 r@5 0.64 0.639 p@5 0.382 0.381 / department of mathematics and computer science 25-9-2014 Page 10
  • 11. EnTagRec: How have we done it?
  • 12. EnTagRec: How have we done it? L-LDA [Ramage et al. 2009] tokenization, identifier splitting, stop words, stemming
  • 13. EnTagRec: How have we done it?
  • 14. I have Java daemon which I want to pass shell commands. For example… P( )? / department of mathematics and computer science 25-9-2014 Page 14
  • 15. Actually (preprocessing…) P( )? Java daemon want pass shell command exampl daemon load configur possibl / department of mathematics and computer science 25-9-2014 Page 15
  • 16. Tags = nouns (phrases) P( )? Java daemon want pass shell command exampl daemon load configur possibl / department of mathematics and computer science 25-9-2014 Page 16
  • 17. P( | Java ) P( |daemon ) P( |shell ) … Estimate from the training data Combine to get P for the entire text / department of mathematics and computer science 25-9-2014 Page 17
  • 18. Java daemon want pass shell command exampl daemon load configur possibl P( |…) P( |… ) P( | … ) / department of mathematics and computer science 25-9-2014 Page 18
  • 19. Supercalifragilisticexpialidocious / department of mathematics and computer science 25-9-2014 Page 19
  • 20. Supercalifragilisticexpialidocious 0.85 / department of mathematics and computer science 25-9-2014 Page 20
  • 21. EnTagRec: How have we done it? α*BIC + β*FIC Train α and β
  • 22. EnTagRec • is better than BIC and FIC separately EnTagRec BIC FIC r@5 0.805 0.565 0.593 p@5 0.346 0.232 0.258 r@5 0.815 0.505 0.637 p@5 0.358 0.212 0.282 r@5 0.88 0.523 0.713 p@5 0.369 0.212 0.298 r@5 0.64 0.391 0.545 p@5 0.382 0.230 0.322 / department of mathematics and computer science 25-9-2014 Page 22
  • 23. EnTagRec • is better than BIC and FIC separately • is better than TagCombine / department of mathematics and computer science 25-9-2014 Page 23
  • 24. So, what was all this about? / department of mathematics and computer science 25-9-2014 Page 24
  • 25. / department of mathematics and computer science 25-9-2014 Page 25
  • 26. / department of mathematics and computer science 25-9-2014 Page 26
  • 27. / department of mathematics and computer science 25-9-2014 Page 27

Editor's Notes

  1. Missing tags.
  2. As tagging is inherently a distributed and uncoordinated process, often similar objects are tagged differently [40]. Idiosyncrasy reduces the usefulness of tags, since related objects are not linked together by a common tag and relevant information becomes more difficult to retrieve. To illustrate the importance of tags for the well functioning of a software information site, note the more than 4000 questions related to tags on the META STACK OVERFLOW, as opposed say to only the about 1000 related to user interface
  3. Some synonyms are really strange
  4. Stress that we have taken samples of these websites. For instance, for StackOverflow we have taken 47K questions with 437 tags… FreeCode – in addition to textual tags, such as general application domain (e.g., Software Development for Apache Ant) and more specific category within this domain (BuildTools)
  5. Precision and recall: *averages*. Recall: average (Top5Predicted intersect Correct)/Correct with averaging over all software objects in the test set. Precision, similarly for average (Top5Predicted intersect Correct)/Top5Predicted
  6. BIC and FIC return a set of tags with associated probabilities, e.g., B_o(t) is the probability that object o will be tagged t according to the Bayesian approach. K_Bayesian and K_frequentist are number of tags returned by each of the components (= default 70), Composer Component will return K tags (as in Recall@K, i.e., 5)
  7. D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In EMNLP ’09, pages 248–256, 2009. EMNLP = Empirical methods in natural language processing
  8. Tag [python] stresses that tags are not necessarily coming literally from the text. This is where the weights are coming from We infer not more than K_Frequentist tags. Indeed, StackOverflow has more than 40 000 tags
  9. Spreading activation network. Each node in the network corresponds to a tag (all tags considered for the training set), and each edge connecting two nodes in the network corresponds to the relationship between the corresponding tags. Weight of a node is initially zero. Weight of an edge is the ratio of the #objects in the training set tagged with both tags and #objects tagged with at least one tag. Set of starting tags are the tags inferred by the approach presented so far with weights as calculated above. So in this example we assume that in general six tags are considered (Java, Linux. Eclipse, Python, IDE, Project) but for a given software object the previous approach has inferred two tags: Java with probability 0.5 and Python with probability 0.6. Max hop in this example is 2, and this is why we do not update the weight of Project in the first activation.
  10. BIC and FIC return a set of tags with associated probabilities, e.g., B_o(t) is the probability that object o will be tagged t according to the Bayesian approach. alpha and beta can be tuned for any evaluation criteria EC, we use recall@5 as used in the literature. Reminder: recall@5 is *average* (Top5Predicted intersect Correct)/Correct with averaging over all software objects in the test set.
  11. These values are for the optimal values of K, K_Bayesian = K_Frequentist = 70. For both K’s equal to 10 the values of recall are lower but still better than those of BIC/FIC for the StackExchange datasets (recall between 70% and 80%) r@5 and p@5 values in this table are averages but we have also conducted a more profound study using Dunnett-type contrasts showing that EnTagRec indeed outperforms BIC and FIC and EnTagRec outperforms TagCombine on the StackExchange datasets