SlideShare a Scribd company logo
1 of 16
Download to read offline
MINING ANALOGICAL LIBRARIES IN Q&A DISCUSSIONS
— INCORPORATING RELATIONAL & CATEGORICAL KNOWLEDGE INTO WORD EMBEDDING
CHUNYANG CHEN, SA GAO, ZHENCHANG XING
School of Computer, Nanyang Technological University, Singapore
Analogical Technique Questions
? is to A as b is to B?
• e.g., What are Java’s equivalent libraries to Python’s NLTK ?
 When do developers ask it?
• They are not satisfied with current libraries;
• They migrate from one language to another.
Analogical Technique Questions
Out of date & subjective
When we ask in Google:
Analogical Technique Questions
When we ask it in Q&A site in Stack Overflow
http://stackoverflow.com/questions/15478263/java-equivalent-for-python-nltk
o Not allowed in Stack Overflow as it is too subjective.
Can we answer such questions automatically and objectively?
A Key Observation
Analogical libraries often appear in similar context.
…… ……
Continuous Skip-Gram Model
word embedding of t3
𝑡1 𝑡5𝑡4𝑡2
𝑡3
Learn word vector representations that are good at predicting
the nearby words i.e., word to vector (w2v).
Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in
neural information processing systems. 2013.
Relational Similarity
Word embedding encodes relation between pairs of words:
• E.g., Queen – King ~ Woman – Man (analogy reasoning)
Solve analogy question “? is to A as b is to B” by:
Apply the model to our task?
Original word embedding is designed for general text;
• Tag sentences are too short ! (<= 5);
• No linguistics in tag sentence !
With solely original word embedding:
• Too many false positives, so we need to customize it !
Association rule
mining
POS & Phrase
chunking &
Rule matching
Tag wiki
Tag
sentences
Vector space
Continuous skip-
gram model
Relational
Knowledge
Categorical
Knowledge
Analogical-library
knowledge base
Extract analogical
libraries
Overall Method
Not only semantic
knowledge learned by word
embedding, but also
incorporate:
Relational Knowledge;
Categorical Knowledge.
Mining Relational Knowledge
Get tag relations by association rule mining:
• E.g., java  opennlp, python  nltk
Link rules  a technology associative network. Part of the graph
Chen, Chunyang, and Zhenchang Xing. "Mining technology landscape from stack overflow." In Proceedings of the
10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, p. 14. ACM, 2016.
Categorical Knowledge
Tags in Stack Overflow belong to different categories:
• Java  programming language;
• Eclipse  IDE;
• Binary-search  algorithm;
• Caching  mechanism;
• D3.js  library.
 Original word embedding doesn’t know that we need library tags in
this work !
Mining Categorical Knowledge
Community TagWiki info
• First sentence
http://stackoverflow.com/tags/ios/info
Get the category of the tag
• Part-of-Speech tagging
and phrase chunking
• First noun phrase after
“be” is category label
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5
Precision@k
k
w2v+rc_kg w2v+c_kg w2v+r_kg w2v
Dataset
o 7 years Stack Overflow data (07/2008 - 08/2015)
o 9.97 million questions, 36,197 tags with TagWiki and
7,783 library tags
Evaluation
Evaluate three components in the approach:
o The accuracy of tag categorization
• 84% are correctly categorized
o The semantic distance of tag correlations
• 88% appear in Google Trends
o The recommendation of analogical libraries
• Precision@1 = 0.81, Precision@5 = 0.67
Our Website & Visiting Statistic
https://graphofknowledge.appspot.com/similarTech
THANKS A LOT FOR THE LISTENING
• Chen, Chunyang, Sa Gao, and Zhenchang Xing. "Mining analogical libraries in q&a
discussions--incorporating relational and categorical knowledge into word embedding."
In 2016 IEEE 23rd international conference on software analysis, evolution, and
reengineering (SANER), vol. 1, pp. 338-348. IEEE, 2016.
• Chen, Chunyang, and Zhenchang Xing. "Similartech: automatically recommend analogical
libraries across different programming languages." In 2016 31st IEEE/ACM International
Conference on Automated Software Engineering (ASE), pp. 834-839. IEEE, 2016.
• Chen, Chunyang, Zhenchang Xing, and Yang Liu. "What’s spain’s paris? mining analogical
libraries from q&a discussions." Empirical Software Engineering 24, no. 3 (2019): 1155-
1194.

More Related Content

What's hot

On the Use of Static Analysis to Safeguard Recursive Dependency Resolution
On the Use of Static Analysis to Safeguard Recursive Dependency ResolutionOn the Use of Static Analysis to Safeguard Recursive Dependency Resolution
On the Use of Static Analysis to Safeguard Recursive Dependency ResolutionKamil Jezek
 
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...Preetha Chatterjee
 
Automatic Assessment of Programming Assignments
Automatic Assessment of Programming AssignmentsAutomatic Assessment of Programming Assignments
Automatic Assessment of Programming AssignmentsPetri Ihantola
 
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)Binary Studio
 
ANTLR4 and its testing
ANTLR4 and its testingANTLR4 and its testing
ANTLR4 and its testingKnoldus Inc.
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsPreetha Chatterjee
 
Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Alexey Diyan
 
Getting Started with SPARK
Getting Started with SPARKGetting Started with SPARK
Getting Started with SPARKAdaCore
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsPreetha Chatterjee
 
Not Everything Is An Object
Not Everything Is An ObjectNot Everything Is An Object
Not Everything Is An ObjectGary Short
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySteffen Staab
 
[Impl] neural machine translation
[Impl] neural machine translation[Impl] neural machine translation
[Impl] neural machine translationJaeHo Jang
 
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs DocumentationDRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs DocumentationSebastiano Panichella
 
Diving into Functional Programming
Diving into Functional ProgrammingDiving into Functional Programming
Diving into Functional ProgrammingLev Walkin
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 

What's hot (20)

ANTLR4 in depth
ANTLR4 in depthANTLR4 in depth
ANTLR4 in depth
 
On the Use of Static Analysis to Safeguard Recursive Dependency Resolution
On the Use of Static Analysis to Safeguard Recursive Dependency ResolutionOn the Use of Static Analysis to Safeguard Recursive Dependency Resolution
On the Use of Static Analysis to Safeguard Recursive Dependency Resolution
 
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
 
Automatic Assessment of Programming Assignments
Automatic Assessment of Programming AssignmentsAutomatic Assessment of Programming Assignments
Automatic Assessment of Programming Assignments
 
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 
Java Code Quality Tools
Java Code Quality ToolsJava Code Quality Tools
Java Code Quality Tools
 
ANTLR4 and its testing
ANTLR4 and its testingANTLR4 and its testing
ANTLR4 and its testing
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
 
Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...
 
Getting Started with SPARK
Getting Started with SPARKGetting Started with SPARK
Getting Started with SPARK
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
 
Haskell Tour (Part 1)
Haskell Tour (Part 1)Haskell Tour (Part 1)
Haskell Tour (Part 1)
 
Not Everything Is An Object
Not Everything Is An ObjectNot Everything Is An Object
Not Everything Is An Object
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuity
 
[Impl] neural machine translation
[Impl] neural machine translation[Impl] neural machine translation
[Impl] neural machine translation
 
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs DocumentationDRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
 
Diving into Functional Programming
Diving into Functional ProgrammingDiving into Functional Programming
Diving into Functional Programming
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 

Similar to What's Spain's Paris? Mining Analogical Libraries from Q&A Discussions

Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...
Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...
Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...João Rocha da Silva
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphVaticle
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsPyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsUwe Korn
 
Java 8 selected updates
Java 8 selected updatesJava 8 selected updates
Java 8 selected updatesVinay H G
 
Projects Valhalla and Loom at IT Tage 2021
Projects Valhalla and Loom at IT Tage 2021Projects Valhalla and Loom at IT Tage 2021
Projects Valhalla and Loom at IT Tage 2021Vadym Kazulkin
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)Jie Bao
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Brian Ulicny
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
 
Building a Semantic search Engine in a library
Building a Semantic search Engine in a libraryBuilding a Semantic search Engine in a library
Building a Semantic search Engine in a librarySEECS NUST
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
 
A Portable Approach for Bidirectional Integration between a Logic and a Stati...
A Portable Approach for Bidirectional Integration between a Logic and a Stati...A Portable Approach for Bidirectional Integration between a Logic and a Stati...
A Portable Approach for Bidirectional Integration between a Logic and a Stati...Sergio Castro
 
Reactive Card Magic: Understanding Spring WebFlux and Project Reactor
Reactive Card Magic: Understanding Spring WebFlux and Project ReactorReactive Card Magic: Understanding Spring WebFlux and Project Reactor
Reactive Card Magic: Understanding Spring WebFlux and Project ReactorVMware Tanzu
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallJohn Mulhall
 
Programming in Scala - Lecture One
Programming in Scala - Lecture OneProgramming in Scala - Lecture One
Programming in Scala - Lecture OneAngelo Corsaro
 
The WorldCat Search API
The WorldCat Search APIThe WorldCat Search API
The WorldCat Search APIOCLC Research
 

Similar to What's Spain's Paris? Mining Analogical Libraries from Q&A Discussions (20)

Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...
Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...
Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge Graph
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsPyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
 
Java 8 selected updates
Java 8 selected updatesJava 8 selected updates
Java 8 selected updates
 
Ontologies & linked open data
Ontologies & linked open dataOntologies & linked open data
Ontologies & linked open data
 
Projects Valhalla and Loom at IT Tage 2021
Projects Valhalla and Loom at IT Tage 2021Projects Valhalla and Loom at IT Tage 2021
Projects Valhalla and Loom at IT Tage 2021
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)
 
My life as a cyborg
My life as a cyborg My life as a cyborg
My life as a cyborg
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
 
Illustrated Code (ASE 2021)
Illustrated Code (ASE 2021)Illustrated Code (ASE 2021)
Illustrated Code (ASE 2021)
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
 
Building a Semantic search Engine in a library
Building a Semantic search Engine in a libraryBuilding a Semantic search Engine in a library
Building a Semantic search Engine in a library
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
A Portable Approach for Bidirectional Integration between a Logic and a Stati...
A Portable Approach for Bidirectional Integration between a Logic and a Stati...A Portable Approach for Bidirectional Integration between a Logic and a Stati...
A Portable Approach for Bidirectional Integration between a Logic and a Stati...
 
Reactive Card Magic: Understanding Spring WebFlux and Project Reactor
Reactive Card Magic: Understanding Spring WebFlux and Project ReactorReactive Card Magic: Understanding Spring WebFlux and Project Reactor
Reactive Card Magic: Understanding Spring WebFlux and Project Reactor
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John Mulhall
 
Programming in Scala - Lecture One
Programming in Scala - Lecture OneProgramming in Scala - Lecture One
Programming in Scala - Lecture One
 
The WorldCat Search API
The WorldCat Search APIThe WorldCat Search API
The WorldCat Search API
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

What's Spain's Paris? Mining Analogical Libraries from Q&A Discussions

  • 1. MINING ANALOGICAL LIBRARIES IN Q&A DISCUSSIONS — INCORPORATING RELATIONAL & CATEGORICAL KNOWLEDGE INTO WORD EMBEDDING CHUNYANG CHEN, SA GAO, ZHENCHANG XING School of Computer, Nanyang Technological University, Singapore
  • 2. Analogical Technique Questions ? is to A as b is to B? • e.g., What are Java’s equivalent libraries to Python’s NLTK ?  When do developers ask it? • They are not satisfied with current libraries; • They migrate from one language to another.
  • 3. Analogical Technique Questions Out of date & subjective When we ask in Google:
  • 4. Analogical Technique Questions When we ask it in Q&A site in Stack Overflow http://stackoverflow.com/questions/15478263/java-equivalent-for-python-nltk o Not allowed in Stack Overflow as it is too subjective.
  • 5. Can we answer such questions automatically and objectively?
  • 6. A Key Observation Analogical libraries often appear in similar context. …… ……
  • 7. Continuous Skip-Gram Model word embedding of t3 𝑡1 𝑡5𝑡4𝑡2 𝑡3 Learn word vector representations that are good at predicting the nearby words i.e., word to vector (w2v). Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
  • 8. Relational Similarity Word embedding encodes relation between pairs of words: • E.g., Queen – King ~ Woman – Man (analogy reasoning) Solve analogy question “? is to A as b is to B” by:
  • 9. Apply the model to our task? Original word embedding is designed for general text; • Tag sentences are too short ! (<= 5); • No linguistics in tag sentence ! With solely original word embedding: • Too many false positives, so we need to customize it !
  • 10. Association rule mining POS & Phrase chunking & Rule matching Tag wiki Tag sentences Vector space Continuous skip- gram model Relational Knowledge Categorical Knowledge Analogical-library knowledge base Extract analogical libraries Overall Method Not only semantic knowledge learned by word embedding, but also incorporate: Relational Knowledge; Categorical Knowledge.
  • 11. Mining Relational Knowledge Get tag relations by association rule mining: • E.g., java  opennlp, python  nltk Link rules  a technology associative network. Part of the graph Chen, Chunyang, and Zhenchang Xing. "Mining technology landscape from stack overflow." In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, p. 14. ACM, 2016.
  • 12. Categorical Knowledge Tags in Stack Overflow belong to different categories: • Java  programming language; • Eclipse  IDE; • Binary-search  algorithm; • Caching  mechanism; • D3.js  library.  Original word embedding doesn’t know that we need library tags in this work !
  • 13. Mining Categorical Knowledge Community TagWiki info • First sentence http://stackoverflow.com/tags/ios/info Get the category of the tag • Part-of-Speech tagging and phrase chunking • First noun phrase after “be” is category label
  • 14. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 Precision@k k w2v+rc_kg w2v+c_kg w2v+r_kg w2v Dataset o 7 years Stack Overflow data (07/2008 - 08/2015) o 9.97 million questions, 36,197 tags with TagWiki and 7,783 library tags Evaluation Evaluate three components in the approach: o The accuracy of tag categorization • 84% are correctly categorized o The semantic distance of tag correlations • 88% appear in Google Trends o The recommendation of analogical libraries • Precision@1 = 0.81, Precision@5 = 0.67
  • 15. Our Website & Visiting Statistic https://graphofknowledge.appspot.com/similarTech
  • 16. THANKS A LOT FOR THE LISTENING • Chen, Chunyang, Sa Gao, and Zhenchang Xing. "Mining analogical libraries in q&a discussions--incorporating relational and categorical knowledge into word embedding." In 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol. 1, pp. 338-348. IEEE, 2016. • Chen, Chunyang, and Zhenchang Xing. "Similartech: automatically recommend analogical libraries across different programming languages." In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 834-839. IEEE, 2016. • Chen, Chunyang, Zhenchang Xing, and Yang Liu. "What’s spain’s paris? mining analogical libraries from q&a discussions." Empirical Software Engineering 24, no. 3 (2019): 1155- 1194.