Spotting automatically
cross-language relations

Federico Tomassetti (me)

Giuseppe Rizzo

Marco Torchiano
data.sql
CREATE TABLE Persons (
ID int,
FirstName varchar(255),
LastName varchar(255),
City varchar(255)
);

Person.java
S...
data.sql
CREATE TABLE Persons (
ID int,
FirstName varchar(255),
LastName varchar(255),
City varchar(255)
);

Person.java
S...
…the complexive system, works, sometimes
If we would automatically identify
cross-language relations we could:
• Recognize them
So I am aware that this ID is
relat...
If we would automatically identify
cross-language relations we could:
• Recognize them
• Support refactoring
If I change o...
If we would automatically identify
cross-language relations we could:
• Recognize them
• Support refactoring
• Validate th...
If we would automatically identify
cross-language relations we could:
• Recognize them
• Support refactoring
• Validate th...
CodeModels
ASTs
Embedded AST (prendo immagine da paper)
index.html
<ul id="types">
<li ng-repeat="t in types" ng-class="{'selected': t.id == type}">
<a ng-href="#/{{t.id}}">{{t.t...
index.html
<ul id="types">
<li ng-repeat="t in types" ng-class="{'selected': t.id == type}">
<a ng-href="#/{{t.id}}">{{t.t...
Context of a node:
all the descendants
+
the siblings and their descendants
Context of a node:
all the descendants
+
the siblings and their descendants
How to compare contexts:
1) Take all the values in the context (IDs, strings,
numbers)
+
2) Employ different metrics
Some ...
How to combine those metrics:
Random Tree tells us
We built a golden set of 1200 candidate relations
(around 140 real rela...
How to evaluate it?
10-fold cross valiationn
What we have
• A tool that spot automatically cross-language relations
with a precision and recall > 90% (on a first in-ho...
Spotting Automatically
Cross-Language Relations
Federico Tomassetti, Giuseppe Rizzo, Marco Torchiano
CSMR 2014, Antwerpen,...
Automatically Spotting Cross-language Relations
Upcoming SlideShare
Loading in …5
×

Automatically Spotting Cross-language Relations

938 views

Published on

An algorithm (with code on GitHub) to identify cross-language relations. Welcome into polyglot software development!

Automatically Spotting Cross-language Relations

  1. 1. Spotting automatically cross-language relations Federico Tomassetti (me) Giuseppe Rizzo Marco Torchiano
  2. 2. data.sql CREATE TABLE Persons ( ID int, FirstName varchar(255), LastName varchar(255), City varchar(255) ); Person.java String query = "select ID, FirstName, LastName, " + "City " + "from " + dbName + ".Persons"; try { ... while (rs.next()) { int id = rs.getInt("ID"); String firstName = rs.getString("FirstName"); String lastName = rs.getString("LastName"); String city= rs.getString("City"); } } catch (SQLException e ) { ...... }
  3. 3. data.sql CREATE TABLE Persons ( ID int, FirstName varchar(255), LastName varchar(255), City varchar(255) ); Person.java String query = "select ID, FirstName, LastName, " + "City " + "from " + dbName + ".Persons"; try { ... while (rs.next()) { int id = rs.getInt("ID"); String firstName = rs.getString("FirstName"); String lastName = rs.getString("LastName"); String city= rs.getString("City"); } } catch (SQLException e ) { (Hopefully it does not happen) }
  4. 4. …the complexive system, works, sometimes
  5. 5. If we would automatically identify cross-language relations we could: • Recognize them So I am aware that this ID is related to something else • Support refactoring • Validate them • Navigate them
  6. 6. If we would automatically identify cross-language relations we could: • Recognize them • Support refactoring If I change one, the others are updated • Validate them • Navigate them
  7. 7. If we would automatically identify cross-language relations we could: • Recognize them • Support refactoring • Validate them See broken relations as errors • Navigate them
  8. 8. If we would automatically identify cross-language relations we could: • Recognize them • Support refactoring • Validate them • Navigate them Click to see the other side of the relation
  9. 9. CodeModels ASTs
  10. 10. Embedded AST (prendo immagine da paper)
  11. 11. index.html <ul id="types"> <li ng-repeat="t in types" ng-class="{'selected': t.id == type}"> <a ng-href="#/{{t.id}}">{{t.title}}</a> </li> </ul> <div ng-repeat="puzzle in puzzles"> <h2>{{puzzle.title}}</h2> … </div> app.js var types = [ { id: 'sliding-puzzle', title: 'Sliding puzzle' }, { id: 'word-search-puzzle', title: 'Word search puzzle' } ]; app.controller('slidingAdvancedCtrl', function($scope) { $scope.puzzles = [ { src: './img/misko.jpg', title: 'Miško Hevery', rows: 4, cols: 4 }, { src: './img/igor.jpg', title: 'Igor Minár', rows: 3, cols: 3 }, { src: './img/vojta.jpg', title: 'Vojta Jína', rows: 4, cols: 3 } ]; });
  12. 12. index.html <ul id="types"> <li ng-repeat="t in types" ng-class="{'selected': t.id == type}"> <a ng-href="#/{{t.id}}">{{t.title}}</a> </li> </ul> <div ng-repeat="puzzle in puzzles"> <h2>{{puzzle.title}}</h2> … </div> app.js var types = [ { id: 'sliding-puzzle', title: 'Sliding puzzle' }, { id: 'word-search-puzzle', title: 'Word search puzzle' } ]; app.controller('slidingAdvancedCtrl', function($scope) { $scope.puzzles = [ { src: './img/misko.jpg', title: 'Miško Hevery', rows: 4, cols: 4 }, { src: './img/igor.jpg', title: 'Igor Minár', rows: 3, cols: 3 }, { src: './img/vojta.jpg', title: 'Vojta Jína', rows: 4, cols: 3 } ]; });
  13. 13. Context of a node: all the descendants + the siblings and their descendants
  14. 14. Context of a node: all the descendants + the siblings and their descendants
  15. 15. How to compare contexts: 1) Take all the values in the context (IDs, strings, numbers) + 2) Employ different metrics Some metrics we use: • Number of shared values • Min and max number of different values • Tversky Index 𝑇𝑉 𝑋, 𝑌 = |𝑋∩𝑌| |𝑋∩𝑌|+𝛼|𝑋−𝑌|+𝛽|𝑌−𝑋| • Jaro, Jaccard, tf-idf and others
  16. 16. How to combine those metrics: Random Tree tells us We built a golden set of 1200 candidate relations (around 140 real relations, the other just same ID) We train it with golden set Random Tree find out the best way to combine those metrics to decide if a pair is related or not Output of Random Tree Rule to understand if two nodes with same ID are connected
  17. 17. How to evaluate it? 10-fold cross valiationn
  18. 18. What we have • A tool that spot automatically cross-language relations with a precision and recall > 90% (on a first in-house dataset) What now? • We want to build a larger golden set • We want to integrate support in editors Code available at: https://github.com/orgs/CrossLanguageProject
  19. 19. Spotting Automatically Cross-Language Relations Federico Tomassetti, Giuseppe Rizzo, Marco Torchiano CSMR 2014, Antwerpen, Belgium Preprint at: http://www.di.unito.it/~rizzo/publications/Tomassetti_Rizzo-CSMRWCRE2014.pdf www.slideshare.net/FTomassetti Code available at: https://github.com/orgs/CrossLanguageProject

×