Wi iat-bootstrapping the analysis of large-scale web service networks-v3

  • 137 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
137
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Bootstrapping the Analysis ofLarge-scale Web Service NetworksShahab Mokarizadeh, Royal Institute of Technology , Sweden Peep Kungas, Tartu University, Estonia Mihhail Matskin, Royal Institute of Technology, Sweden IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 2. BackgroundWhy web service analysis? Identifying Missing but Valuable Web service (to be implemented)  Discovering correlation among public , governmental and private sector web services Discovery of the most/least exploited concept(s)s, web service(s), we service provider(s) …..Initial challenge? Vast majority of available services are not semantically annotated or even come with any sort of documentation !2 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 3. Analysis Roadmap • Generate Reference Ontology • Initially only WSDL web services • Web service Annotation • Web Service Matching & Network generation • Apply Social Network Analysis Algorithms • Information Diffusion among Web service communities • Analysis the Impact of Services /Concept on other services or concepts3 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 4. Remind: WSDL Structure Image from : Web Services and Security,1/17/2006 ,Marco Cova4 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 5. Ontology Learning from Information Elicitation WSDL Interfaces1 Term Extraction Syntactic Refinement Ontology DiscoveryOntology Learning Input: Pattern-based - Message Part names of input/output Semantic Analysis parameters Term Disambiguation - XML Schema leaf element names of complex types Class and Relation Determination Ontology Organization[1] ”Ontology Learning for Cost-Effective Large-scale Semantic Adding RelationsAnnotation of XML Schemas and Web Service Interfaces". in Porc.EKAW 2010, LNAI 6317,pp.401-410, 2010 Reference 5 Ontology IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 6. Annotation Heuristics2entity_reference ← synset{…}Concept in Ontology Instances in Ontology (terms) Example:Password ← {password, pwd, strPassword, authPassword, pass}Address ← {addr, address1, postal_address} [2] P.Küngas, and M. Dumas.“Cost-Effective Semantic Annotation of XML Schemas and Web Service Interfaces”. Proc. IEEE Conference on Services Computing, 2009, pp.372-379,6 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 7. Web service Matching SchemeMatching of basic elements of Web service input and output parameters (ontological instances)Web service matching Simplified as Instance MatchingRule based matching scheme. - A matching rule reveals existence of kind of semantic relation between the given two instances.7 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 8. Instance Matching Rules (1)Rule-1: Same concept . Example: (addr, addr_line) : {addr, addr_line} instanceOf Address .Rule-2: Synonyms Concepts . Example: ( loc, place) {loc} instanceOf Location , {place} instanceOf Place Place isSynonymOf LocationRule-3: Subcalss Concepts. Example: (loc, city): {loc} instanceOf Location, {city} instanceOf City, City isSubClassOf Location8 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 9. Instance Matching Rules (2)Rule-4: Rule 2 + Rule 3 .Example : (bidUId, id) {bidUId} instanceOf BidUniqueCode, {id} instanceOf ContractIdentifier BidUniqueCode isSynonymOf ContractIdentifierRule-5: Interrelated by an ontological relations (other than isSynonymOf):Example : Person hasPropertyXXX FirstName.9 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 10. Evaluate Matching Scheme -1 1- Classical Approach (Precision, Recall, F-measure)1. Need a Golden Annotation /Ontology to compare with .2. Identify :  True Positives (TP) : the common annotations between golden and generated ontology  False Positives (FP) : annotations made only by generated ontology  False Negatives (FN): annotations made by golden ontology but not discovered by the generated ontology).3. Compute:10 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 11. Evaluate Matching Scheme - 2 2-Tracking Performance of Matching Scheme in Network Model • Generate Semantic Network model out of Annotated Web service corpus. • Track the performance of exploited Annotation & Matching scheme in the network properties .Web service (WSDL) networks (in small size) observed to exhibit: • Small-worldness model  Scale free model  Correlation degree on nodes ?11 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 12. Web service Network Models 2-Projecting Matching Scheme Accuracy in Network Model Operations Parameters Concepts Semantic NetworkWS1 - WS3 : Web services WS1 P1 C1 C1 OP1OP1 - OP3 : Web service P2 Operations C2 WS2 C2 C3 P3 OP2 C3P1 - P6 : Basic Elements of Input P4 / Output Parameters C5 C4 C4 WS3 P5C1 – C5 : Ontological Concepts OP3 C5 P6 Representing the Parameter Annotated Web service 12 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 13. Evaluating Network PropertiesSmall WorldnessSmall world networks are networks with the following characteristics:1. LRandom ≤ LActual L: Shortest Path Length2. CRandom << CActual C: Clustering Coefficient Sindex : Small worldness IndexIn other words: > 1, λ > 1, Sindex > 1Small-worldness scales linearly withnetwork size.13 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 14. Evaluating Network PropertiesScale free Networks  Scale free Networks:  Fitted to power-law function y  c.x Many nodes with few links # of nodes with M links (log) A few nodes with many links # of links (M) (log)14 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 15. Evaluating Network PropertiesAssortativity of Node Degree (Correlation Degree on Nodes) Positive Correlation : if vertices with high number of connection tend to be connected with other nodes which also have many links . Observed in social networks : e.g. network of actors. Negative Correlation: if the preference is to attach to those having small quantity of connection. Observed in technological and biological networks : e.g. Internet, protein interactions.15 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 16. Experimental Datasets SOATrader dataset: 1,000,000 terms form SOATrader collection of 15000 WSDL s collected from different repositories in the Web between 2005-2007. SOATarder: ( http://www.soatrader.com/web-services) . ASSAM dataset3: 146 WSDLs collected by Hess et. al and annotated by ASSAM tools .We use all unique terms (appr. 375 ) with any frequency from this collection. ASSAM : http://www.andreas-hess.info/projects/annotator/ [3] A.Heß, N.Kushmeric, ”Machine Learning for Annotating Semantic Web services “,AAAI Spring Symposium Semantic Web Services, 200416 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 17. Golden Ontology SOATrader dataset: The golden annotation is handcrafted by authors based on top 2000 recurrent terms. ASSAM : Exploit the golden annotation developed by ASSAM developers and exploited as reference ontology in their experiment with ASSAM Web service annotation tool.17 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 18. Evaluation Result - 1 Precision, Recall, F-Measure 0.6 0.5 0.4 0.3 0.2 Rule-1 0.1 Rules 1-4 0 Rules 1-5 Recall Precision Recall Precision F-Measure F-Measure Top2000 ASSAM18 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 19. Dataset for Network EvaluationIdeal :Use all dataset of WSDL/XSD elements (approx. 1,000,000 terms) from SOATrader collection (appr. 1 million term) and ASSAM collection ( appr. 10000 terms)Problem with Large dataset:- The larger is dataset, the bigger will be ontology, the harder will be verifying and enhancing the quality of annotation- Not Cost Effective (human and computation cost) nor Scalable for analysis purpose.Proposal: limit SOATarder experimental dataset to the following four arbitrary chosen thresholds ( minimum frequency of occurrence of term) 10, 15, 20 and 25( h10, h15, h20, h25 ) , covering 30000 (unique) most recurrent terms.19 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 20. Annotation Progress h25 h20 h15 h10 Learned ontology size 4523 5614 7378 11610 Annotated elements 588057 596625 621336 663618 Total elements 998916 998916 998916 998916 Percentage of total 59% 60% 62% 66%20 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 21. Analysis of Small Worldness Dataset Networks L C Sindex Entire Syntactic Actual 3.283 0.2968 591.08SOATarder Random-ER 3.9229 0.00062 h 25 Generated Actual 2.4256 0.259 7.5769 Random-ER 2.4756 0.0348 h20 Generated Actual 2.3882 0.2811 8.8148 Random-ER 2.4851 0.0331 h15 Generated Actual 2.3724 0.2805 8.2753 Random-ER 2.3396 0.0334 h10 Generated Actual 2.5322 0.2449 18.2709 Random-ER 2.7662 0.0146 Top2000 Golden Actual 2.1895 0.3761 2.8404 Random-ER 1.8852 0.1146 Generated Actual 2.08475 0.3209 3.3878 Random-ER 2.0667 0.0939 ASSAM Golden Actual 4.5653 0.2147 3.1464 Random-ER 3.546 0.05304 Generated Actual 3.0592 0.4803 21.4835 Rule. 1 Random-ER 3.8451 0.0281 21 Generated Actual 2.5732 0.4057 8.5288 Rules .1-4 Random-ER 3.1267 0.0578
  • 22. Analysis of Scale-free Properties & Correlation DegreeCategory Networks Power-law Degree #Nodes Degree Exponent CorrelationEntire Syntactic 1.3722 67622 -0.0413 h25 Generated 1.1945 2086 -0.1993 Random Annotation 0.6332 2086 0.019 h20 Generated 1.1977 2394 -0.2093 h15 Generated 1.1448 3239 -0.2222 h10 Generated 1.2316 4050 -0.1895Top2000 Golden 1.1504 856 -0.2238 Generated 1.1483 936 -0.2137 Syntactic 1.1653 828 -0.2229ASSAM Golden 1.5346 170 -0.3079 Generated- Rule. 1 1.5574 413 0.3642 Generated - Rules .1-4 1.4566 217 0.041 Random Annotation 1.0755 170 0.115122 Syntactic 1.6105 886 0.194
  • 23. Plot of Degree Distribution Out-degree Distribution of Random Annotation Out-degree Distribution of Actual Annotation23 IEEE/WIC/ACM International Conference of Web Intelligence
  • 24. Conclusion & Future work Performance of Web service Annotation scheme can be tracked in the properties of Web service networks models.An efficient matching scheme eliminates or at least minimizes deviation from small-worldness conditions , shows strong negative correlation degree and follows scale-free model. A major threat :  Network theories are incomplete : e.g. emergence of power-laws is so normal to rely on !  Evaluated dataset may not represent the model governing whole picture Future work:  Benchmarking other WS annotation & matching methods  Investigating other network properties24 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 25. Thanks ! Grateful to have your Questions , Critics and Suggestions?  SHAHABM@KTH.SE25 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 26. Backup Slides IEEE/WIC/ACM International Conference of Web26 Intelligence 22-27 Aug 2011
  • 27. What Is Going To Be Annotated?Note: We annotate ONLY basic elements of Web service input and output parameter (message part names and XML Scheme basic element names).WSDL Semantic Annotation Ontology<wsdl:types> Address <complexType name="Address"> <sequence> hasZipCode hasCityName …… <element name="Zip" type="string“/> ….. ZipCode <element name="City" type="string“/> </sequence> </complexType>(…) CityName</wsdl:types> IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 201127
  • 28. Example of Generated Ontology Input Terms: “userId”,” username”,“Zip”,“addr_line”, “userPostalAddress”,“online_usr”,…. OnlineUser isSubClassOf hasAddress User PostalAddress hasName hasIdentifier isSubClassOf Address hasAddressLine UserName UserIdentifier hasZipCode PostalCode ZipCode AddressLine isSynonymOf IEEE/WIC/ACM International Conference of Web28 Intelligence 22-27 Aug 2011