Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Adding Meaning To Your Data


Published on

Biosapiens talk 2007-12-04

Published in: Technology, Education
  • Be the first to comment

Adding Meaning To Your Data

  1. 1. Making your data more meaningful Using BioMOBY and my Grid Taverna services as examples Duncan Hull University of Manchester BioSapiens Network of Excellence 2007-12-04
  2. 2. Outline <ul><li>Semantic Web redux </li></ul><ul><ul><li>Semantic web stack, </li></ul></ul><ul><ul><li>There are lots of TLAs (three letter acronyms) in this talk, unavoidable </li></ul></ul><ul><ul><li>XML, URI, Namespaces, Unicode, </li></ul></ul><ul><ul><li>RDF, RDFS OWL </li></ul></ul><ul><li>In this talk when I say “data” I usually mean Web Services… </li></ul><ul><ul><li>Example: InterProScan </li></ul></ul><ul><ul><li>BioMOBY </li></ul></ul><ul><ul><li>myGrid Taverna </li></ul></ul><ul><ul><li>myExperiment </li></ul></ul><ul><li>Conclusions </li></ul><ul><ul><li>What did we learn? </li></ul></ul>
  3. 3. The semantic web “knows what you mean” thanks to added meaning <ul><li>According to Tim Berners-Lee, Ora Lassila and Jim Hendler </li></ul><ul><li>… and Mark Butler from HP labs </li></ul><ul><li>Vague, audacious, “visionary”, controversial and/or doomed (depending on who you ask) </li></ul><ul><li>in practice this means… </li></ul>
  4. 4. Semantic Web “stack” <ul><li>A suite of technology and standards* for adding meaning to data </li></ul><ul><li>Taken from “Semantic Web Architecture: Stack or Two Towers?” by Ian Horrocks et al see and </li></ul>
  5. 5. … But first, InterProScan… <ul><li>InterProScan: Protein domains identifier @ EBI </li></ul><ul><li> </li></ul><ul><li>That horrendously long URI submits a job to InterProScan with 4 parameters </li></ul><ul><ul><li>?tool=iprscan “use the InterProScan tool” </li></ul></ul><ul><ul><li>&sequence=uniprot:slpi_human “…with the sequence secretory leukocyte proteinase inhibitor (SLPI) in UniProt format” </li></ul></ul><ul><ul><li>&seqtype=P “sequence type is protein (e.g. not DNA)” </li></ul></ul><ul><ul><li>& “email results to Homer Simpson” </li></ul></ul><ul><ul><li>Returns a job identifier e.g. iprscan-20071203-18053660 </li></ul></ul><ul><ul><li>http://www. ebi .ac. uk/cgi-bin/iprscan/iprscan ? tool=iprscan &jobid= iprscan-20071203-18053660 </li></ul></ul>
  6. 6. Back to the Stack
  7. 7. URI, Unicode, XML and Namespaces <ul><li>Bottom of semantic web stack: </li></ul><ul><li>Namespaces </li></ul><ul><li>eXtensible Markup Language (XML) </li></ul><ul><li>Uniform Resource Identifiers (URI) </li></ul><ul><li>Unicode </li></ul>
  8. 8. URI: Uniform Resource IDENTIFIER <ul><li>URIs include Uniform Resource Locators (URLs) most people are familiar with for locating things, usually just called them “links”… </li></ul><ul><ul><li>E.g. locator for the biosapiens website </li></ul></ul><ul><ul><li>E.g. locates a biosapiens publication </li></ul></ul><ul><ul><li>Not persistent, sometimes unstable and break e.g. “404 not found” </li></ul></ul><ul><ul><li>Not guaranteed to be unique </li></ul></ul><ul><li>URIs include Uniform Resource Names (URNs) for naming things that are less familiar like ISBN, Digital Object Identifiers (DOI) and Life Science Identifiers (LSID) etc </li></ul><ul><ul><li>E.g. urn:doi:10.1038/sj.ejhg.5201470 names a publication using DOI </li></ul></ul><ul><ul><li>E.g. urn:isbn:0387484361 names a book using ISBN </li></ul></ul><ul><ul><li>E.g. names a biological sequence using LSID </li></ul></ul><ul><ul><li>Unlike URLs, URNs are UNIQUE and PERSISTENT </li></ul></ul><ul><li>URIs can be names , identifiers or locators (sometimes all three) </li></ul><ul><li>See URI Generic syntax and URN syntax from the Internet Engineering Task Force (IETF) </li></ul>
  9. 9. Unicode: Boring but important <ul><li>Unicode provides a unique number for every character </li></ul><ul><ul><li>no matter what the platform (Windows, Unix, iPhone, toaster etc) </li></ul></ul><ul><ul><li>no matter what the program (protein database, email client etc) </li></ul></ul><ul><ul><li>no matter what the language (English, Chinese, Swahili… you name it) </li></ul></ul><ul><li>E.g. U+0041 is the number for “LATIN CAPITAL A” </li></ul><ul><li>E.g. U+0F03 is the number for “TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA” </li></ul><ul><li>E.g. U+221E Is the number for “INFINITY” </li></ul><ul><li>http://www. unicode . org/standard/WhatIsUnicode .html http://www. tbray .org/ongoing/When/200x/2003/04/06/Unicode </li></ul>
  10. 10. XML: eXtensible Markup Language, boring but incredibly useful <ul><li>Data marked using tags as “trees” </li></ul><ul><li><operation name=&quot;InterProScan&quot; method=&quot;get&quot;> </li></ul><ul><li><request> </li></ul><ul><li><parameter name=&quot;sequence&quot; type=&quot;xsd:string&quot; required=&quot;true&quot;/> </li></ul><ul><li></request> </li></ul><ul><li><response> </li></ul><ul><li><representation mediaType=&quot;text/xml&quot; element=&quot;yn:ResultSet&quot;> </li></ul><ul><li><parameter name=&quot;totalResults&quot; </li></ul><ul><li>type=&quot;xsd:nonNegativeInteger&quot; </li></ul><ul><li></response> </li></ul><ul><li></operation> </li></ul>
  11. 11. Namespaces <ul><li>“ XML namespaces provide a simple method for qualifying element and attribute names used in XML by associating them with namespaces identified by URI references.” </li></ul><ul><li><?xml version=&quot;1.0&quot; standalone=&quot;yes&quot;?> </li></ul><ul><li>xmlns:xsi=&quot;; </li></ul><ul><li>xmlns:xsd= http :// … </li></ul><ul><li>What this means is that </li></ul><ul><li><xsi:fred> and <xsd:fred> </li></ul>Are different because they belong in different namespaces, “ xsi:fred ” is shorthand for Allows us to have lots of different things called “fred”
  12. 12. Describing Web Services <ul><li>This (xml, uri, namespaces + xml) gives us enough to describe Web Services </li></ul><ul><li>There are two styles of services on the Web: “RESTful” and “RESTless” </li></ul><ul><ul><li>“ RESTful” (usually no SOAP): described with (Web Application Description Language) WADL </li></ul></ul><ul><ul><li>“ RESTless” (uses SOAP and WSDL): described with Web Services Description Language (WSDL) </li></ul></ul><ul><ul><li>Most services you’ll come across in bioinformatics are the latter… but that might change </li></ul></ul>
  13. 13. WSDL, WuzzDuLL, MiserabuL… <ul><li>WSDL </li></ul><ul><li>Describes Inputs and Outputs, tells us how to interact with service </li></ul><ul><li>Registries of Web Services, like myGrid and BioMOBY use these WSDLs to build their index </li></ul><ul><li>But, its really difficult to find services based on information in their WSDL </li></ul><ul><ul><li>Poor metadata e.g. input “ string ” name “ in0 ”, “ in1 ”, “ out1 ” etc </li></ul></ul><ul><ul><li>Often auto-generated by tools, not humans </li></ul></ul><ul><ul><li>No constraints on what can be said </li></ul></ul><ul><ul><li>Machine readable but not very human readable </li></ul></ul>
  14. 14. RDF and OWL: Adding metadata and semantics <ul><li>Web Ontology Language (OWL) </li></ul><ul><li>Resource Description Framework (RDF) (M&S stands for model and syntax) </li></ul><ul><li>RDF Schema (RDFS) </li></ul>
  15. 15. RDF and RDF schema <ul><li>RDF is just triples of (subject, verb , object ) we can say things about services like </li></ul><ul><ul><li>InterproScan isA service </li></ul></ul><ul><ul><li>InterProScan isA protein_domains_identifier </li></ul></ul><ul><ul><li>InterProScan hasInput protein_sequence </li></ul></ul><ul><ul><li>InterProScan hasOutput InterProScan_report </li></ul></ul><ul><li>The idea is simple … </li></ul><ul><li>… but unfortunately the specifications are syntax are horrible to read and write </li></ul><ul><ul><li>But see by Aaron Swartz </li></ul></ul><ul><ul><li>RDF Schema gives us “templates” for RDF </li></ul></ul>
  16. 16. <ul><li>A registry of annotated Web Services: </li></ul><ul><li>BioMOBY has three ontologies </li></ul><ul><ul><li>Namespace e.g. genbank </li></ul></ul><ul><ul><li>Object e.g. protein_sequence (inputs and outputs) </li></ul></ul><ul><ul><li>Services (tasks e.g. alignment ) </li></ul></ul><ul><ul><li>And an API too, which lets users add terms to ontology when they register services </li></ul></ul><ul><ul><li>Everything in BioMOBY is annotated (unlike myGrid and myExperiment) </li></ul></ul><ul><ul><li>Ontologies and Services are available from: </li></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul>
  17. 17. myGrid Taverna <ul><li>myGrid has a registry of services </li></ul><ul><ul><li>Many aren’t annotated </li></ul></ul><ul><ul><li>… but arbitrary services can be added, not just BioMOBY </li></ul></ul><ul><ul><li>Lovingly curated by Franck Tanoh and Katy Wolstencroft </li></ul></ul><ul><ul><li>Using a single ontology </li></ul></ul><ul><li>Accessible from Taverna workflow engine </li></ul><ul><li>myGrid makes a bit more use of OWL but not much </li></ul>
  18. 18. Web Ontology Language (OWL) <ul><li>RDF and RDFS provide limited capabilities for reasoning </li></ul><ul><ul><li>All men are mortal </li></ul></ul><ul><ul><li>Socrates is a man </li></ul></ul><ul><ul><li>-------------------------------------- </li></ul></ul><ul><ul><li>Therefore Socrates is mortal </li></ul></ul><ul><li>Do this using deductive reasoners like FaCT++, Pellet, KAON2 etc </li></ul><ul><li>Ulrike Sattlers list of reasoners </li></ul><ul><ul><li> </li></ul></ul><ul><li> socrates picture </li></ul>
  19. 19. What can a reasoner do? <ul><li>Subsumption check knowledge is correct, e.g. all protein_sequences are biological_sequences </li></ul><ul><li>Equivalence check knowledge is minimally redundant e.g. SLPI and WAP4 are synonyms for “Secretory leukocyte protease inhibitor” </li></ul><ul><li>Consistency check that knowledge is meaningful, no contradictions are made SLP1 can’t be both a DNA_sequence and a protein_sequence because these are disjoint classes </li></ul><ul><li>Instantiation check if an individual is an instance of a class is myProtein and instance of SLPI ? </li></ul><ul><li>Used Protégé, you have used a reasoner </li></ul>
  20. 20. Semantic Web Services in a nutshell User driven Bit of an afterthought Lots of user generated metadata Metadata yes Kind of Yes Yes Yes! Community participation possibly? A bit No no no! Reasoning / semantics Maybe? no yes API? Taverna 2 / myExperiment myGrid BioMOBY
  21. 21. <ul><li>Getting large quantities or high-quality metadata about services is time-consuming and expensive… </li></ul><ul><li>Many new web applications rely on users to provide metadata for them </li></ul><ul><ul><li>E.g. flickr, myspace, facebook, delicious etc </li></ul></ul><ul><li>People annotate services by uploading collections of services, workflows </li></ul><ul><li>Can “tag” them </li></ul>
  22. 22. Conclusions <ul><li>We really need standard metadata to describe and find services </li></ul><ul><li>Standards are boring but important </li></ul><ul><li>You’re unlikely to win a Nobel prize for creating or using one… </li></ul><ul><ul><li>But science can’t work without them </li></ul></ul><ul><ul><li>Especially “data-driven” rather than “hypothesis-driven” Science </li></ul></ul><ul><li>We’ve looked at semantic web standards for describing Web Services, using InterProScan as an example </li></ul><ul><ul><li>And myGrid, BioMOBY and myExperiment too </li></ul></ul><ul><ul><li>But didn’t talk about DAS / BioDAS </li></ul></ul><ul><ul><li>Thanks for listening </li></ul></ul>
  23. 23. Acknowledgements and References <ul><li>Thanks to everyone I robbed stuff off : </li></ul><ul><ul><li>Carole Goble, Homer Simpson, David De Roure, Tim Bray, Mark Butler, Stian Soiland, Katy Wolstencroft, Franck Tanoh, Rod Page, Mark Wilkinson, myGrid team, myExperiment team, Ian Horrocks, Ulrike Sattler, Tim Berners-Lee, Ora Lassila, Jim Hendler, Steve Pettifer, Douglas Kell, IETF, W3C etc </li></ul></ul><ul><li>These slides are also available at </li></ul><ul><li>See Also: </li></ul><ul><ul><li>This talk mostly about semantics rather than web services: see also “Web of Science - REST or SOAP?” at </li></ul></ul><ul><ul><li>BioMOBY </li></ul></ul><ul><ul><li>myGrid Ontology </li></ul></ul><ul><ul><li>Taverna workflow </li></ul></ul><ul><ul><li>myExperiment: social networking for workflow-using e-scientists (Goble and DeRoure) </li></ul></ul><ul><li>Questions? </li></ul>