Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Realizing Semantic Web - Light Weight semantics and beyond

803 views

Published on

The talk titled "Realizing Semantic Web - Light Weight semantics and beyond" given by prof. T.K. Prasad at the ICMSE-MGI Digital Data Workshop held at Kno.e.sis Center from November 13-14 2013. The talk emphasized on annotation and search framework.

workshop page: http://wiki.knoesis.org/index.php/ICMSE-MGI_Digital_Data_Workshop

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Realizing Semantic Web - Light Weight semantics and beyond

  1. 1. Realizing Semantic Web: Lightweight Semantics and Beyond Krishnaprasad Thirunarayan (T. K. Prasad) Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 1
  2. 2. Outline • Domain Goals and Challenges • Cyberinfrastructure Investments in Science • Utility and Continuum of Machine-Processable Semantics : An Architecture • • • What?: Nature of Data and Granurality of Semantics Why?: Lightweight semantics and its benefits How?: Community-ratified Ontologies + Semantic Annotations of Data and Documents + Linked Open Materials Data • Research: Processing Tabular Data 2
  3. 3. Domain Goals and Challenges • Materials Science and Engineering Data and Information sharing, discovery, and application are possible only if domain scientists are able and willing to do so. • Technological challenges – Computational tools and repositories conducive to easy exchange, curation, attribution, and analysis of data • Cultural challenges – Proper protection, control, and credit for sharing data 3
  4. 4. Category of Geoscience Data Characteristics Strategy for Reuse CI Strategy Short tail science data created by large organization s and projects Few, large (TB+), structured, spatially rich (e.g., remote sensing), largely homogeneous, highly visible, curated Planned integration strategies, could use formal ontologies / domain models and vocabularies, visualization tools and APIs Data centers / grids generally using relational databases and files, maintained by people with significant IT skills Long tail science data created by individual scientists and small groups Many, small (GB+), heterogeneous, invisible (except via publications), poorly curated Multi-domain and broad vocabularies (including community established ones), create semantic metadata (annotations) and optionally publish, search and download legacy data, or use an open data initiative Web-based easy to learn and use semantic tools for annotation, publication, search and download that can be used by individual scientists without significant IT skills 4
  5. 5. Our Thesis Associating machine-processable semantics with materials science and engineering data and documents can help overcome challenges associated with data discovery, integration and interoperability caused by data heterogeneity. 5
  6. 6. What?: Nature of Data and Documents • Structured Data (e.g., relational) • Semi-structured, Heterogeneous Documents (e.g., publications and technical specs usually include text, numerics, units of measure, images and equations) • Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries) 6
  7. 7. Fragment of Materials and Process spec for Ti Alloy Bars, Wire, Forgings, and Rings. 7
  8. 8. What?: Granularity of Semantics and Applications: Examples • Synonyms – Chemistry, Chemical Composition, Chemical Analysis, ... – Bend Test, Bending, ... – Delivery Condition, Process/Surface Finish, Temper, "as received by purchaser", ... • Coreference vs broadening/narrowing – Tubing vs welded tubing vs flash-welded part • Capturing characteristic-value pairs – Recognize and Normalize: “0.1 inch and under in nominal thickness” is translated to “Thickness <= 0.1 in”. – Glean elided characteristic: controlled term “solution heat treated” implies the characteristic “heat treat type”. 8
  9. 9. What?: Granularity of Semantics and Associated Applications • Lightweight semantics: File and document-level annotation to enable discovery and sharing • Richer semantics: Data-level annotation and extraction for semantic search and summarization • Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Materials Science Data 9
  10. 10. Computer Assisted Document Extraction Tool Typical view of the tagged Spec Tree/Structure view of the Spec 10
  11. 11. Computer Assisted Document Extraction Tool Tag Editor Few More Examples: Procedure Melt Methods View of the Original Spec Tagged Spec 11
  12. 12. Computer Assisted Document Extraction Tool Tag Editor Few More Examples: Procedure Melt Methods The SDL 12
  13. 13. Why?: Benefits of Lightweight Semantics • Ease of use by domain experts – Faster and wider adoption, promoting evolution • Low upfront cost to support • Shallow semantics has wider applicability to a range of documents/data and appeal to a broader community of geoscientists • Bottom-line: “Learn to Walk before we Run” 13
  14. 14. How?: Using Semantic Web Technologies Machine-processable semantics achieved by addressing • Syntactic Heterogeneity: Using XML syntax and RDF datamodel (labelled graph structure) • Semantic Heterogeneity: – Using “common” controlled vocabularies, taxonomies and ontologies – Using federated data sources, exchanges, querying, and services 14
  15. 15. How?: Ingredients for Semantics-based Cyber Infrastructure • Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies) • Ease registration, publishing, and discovery • Provide support for provenance and access control • Track data citation for credit for data sharing • Semi-automatic annotation of data and documents : Manual + Automatic 15
  16. 16. How?: Search Continuum • Keyword-based full-text search • + Manually provided content and source metadata • • • Uses upper-level ontology + Automatically extracted metadata • • Map text to concepts/properties/values Semantic + faceted search using background knowledge + Deeper semi-automatic content annotation and extraction • • Aggregating related pieces of information; conditioning Integration and Interoperation • + Linked Open Material Science Data • + Federated and Faceted Querying and Services 16
  17. 17. Linked Open Data – Why do we need data? 17
  18. 18. Linked Open Data – Just data is not enough • More and more data are available, But … Isolated islands of data is not enough, akin to the web of documents without hyperlinks. data set A data set D data set B data set F data set E data set C Need to interlink data over the web to enable content-rich applications. Linked Data data set A data set D data set F data set B data set E data set C 18
  19. 19. Linked Open Data – A Realization http://dbpedia../politici an http://ex./John_Kennedy http://dbpedia../Profession Owl:sameAs http://ex./AuthoredBook http://dbpedia../John_F._Kennedy http://ex./A_Nation_of_I mmigrants http://ex./publishedIn 1964 http://dbpedia../BirthDate 1917-05-29 http://ex./genre http://ex./non-fiction http://dbpedia../Capital http://dbpedia../Boston http://dbpedia../BirthPlace http://dbpedia../Massac husetts http://dbpedia../Country http://dbpedia../United _States 19
  20. 20. Linked Open Data “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” 20
  21. 21. Example: Lightweight Semantic Registration of Data Title of data Type of data Selected from five tier vocabulary provided Keywords maps, excel files, images, text Data format structured or unstructured Description of data brief unstructured description of content Contact information of provider(s) name of provider(s), email for verification, lineage location Spatial extent of data and reference system Temporal extent of data date range in time or age range if not recent Date and type of Related Publication(s) Host site for publication Journal, Thesis, Agency report, not published Access restrictions copyright regulations Journal, Library, Personal computer 21
  22. 22. System Architecture and Components 22
  23. 23. Deeper Issues: Semantic Formalization of Tabular Data Problems and A Practical Approach (“When rubber meets the road”) skip 23
  24. 24. Nature of tables • Compact structures for sharing information – Minimize duplication • Types of Tables – Regular : Dense Grid with explicit schema information in terms of column and row headings => Tractable – Irregular: Sparse Grid with implicit schema and ad hoc placement of heading => Hard 24
  25. 25. 25
  26. 26. Challenges Associated with Typical Spreadsheet/Table • • Meant for human consumption Irregular : – Not simple rectangular grid • Heterogeneous – All rows not interpreted similarly • Complex – Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials and process specifications) 26
  27. 27. Practical Semi-Automatic Content Extraction • DESIGN: Develop regular data structures that can be used to formalize tabular information. – Provide a natural expression of data – Provide semantics to data, thereby removing potential ambiguities – Enable automatic translation • USE: Manual population of regular tables and automatic translation into LOD 27
  28. 28. Kno.e.sis thank you, and please visit us at http://knoesis.org/ Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA 28

×