ICIC 2013 Conference Proceedings Uwe Rosemann TIB

1,152 views

Published on

Text and Non-textual Objects: Seamless access for scientists
Uwe Rosemann (German National Library of Science and Technology (TIB), Germany)
The European High Level Expert Group on Scientific data has formulated the challenges for a scientific infrastructure to be reached by 2030: “Our vision is a scientific e-infrastructure that supports seamless access, use, re-use, and trust of data. In a sense, the physical and technical infrastructure becomes invisible and the data themselves become the infrastructure – a valuable asset, on which science, technology, the economy and society can advance”.

Here, “data” is not restricted to primary data but also includes all non-textual material (graphs, spectra, videos, 3D-objects etc.).

The German National Library of Science and Technology (TIB) has developed a concept for a national competence center for non-textual materials which is now founded by the German State and by the German Federal Countries. The center has to perform the task: developing solutions and services together with the scientific community to make such data available, citable, sharable and usable, including visual search tools and enhanced content-based retrieval.

With solutions such as DataCite and modular development for extraction, indexing and visual searching of new scientific metadata, TIB will accept the challenge. And will make all data accessible to its users fast, convenient and easy to use.

The paper shows what special tools are developed by TIB in the context of scientific AV-media, 3D-objects and research data.

Published in: Technology, Education
  • Be the first to comment

ICIC 2013 Conference Proceedings Uwe Rosemann TIB

  1. 1. Textual and non-textual objects: Seamless access for scientists Uwe Rosemann ICIC 2013 Vienna
  2. 2. German National Library of Science and Technology (TIB) • Specialized Library for Architecture, Chemistry, Computer Science, Mathematics, Physics, Engineering Technology • Financed by Federal Government and all Federal States • Member of the Leibniz Association • Global supplier for scientific and technical information 2
  3. 3. Global Network TechLib 3
  4. 4. Customers Germany Europe 10% 71% USA 14% World 5% 4
  5. 5. Main Services • Provision of scientific content • full texts, document delivery, interlibrary loan • Scientific retrieval • portal GetInfo • Long-term preservation • DOI-Service for research data • Research and development 5
  6. 6. Changes in the scientific process Jim Gray, eScience Group, Microsoft Research 6
  7. 7. A gap • A widening gap in the scientific record between published research in a text document and the data that underlies it • As a result, datasets are • difficult to discover • difficult to access • Scientific information gets lost 7
  8. 8. Requirements - Politics Knowledge is power. Europe must manage the digital assets its researchers generate. 8
  9. 9. „Riding the wave“ – How Europe can gain access from the rising tide of scientific data Final report of the High Level Expert Group on Scientific Data. 9
  10. 10. Strategy – Move beyond text Scientific Films 3D Objects Software Simulation Research Data Text 10
  11. 11. Move beyond text – Consequences for TIB • Research communities produce many types of scientific and technical information • Each has its own unique characteristics and life cycle • Must become capable of accepting and managing new media formats 11
  12. 12. Competence Center for Non-textual Materials I • Develop a clear strategy for the use and integration of non-textual materials at the TIB • Systematically collect non-textual materials from research and teaching • Define, integrate and establish technical infrastructure • Define and establish workflows for indexing, cataloguing, digital preservation, DOI names, licencing 12
  13. 13. Competence Center for Non-textual Materials II • Develop innovative media-specific portals enabled by e.g. an automated video analysis with scene, speech, text and image recognition • Linking non-textual materials to other research information such as full texts and research data via the specialist portal GetInfo • Engage in communities, provide support and advice to media providers  TIB will establish its own research capacity 13
  14. 14. How have we been preparing ? • Infrastructure for research data • Visual search tools for AV-media • 3D-Objects • chemOCR 14
  15. 15. Collaboration – Research Data • In 2005, the TIB became a non-commercial DOI registration agency for research data • In 2010, the TIB became co-founder of the international DataCite consortium to establish easier access to scientific research data on the Internet Mission • Citability of research data • High visibility of the data • Easy re-use and verification of the data sets • Increasing quality of published papers 15
  16. 16. DataCite Members 16
  17. 17. Example: EHEC virus 17
  18. 18. Example: EHEC virus 18
  19. 19. DOI Services • Contracts with 60 data centres • • • • Research Institutes Universities Libraries Publisher • 776.454 DOI registrations • 22.533 up to September 2013 19
  20. 20. Research data – Further developments • KomFor • Centre of Expertise for Research Data from the „Earth and Environment“ project • RADAR • RADAR - Research Data Repositorium • Visual Analysis • VisInfo Methods 20
  21. 21. Numerical data Zeit [h] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 T [°C] 12 13 12 12 13 35 17 11 10 12 13 13 12 12 12 11 11 10 10 11 11 10 12 12 21
  22. 22. Visual access to research data 22
  23. 23. How have we been preparing ? • Infrastructure for research data • Visual search tools for AV-media • 3D-Objects • chemOCR 23
  24. 24. TIB‘s portal for audiovisual media Project Aim Time Partner Development of a portal for audiovisual media Improve access to AV-Media July 2011 – December 2013 Hasso-Plattner Institut for Softwaresystemtechnology GmbH 24
  25. 25. TIB‘s portal for audiovisual media How do I find what I‘m looking for in videos? Today: Manual annotation of the whole video Metadata • Titel • Author • Description • Publisher • Publication year • Rightsholder • ….. 25
  26. 26. TIB‘s portal for audiovisual media Future: Manual Annotation plus content-based information 1. Speech Leibniz University Hannover source: Scorupka, Sascha, Experiment der Woche, 2011 2. Visual features e.g. Indoor, Experiment, Technology 3. Textual information 4. Structural Information Scenes, Shots, Segments 26
  27. 27. TIB‘s portal for audiovisual media Media analysis process Upload 27
  28. 28. TIB‘s portal for audiovisual media Scene recognition Hard cut Automatic cut detection → luminance / contrast → colour distribution / colour histogramm → edges 28 Kopf, S. Computergestützte Inhaltsanalyse von digitalen Videoarchiven, Mannheim. 2006
  29. 29. TIB‘s portal for audiovisual media Automatic speech recognition this work is copy right ed nine teen thirty six Quality of results is dependent upon • quality of the speaker • dialects • background noises • voice overlaps 29 Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering
  30. 30. TIB‘s portal for audiovisual media Intelligent Character Recognition Intelligent Character Recognition (ICR) • Character/Logo Detection • Character Filtering • Character Recognition 30
  31. 31. TIB‘s portal for audiovisual media Automated analysis: Image recognition Method of analysis Image recognition Interview, experiment, animation, lecture Extracted data is converted into text 31
  32. 32. TIB‘s portal for audiovisual media Keyframes Annotation Machine learning using visual features Visual Concepts Graphical : Animation Graphical : Drawing Graphical : Diagram Real : Outdoor Real : Indoor Real : Lecture / Conference Real : Interview Real : Buildings ... 32
  33. 33. TIB‘s portal for audiovisual media 33
  34. 34. How have we been preparing? • Infrastructure for research data • Visual search tools for AV-media • 3D Objects • chemOCR 34
  35. 35. 3D Objects – an excursion to Architecture 35 35
  36. 36. Visual search tools visual search content based indexing 36
  37. 37. Content based indexing segmentation with form-primitives extraction of room connectivity graphs 37
  38. 38. Visual search attributed graph 3D sketch result visualization 38
  39. 39. Further developments 39
  40. 40. How have we been preparing ? • Infrastructure for research data • Visual search tools for AV-media • 3D Objects • chemOCR 40
  41. 41. Information retrieval in Chemistry Search for chemical structures – how? Chemists are used to drawing ? 41
  42. 42. Textual and non-textual chemical information Table with reaction scheme Chemical Names Linked entities from the table 2a-i: Derivates from the reaction Chemical structure Reaction scheme 42
  43. 43. Non-textual data processing – chemOCR image data CLiDE chemical structure data chemOCR 43
  44. 44. Information retrieval in chemistry Text AND formulas 44
  45. 45. Further subjects • Open Science Lab • Ontology 45
  46. 46. Conclusion Dissemination of scientific and technical information has been a foundational mission. The methods have completely changed, but the mission remains the same. 46
  47. 47. Conclusion Ultimate Goal: Interlinking and Search Across All Types of Digital Assets. 47
  48. 48. GetInfo – Portal for Science and Technology • 58 m metadata in internal index • 390 m metadata in external sources • 900.000 pdf fulltexts • Data, AV-Media, 3D Objects 48
  49. 49. Development of media-specific portals BEREITSTELLU NG Portal for audiovisual Media Probado 3D 49
  50. 50. Questions? 50

×