Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Life Science Database Cross Search and Metadata

10,948 views

Published on

Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.

Published in: Technology
  • Be the first to comment

Life Science Database Cross Search and Metadata

  1. 1. Maori Ito @ NIBIOLife Science DatabaseCross Search and Metadata
  2. 2. Database integrate collaborationamong 4 ministries with NBDC• Database Catalog• Life Science Database Cross Search• Life Science Database Archive• Database Reconstructive Integration
  3. 3. Why Cross Search?• Easy to use• Accustomed to use• Appropriate for comparing various kinds of databases
  4. 4. Sagace• Search for Biomedical Data & Resources in Japan
  5. 5. Bad Skeptical Reputations forSearch Results…• Useless…• Slow….• What is the advantage?
  6. 6. hat is the most Importathing in cross search ?
  7. 7. Simple Answers•Speed and Accuracy
  8. 8. Mechanism of Search Engine1. Crawling2. Indexing3. Query Processing4. Scoring
  9. 9. Crawling• Crawl databases and pages by program Program
  10. 10. Indexing • Split data convenient size and store own serverExternal DataInternal Server
  11. 11. Query Processing and Scoring
  12. 12. In case of Hyper Estraier (Search System) NIBIO AgriTogo Collaborate by using P2PNBDC / DBCLS MEDALS architecture Under Comtemplation JCGGDB 12
  13. 13. Back to the simple answers to improvement• Speed (Thanks to Johan-san ,Mizuguchi-san and many collaborators) 1. Relax limits on access of DBCLS (Use a liggle ingenuity in css and images)• Accuracy NIBIO NBDC / DBCLS
  14. 14. How to improve accuracy?• What is accuracy for life science database cross search?• What is accuracy for life science specialist?
  15. 15. • In general, developers emphasize search algorithms and scorings.• However, general results and methods for cross search may not suitable for life science specialists..?• Data (Index files) from life science databases are sometimes difficult to understand immediately.• It’s hard to make each crawler program for each database and maintenance it.• (We have no extra …. to make proper search page like entrez et al….)
  16. 16. To Improve Accuracy• Manually select Databases• Assigned weights to crawled databases for improving the ranking system
  17. 17. Metadata! • One way to solve these problems Difficult to understand dataimmediately
  18. 18. If metadata are added data… DataMetadata Disease:Epithelial adenoma Species:Mouse Keywords:DNA sequence Last Modified:2013-01-19
  19. 19. Easy to understand for users• It can be a guide to improve user experience. Image
  20. 20. Easy to understand for crawlers Metadata Disease:Epithelial adenoma Species:Mouse Keywords:DNA sequence Last Modified:2013-01-19
  21. 21. How to use it? • Mark up data by microdata like a tagImage Title ID Last Modified http://www.pdbj.org/emnavi/emnavi_detail.php?id=1556&lang=en
  22. 22. Is it a practical suggestion?• Google, Yahoo! and Bing decided to use microdata to show search results more valuable.• Some vocabularies have already applied to search results.• E.g.
  23. 23. Schema.org• Provide a collection of schemas (htm tags)• Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. (quoted by schema.org)• We proposed “schema.org” extensions for “BiologicalDatabaseEntry” and “Biological Database”.• Schema.org proposals : http://www.w3.org/wiki/WebSchemas/SchemaDot OrgProposals
  24. 24. Properties for BiologicalDatabaseEntryentryID additionalType dateCreatedisEntryof description dateModifiedtaxon image keywordsseeAlso url providerreference alternativeHeadline breadcrumbname inLanguage
  25. 25. Related Link for our proposal• WebSchemas proposal ‘Biological Databases’ for schema.org – http://www.w3.org/wiki/WebSchemas/BioData bases• Discussions at BioHackathon – https://github.com/dbcls/bh12/wiki/Schema.org -extension• Discussions at BH12.12 (Japanese only) – http://wiki.lifesciencedb.jp/mw/index.php/BH12 .12/schema.org
  26. 26. How to markup ? Declaration<div itemscope itemtype=“http://schema.org/BiologicalDatabaseEntry”>ID <span itemprop="entryID">1556</span>Specied<span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"> <span itemprop="name">Bacillus subtilis</span></span>Deposition: <span itemprop="dateCreated">2008-09-08</span>Last update: <span itemprop="dateModified">2012-10-24</span></div> Specify Property and markup with normal tag
  27. 27. And then• Crawl these microdata At Present• Reflect Search Results Image Within the fiscal year (Preparation to reflect)
  28. 28. Ask for your help• If this approach have some efforts, there are may be chances to reflect major search engines.• Please markup your own site or database and give me feedback.• If you have any suggestions or comments, please let me know.
  29. 29. Future Perspective• Focus on Accuracy continuously• Microdata – Discuss many scientists and finalize the proposal of schema.org extension – Boost numbers of databases – Make support tools to mark up microdata• Add appropriate data from high-quality databases
  30. 30. Thank you forlistening!

×