Maori Ito @ NIBIOLife Science DatabaseCross Search and Metadata
Database integrate collaborationamong 4 ministries with NBDC• Database Catalog• Life Science Database Cross Search• Life S...
Why Cross Search?• Easy to use• Accustomed to use• Appropriate for comparing various kinds of  databases
Sagace• Search for Biomedical Data & Resources  in Japan
Bad Skeptical Reputations forSearch Results…• Useless…• Slow….• What is the advantage?
hat is the most Importathing in cross search ?
Simple Answers•Speed and Accuracy
Mechanism of Search Engine1. Crawling2. Indexing3. Query Processing4. Scoring
Crawling• Crawl databases and pages by program                                     Program
Indexing     • Split data convenient size and store       own serverExternal DataInternal Server
Query Processing and Scoring
In case of Hyper Estraier (Search    System)               NIBIO      AgriTogo                                     Collabo...
Back to the simple answers to     improvement• Speed (Thanks to Johan-san ,Mizuguchi-san and many collaborators)  1. Relax...
How to improve accuracy?• What is accuracy for life science database  cross search?• What is accuracy for life science  sp...
• In general, developers emphasize search  algorithms and scorings.• However, general results and methods for  cross searc...
To Improve Accuracy• Manually select Databases• Assigned weights to crawled databases for  improving the ranking system
Metadata!  • One way to solve these problems  Difficult to understand     dataimmediately
If metadata are added data…                                 DataMetadata  Disease:Epithelial adenoma  Species:Mouse  Keywo...
Easy to understand for users• It can be a guide to improve user experience.                                   Image
Easy to understand for crawlers          Metadata             Disease:Epithelial adenoma             Species:Mouse        ...
How to use it?  • Mark up data by microdata like a tagImage                                     Title                    I...
Is it a practical suggestion?• Google, Yahoo! and Bing decided to use microdata to  show search results more valuable.• So...
Schema.org• Provide a collection of schemas (htm tags)• Bing, Google, Yahoo! and Yandex rely on  this markup to improve th...
Properties for    BiologicalDatabaseEntryentryID     additionalType        dateCreatedisEntryof   description           da...
Related Link for our proposal• WebSchemas proposal ‘Biological  Databases’ for schema.org  – http://www.w3.org/wiki/WebSch...
How to markup ?                                                    Declaration<div itemscope itemtype=“http://schema.org/B...
And then• Crawl these microdata              At Present• Reflect Search Results           Image                           ...
Ask for your help• If this approach have some efforts, there are  may be chances to reflect major search  engines.• Please...
Future Perspective• Focus on Accuracy continuously• Microdata  – Discuss many scientists and finalize the    proposal of s...
Thank you forlistening!
Upcoming SlideShare
Loading in...5
×

Life Science Database Cross Search and Metadata

2,952

Published on

Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,952
On Slideshare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Life Science Database Cross Search and Metadata

  1. 1. Maori Ito @ NIBIOLife Science DatabaseCross Search and Metadata
  2. 2. Database integrate collaborationamong 4 ministries with NBDC• Database Catalog• Life Science Database Cross Search• Life Science Database Archive• Database Reconstructive Integration
  3. 3. Why Cross Search?• Easy to use• Accustomed to use• Appropriate for comparing various kinds of databases
  4. 4. Sagace• Search for Biomedical Data & Resources in Japan
  5. 5. Bad Skeptical Reputations forSearch Results…• Useless…• Slow….• What is the advantage?
  6. 6. hat is the most Importathing in cross search ?
  7. 7. Simple Answers•Speed and Accuracy
  8. 8. Mechanism of Search Engine1. Crawling2. Indexing3. Query Processing4. Scoring
  9. 9. Crawling• Crawl databases and pages by program Program
  10. 10. Indexing • Split data convenient size and store own serverExternal DataInternal Server
  11. 11. Query Processing and Scoring
  12. 12. In case of Hyper Estraier (Search System) NIBIO AgriTogo Collaborate by using P2PNBDC / DBCLS MEDALS architecture Under Comtemplation JCGGDB 12
  13. 13. Back to the simple answers to improvement• Speed (Thanks to Johan-san ,Mizuguchi-san and many collaborators) 1. Relax limits on access of DBCLS (Use a liggle ingenuity in css and images)• Accuracy NIBIO NBDC / DBCLS
  14. 14. How to improve accuracy?• What is accuracy for life science database cross search?• What is accuracy for life science specialist?
  15. 15. • In general, developers emphasize search algorithms and scorings.• However, general results and methods for cross search may not suitable for life science specialists..?• Data (Index files) from life science databases are sometimes difficult to understand immediately.• It’s hard to make each crawler program for each database and maintenance it.• (We have no extra …. to make proper search page like entrez et al….)
  16. 16. To Improve Accuracy• Manually select Databases• Assigned weights to crawled databases for improving the ranking system
  17. 17. Metadata! • One way to solve these problems Difficult to understand dataimmediately
  18. 18. If metadata are added data… DataMetadata Disease:Epithelial adenoma Species:Mouse Keywords:DNA sequence Last Modified:2013-01-19
  19. 19. Easy to understand for users• It can be a guide to improve user experience. Image
  20. 20. Easy to understand for crawlers Metadata Disease:Epithelial adenoma Species:Mouse Keywords:DNA sequence Last Modified:2013-01-19
  21. 21. How to use it? • Mark up data by microdata like a tagImage Title ID Last Modified http://www.pdbj.org/emnavi/emnavi_detail.php?id=1556&lang=en
  22. 22. Is it a practical suggestion?• Google, Yahoo! and Bing decided to use microdata to show search results more valuable.• Some vocabularies have already applied to search results.• E.g.
  23. 23. Schema.org• Provide a collection of schemas (htm tags)• Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. (quoted by schema.org)• We proposed “schema.org” extensions for “BiologicalDatabaseEntry” and “Biological Database”.• Schema.org proposals : http://www.w3.org/wiki/WebSchemas/SchemaDot OrgProposals
  24. 24. Properties for BiologicalDatabaseEntryentryID additionalType dateCreatedisEntryof description dateModifiedtaxon image keywordsseeAlso url providerreference alternativeHeadline breadcrumbname inLanguage
  25. 25. Related Link for our proposal• WebSchemas proposal ‘Biological Databases’ for schema.org – http://www.w3.org/wiki/WebSchemas/BioData bases• Discussions at BioHackathon – https://github.com/dbcls/bh12/wiki/Schema.org -extension• Discussions at BH12.12 (Japanese only) – http://wiki.lifesciencedb.jp/mw/index.php/BH12 .12/schema.org
  26. 26. How to markup ? Declaration<div itemscope itemtype=“http://schema.org/BiologicalDatabaseEntry”>ID <span itemprop="entryID">1556</span>Specied<span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"> <span itemprop="name">Bacillus subtilis</span></span>Deposition: <span itemprop="dateCreated">2008-09-08</span>Last update: <span itemprop="dateModified">2012-10-24</span></div> Specify Property and markup with normal tag
  27. 27. And then• Crawl these microdata At Present• Reflect Search Results Image Within the fiscal year (Preparation to reflect)
  28. 28. Ask for your help• If this approach have some efforts, there are may be chances to reflect major search engines.• Please markup your own site or database and give me feedback.• If you have any suggestions or comments, please let me know.
  29. 29. Future Perspective• Focus on Accuracy continuously• Microdata – Discuss many scientists and finalize the proposal of schema.org extension – Boost numbers of databases – Make support tools to mark up microdata• Add appropriate data from high-quality databases
  30. 30. Thank you forlistening!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×