Presentation Materials
http://l.bitcasa.com/ayav_jSQ	

Cross Search Service for Life
Science and Semantic web	
National In...
Sagace	
Search for Biomedical Data &
Resources in Japan
Features	
• 
• 
• 
• 

Focus on biomedical database
Semi-automated Ranking
Refining search results with facets
More informa...
h"p://integbio.jp/en/	
4
Mechanisms of Search
Engine	
1.  Crawling
2.  Indexing
3.  Query Processing
4.  Scoring
Crawling	
Databases	

Crawling Program	

6
Indexing	
•  Split data convenient size and store
own server	
Indexing Data	

Internal Server
Query Processing and
Scoring
Search System	
NIBIO	

NBDC	
  /	
  DBCLS	

AgriTogo	
  

MEDALS	

Collaborate by
using P2P
architecture	

JCGGDB	
  

9
Log Analysis and Reflect
Search Results	

•  The members of top 8 databases are almost
the same.
–  Patents
–  KEGG MEDICUS...
Comparison of Databases	
•  Popular databases are Medical or
Pharmaceutical “literal rich” databases.
•  Top databases run...
Unpopular databases	
•  Sagace has started the service in March
2012.
•  Some databases have never clicked
since then.
•  ...
Results	
•  Accuracy for users must have improved.
•  Reducing databases also caused speed
up. 	

13
Specific databases in life
science	
•  Some databases in life science is lacked
“literal information” .
•  Cross search eng...
Semantic Web?	

15
What is semantic web?	
Semantic web is constructed by
Web of Meaningful and Machine
Understandable Data	

16
Web of Document	

h"p://pdbj.org/mine/summary/2yi1	

17
Search Engine Results	
Query	
  “2yi1	
  pdbj”	
  search	
  on	
  google	

Search	
  engine	
  can	
  reflect	
  only	
  te...
Web of Document to Web of Data	
Data	

Data	

Data	
Data	

Data	
Data	
Data	

Data	
Data	
Data	
Data	
Data	
Data	
Data	
 h...
How should the
computer recognize
these data?	
20
A.(Focus on search service)
Mark-up with Metadata
by Database Developer	

21
What is metadata?	
•  Data about Data	
Entry	
  ID	

See	
  Also	
Keywords	
Species	

Reference	
Experimental	
  
method	
...
Reflect Search Results	
•  Metadata encourage encounter Users and
Database	

	
Image

23
How to markup?
(microdata)	
•  Add metadata with html tag	

Declare	
  Vocabulary	

<div	
  itemscope=“”	
  itemtype=“h"p:...
How to reflect?	
•  Crawler program can find metadata easily!
<div	
  itemscope=“”	
  itemtype=“h"p://schema.org/BiologicalD...
Machine Understandable Data	
•  Declaration of vocabulary is important.	
biological?	
  	

E.g. entryID	

book?	

products...
Machine Understandable Data	
•  Declaration of vocabulary is important.	
<div	
  itemscope=“”	
  itemtype=“h"p://schema.or...
What is schema.org?	
•  "Schema.org is a set of extensible
schemas that enables webmasters to
embed structured data on the...
It’s not only in Sagace.	
•  "Search engines including Bing, Google,
Yahoo! and Yandex rely on this markup
to improve the ...
•  Google support these content types:
–  Reviews
–  People
–  Products
–  Businesses and organizations
–  Recipes
–  Even...
Current Situation	
•  Define original properties for Biological Database and
Biological Database Entry for schema.org
–  en...
Sagace reflects these
properties	
• 
• 
• 
• 
• 
• 
• 
• 

image  
isEntryOf  (Database name)
entryID
taxon(Species)
diseas...
To reflect biological data into major search
engine, it requires adding schema.org.	

schema.org
Reflect Search Results

Bi...
•  To achieve adding our proposal into
schema.org,“Need more people who
think it is a good idea.” (by organizers @
schema....
9 DBs have applied
microdata!	
•  DoBISCUIT (Database Of BIoSynthesis clusters
CUrated and InTegrated)
•  JCRB Cell Bank
•...
Search Results Example 1	

36
Search Results Example 2 	

37
Issues (Cons) for Microdata	
•  Microdata strongly recommend using
schema.org vocabulary.
•  Microdata is W3C working grou...
RDFa Lite	
•  RDFa Lite is a minimal subset of RDFa,
the Resource Description Framework in
attributes (http://www.w3.org/T...
How to markup? (RDFa Lite)	
•  Add metadata with html tag	

Declare	
  Vocabulary	

<div	
  vocab=“h"p://schema.org”	
  ty...
If you use PDBo as
extension vocabulary	
Declare	
  Vocabulary	

<div prefix="PDBo : http://rdf.wwpdb.org/schema/pdbx-v40....
If metadata add into
database...,	
•  Search engine can pick up many
important data.
•  Database developers can appeal the...
Current Situation	
•  KNApSAcK has applied RDFa Lite.
•  We’d like to reflect more information by
using RDFa Lite.
•  If yo...
Acknowledgement	
• 

National Institute of
Biomedical Innovation
–  Mizuguchi Kenji	
–  Morita Mizuki	
–  Igarashi Yoshino...
45
Web of Data
(Concept)	

46
xxxx	

http://pdbj.org/mine/summary/xxxx	
http://schema.org/BiologicalDatabaseEntry/entryID	
http://schema.org/BiologicalD...
Upcoming SlideShare
Loading in …5
×

Presentation forpd bj_1

443 views
367 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
443
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentation forpd bj_1

  1. 1. Presentation Materials http://l.bitcasa.com/ayav_jSQ Cross Search Service for Life Science and Semantic web National Institute of Biomedical Innovation Maori Ito 1
  2. 2. Sagace Search for Biomedical Data & Resources in Japan
  3. 3. Features •  •  •  •  Focus on biomedical database Semi-automated Ranking Refining search results with facets More informative search results with metadata
  4. 4. h"p://integbio.jp/en/ 4
  5. 5. Mechanisms of Search Engine 1.  Crawling 2.  Indexing 3.  Query Processing 4.  Scoring
  6. 6. Crawling Databases Crawling Program 6
  7. 7. Indexing •  Split data convenient size and store own server Indexing Data Internal Server
  8. 8. Query Processing and Scoring
  9. 9. Search System NIBIO NBDC  /  DBCLS AgriTogo   MEDALS Collaborate by using P2P architecture JCGGDB   9
  10. 10. Log Analysis and Reflect Search Results •  The members of top 8 databases are almost the same. –  Patents –  KEGG MEDICUS –  Medicine and pharmaceutical proceedings –  Drug emergency call –  Ingredients information of health food –  Merck Manual –  Medical Information Network Distribution Service –  The Encyclopedia of Psychoactive Drugs 10
  11. 11. Comparison of Databases •  Popular databases are Medical or Pharmaceutical “literal rich” databases. •  Top databases run away with the winnings! •  More than half of databases have never clicked! 11
  12. 12. Unpopular databases •  Sagace has started the service in March 2012. •  Some databases have never clicked since then. •  Eliminate these databases. •  Databases –  272 DB -> 122 DB 12
  13. 13. Results •  Accuracy for users must have improved. •  Reducing databases also caused speed up. 13
  14. 14. Specific databases in life science •  Some databases in life science is lacked “literal information” . •  Cross search engine is suitable to show literal information. •  Semantic web will help these databases. 14
  15. 15. Semantic Web? 15
  16. 16. What is semantic web? Semantic web is constructed by Web of Meaningful and Machine Understandable Data 16
  17. 17. Web of Document h"p://pdbj.org/mine/summary/2yi1 17
  18. 18. Search Engine Results Query  “2yi1  pdbj”  search  on  google Search  engine  can  reflect  only  text  data. 18
  19. 19. Web of Document to Web of Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data h"p://pdbj.org/mine/summary/2yi1 19
  20. 20. How should the computer recognize these data? 20
  21. 21. A.(Focus on search service) Mark-up with Metadata by Database Developer 21
  22. 22. What is metadata? •  Data about Data Entry  ID See  Also Keywords Species Reference Experimental   method Image Entry ID: 2YI1 Species:HOMO SAPIENS Reference: PubMed ID 22343627 See Also:2YHY,2YHW Experimental method: X-RAY DIFFRACTION Image: http://pdbj.org/pdb_images/ 2yi1.jpg 22
  23. 23. Reflect Search Results •  Metadata encourage encounter Users and Database Image 23
  24. 24. How to markup? (microdata) •  Add metadata with html tag Declare  Vocabulary <div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>   </div> Property   Content   (Predicate)   (Object) http://pdbj.org/mine/summary/2yi1 2YI1 http://schema.org/BiologicalDatabaseEntry/entryID 24
  25. 25. How to reflect? •  Crawler program can find metadata easily! <div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>   </div> •  Add indexed data @BiologicalDatabaseEntry_entryID=2YI1 •  Reflect search results 25
  26. 26. Machine Understandable Data •  Declaration of vocabulary is important. biological?   E.g. entryID book? products? recipe? 26
  27. 27. Machine Understandable Data •  Declaration of vocabulary is important. <div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>   </div> E.g. entryID=2YI1 Biological   DatabaseEntry!! 27
  28. 28. What is schema.org? •  "Schema.org is a set of extensible schemas that enables webmasters to embed structured data on their web pages for use by search engines and other applications.” –  (http://schema.org/) 28
  29. 29. It’s not only in Sagace. •  "Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages.” (h"p://schema.org/)   29
  30. 30. •  Google support these content types: –  Reviews –  People –  Products –  Businesses and organizations –  Recipes –  Events –  Music 30
  31. 31. Current Situation •  Define original properties for Biological Database and Biological Database Entry for schema.org –  entryID, isEntryOf, taxon, seeAlso, reference –  Schema.org proposal –  http://www.w3.org/wiki/WebSchemas/BioDatabases •  Sagace can reflect them to search results. •  Search Collaboration organization will also reflect them to search results. –  NBDC –  MEDALS (molprof) •  How to mark up and search results examples in Sagace •  http://sagace.nibio.go.jp/press/metadata/markup/ 31
  32. 32. Sagace reflects these properties •  •  •  •  •  •  •  •  image   isEntryOf  (Database name) entryID taxon(Species) disease seeAlso (Reference database entry) dateModified (last modified) reference (Reference article) 32
  33. 33. To reflect biological data into major search engine, it requires adding schema.org. schema.org Reflect Search Results Biological Database and Biological Database Entry schema.org Proposal 33
  34. 34. •  To achieve adding our proposal into schema.org,“Need more people who think it is a good idea.” (by organizers @ schema.org) •  We need more databases! 34
  35. 35. 9 DBs have applied microdata! •  DoBISCUIT (Database Of BIoSynthesis clusters CUrated and InTegrated) •  JCRB Cell Bank •  Functional Glycomics with KO mice database •  Glyco-Disease Genes Database •  Carbohydrate Interaction Database (Carint) •  •  •  •  JCGGDB Report MEDALS Integbio Database Catalog Life Science Database Archive 35
  36. 36. Search Results Example 1 36
  37. 37. Search Results Example 2 37
  38. 38. Issues (Cons) for Microdata •  Microdata strongly recommend using schema.org vocabulary. •  Microdata is W3C working group not recommendation •  If we integrate RDF data, we have to consider again which vocabularies are suitable.
  39. 39. RDFa Lite •  RDFa Lite is a minimal subset of RDFa, the Resource Description Framework in attributes (http://www.w3.org/TR/rdfa-lite/) –  Affected by Microdata –  W3C recommendation 07 June 2012 •  Ability to specify more than one vocabulary (not only schema.org) •  Easy to mark up 39
  40. 40. How to markup? (RDFa Lite) •  Add metadata with html tag Declare  Vocabulary <div  vocab=“h"p://schema.org”  typeof=“BiologicalDatabaseEntry”>    <span  property=“entryID”>2YI1</span>   </div> Property   Content   (Predicate)   (Object) http://pdbj.org/mine/summary/2yi1 2YI1 http://schema.org/BiologicalDatabaseEntry/entryID 40
  41. 41. If you use PDBo as extension vocabulary Declare  Vocabulary <div prefix="PDBo : http://rdf.wwpdb.org/schema/pdbx-v40.owl#"> <span property="PDBo:exptl.method">X-RAY DIFFRACTION</span> </div> Content   Property   (Predicate)   (Object) Image 41
  42. 42. If metadata add into database..., •  Search engine can pick up many important data. •  Database developers can appeal their service more effectively. •  Users can find easily which they are looking for. 42
  43. 43. Current Situation •  KNApSAcK has applied RDFa Lite. •  We’d like to reflect more information by using RDFa Lite. •  If you add metadata into your databases, please contact NBDC or me (maori@nibio.go.jp) •  Please collaborate with us ! •  Please tell me what kind of information is suitable to show and refine. 43
  44. 44. Acknowledgement •  National Institute of Biomedical Innovation –  Mizuguchi Kenji –  Morita Mizuki –  Igarashi Yoshinobu –  Sakate Ryuichi –  Nagao Chioko –  Chen Yi-an –  Akiko Fukagawa –  Tohru Masui –  Johan Nystrom-Persson •  •  •  •  National Bioscience Database Center (NBDC) National Institute of Agrobiological Sciences database (NIAS) Molecular Profiling Research Center for Drug Discovery (molprof) Japan Consortium for Glycobiology and Glycotechnology DataBase (JCGGDB) •  This project is supported by a collaboration "Database integration in NIBIO and cooperation with outside organizations" with the NBDC. 44
  45. 45. 45
  46. 46. Web of Data (Concept) 46
  47. 47. xxxx http://pdbj.org/mine/summary/xxxx http://schema.org/BiologicalDatabaseEntry/entryID http://schema.org/BiologicalDatabaseEntry/isEntryOf http://schema.org/BiologicalDatabaseEntry/reference PDBj PubMed:xxxxxxx http://schema.org/BiologicalDatabaseEntry/reference http://schema.org/BiologicalDatabaseEntry/isEntryOf http://databaseA.org/publication Database A 47

×