SlideShare a Scribd company logo
LigerCat
Holly Miller, MBLWHOI Library, Marine Biological Laboratory
Catherine N. Norton, MBLWHOI Library, Marine Biological Laboratory
Indra Neil Sarkar, Biomedical Informatics, University of Vermont
LigerCat (http://ligercat.org) is a web application that allows a user to search for
articles in PubMed with key words or even a DNA/protein sequence and then
explore the resulting set via the MeSH descriptors and a publication time line.
LigerCat is freely available on the Internet and the source code is available under an
MIT license (http://github.com/mbl-cli/LigerCat).
LigerCat, Literature and Genomic Electronic Resource Catalogue, aggregates
multiple articles in PubMed, combining the associated MeSH descriptors into a tag
cloud, weighted by frequency -- more frequent terms are in a larger font size.
Through LigerCat a user can search the PubMed database using plain language text
or molecular sequences. The MeSH tag cloud then allows the user to explore the
collection and create subsets of articles simply by clicking on terms in the cloud.
The three ways to search in LigerCat: Journals, Articles and Genes.
Journal Search
Queries entered into the LigerCat “Journals” tab are used to search the NLM Journals
database. Searches are enhanced using words in titles of journals. This approach has
shown promise to categorize biomedical literature. In our implementation, stop
words (e.g., “the,” “journal,” “association”) are removed from the list of journal titles
returned from the initial search, and then the top 15% of remaining title words are
determined. The initial search is then enhanced with these additional title words.
Across all the journals indexed in MEDLINE, we observed that on average this
enhanced approach increased results by approximately 25% over just using subject
terms associated with journals. In a specific example, searching the NLM Journals
database for journals using the term “Aging” returns 41 journals; the enhanced
approach implemented in LigerCat returns 130.
LigerCat presents users with the resulting list of journal titles, and the user selects
those that they wish to include in the MeSH cloud generation process. Using a local
MEDLINE database, all PubMed identifiers (PMIDs) are then retrieved for the
selected list of journals.
The MeSH cloud is interactive. The user can click on terms in the cloud to create
subsets of articles in real time. The resulting subset can be displayed in PubMed by
clicking on “Go to Pubmed.”
Article Search
Traditional PubMed searches are fundamental to the identification of relevant
biomedical knowledge. Through the “Articles” tab in LigerCat, users enter a query as
they would enter in the PubMed interface. A series of eUtils-mediated searches are
then used to retrieve the list of PMIDs. In addition to the MeSH cloud, a Publication
History graph is displayed. Both features are interactive. Moussing over the
histogram will highlight years and display the number of articles from that year that
your search retrieved. Clicking on the bar for that year performs the search in
PubMed and takes you to that result page.
Gene Search
The third modality of search aims to identify relevant literature that is associated
with a molecular sequence. Within the context of GenBank, 71% of sequences with
citation information are associated with at least one MEDLINE citation. Each PMID
reference associated with GenBank records are extracted and stored within a local
GenBank database. Through the LigerCat “Genes” tab, the user can enter a FASTA-
formatted sequence of a single molecule (e.g., a gene). Related sequences are then
identified using netBLAST. Based on the returned results, sequences are examined
above a minimally stringent E-value (1.0). A list of PMIDs is then derived from the
local GenBank database. The MeSH descriptors from those articles are displayed as a
MeSH cloud similar the one generated from the Article Search. In addition to the
MeSH cloud, a Publication History graph is displayed, however, at this point it is not
interactive.
Under the hood
LigerCat makes use of a hybrid approach of local databases and Web accessible
resources at the National Library of Medicine (NLM Journals, PubMed, and
GenBank). This synergy between live and archived data is leveraged both for speed
(many of the queries and MeSH profile data are precompiled) and to maintain
currency. Additionally, the use of the eUtils for mediating MEDLINE queries enables
the leveraging of the synonym mapping features that are part of the Entrez
interface. As new items are discovered through the system, they are dynamically
added to the database and the MeSH ranks are appropriately adjusted.
LigerCat uses a fast and scalable architecture. The queries to NLM are mediated
through a load-balancing mechanism that off-loads tasks to a background queue.
This is necessary in part because eUtils requires that there is a minimum 1-second
interval between queries to the live data and the volume of data that may need to be
processed to create the resultant MeSH cloud. LigerCat utilizes a number of open
source tools to process data in parallel. By designing a modular, queue-based
architecture, we were able to eliminate single points of failure in the system while
also gaining the benefit of horizontal scalability. The modular nature of the
architecture allowed us to replace a traditional relational database system with the
Redis in-memory key-value store affording us a five fold decrease in processing
times, though at the cost of an increased memory requirement. By modularizing and
virtualizing components of the architecture, we were able to scale processing across
many processor cores of a single or many hosts or out onto public clouds where
required. Scaling can occur in parallel and rapidly due to the small footprint of the
processing components. LigerCat was developed using the Ruby on Rails
framework, which allowed for rapid prototyping, development, testing and
integration of the system’s various components. The use of messaging queues, in-
memory key-value stores and related tools offers informatics developers new
solutions to existing data management and processing challenges. The modular
nature enables components of this existing system to be reused in other research
projects that are faced with similar challenges.
Acknowledgements
LigerCat development was made possible through funds from The Ellison Medical
Foundation, NIH, and the Marine Biological Laboratory.

More Related Content

What's hot

Crossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref Metadata and Metadata Services
Crossref Metadata and Metadata Services
Crossref
 
Managing plagiarism: Similarity Check
Managing plagiarism: Similarity CheckManaging plagiarism: Similarity Check
Managing plagiarism: Similarity Check
Crossref
 
The Global reach of Crossref metadata
The Global reach of Crossref metadataThe Global reach of Crossref metadata
The Global reach of Crossref metadata
Crossref
 
Collecting and Using Funding Data Crossref
Collecting and Using Funding Data CrossrefCollecting and Using Funding Data Crossref
Collecting and Using Funding Data Crossref
Relawan Jurnal Indonesia
 
MENGGUNAKAN METADATA PADA CROSSREF
MENGGUNAKAN METADATA PADA CROSSREFMENGGUNAKAN METADATA PADA CROSSREF
MENGGUNAKAN METADATA PADA CROSSREF
Relawan Jurnal Indonesia
 
Reference linking and Cited-by
Reference linking and Cited-byReference linking and Cited-by
Reference linking and Cited-by
Crossref
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
inscit2006
 
Collecting and using funding data in your publications
Collecting and using funding data in your publicationsCollecting and using funding data in your publications
Collecting and using funding data in your publications
Crossref
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
Robert Grossman
 
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck KoscherBarcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
Crossref
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
Giannis Tsakonas
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
AndrewLIS688
 
CEK KEMIRIPAN PADA CROSSREF
CEK KEMIRIPAN PADA CROSSREFCEK KEMIRIPAN PADA CROSSREF
CEK KEMIRIPAN PADA CROSSREF
Relawan Jurnal Indonesia
 
CPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data CommonsCPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data Commons
imgcommcall
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
CrossCheck iThenticate Admin Webinar
CrossCheck iThenticate Admin WebinarCrossCheck iThenticate Admin Webinar
CrossCheck iThenticate Admin Webinar
Crossref
 
Understanding Crossref Metadata
Understanding Crossref MetadataUnderstanding Crossref Metadata
Understanding Crossref Metadata
Crossref
 
The benefits of using Crossref metadata for libraries and scientists - Crossr...
The benefits of using Crossref metadata for libraries and scientists - Crossr...The benefits of using Crossref metadata for libraries and scientists - Crossr...
The benefits of using Crossref metadata for libraries and scientists - Crossr...
Crossref
 
Crossmark Update Webinar
Crossmark Update WebinarCrossmark Update Webinar
Crossmark Update Webinar
Crossref
 
Cited-by Linking
Cited-by Linking Cited-by Linking
Cited-by Linking
Crossref
 

What's hot (20)

Crossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref Metadata and Metadata Services
Crossref Metadata and Metadata Services
 
Managing plagiarism: Similarity Check
Managing plagiarism: Similarity CheckManaging plagiarism: Similarity Check
Managing plagiarism: Similarity Check
 
The Global reach of Crossref metadata
The Global reach of Crossref metadataThe Global reach of Crossref metadata
The Global reach of Crossref metadata
 
Collecting and Using Funding Data Crossref
Collecting and Using Funding Data CrossrefCollecting and Using Funding Data Crossref
Collecting and Using Funding Data Crossref
 
MENGGUNAKAN METADATA PADA CROSSREF
MENGGUNAKAN METADATA PADA CROSSREFMENGGUNAKAN METADATA PADA CROSSREF
MENGGUNAKAN METADATA PADA CROSSREF
 
Reference linking and Cited-by
Reference linking and Cited-byReference linking and Cited-by
Reference linking and Cited-by
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
 
Collecting and using funding data in your publications
Collecting and using funding data in your publicationsCollecting and using funding data in your publications
Collecting and using funding data in your publications
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck KoscherBarcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
 
CEK KEMIRIPAN PADA CROSSREF
CEK KEMIRIPAN PADA CROSSREFCEK KEMIRIPAN PADA CROSSREF
CEK KEMIRIPAN PADA CROSSREF
 
CPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data CommonsCPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data Commons
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
CrossCheck iThenticate Admin Webinar
CrossCheck iThenticate Admin WebinarCrossCheck iThenticate Admin Webinar
CrossCheck iThenticate Admin Webinar
 
Understanding Crossref Metadata
Understanding Crossref MetadataUnderstanding Crossref Metadata
Understanding Crossref Metadata
 
The benefits of using Crossref metadata for libraries and scientists - Crossr...
The benefits of using Crossref metadata for libraries and scientists - Crossr...The benefits of using Crossref metadata for libraries and scientists - Crossr...
The benefits of using Crossref metadata for libraries and scientists - Crossr...
 
Crossmark Update Webinar
Crossmark Update WebinarCrossmark Update Webinar
Crossmark Update Webinar
 
Cited-by Linking
Cited-by Linking Cited-by Linking
Cited-by Linking
 

Viewers also liked

チェンジメーカーカンファレンス2012
チェンジメーカーカンファレンス2012チェンジメーカーカンファレンス2012
チェンジメーカーカンファレンス2012Yasuhiro Toudou
 
Bai geneva eng invitation l (3)
Bai geneva eng invitation l (3)Bai geneva eng invitation l (3)
Bai geneva eng invitation l (3)
D T
 
ソーシャルベンチャーパートナーズ東京プレゼン111225
ソーシャルベンチャーパートナーズ東京プレゼン111225ソーシャルベンチャーパートナーズ東京プレゼン111225
ソーシャルベンチャーパートナーズ東京プレゼン111225Yasuhiro Toudou
 
U2plus 回復エピソードブログ用
U2plus 回復エピソードブログ用U2plus 回復エピソードブログ用
U2plus 回復エピソードブログ用Yasuhiro Toudou
 
U2plus the fanclub in Co-Ba 120609
U2plus the fanclub in Co-Ba 120609U2plus the fanclub in Co-Ba 120609
U2plus the fanclub in Co-Ba 120609Yasuhiro Toudou
 
120311メンタルヘルスと医療制度 toudou
120311メンタルヘルスと医療制度 toudou120311メンタルヘルスと医療制度 toudou
120311メンタルヘルスと医療制度 toudouYasuhiro Toudou
 
U2plusプレゼン資料
U2plusプレゼン資料U2plusプレゼン資料
U2plusプレゼン資料Yasuhiro Toudou
 
Multikulturel kompetence etnisk x-faktor - endelig
Multikulturel kompetence   etnisk x-faktor - endeligMultikulturel kompetence   etnisk x-faktor - endelig
Multikulturel kompetence etnisk x-faktor - endeligcifs
 
Smw0216
Smw0216Smw0216
うつ家族教室20150121
うつ家族教室20150121うつ家族教室20150121
うつ家族教室20150121
Yasuhiro Toudou
 
Cedulas septiembre
Cedulas septiembreCedulas septiembre
Cedulas septiembre
Asopalsat
 
U2plusプレゼン資料@日本うつ病学会、日本認知療法学会合同シンポジウム
U2plusプレゼン資料@日本うつ病学会、日本認知療法学会合同シンポジウムU2plusプレゼン資料@日本うつ病学会、日本認知療法学会合同シンポジウム
U2plusプレゼン資料@日本うつ病学会、日本認知療法学会合同シンポジウム
Yasuhiro Toudou
 
110917インターネットで行う認知行動療法
110917インターネットで行う認知行動療法110917インターネットで行う認知行動療法
110917インターネットで行う認知行動療法Yasuhiro Toudou
 
WEB認知行動療法サービス:Depress the Depression
WEB認知行動療法サービス:Depress the DepressionWEB認知行動療法サービス:Depress the Depression
WEB認知行動療法サービス:Depress the DepressionYasuhiro Toudou
 
うつ家族教室20151121
うつ家族教室20151121うつ家族教室20151121
うつ家族教室20151121
Yasuhiro Toudou
 
Formation Social Selling
Formation Social SellingFormation Social Selling
Formation Social Selling
Social Media Kapital
 
How IBM does Innovation
How IBM does InnovationHow IBM does Innovation
How IBM does Innovation
cifs
 

Viewers also liked (19)

MI perfil
MI perfilMI perfil
MI perfil
 
チェンジメーカーカンファレンス2012
チェンジメーカーカンファレンス2012チェンジメーカーカンファレンス2012
チェンジメーカーカンファレンス2012
 
Bai geneva eng invitation l (3)
Bai geneva eng invitation l (3)Bai geneva eng invitation l (3)
Bai geneva eng invitation l (3)
 
ソーシャルベンチャーパートナーズ東京プレゼン111225
ソーシャルベンチャーパートナーズ東京プレゼン111225ソーシャルベンチャーパートナーズ東京プレゼン111225
ソーシャルベンチャーパートナーズ東京プレゼン111225
 
U2plus 回復エピソードブログ用
U2plus 回復エピソードブログ用U2plus 回復エピソードブログ用
U2plus 回復エピソードブログ用
 
U2plus the fanclub in Co-Ba 120609
U2plus the fanclub in Co-Ba 120609U2plus the fanclub in Co-Ba 120609
U2plus the fanclub in Co-Ba 120609
 
120311メンタルヘルスと医療制度 toudou
120311メンタルヘルスと医療制度 toudou120311メンタルヘルスと医療制度 toudou
120311メンタルヘルスと医療制度 toudou
 
U2plusプレゼン資料
U2plusプレゼン資料U2plusプレゼン資料
U2plusプレゼン資料
 
Multikulturel kompetence etnisk x-faktor - endelig
Multikulturel kompetence   etnisk x-faktor - endeligMultikulturel kompetence   etnisk x-faktor - endelig
Multikulturel kompetence etnisk x-faktor - endelig
 
Smw0216
Smw0216Smw0216
Smw0216
 
うつ家族教室20150121
うつ家族教室20150121うつ家族教室20150121
うつ家族教室20150121
 
Cedulas septiembre
Cedulas septiembreCedulas septiembre
Cedulas septiembre
 
千葉大学110907
千葉大学110907千葉大学110907
千葉大学110907
 
U2plusプレゼン資料@日本うつ病学会、日本認知療法学会合同シンポジウム
U2plusプレゼン資料@日本うつ病学会、日本認知療法学会合同シンポジウムU2plusプレゼン資料@日本うつ病学会、日本認知療法学会合同シンポジウム
U2plusプレゼン資料@日本うつ病学会、日本認知療法学会合同シンポジウム
 
110917インターネットで行う認知行動療法
110917インターネットで行う認知行動療法110917インターネットで行う認知行動療法
110917インターネットで行う認知行動療法
 
WEB認知行動療法サービス:Depress the Depression
WEB認知行動療法サービス:Depress the DepressionWEB認知行動療法サービス:Depress the Depression
WEB認知行動療法サービス:Depress the Depression
 
うつ家族教室20151121
うつ家族教室20151121うつ家族教室20151121
うつ家族教室20151121
 
Formation Social Selling
Formation Social SellingFormation Social Selling
Formation Social Selling
 
How IBM does Innovation
How IBM does InnovationHow IBM does Innovation
How IBM does Innovation
 

Similar to Liger cat challenge

Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Dag Endresen
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD Editor
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
CSCJournals
 
Cloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewCloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overview
Nikesh Narayanan
 
A scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for microA scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for micro
aman341480
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
 
Evaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesEvaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery Services
Nikesh Narayanan
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
cscpconf
 
bio data
bio databio data
bio data
007dcp
 
11-Big Data Application in Biomedical Research and Health Care.pptx
11-Big Data Application in Biomedical Research and Health Care.pptx11-Big Data Application in Biomedical Research and Health Care.pptx
11-Big Data Application in Biomedical Research and Health Care.pptx
shikhamittal42
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...
Nikesh Narayanan
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Syed Ahmad Chan Bukhari, PhD
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Ahmad C. Bukhari
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
Sreekanth Gali
 
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLESCOLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
ijcsit
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
ijseajournal
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
ijseajournal
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
Kelly Lipiec
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans

Similar to Liger cat challenge (20)

Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
Cloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewCloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overview
 
A scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for microA scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for micro
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Evaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesEvaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery Services
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
 
bio data
bio databio data
bio data
 
11-Big Data Application in Biomedical Research and Health Care.pptx
11-Big Data Application in Biomedical Research and Health Care.pptx11-Big Data Application in Biomedical Research and Health Care.pptx
11-Big Data Application in Biomedical Research and Health Care.pptx
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
 
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLESCOLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 

Recently uploaded

Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
FFragrant
 
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
shruti jagirdar
 
Nano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory projectNano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory project
SIVAVINAYAKPK
 
NAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdf
NAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdfNAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdf
NAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdf
Rahul Sen
 
Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)
Josep Vidal-Alaball
 
Ophthalmic drugs latest. Xxxxxxzxxxxxx.pdf
Ophthalmic drugs latest. Xxxxxxzxxxxxx.pdfOphthalmic drugs latest. Xxxxxxzxxxxxx.pdf
Ophthalmic drugs latest. Xxxxxxzxxxxxx.pdf
MuhammadMuneer49
 
Top Travel Vaccinations in Manchester
Top Travel Vaccinations in ManchesterTop Travel Vaccinations in Manchester
Top Travel Vaccinations in Manchester
NX Healthcare
 
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl MumbaiCall Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Mobile Problem
 
How to choose the best dermatologists in Indore.
How to choose the best dermatologists in Indore.How to choose the best dermatologists in Indore.
How to choose the best dermatologists in Indore.
Gokuldas Hospital
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
FFragrant
 
13. PROM premature rupture of membranes
13.  PROM premature rupture of membranes13.  PROM premature rupture of membranes
13. PROM premature rupture of membranes
TigistuMelak
 
CBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdfCBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdf
suvadeepdas911
 
biomechanics of running. Dr.dhwani.pptx
biomechanics of running.   Dr.dhwani.pptxbiomechanics of running.   Dr.dhwani.pptx
biomechanics of running. Dr.dhwani.pptx
Dr. Dhwani kawedia
 
What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?
Healthmedsrx.com
 
Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024
Torstein Dalen-Lorentsen
 
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOWPune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Get New Sim
 
Local anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdfLocal anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdf
NarminHamaaminHussen
 
Full Handwritten notes of RA by Ayush Kumar M pharm - Al ameen college of pha...
Full Handwritten notes of RA by Ayush Kumar M pharm - Al ameen college of pha...Full Handwritten notes of RA by Ayush Kumar M pharm - Al ameen college of pha...
Full Handwritten notes of RA by Ayush Kumar M pharm - Al ameen college of pha...
ayushrajshrivastava7
 
K CỔ TỬ CUNG.pdf tự ghi chép, chữ hơi xấu
K CỔ TỬ CUNG.pdf tự ghi chép, chữ hơi xấuK CỔ TỬ CUNG.pdf tự ghi chép, chữ hơi xấu
K CỔ TỬ CUNG.pdf tự ghi chép, chữ hơi xấu
HongBiThi1
 
Patellar Instability: Diagnosis Management
Patellar Instability: Diagnosis  ManagementPatellar Instability: Diagnosis  Management
Patellar Instability: Diagnosis Management
Dr Nitin Tyagi
 

Recently uploaded (20)

Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
 
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
 
Nano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory projectNano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory project
 
NAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdf
NAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdfNAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdf
NAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdf
 
Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)
 
Ophthalmic drugs latest. Xxxxxxzxxxxxx.pdf
Ophthalmic drugs latest. Xxxxxxzxxxxxx.pdfOphthalmic drugs latest. Xxxxxxzxxxxxx.pdf
Ophthalmic drugs latest. Xxxxxxzxxxxxx.pdf
 
Top Travel Vaccinations in Manchester
Top Travel Vaccinations in ManchesterTop Travel Vaccinations in Manchester
Top Travel Vaccinations in Manchester
 
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl MumbaiCall Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
 
How to choose the best dermatologists in Indore.
How to choose the best dermatologists in Indore.How to choose the best dermatologists in Indore.
How to choose the best dermatologists in Indore.
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
 
13. PROM premature rupture of membranes
13.  PROM premature rupture of membranes13.  PROM premature rupture of membranes
13. PROM premature rupture of membranes
 
CBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdfCBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdf
 
biomechanics of running. Dr.dhwani.pptx
biomechanics of running.   Dr.dhwani.pptxbiomechanics of running.   Dr.dhwani.pptx
biomechanics of running. Dr.dhwani.pptx
 
What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?
 
Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024
 
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOWPune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
 
Local anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdfLocal anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdf
 
Full Handwritten notes of RA by Ayush Kumar M pharm - Al ameen college of pha...
Full Handwritten notes of RA by Ayush Kumar M pharm - Al ameen college of pha...Full Handwritten notes of RA by Ayush Kumar M pharm - Al ameen college of pha...
Full Handwritten notes of RA by Ayush Kumar M pharm - Al ameen college of pha...
 
K CỔ TỬ CUNG.pdf tự ghi chép, chữ hơi xấu
K CỔ TỬ CUNG.pdf tự ghi chép, chữ hơi xấuK CỔ TỬ CUNG.pdf tự ghi chép, chữ hơi xấu
K CỔ TỬ CUNG.pdf tự ghi chép, chữ hơi xấu
 
Patellar Instability: Diagnosis Management
Patellar Instability: Diagnosis  ManagementPatellar Instability: Diagnosis  Management
Patellar Instability: Diagnosis Management
 

Liger cat challenge

  • 1. LigerCat Holly Miller, MBLWHOI Library, Marine Biological Laboratory Catherine N. Norton, MBLWHOI Library, Marine Biological Laboratory Indra Neil Sarkar, Biomedical Informatics, University of Vermont LigerCat (http://ligercat.org) is a web application that allows a user to search for articles in PubMed with key words or even a DNA/protein sequence and then explore the resulting set via the MeSH descriptors and a publication time line. LigerCat is freely available on the Internet and the source code is available under an MIT license (http://github.com/mbl-cli/LigerCat). LigerCat, Literature and Genomic Electronic Resource Catalogue, aggregates multiple articles in PubMed, combining the associated MeSH descriptors into a tag cloud, weighted by frequency -- more frequent terms are in a larger font size. Through LigerCat a user can search the PubMed database using plain language text or molecular sequences. The MeSH tag cloud then allows the user to explore the collection and create subsets of articles simply by clicking on terms in the cloud. The three ways to search in LigerCat: Journals, Articles and Genes. Journal Search Queries entered into the LigerCat “Journals” tab are used to search the NLM Journals database. Searches are enhanced using words in titles of journals. This approach has shown promise to categorize biomedical literature. In our implementation, stop words (e.g., “the,” “journal,” “association”) are removed from the list of journal titles returned from the initial search, and then the top 15% of remaining title words are determined. The initial search is then enhanced with these additional title words. Across all the journals indexed in MEDLINE, we observed that on average this enhanced approach increased results by approximately 25% over just using subject terms associated with journals. In a specific example, searching the NLM Journals database for journals using the term “Aging” returns 41 journals; the enhanced approach implemented in LigerCat returns 130. LigerCat presents users with the resulting list of journal titles, and the user selects those that they wish to include in the MeSH cloud generation process. Using a local MEDLINE database, all PubMed identifiers (PMIDs) are then retrieved for the selected list of journals. The MeSH cloud is interactive. The user can click on terms in the cloud to create subsets of articles in real time. The resulting subset can be displayed in PubMed by clicking on “Go to Pubmed.” Article Search Traditional PubMed searches are fundamental to the identification of relevant biomedical knowledge. Through the “Articles” tab in LigerCat, users enter a query as
  • 2. they would enter in the PubMed interface. A series of eUtils-mediated searches are then used to retrieve the list of PMIDs. In addition to the MeSH cloud, a Publication History graph is displayed. Both features are interactive. Moussing over the histogram will highlight years and display the number of articles from that year that your search retrieved. Clicking on the bar for that year performs the search in PubMed and takes you to that result page. Gene Search The third modality of search aims to identify relevant literature that is associated with a molecular sequence. Within the context of GenBank, 71% of sequences with citation information are associated with at least one MEDLINE citation. Each PMID reference associated with GenBank records are extracted and stored within a local GenBank database. Through the LigerCat “Genes” tab, the user can enter a FASTA- formatted sequence of a single molecule (e.g., a gene). Related sequences are then identified using netBLAST. Based on the returned results, sequences are examined above a minimally stringent E-value (1.0). A list of PMIDs is then derived from the local GenBank database. The MeSH descriptors from those articles are displayed as a MeSH cloud similar the one generated from the Article Search. In addition to the MeSH cloud, a Publication History graph is displayed, however, at this point it is not interactive. Under the hood LigerCat makes use of a hybrid approach of local databases and Web accessible resources at the National Library of Medicine (NLM Journals, PubMed, and GenBank). This synergy between live and archived data is leveraged both for speed (many of the queries and MeSH profile data are precompiled) and to maintain currency. Additionally, the use of the eUtils for mediating MEDLINE queries enables the leveraging of the synonym mapping features that are part of the Entrez interface. As new items are discovered through the system, they are dynamically added to the database and the MeSH ranks are appropriately adjusted. LigerCat uses a fast and scalable architecture. The queries to NLM are mediated through a load-balancing mechanism that off-loads tasks to a background queue. This is necessary in part because eUtils requires that there is a minimum 1-second interval between queries to the live data and the volume of data that may need to be processed to create the resultant MeSH cloud. LigerCat utilizes a number of open source tools to process data in parallel. By designing a modular, queue-based architecture, we were able to eliminate single points of failure in the system while also gaining the benefit of horizontal scalability. The modular nature of the architecture allowed us to replace a traditional relational database system with the Redis in-memory key-value store affording us a five fold decrease in processing times, though at the cost of an increased memory requirement. By modularizing and virtualizing components of the architecture, we were able to scale processing across many processor cores of a single or many hosts or out onto public clouds where required. Scaling can occur in parallel and rapidly due to the small footprint of the processing components. LigerCat was developed using the Ruby on Rails
  • 3. framework, which allowed for rapid prototyping, development, testing and integration of the system’s various components. The use of messaging queues, in- memory key-value stores and related tools offers informatics developers new solutions to existing data management and processing challenges. The modular nature enables components of this existing system to be reused in other research projects that are faced with similar challenges. Acknowledgements LigerCat development was made possible through funds from The Ellison Medical Foundation, NIH, and the Marine Biological Laboratory.