Presentation regarding InterMine and its adoption by the AIP and MTGD project, made at the Informatics Research WIPS meeting on 03 November 2014, conducted at J. Craig Venter Institute, Rockville, MD.
Presented by Vivek Krishnakumar
An On-line Collaborative Data Management SystemCameron Kiddle
A presentation I prepared that was presented by Rob Simmonds at the Gateway Computing Environments 2010 Workshop in New Orleans on November 14, 2010. It provides an overview of a data management system that was developed for GeoChronos - an on-line collaborative platform for Earth observation scientists.
Data repositories -- Xiamen University 2012 06-08Jian Qin
The document discusses data repositories and services. It begins by defining what a data repository is, noting that it is a logical and sometimes physical partitioning of data where multiple databases reside. It then outlines some key aspects of data repositories, including technical features like standards, software, and staffing requirements. The document also discusses functions of repositories like content management, archiving, dissemination and system maintenance. It provides examples of institutional repositories and data repositories, highlighting characteristics of each. Finally, it provides a case study on Dryad, an international repository for data and publications in biosciences.
Investigating plant systems using data integration and network analysisCatherine Canevet
The document discusses challenges in integrating plant data from multiple sources and proposes solutions. It notes that plant data is sparse, distributed across many databases in various formats, and focused primarily on the model plant Arabidopsis. Data integration is necessary to address key biological questions by consolidating information from pathway databases, gene annotations, protein interactions, and more. The document outlines approaches to data integration including controlled vocabularies, ontologies, data standards, and integration applications specifically designed to combine data sources like Ondex. Effective integration is important to fully leverage available plant data.
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
Web Information Extraction for the DB Research Domainliat_kakun
A presentation describing my final project for an engineering degree at the Hebrew University of Jerusalem - a system for extracting information from web sites into instances of an XML schema, utilizing machine learning, structural analysis of documents and a divide & conquer strategy.
A presentation given by Manjula Patel (UKOLN) at the Repository Curation Environments (RECURSE) Workshop held at the 4th International Digital Curation Conference, Edinburgh, 1st December 2008,
http://www.dcc.ac.uk/events/dcc-2008/programme/
The document describes the eTRIKS Data Harmonization Service Platform, which aims to provide a common infrastructure and services to support cross-institutional translational research. It discusses challenges around data integration and harmonization. The platform utilizes standards and controlled vocabularies to syntactically and semantically harmonize data from various sources. It employs a metadata framework and modular workflow to structure, standardize, and integrate observational data into a harmonized repository for exploration and analysis. A demo of the platform's capabilities for project setup, data staging, exploration, export, and integration with tranSMART is also provided.
An On-line Collaborative Data Management SystemCameron Kiddle
A presentation I prepared that was presented by Rob Simmonds at the Gateway Computing Environments 2010 Workshop in New Orleans on November 14, 2010. It provides an overview of a data management system that was developed for GeoChronos - an on-line collaborative platform for Earth observation scientists.
Data repositories -- Xiamen University 2012 06-08Jian Qin
The document discusses data repositories and services. It begins by defining what a data repository is, noting that it is a logical and sometimes physical partitioning of data where multiple databases reside. It then outlines some key aspects of data repositories, including technical features like standards, software, and staffing requirements. The document also discusses functions of repositories like content management, archiving, dissemination and system maintenance. It provides examples of institutional repositories and data repositories, highlighting characteristics of each. Finally, it provides a case study on Dryad, an international repository for data and publications in biosciences.
Investigating plant systems using data integration and network analysisCatherine Canevet
The document discusses challenges in integrating plant data from multiple sources and proposes solutions. It notes that plant data is sparse, distributed across many databases in various formats, and focused primarily on the model plant Arabidopsis. Data integration is necessary to address key biological questions by consolidating information from pathway databases, gene annotations, protein interactions, and more. The document outlines approaches to data integration including controlled vocabularies, ontologies, data standards, and integration applications specifically designed to combine data sources like Ondex. Effective integration is important to fully leverage available plant data.
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
Web Information Extraction for the DB Research Domainliat_kakun
A presentation describing my final project for an engineering degree at the Hebrew University of Jerusalem - a system for extracting information from web sites into instances of an XML schema, utilizing machine learning, structural analysis of documents and a divide & conquer strategy.
A presentation given by Manjula Patel (UKOLN) at the Repository Curation Environments (RECURSE) Workshop held at the 4th International Digital Curation Conference, Edinburgh, 1st December 2008,
http://www.dcc.ac.uk/events/dcc-2008/programme/
The document describes the eTRIKS Data Harmonization Service Platform, which aims to provide a common infrastructure and services to support cross-institutional translational research. It discusses challenges around data integration and harmonization. The platform utilizes standards and controlled vocabularies to syntactically and semantically harmonize data from various sources. It employs a metadata framework and modular workflow to structure, standardize, and integrate observational data into a harmonized repository for exploration and analysis. A demo of the platform's capabilities for project setup, data staging, exploration, export, and integration with tranSMART is also provided.
This document discusses Kno.e.sis' projects on a federated semantic services platform and material database knowledge discovery for material sciences. It proposes a federated architecture with provenance and access control to realize open digital data sharing while protecting private data. The architecture uses semantic mappings and query processing across distributed public, shared, and private material databases. Provenance metadata captured during experiments can improve reproducibility and trust in material products. Flexible access control policies allow custom sharing of semantic data at different granularities with public communities or collaborators.
The document introduces COBWEB, a research project that develops a crowdsourcing infrastructure for collecting and analyzing environmental data provided by citizens. The project aims to address data quality issues and support policy decisions. It has several pilot sites and partners, including UNESCO biosphere reserves. The framework includes mobile apps, QA processes, and a portal to view and analyze citizen-submitted data. It uses open standards and aims to be customizable for different use cases involving topics like biological monitoring and flooding.
Bioinformatics presentation to students University of Minhointrofini
This document describes ProtoFilWW, a computational platform for analyzing relationships between microorganisms and environmental parameters in wastewater treatment plants. It has the following key components:
- A content management system for researchers to manage and analyze data from wastewater treatment plant samples.
- A text mining component to find additional information about microorganisms present in biological samples. It uses technologies like Lucene, Solr, and UIMA.
- User roles including visitors, collaborators, researchers, and administrators. Researchers can insert, analyze, and export data, while administrators manage users and data backups.
- Features like dynamic reporting, charting, geolocation of wastewater treatment plants, and
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterGlobus
This poster was presented at the 2019 NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium by Brigitte Raumann and Ian Foster of Globus, University of Chicago and Argonne National Lab.
Mica is a web platform that supports epidemiological studies through an integrated infrastructure called Maelstrom repository. It is a 3-step process: 1) Mica Catalogues document participating studies and their variables, 2) Mica Variables further identify variables for harmonization, 3) Processing, Integration and Dissemination harmonizes data across studies and allows dissemination through integration with other tools like Opal. The goal is to provide a comprehensive, customizable and open source software solution to support all stages of multi-study data harmonization and analysis.
Curation and Preservation of Crystallography DataManjulaPatel
A presentation given by Manjula Patel (UKOLN) at "Chemistry in the Digital Age: A Workshop connecting research and education", June 11-12th 2009, Penn State University,
http://www.chem.psu.edu/cyberworkshop09
. Images have an irrefutably central role in scientific discovery and discourse.
However, the issues associated with knowledge management and utility operations
unique to image data are only recently gaining recognition. In our previous
work, we have developed Yale Image finder (YIF), which is a novel Biomedical image
search engine that indexes around two million biomedical image data, along with
associated metadata. While YIF is considered to be a veritable source of easily accessible
biomedical images, there are still a number of usability and interoperability challenges
that have yet to be addressed. To overcome these issues and to accelerate the
adoption of the YIF for next generation biomedical applications, we have developed a
publically accessible semantic API for biomedical images with multiple modalities.
The core API called iCyrus is powered by a dedicated semantic architecture that exposes
the YIF content as linked data, permitting integration with related information
resources and consumption by linked data-aware data services. To facilitate the adhoc
integration of image data with other online data resources, we also built semantic
web services for iCyrus, such that it is compatible with the SADI semantic web service
framework. The utility of the combined infrastructure is illustrated with a number
of compelling use cases and further extended through the incorporation of Domeo, a
well known tool for open annotation. Domeo facilitates enhanced search over the
images using annotations provided through crowdsourcing. The iCyrus triplestore
currently holds more than thirty-five million triples and can be accessed and operated
through syntactic or semantic query interfaces. Core features of the iCyrus API,
namely: data reusability, system interoperability, semantic image search, automatic
update and dedicated semantic infrastructure make iCyrus a state of the art resource
for image data discovery and retrieval
Web Information Extraction for the Database Research DomainMichael Genkin
A presentation describing my final project for an engineering degree at the Hebrew University of Jerusalem - a system for extracting information from web sites into instances of an XML schema, utilizing machine learning, structural analysis of documents and a divide & conquer strategy.
VIVO is a web resource created at Cornell University that provides a single point of access for information on scholarly activity. It uses an ontology and semantic web technologies like Jena to represent common university relationships and allows users to search for and browse faculty profiles and research activities. Current development focuses on improving import/export of OWL/RDF data, extending the unified data model, and using inference engines to classify and relate information in real-time without manual tagging. Future plans include leveraging SPARQL queries, integrating multiple ontologies, and using Vitro as a front-end for other data repositories and collections.
Integrated research data management in the Structural SciencesManjulaPatel
A presentation given by Manjula Patel (UKOLN, University of Bath) at the I2S2 workshop "Scaling Up to Integrated Research Data Management", IDCC 2010, 6th December 2010, Chicago.
http://www.ukoln.ac.uk/projects/I2S2/events/IDCC-2010-ScalingUp-Wksp/
NADA is an open source web application for archiving, searching and browsing microdata using the data documentation initiative (DDI).
Key features are:
- Support DDI and RDF
- Search studies and variables
- Compare variables
- Provides data access for datasets using Public, Licensed, Direct and Data Enclave
This document describes a design challenge to create a system for managing data flows and access within computational social science studies in a privacy-aware manner. The system should support multiple studies conducted by different researchers while reusing common functions like user management, informed consent processes, and data access controls. It should allow multiple users in different studies to continuously view collected data and manage their consent and authorizations. Privacy-aware approaches are needed as sensitive personal data is increasingly collected at scale, but current solutions are minimal; the goal is a simple yet effective system like Funf for data collection from phones.
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
The document discusses increasing the discoverability and accessibility of open access content by adopting two principles for open repositories:
1) Providing dereferencable identifiers in metadata that link to the full content files. This allows aggregation systems to accurately determine what content is truly open.
2) Ensuring universal access to repository content by machines, similar to human access. This enables services like text mining to reuse open content.
Validation tools are needed to help repositories comply with these principles and maximize the potential of open access by making content fully discoverable and accessible to both humans and machines.
Presentation about the agINFRA Germplasm Working Group (http://wiki.aginfra.eu/index.php/Germplasm_Working_Group). Presented during Session 1 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)
ETDs and Open Access for Research and Development: Issues and challengesBhojaraju Gunjal
- ETDs (Electronic Theses and Dissertations) have grown enormously in recent years, with over 6 million items now available in open access repositories worldwide.
- Factors like knowledge organization systems (KOS) and discovery services have helped improve management and retrieval of ETDs, but issues around policies, metadata standards, and open access remain.
- Making ETDs openly accessible online can help research and development by increasing global awareness of universities' work, but many institutions still embargo access or do not make ETDs open at all.
- To address ongoing challenges, experts recommend developing uniform global policies modeled after the NDLTD, encouraging open access of scholarly works through institutional repositories, and providing training
SEEK is an open-source platform for scientists to store, share, and collaborate on heterogeneous data, models, and standard operating procedures. It was developed by researchers in the UK and Germany to facilitate data sharing across multi-group projects. SEEK allows scientists to organize experiments and data using ISA-TAB standards, interlink related assets, and control access to assets at various stages of research from private to public. Key features include hosting and simulating SBML models, exploring and annotating spreadsheets, and finding expertise and collaborators through people profiles.
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
This document discusses the characteristics and challenges of managing life sciences data. It notes that bio-data lacks structure, grows rapidly in heterogeneous formats and file sizes. Data goes through multiple analysis stages and is associated with evolving metadata standards. Ensuring data is properly stored, shared and preserved requires significant effort in describing formats, preparing submissions to various specialized public repositories, and developing data management plans. Integrating data from different sources also poses major challenges.
The document discusses use cases for the ARCHIVER project at the European Bioinformatics Institute. It notes that the EBI's data is growing rapidly at around 40-50% per year and will likely continue doubling every two years. It aims to develop a hybrid multi-cloud storage solution to address this growth and enable cost-effective scaling, analysis in the cloud, and caching of frequently accessed data in public clouds. Key challenges include balancing cost and performance across on-premises and cloud storage as data and analysis needs increase.
The document discusses user experience (UX) design considerations for mobile and traditional web. It notes that mobile internet usage will surpass desktop by 2014. While not every site needs a mobile version, mobile UX faces different constraints like small screens and awkward input. The document explores these constraints and provides examples of mobile-optimized sites that reduce content, simplify layouts, and make the most important actions easily accessible. It emphasizes analyzing user context and building designs that are useful, usable and emotionally engaging for the mobile experience.
This document discusses Kno.e.sis' projects on a federated semantic services platform and material database knowledge discovery for material sciences. It proposes a federated architecture with provenance and access control to realize open digital data sharing while protecting private data. The architecture uses semantic mappings and query processing across distributed public, shared, and private material databases. Provenance metadata captured during experiments can improve reproducibility and trust in material products. Flexible access control policies allow custom sharing of semantic data at different granularities with public communities or collaborators.
The document introduces COBWEB, a research project that develops a crowdsourcing infrastructure for collecting and analyzing environmental data provided by citizens. The project aims to address data quality issues and support policy decisions. It has several pilot sites and partners, including UNESCO biosphere reserves. The framework includes mobile apps, QA processes, and a portal to view and analyze citizen-submitted data. It uses open standards and aims to be customizable for different use cases involving topics like biological monitoring and flooding.
Bioinformatics presentation to students University of Minhointrofini
This document describes ProtoFilWW, a computational platform for analyzing relationships between microorganisms and environmental parameters in wastewater treatment plants. It has the following key components:
- A content management system for researchers to manage and analyze data from wastewater treatment plant samples.
- A text mining component to find additional information about microorganisms present in biological samples. It uses technologies like Lucene, Solr, and UIMA.
- User roles including visitors, collaborators, researchers, and administrators. Researchers can insert, analyze, and export data, while administrators manage users and data backups.
- Features like dynamic reporting, charting, geolocation of wastewater treatment plants, and
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterGlobus
This poster was presented at the 2019 NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium by Brigitte Raumann and Ian Foster of Globus, University of Chicago and Argonne National Lab.
Mica is a web platform that supports epidemiological studies through an integrated infrastructure called Maelstrom repository. It is a 3-step process: 1) Mica Catalogues document participating studies and their variables, 2) Mica Variables further identify variables for harmonization, 3) Processing, Integration and Dissemination harmonizes data across studies and allows dissemination through integration with other tools like Opal. The goal is to provide a comprehensive, customizable and open source software solution to support all stages of multi-study data harmonization and analysis.
Curation and Preservation of Crystallography DataManjulaPatel
A presentation given by Manjula Patel (UKOLN) at "Chemistry in the Digital Age: A Workshop connecting research and education", June 11-12th 2009, Penn State University,
http://www.chem.psu.edu/cyberworkshop09
. Images have an irrefutably central role in scientific discovery and discourse.
However, the issues associated with knowledge management and utility operations
unique to image data are only recently gaining recognition. In our previous
work, we have developed Yale Image finder (YIF), which is a novel Biomedical image
search engine that indexes around two million biomedical image data, along with
associated metadata. While YIF is considered to be a veritable source of easily accessible
biomedical images, there are still a number of usability and interoperability challenges
that have yet to be addressed. To overcome these issues and to accelerate the
adoption of the YIF for next generation biomedical applications, we have developed a
publically accessible semantic API for biomedical images with multiple modalities.
The core API called iCyrus is powered by a dedicated semantic architecture that exposes
the YIF content as linked data, permitting integration with related information
resources and consumption by linked data-aware data services. To facilitate the adhoc
integration of image data with other online data resources, we also built semantic
web services for iCyrus, such that it is compatible with the SADI semantic web service
framework. The utility of the combined infrastructure is illustrated with a number
of compelling use cases and further extended through the incorporation of Domeo, a
well known tool for open annotation. Domeo facilitates enhanced search over the
images using annotations provided through crowdsourcing. The iCyrus triplestore
currently holds more than thirty-five million triples and can be accessed and operated
through syntactic or semantic query interfaces. Core features of the iCyrus API,
namely: data reusability, system interoperability, semantic image search, automatic
update and dedicated semantic infrastructure make iCyrus a state of the art resource
for image data discovery and retrieval
Web Information Extraction for the Database Research DomainMichael Genkin
A presentation describing my final project for an engineering degree at the Hebrew University of Jerusalem - a system for extracting information from web sites into instances of an XML schema, utilizing machine learning, structural analysis of documents and a divide & conquer strategy.
VIVO is a web resource created at Cornell University that provides a single point of access for information on scholarly activity. It uses an ontology and semantic web technologies like Jena to represent common university relationships and allows users to search for and browse faculty profiles and research activities. Current development focuses on improving import/export of OWL/RDF data, extending the unified data model, and using inference engines to classify and relate information in real-time without manual tagging. Future plans include leveraging SPARQL queries, integrating multiple ontologies, and using Vitro as a front-end for other data repositories and collections.
Integrated research data management in the Structural SciencesManjulaPatel
A presentation given by Manjula Patel (UKOLN, University of Bath) at the I2S2 workshop "Scaling Up to Integrated Research Data Management", IDCC 2010, 6th December 2010, Chicago.
http://www.ukoln.ac.uk/projects/I2S2/events/IDCC-2010-ScalingUp-Wksp/
NADA is an open source web application for archiving, searching and browsing microdata using the data documentation initiative (DDI).
Key features are:
- Support DDI and RDF
- Search studies and variables
- Compare variables
- Provides data access for datasets using Public, Licensed, Direct and Data Enclave
This document describes a design challenge to create a system for managing data flows and access within computational social science studies in a privacy-aware manner. The system should support multiple studies conducted by different researchers while reusing common functions like user management, informed consent processes, and data access controls. It should allow multiple users in different studies to continuously view collected data and manage their consent and authorizations. Privacy-aware approaches are needed as sensitive personal data is increasingly collected at scale, but current solutions are minimal; the goal is a simple yet effective system like Funf for data collection from phones.
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
The document discusses increasing the discoverability and accessibility of open access content by adopting two principles for open repositories:
1) Providing dereferencable identifiers in metadata that link to the full content files. This allows aggregation systems to accurately determine what content is truly open.
2) Ensuring universal access to repository content by machines, similar to human access. This enables services like text mining to reuse open content.
Validation tools are needed to help repositories comply with these principles and maximize the potential of open access by making content fully discoverable and accessible to both humans and machines.
Presentation about the agINFRA Germplasm Working Group (http://wiki.aginfra.eu/index.php/Germplasm_Working_Group). Presented during Session 1 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)
ETDs and Open Access for Research and Development: Issues and challengesBhojaraju Gunjal
- ETDs (Electronic Theses and Dissertations) have grown enormously in recent years, with over 6 million items now available in open access repositories worldwide.
- Factors like knowledge organization systems (KOS) and discovery services have helped improve management and retrieval of ETDs, but issues around policies, metadata standards, and open access remain.
- Making ETDs openly accessible online can help research and development by increasing global awareness of universities' work, but many institutions still embargo access or do not make ETDs open at all.
- To address ongoing challenges, experts recommend developing uniform global policies modeled after the NDLTD, encouraging open access of scholarly works through institutional repositories, and providing training
SEEK is an open-source platform for scientists to store, share, and collaborate on heterogeneous data, models, and standard operating procedures. It was developed by researchers in the UK and Germany to facilitate data sharing across multi-group projects. SEEK allows scientists to organize experiments and data using ISA-TAB standards, interlink related assets, and control access to assets at various stages of research from private to public. Key features include hosting and simulating SBML models, exploring and annotating spreadsheets, and finding expertise and collaborators through people profiles.
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
This document discusses the characteristics and challenges of managing life sciences data. It notes that bio-data lacks structure, grows rapidly in heterogeneous formats and file sizes. Data goes through multiple analysis stages and is associated with evolving metadata standards. Ensuring data is properly stored, shared and preserved requires significant effort in describing formats, preparing submissions to various specialized public repositories, and developing data management plans. Integrating data from different sources also poses major challenges.
The document discusses use cases for the ARCHIVER project at the European Bioinformatics Institute. It notes that the EBI's data is growing rapidly at around 40-50% per year and will likely continue doubling every two years. It aims to develop a hybrid multi-cloud storage solution to address this growth and enable cost-effective scaling, analysis in the cloud, and caching of frequently accessed data in public clouds. Key challenges include balancing cost and performance across on-premises and cloud storage as data and analysis needs increase.
The document discusses user experience (UX) design considerations for mobile and traditional web. It notes that mobile internet usage will surpass desktop by 2014. While not every site needs a mobile version, mobile UX faces different constraints like small screens and awkward input. The document explores these constraints and provides examples of mobile-optimized sites that reduce content, simplify layouts, and make the most important actions easily accessible. It emphasizes analyzing user context and building designs that are useful, usable and emotionally engaging for the mobile experience.
BizTalk is Microsoft's platform for enterprise application integration (EAI), business process management (BPM), and business-to-business (B2B) integration. It provides a development and runtime environment for integrating systems, applications, and services through its core components like ports, schemas, pipelines, orchestrations and adapters. BizTalk handles long-running business processes within and between businesses through its publish-subscribe architecture and supports transactions that can run for weeks or months.
Casmira Camilo was born in Marawi to parents Camilo and Baimonan Anongcar. Her father works hard in business to provide for their family, while her mother prays from the Qur'an for Casmira to have good health and a bright future. Casmira was named Casmira Camilo, with her name meaning "cash" and "beautiful girl" in reference to her father's belief that she will bring them luck and her mother's prayers from the Qur'an.
The document discusses persuasive design, which focuses on using emotional triggers to influence user behavior in a desired way. It involves understanding user emotions and acting on that information to create intriguing experiences that nudge users towards the desired actions. The research for persuasive design is more qualitative than traditional usability methods and aims to understand what drives user actions and when users are most receptive to influence.
This document tells the story of how a boy named Harlan Dave received his name. Fifteen years ago, a couple had a baby boy who was born on a starry morning in the Philippines. They found a piece of paper in their backyard with the name "Harlan" written on it, and decided to use that as the boy's first name. They added "Dave" to the end to honor the boy's grandfather's strong faith in David from the Bible, defining him as a soldier accompanied by God. The boy is now making his own history, accompanied by prayers to God.
Tutorial 1: Your First Science App - Araport Developer WorkshopVivek Krishnakumar
Slide deck pertaining to Tutorial 1 of the Araport Developer Workshop conducted at TACC, Austin TX on November 5, 2014.
Presented by Vivek Krishnakumar
This document is a resume for Victor Cassen summarizing his technical skills and work history as a software developer. It lists his proficiency with various programming languages, frameworks and databases. It then describes his two most recent roles developing network management software and biological research tools and databases. It provides examples of projects in each role involving web services, data processing pipelines, and user-facing applications. It concludes with his educational background of a Computer Science degree from the University of Washington.
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
The document discusses extending the iPlant cyberinfrastructure to support microbes in addition to plants. It provides an overview of iPlant, including its funding from NSF, collaborations, resources like data storage and computing platforms, and applications for analysis. Future plans are outlined to build tools and streamline workflows for metagenomics and enable high-throughput computing for microbial data.
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Bradford Condon
Talk given by Dr. Bradford Condon at the NSRP10 session of the Plant and Animal Genomes conference (PAG) 2019. Covers the basics of the biological database toolkit Tripal, and how Tripal enables FAIR data.
New ICT Trends and Issues of LibrarianshipLiaquat Rahoo
The document summarizes a one-day workshop on new ICT trends and issues in librarianship. It will cover topics like the introduction of ICT in libraries, different types of libraries supported by ICT, necessary ICT infrastructure, software for library automation, digital repositories, and web applications. The workshop will be held at the Institute of Modern Sciences and Arts on April 17, 2016.
The document discusses the HarvestChoice project, which aims to improve agricultural productivity for the poor through policy evaluation and technology assessment. It outlines the development of an advanced web portal to integrate bibliographic, spatial, and other data through a Drupal/Solr platform integrated with a GeoNetwork platform. The portal architecture uses Solr for search and Drupal for content management, with GeoNetwork to access spatial and mapped data through metadata standards. The implementation process and next steps are also summarized.
ChemAxon’s consulting services has undergone substantial growth in the last several years. We now utilize talent from across the globe, servicing a wide range of project needs. From small short term solutions to mega-pharma project management, small-scale software integration/migration to customized product development. This presentation will cover some of the range of projects we are engaged in with some focus on a large web portal project we delivered for the European Lead Factory.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
1. Data commons co-locate large biomedical datasets with cloud computing infrastructure and analysis tools to create shared resources for the research community.
2. The NCI Genomic Data Commons is an example of a data commons that makes over 2.5 petabytes of cancer genomics data available through web portals, APIs, and harmonized analysis pipelines.
3. The Gen3 platform is an open source software stack for building data commons that can interoperate through common APIs and data models to support reproducible, collaborative research across projects.
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
Gianpaolo Coro, ISTI-CNR, at BlueBRIDGE workshop on "Data Management services to support stock assessement", held during the Annual ICES Science conference 2016
The document describes the DALICC Vocabulary, which was developed as part of the DALICC project to represent legal expressions from licenses in a machine-readable way. The vocabulary extends the ODRL and CCRel ontologies with additional properties needed to capture the full semantic spectrum of copyright statements. Examples are provided showing how the BSD 3.0, CC-BY, and Apache licenses can be represented using the DALICC vocabulary. The goal is to significantly reduce the costs of license clearance for derivative works by developing a framework that can understand and process license information.
Data commons bonazzi bd2 k fundamentals of science feb 2017Vivien Bonazzi
Vivien Bonazzi leads the Data Commons efforts within NIH. She discussed how big data is characterized by volume, velocity, variety and veracity. She explained that data is becoming the central currency of a new digital economy and organizations must leverage their digital assets through platforms like the Data Commons to transform into digital enterprises. The Data Commons platform fosters development of a digital ecosystem by enabling interactions between producers and consumers of FAIR digital objects like data, software and publications.
Site up an open access-ICAR
Institutional Repository-Hardware, Software, Policies and Personnel.
ICAR Initiatives
Under NATP Project – Integrated National Agricultural Resources Information System INARIS (Rai et. Al., 2007). A Central Data warehouse (CWD) of agricultural resources was established at IASRI
This project having collaborations with 13 other organizations of ICAR.
In this view 13 different data marts were designed.
This Project was available under this link (http://agdw.iasri.res.in)
My outlook Country should have agri-search engine
Agri-Search Engine should be developed in country to aggregate information from the internet and provide it to farmers in meaningful manner through using ICT tools.
Agri-Search Engine be coordinated with Govt. of India’s Agricultural Websites to monitor each website per day.
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services
GBIF is an intergovernmental organization that facilitates open access to biodiversity data worldwide via the internet. It provides three main types of infrastructure: physical infrastructure including data workflows and datasets; information infrastructure such as data portals and products; and capability infrastructure like knowledge management and standard development. GBIF aims to make biodiversity data freely available under common standards to support scientific research, conservation, and sustainable development. It currently hosts over 400 million data records from more than 10,000 datasets contributed by its 52 member countries and 36 international organizations.
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
The document discusses how cloud computing can enable large-scale genomic analysis by providing on-demand access to computational resources and petabytes of reference data. It describes how tools like Galaxy and CloudMan allow researchers to perform genomic analysis in the cloud through a web browser by automating the provisioning and configuration of cloud resources. This approach makes genomic research more accessible and enables the elastic scaling of analysis as needed.
Enabling knowledge management in the Agronomic DomainPierre Larmande
This talk will focus mainly on, ongoing projects at the Institute of Computational Biology
Agronomic Linked Data (AgroLD): is a Semantic Web knowledge base designed to integrate data from various publically available plant centric data sources.
GIGwA: is a tool developed to manage genomic, transcriptomic and genotyping large data resulting from NGS analyses.
Similar to Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting (20)
Presented in the "New and Updated Bioinformatics Datasets, Tools and Resources" at the 28th International Conference on Arabidopsis Research (ICAR 2017) held in St. Louis, MO.
Thursday, June 22nd, 2017
Lightning Talk about InterMine/JBrowse integration and extensions to Inter-"Mine" Communication, presented at the 2017 InterMine Developer Workshop and Hackathon (IMDEV 2017) held at the Joint Genome Institute (JGI) in Walnut Creek, CA
Thursday, March 30th
Integrate JBrowse REST API Framework with Adama Federation ArchitectureVivek Krishnakumar
This presentation describes the work done to integrate the JBrowse REST API Framework with the Araport.org-developed Adama Federation Architecture, enabling community developers to package their published datasets and expose them in a manner which is compatible with JBrowse.
Araport is an online resource for Arabidopsis research that integrates data from various sources through federation and warehousing. It provides updated gene annotations including over 1,000 new protein coding genes and 50k new splice variants identified from RNA-seq data. Araport's goals are to serve as a comprehensive "one-stop-shop" for Arabidopsis genomic data, literature, and tools through its combination of software and state-of-the-art web technologies.
Overview of InterMine infrastructure, ability to interoperate with other InterMine instances via IM 2.0 StairCase
Presented at the LF Project Kickoff Meeting, 2015/06/22
InterMine is an open-source data warehouse software that allows for the integration of complex biological data. It provides parsers for common data formats and an extensible framework to customize data. The system uses a PostgreSQL database to store integrated data according to an object-oriented data model. It offers a customizable web interface for querying as well as programmatic access via a web service API. Building an InterMine instance involves configuring data sources, performing data integration and post-processing, and deploying the web application. InterMine facilitates data sharing across multiple biological "mines".
JBrowse within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
Araport integrates JBrowse visualization software from GMOD. In order to support diverse sets of locally and remotely sourced tracks, the “ComboTrackSelector” JBrowse plugin was developed to enable the capability to partition metadata rich tracks in the “Faceted” selector while using the default “Hierarchical” selector for everything else.
A dynamic sequence viewer add-on, “SeqLighter”, was developed using the BioJS framework (http://biojs.net/), configured offer end-users with the capability to view the genomic sequence underlying the gene models (genic regions plus customizable flanking regions), highlight sub-features (like UTRs, exons, introns, start/stop codons) and export the annotated output in various formats (SVG, PNG, JPEG).
Tripal within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
Araport plans to implement a Chado-backed data warehouse, fronted by Tripal, serving as as our core database, used to track multiple versions of genome annotation (TAIR10, Araport11, etc.), evidentiary data (used by our annotation update pipeline), metadata such as publications collated from multiple sources like TAIR, NCBI PubMed and UniProtKB (curated and unreviewed) and stock/germplasm data linked to AGI loci via their associated polymorphisms.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
8.Isolation of pure cultures and preservation of cultures.pdf
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting
1. InterMine
Integrated Data Warehouse
Use Cases: Arabidopsis & Medicago Genome Projects
Vivek Krishnakumar
Plant Genomics Group (EUK)
IFX Research WIPS Meeting, 03 October 2014
2. Overview
• Introduction
• InterMine
Integrated data warehouse, Extensible data model,
Flexible query system
Web and Programmatic Interface
Other InterMine instances
• Use cases
Arabidopsis Information Portal (AIP)
Medicago truncatula Genome Database (MTGD)
• Summary
Advantages
Caveats
3. Introduction
For genome projects that wish to expose their
data via the web (query, visualize, warehouse)
to foster scientific collaboration, there are
several technologies available:
• JCVI developed software
Manatee (backed by an RDBMS)
• Externally developed software
BioMart (federated from various databases)
Tripal (powered by Drupal, backed by CHADOdb)
InterMine
4. InterMine
• Functions as a data warehouse for the integration of complex
biological data. Integration across data types occurs based on
a common identifier (e.g. gene primary ID)
• Uses a flexible and extensible data model, controlled by XML
files, driven by ontologies (Sequence [SO], Gene [SO], etc.)
Genomics, Proteomics, Interactions, Homology,
Expression, Pathways (and more data types)
Parsers for commonly used biological data formats
Provides framework for adding your own data
• Offers a flexible query system, optimized via precomputed
tables (no need for schema denormalization)
Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data
Bioinformatics (2012) 28 (23): 3163-3165
5. InterMine (contd.)
• Provides a user-friendly web interface exposing
powerful features:
Analysis of lists (facilitate enrichment studies)
Full-featured report pages (one-stop shop)
Interactive result tables (sort, filter, summarize)
Visual query builder (no need to write SQL!)
Quick search and Region-based search
• Fosters development of external applications
using data hosted within InterMine via Application
Programming Interfaces (API):
RESTful
Perl, Python, Ruby, Java, JavaScript
Kalderimis, A. et al. InterMine: extensive web services for modern biology
Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472
6. Public “Mines”
• InterMine supports querying across mines
for cross-database integration
• Vast number of warehouses powered by
InterMine already exist
7. Arabidopsis Information Portal (AIP)
• AIP origins
Funded by NSF in response to community needs, following
termination of funding to TAIR
• AIP objectives
Develop a community web resource that…
– is sustainable and fundable and community-extensible
– hosts analysis & visualization tools, user data spaces
Federation: integrate diverse data sets from distributed data
sources; foster development of tools for and by the community
Maintenance of the Col-0 gold standard annotation
• AIP methods
Assimilate TAIR data
Host an InterMine instance devoted to Arabidopsis (thale cress)
Offer and consume RESTful web services
Integrate and utilize iPlant resources
8. ThaleMine
https://apps.araport.org/thalemine
• An InterMine interface
to Arabidopsis genomic
data
• Integrates a wide
variety of data types
(A-E, H), some of
which are warehoused
and others are
federated via web
services
• Embedded elements
visualizing gene
structure (JBrowse, not
shown), interaction
networks (F),
expression patterns (G)
9. Visual Query Builder
Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
10. Interactive Result Tables Region-based search
Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
11. MedicMine
http://medicmine.jcvi.org
• NSF funded project to
assist with the curation
of the Medicago
truncatula Genome
Assembly and
Annotation (funding
ended August 2014)
• In order to warehouse
and prolong the project
data, an InterMine
interface for Medicago
was implemented
(backed by a CHADO
database)
• Provides similar kind of
functionality available via
ThaleMine
12. Summary
• Advantages
InterMine is a powerful biological data warehouse
Performs complex data integration
Allows fast and flexible querying
Well documented programmatic interface
Cookie-cutter, user-friendly web interface
Facilitates cross-talk between “mines”
• Caveats
Adding more data requires a full database rebuild (incremental loading
is not possible) because of the integration step
• About InterMine:
Developed by the Micklem Lab at the University of Cambridge, UK
Written in Java, backed by PostgreSQLdb, deployed under Tomcat.
Documentation and downloads available at http://www.intermine.org
13. Chris Town, PI
Chris Nelson
PM
Lisa McDonald
Education and
Outreach
Coordinator
Jason Miller, Co-PI
Technical Lead
Erik Ferlanti
SE
Vivek Krishnakumar
BE
Svetlana Karamycheva
BE
Maria Kim
BE
Gos Micklem, co-PI Sergio Contrino
Eva Huala
Project lead, TAIR
Software Engineer
Bob Muller
Technical lead, TAIR
Matt Vaughn
co-PI Steve Mock
Advanced Computing
Interfaces
Rion Dooley,
Web and Cloud
Services
Matt Hanlon,
Web and Mobile
Applications
Ben Rosen
BA