Presentation for the San Francisco #IDCC14 conference (http://www.dcc.ac.uk/events/idcc14/day-two-papers). The presentation covers publishing zooarchaeology data with Open Context (http://opencontext.org) to study the spread of farming from the Near East to Europe through Anatolia. It looks at editorial processes, linked data annotation, and other workflow concerns relating to making raw data more usable for comparative analysis.
RuleML2015: Rule-Based Exploration of Structured Data in the BrowserRuleML
We present Dexter, a browser-based, domain-independent
structured-data explorer for users. Dexter enables users to explore data
from multiple local and Web-accessible heterogeneous data sources such
as files, Web pages, APIs and databases in the form of tables. Dexter’s
users can also compute tables from existing ones as well as validate
the tables (base or computed) through declarative rules. Dexter enables
users to perform ad hoc queries over their tables with higher expressivity
than that is supported by the underlying data sources. Dexter evaluates
a user’s query on the client side while evaluating sub-queries on remote
sources whenever possible. Dexter also allows users to visualize and share
tables, and export (e.g., in JSON, plain XML, and RuleML) tables along
with their computation rules. Dexter has been tested for a variety of data
sets from domains such as government and apparel manufacturing. Dexter
is available online at http://dexter.stanford.edu.
Presentation for the San Francisco #IDCC14 conference (http://www.dcc.ac.uk/events/idcc14/day-two-papers). The presentation covers publishing zooarchaeology data with Open Context (http://opencontext.org) to study the spread of farming from the Near East to Europe through Anatolia. It looks at editorial processes, linked data annotation, and other workflow concerns relating to making raw data more usable for comparative analysis.
RuleML2015: Rule-Based Exploration of Structured Data in the BrowserRuleML
We present Dexter, a browser-based, domain-independent
structured-data explorer for users. Dexter enables users to explore data
from multiple local and Web-accessible heterogeneous data sources such
as files, Web pages, APIs and databases in the form of tables. Dexter’s
users can also compute tables from existing ones as well as validate
the tables (base or computed) through declarative rules. Dexter enables
users to perform ad hoc queries over their tables with higher expressivity
than that is supported by the underlying data sources. Dexter evaluates
a user’s query on the client side while evaluating sub-queries on remote
sources whenever possible. Dexter also allows users to visualize and share
tables, and export (e.g., in JSON, plain XML, and RuleML) tables along
with their computation rules. Dexter has been tested for a variety of data
sets from domains such as government and apparel manufacturing. Dexter
is available online at http://dexter.stanford.edu.
Presentation on the Resource Identification Pilot Project, an initiative to develop a machine-processable citation system for key research resources used in scientific studies
Managing Content-Driven Websites with DrupalAndy Smith
From 2011 CCCU Conference on Technology:
Your website is most often a prospective student's first impression of the quality of your institution. Not only does it need to be visually attractive, but it needs to have all the features they are expecting — an online application, campus visit requests, cost calculators, online catalogs, etc. — and of course you need to do this on a tight budget. Drupal to the rescue. Drupal is an open-source content management system (CMS) that is robust and flexible. In this session we will demonstrate the power of Drupal, look at some pre-built distributions built on Drupal, and provide you with a list of resources to get you started.
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
My talk about BioThings API project at ISMB 2018 Chicago, as part of BD2K special session. BioThings API project provides a collection of high-performance APIs (MyGene.info, MyVariant.info, MyChem.info), an SDK for building a new biomedical API (BioThings SDK), and a JSON-LD and OpenAPI based solution for across-API interoperability and knowledge exploration.
Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.
Intended for a mixed/general audience of Clinicians, Business Interests, and Research Scientists. No audio, however the event was recorded and posted to youtube by Genome Atlantic: http://www.youtube.com/watch?v=FLVjwOngu-Q I
Presentation on the Resource Identification Pilot Project, an initiative to develop a machine-processable citation system for key research resources used in scientific studies
Managing Content-Driven Websites with DrupalAndy Smith
From 2011 CCCU Conference on Technology:
Your website is most often a prospective student's first impression of the quality of your institution. Not only does it need to be visually attractive, but it needs to have all the features they are expecting — an online application, campus visit requests, cost calculators, online catalogs, etc. — and of course you need to do this on a tight budget. Drupal to the rescue. Drupal is an open-source content management system (CMS) that is robust and flexible. In this session we will demonstrate the power of Drupal, look at some pre-built distributions built on Drupal, and provide you with a list of resources to get you started.
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
My talk about BioThings API project at ISMB 2018 Chicago, as part of BD2K special session. BioThings API project provides a collection of high-performance APIs (MyGene.info, MyVariant.info, MyChem.info), an SDK for building a new biomedical API (BioThings SDK), and a JSON-LD and OpenAPI based solution for across-API interoperability and knowledge exploration.
Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.
Intended for a mixed/general audience of Clinicians, Business Interests, and Research Scientists. No audio, however the event was recorded and posted to youtube by Genome Atlantic: http://www.youtube.com/watch?v=FLVjwOngu-Q I
We isolated and analyzed, at single-nucleotide resolution, cancer-associated neochromosomes from well- and/or dedifferentiated liposarcomas. Neochromosomes, which can exceed 600 Mb in size, initially arise as circular structures following chromothripsis involving chromosome 12. The core of the neochromosome is amplified, rearranged, and corroded through hundreds of breakage-fusion-bridge cycles. Under selective pressure, amplified oncogenes are overexpressed, while coamplified passenger genes may be silenced epigenetically. New material may be captured during punctuated chromothriptic events. Centromeric corro- sion leads to crisis, which is resolved through neocentromere formation or native centromere capture. Finally, amplification terminates, and the neochromosome core is stabilized in linear form by telomere capture. This study investigates the dynamic mutational processes underlying the life history of a special form of cancer mutation.
Science is rapidly being brought into the electronic realm and electronic laboratory notebooks (ELN) are a big part of this activity. The representation of the scientific process in the context of an ELN is an important component to making the data recorded in ELNs semantically integrated.
This presentation outlined initial developments of an Electronic Notebook Ontology (ENO) that will help tie together the ExptML ontology, HCLS Community Profile data descriptions, and the VIVO-ISF ontology.
BioThings SDK: a toolkit for building high-performance data APIs in biologyChunlei Wu
This is from my talk at BOSC 2017.
What’s BioThings?
We use “BioThings” to refer to objects of any biomedical entity-type represented in the biological knowledge space, such as genes, genetic variants, drugs, chemicals, diseases, etc.
BioThings SDK
SDK represents “Software Development Kit”. BioThings SDK provides a Python-based toolkit to build high-performance data APIs (or web services) from a single data source or multiple data sources. It has the particular focus on building data APIs for biomedical-related entities, a.k.a “BioThings”, though it’s not necessarily limited to the biomedical scope. For any given “BioThings” type, BioThings SDK helps developers to aggregate annotations from multiple data sources, and expose them as a clean and high-performance web API.
VIVO: enabling the discovery of research and scholarshipPaul Albert
An introduction to VIVO, an open source, semantic web application that enables discovery of research and scholarship across institutions and one library's role in its implementation and development.
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
Tony Burdett's slides from his talk at Connected Data London. Tony is a Senior Software Engineer at The European Bioinformatics Institute. He presented the complexity of data at the EMBL-EBI and what is their solution to make sense of all this data.
Using the Semantic Web to Support Ecoinformaticsebiquity
We describe our on-going work in using the semantic web in support of ecological informatics, and demonstrate a distributed platform for constructing end-to-end use cases. Specifically, we describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface which allows scientists to semi-automatically construct distributed datasets relevant to the queries they want to ask. ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with Triple Shop and other semantic web resources.
Lightweight data engineering, tools, and software to facilitate data reuse an...Sean Davis
Lightweight tools, software, and publication processes that tie together data resources, analysis tools, documentation can powerful stimuli for the high-quality reuse of available data. While developed with reproducibility as a core value, Bioconductor tooling and infrastructure has reduced barriers to data reuse and established best practices for rich data and metadata sharing in genomics and proteomics. In this talk, I give a few examples and motivation for how the Bioconductor data ecosystem can be a model for other communities to enhance the value of available data.
2016 07 12_purdue_bigdatainomics_seandavisSean Davis
Newer, faster, cheaper molecular assays are driving biomedical research. I discuss the history of biomedical data including concepts of data sharing, hypothesis-driven vs generating research, and the potential to expand our thinking on biomedical research to be much more integrated through smart, creative, and open use of technologies and more flexible, longitudinal studies.
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
Jack Zhu presents a quick set of overview slides on the Bioconductor SRAdb package. The package allows SQL access to the Sequence Read Archive (SRA) repository at NCBI.
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor packageSean Davis
The Sequence Read Archive (SRA) is hosted at the National Center for Biomedical Informatics at the National Institutes of Health. These slides showcase Olivia Zhang's summer internship work to wrap a Bioconductor package, SRAdb, using the the shiny R package.
This slide set is meant to be a teaching guide to R functionality. It includes hands-on exercises meant to be used for an audience sitting in front of a computer.
This talk reviews some of the software packages available for R and Bioconductor to access NCBI Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). In particular, the GEOquery, GEOmetadb, and SRAdb packages are discussed.
1. Bioinformatics Resources An Incomplete List April 11, 2011, SS/SC Retreat Sean Davis, M.D., Ph.D. Genetics Branch, Center for Cancer Research National Cancer Institute National Institutes of Health
4. Patient and Population Characteristics Gene Expression Gene Copy Number Transcriptional Regulation phenotype DNA Methylation Chromatin Structure and Function Sequence Variation
13. Training and Education Google CIT/Helix training http://training.cit.nih.gov NIH Library http://nihlibrary.nih.gov/ResearchTools/Pages/Bioinformatics.aspx Cold Spring Harbor, AACR, OpenHelix, BioConductor, ….