HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only genes, RNA, protein and compounds but also the complicated interactions among them. Yet, even in the most thoroughly studied model plant Arabidopsis thaliana, the knowledge regarding these interactions are scattered throughout literatures and various public databases. Thus, new scientific discovery by exploring these complex and heterogeneous data remains a challenge task for biologists.
We developed a graph-search empowered platform named HRGRN to search known and, more importantly, discover the novel relationships among genes in Arabidopsis biological networks. The HRGRN includes over 51,000 “nodes” that represent very large sets of genes, proteins, small RNAs, and compounds and approximately 150,000 “edges” that are classified into nine types of interactions (interactions between proteins, compounds and proteins, transcription factors (TFs) and their downstream target genes, small RNAs and their target genes, kinases and downstream target genes, transporters and substrates, substrate/product compounds and enzymes, as well as gene pairs with similar expression patterns to provide deep insight into gene-gene relationships) to comprehensively model and represent the complex interactions between nodes. .
The HRGRN allows users to discover novel interactions between genes and/or pathways, and build sub-networks from user-specified seed nodes by searching the comprehensive collections of interactions stored in its back-end graph databases using graph traversal algorithms. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. Currently, we are collaborating the Araport team to develop REST-like web services and provide the HRGRN’s graph search functions to Araport system.
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Agnes Chan (J. Craig Venter Institute)
A Guided Tour of Araport
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Blake Meyers (University of Delaware)
A Community Collaborator Perspective: Case study 2 - Small RNA DBs
Tripal within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
Araport plans to implement a Chado-backed data warehouse, fronted by Tripal, serving as as our core database, used to track multiple versions of genome annotation (TAIR10, Araport11, etc.), evidentiary data (used by our annotation update pipeline), metadata such as publications collated from multiple sources like TAIR, NCBI PubMed and UniProtKB (curated and unreviewed) and stock/germplasm data linked to AGI loci via their associated polymorphisms.
ICAR 2015
Plenary session (MONDAY, JULY 6, 2015, 10:15-10:30 AM)
Chris Town (J. Craig Venter Institute)
Araport: your one-stop-shop for Arabidopsis data in the 21st century
JBrowse within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
Araport integrates JBrowse visualization software from GMOD. In order to support diverse sets of locally and remotely sourced tracks, the “ComboTrackSelector” JBrowse plugin was developed to enable the capability to partition metadata rich tracks in the “Faceted” selector while using the default “Hierarchical” selector for everything else.
A dynamic sequence viewer add-on, “SeqLighter”, was developed using the BioJS framework (http://biojs.net/), configured offer end-users with the capability to view the genomic sequence underlying the gene models (genic regions plus customizable flanking regions), highlight sub-features (like UTRs, exons, introns, start/stop codons) and export the annotated output in various formats (SVG, PNG, JPEG).
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only genes, RNA, protein and compounds but also the complicated interactions among them. Yet, even in the most thoroughly studied model plant Arabidopsis thaliana, the knowledge regarding these interactions are scattered throughout literatures and various public databases. Thus, new scientific discovery by exploring these complex and heterogeneous data remains a challenge task for biologists.
We developed a graph-search empowered platform named HRGRN to search known and, more importantly, discover the novel relationships among genes in Arabidopsis biological networks. The HRGRN includes over 51,000 “nodes” that represent very large sets of genes, proteins, small RNAs, and compounds and approximately 150,000 “edges” that are classified into nine types of interactions (interactions between proteins, compounds and proteins, transcription factors (TFs) and their downstream target genes, small RNAs and their target genes, kinases and downstream target genes, transporters and substrates, substrate/product compounds and enzymes, as well as gene pairs with similar expression patterns to provide deep insight into gene-gene relationships) to comprehensively model and represent the complex interactions between nodes. .
The HRGRN allows users to discover novel interactions between genes and/or pathways, and build sub-networks from user-specified seed nodes by searching the comprehensive collections of interactions stored in its back-end graph databases using graph traversal algorithms. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. Currently, we are collaborating the Araport team to develop REST-like web services and provide the HRGRN’s graph search functions to Araport system.
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Agnes Chan (J. Craig Venter Institute)
A Guided Tour of Araport
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Blake Meyers (University of Delaware)
A Community Collaborator Perspective: Case study 2 - Small RNA DBs
Tripal within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
Araport plans to implement a Chado-backed data warehouse, fronted by Tripal, serving as as our core database, used to track multiple versions of genome annotation (TAIR10, Araport11, etc.), evidentiary data (used by our annotation update pipeline), metadata such as publications collated from multiple sources like TAIR, NCBI PubMed and UniProtKB (curated and unreviewed) and stock/germplasm data linked to AGI loci via their associated polymorphisms.
ICAR 2015
Plenary session (MONDAY, JULY 6, 2015, 10:15-10:30 AM)
Chris Town (J. Craig Venter Institute)
Araport: your one-stop-shop for Arabidopsis data in the 21st century
JBrowse within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
Araport integrates JBrowse visualization software from GMOD. In order to support diverse sets of locally and remotely sourced tracks, the “ComboTrackSelector” JBrowse plugin was developed to enable the capability to partition metadata rich tracks in the “Faceted” selector while using the default “Hierarchical” selector for everything else.
A dynamic sequence viewer add-on, “SeqLighter”, was developed using the BioJS framework (http://biojs.net/), configured offer end-users with the capability to view the genomic sequence underlying the gene models (genic regions plus customizable flanking regions), highlight sub-features (like UTRs, exons, introns, start/stop codons) and export the annotated output in various formats (SVG, PNG, JPEG).
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Nick Provart (University of Toronto)
A Community Collaborator Perspective: Case study 1 - BioAnalytic Resource
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...Araport
PMR database is a community resource for deposition and analysis of metabolomics data and related transcriptomics data. PMR currently houses metabolomics data from over 25 species of eukaryotes. In this talk, we introduce PMRs RESTful web APIs for data sharing, and demonstrate its applications in research using Araport to provide Arabidopsis metabolomics data.
Presented in the "New and Updated Bioinformatics Datasets, Tools and Resources" at the 28th International Conference on Arabidopsis Research (ICAR 2017) held in St. Louis, MO.
Thursday, June 22nd, 2017
Written and presented by Tom Ingraham (F1000), at the Reproducible and Citable Data and Model Workshop, in Warnemünde, Germany. September 14th -16th 2015.
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest.
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Nick Provart (University of Toronto)
A Community Collaborator Perspective: Case study 1 - BioAnalytic Resource
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...Araport
PMR database is a community resource for deposition and analysis of metabolomics data and related transcriptomics data. PMR currently houses metabolomics data from over 25 species of eukaryotes. In this talk, we introduce PMRs RESTful web APIs for data sharing, and demonstrate its applications in research using Araport to provide Arabidopsis metabolomics data.
Presented in the "New and Updated Bioinformatics Datasets, Tools and Resources" at the 28th International Conference on Arabidopsis Research (ICAR 2017) held in St. Louis, MO.
Thursday, June 22nd, 2017
Written and presented by Tom Ingraham (F1000), at the Reproducible and Citable Data and Model Workshop, in Warnemünde, Germany. September 14th -16th 2015.
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest.
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Matt Vaughn (Texas Advanced Computing Center)
Developing Apps: Exposing your data through Araport
Arabidopsis Information Portal overview from Plant Biology Europe 2014Matthew Vaughn
An overview of the design, technical decisions, and implementation of the Arabidopsis Information Portal community-extensible data sharing and analytics platform.
apidays LIVE Hong Kong - Orchestrating APIs at Scale by Hieu Nguyen Nhuapidays
apidays LIVE Hong Kong - The Open API Economy: Finance-as-a-Service & API Ecosystems
Orchestrating APIs at Scale
Hieu Nguyen Nhu , Senior Cloud Native Technical Specialist at Microsoft
Getting Started with API Management – Why It's Needed On-prem and in the CloudRevelation Technologies
APIs are one of the main elements of cloud services. All major cloud service providers expose REST APIs to allow you to programmatically access their services and capabilities. SOAP and REST are the two most common ways of exposing APIs, whether to external, partner, cloud, or internal developers.
The concept of API management is to publish these web APIs for consumption, and includes capabilities such as monitoring, security, and documentation.
This presentation introduces basic concepts of APIs, API management, cloud REST services, and a brief walkthrough of WSO2 API Manager and Oracle API Gateway to see how you can centrally publish, expose, and secure APIs, essentially virtualizing your backend services.
The Query Service is the new platform solution for querying a variety of data sources. The goal of Query Service is that administrators can configure a metadata description of the data source that can then be used by end users without detailed knowledge of the underlying data source. This session explains how to configure Query Service data sources and use them with the RESTful API or component collection.
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionJasonRafeMiller
The Arabidopsis Information Portal (araport.org) is a resource for the plant genomics research community. The AIP conducts developer workshops to help other labs get involved. This presentation introduces the web site with a case study about contributing new module built around a legacy data set.
Oracle API Platform Cloud Service Best Practices & Lessons Learntluisw19
I did this presentation in Split/Croatia on March 2017 where I shared our experiences and insights when implementing the Oracle API Platform Cloud Service.
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataMatthew Vaughn
Araport is an innovative model organism database resource that offers users the ability to bring their own visualizations, data sets, algorithms, and genome browser tracks and share them with their colleagues.
FIWARE Identity Management and Access ControlFIWARE
This training camp teaches you how FIWARE technologies and iSHARE, brought together under the umbrella of the i4Trust initiative, can be combined to provide the means for creation of data spaces in which multiple organizations can exchange digital twin data in a trusted and efficient manner, collaborating in the development of innovative services based on data sharing and creating value out of the data they share. SMEs and Digital Innovation Hubs (DIHs) will be equipped with the necessary know-how to use the i4Trust framework for creating data spaces!
INTERFACE by apidays 2023 - Something Old, Something New, Colin Domoney, 42Cr...apidays
INTERFACE by apidays 2023
APIs for a “Smart” economy. Embedding AI to deliver Smart APIs and turn into an exponential organization
June 28 & 29, 2023
https://www.apidays.global/interface/
Something Old, Something New - OWASP API Security Top 10 in 2023
Colin Domoney, CTO at 42Crunch
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Business Applications Integration In The CloudAnna Brzezińska
Filip Rogaczewski - Atlassian Connect Team Lead.
Presentation from Gdansk University of Technology about integration business application in the cloud i.e. how to integrate 50 000+ servers together.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Cancer cell metabolism: special Reference to Lactate Pathway
Vaughn aip walkthru_pag2015
1. araport.org
Extending the Arabidopsis
Information Portal: A Developer’s
Perspective
Matt Vaughn
Director, Life Sciences Computing
Texas Advanced Computing Center
vaughn@tacc.utexas.edu | @mattdotvaughn |
www.slideshare.net/mattdotvaughn
2. araport.org
Web APIs: Problem Statement
• Lack of web services for legacy data
– There are a lot of web SITES
• Existing web services don’t share
information architecture
– Negatively impacts interoperability,
discoverability, & usability
• Browser security models are punitively
complex
– Hard to build apps integrating multiple
sources
3. araport.org
Gold standard Data APIs
• Implement REST-like interfaces
• Served over HTTPS (with valid SSL certificate)
• Allow Cross Origin Scripting Support (CORS)
• Require authentication
– Understand and respond to client demographics
– Meter access to services
• Simple controlled vocabulary + metadata for query
parameters
• Responses conform to accepted JSON schemas*
• Support future AIP deep caching & mining efforts**
* Except where it makes sense not to
** Based on tech like ElasticSearch or neo4j
4. araport.org
Araport Service Architecture
RESTful API @ https://api.araport.org/
CLI clients,
Scripts, 3rd party
applications
Physical
resources
Agave Core
apps
meta
files
profile
jobssystems
ADAMA
manage
enroll
a b c d e f
AIP + 3rd party data
providers
API Types
• Query
• Map*
• Generic
• Pass-through
• Single-sign on
• Metering
• Unified logging
• API versioning
• Automatic HTTPS +
CORS
REST*
CGI
SOAP
New
Web
Services
InterMin
e
Chado &
Tripal
Computing
Storage
Database
5. araport.org
Araport Service Architecture
RESTful API @ https://api.araport.org/
CLI clients,
Scripts, 3rd party
applications
Physical
resources
Agave Core
apps
meta
files
profile
jobssystems
ADAMA
manage
enroll
a b c d e f
AIP + 3rd party data
providers
API Types
• Query
• Map*
• generic
• pass-through
• Single-sign on
• Throttling
• Unified logging
• API versioning
• Automatic
HTTPS
REST*
CGI
SOAP
New
Web
Services
InterMin
e
Chado &
Tripal
Computing
Storage
Database
6. araport.org
Data API Types
Type Inputs Outputs Notes
query AIP parameters
mandatory
AIP-aligned JSON Gold standard
data APIs
map AIP parameters
preferred
Transformed JSON Ideal for
implementing
namespace
transformations or
filters
generic AIP parameters
preferred
Specified within
code but can be any
valid Content-type
Implement return
of non-JSON data
passthrough Specified by remote
service
Specified by remote
service
Allows existing
services to be
discoverable from
AIP data store
7. araport.org
Data API Reserved Parameters
Name Description Validator (Case-insensitive)
locus AGI Gene Locus
Identifiers
AT[1-5GM][0-5]{5,5}$
transcript AGI Transcript
Identifiers
AT[1-5GM][0-9]{5,5}.[0-9]{1,3}$
identifier Another string plausibly
expected to identify a
gene or transcript
Valid alphanumeric string. No
whitespace.
chromosom
e
A. thaliana Col-0
chromosome identifiers
CHR[1-5MC]$
start/end Coordinates within Col-0
assembly
Numeric. Should be range-checked.
strand Defines genomic strand [+-.]{1,1}
accession Ecotypes or natural
accessions
Not validated at present
term Generic search term Valid text string. Useful for
8. araport.org
Rationalized Responses via
lightweight JSON schemas
• Facilitate creation of mash-up client
applications
• Enable extraction and mining of the
Arabidopsis deep web
• Facilitate future interoperability with
semantic web technology without forcing
their adoption
Minimal, machine validated rules for what AIP
responses should look like
11. araport.org
Interacting with Araport APIs (2)
Araport web services are available in every Javascript console
Data API namespace
Individual Data API
> Agave.api.adama.getNamespaces()
13. araport.org
Creating an Araport Data API (1)
• Decide on a type of Data API to build
• Initialize a local Git repository
• Author a main function (Python only for now)
• Test that it works in your local Python
interpreter
• Write a metadata.yml file describing the
service
• Push the local repository up to Github*
• Perform an authenticated HTTP POST to the
ADAMA service with a link to your repo
• Verify that the service was created
successfully
• Test it out via HTTP request* Or any public git server
Principles
A. All development is done on a
local system
B. Almost no software
dependencies beyond standard
system contents
C. Source code is always public
D. Testing via same routes as
usage
E. Easy to iterate if things go awry
2
3
4
5
1. Write code
2. Publish code
3. Register repository
4. Code deployed
5. Use web service
1
14. araport.org
Science Apps: Problem Statement
• Technical hurdles for developing web
applications
– Technology selection
– Development and testing environment setup
• The small number of applications that get
built are often not reusable
19. araport.org
App Security
• Apps deployed to AIP are sandboxed
– Only the user creating the app can
access/use
– Publication workflow for AIP staff to code
review and functionality review before making
public
• App code is partitioned
– Kept separate from the rest of AIP Portal code
– Only executes in user’s browser, not on
server
• App artifact hosting is limited
20. araport.org
Apps Workspace
• Drupal module
• Apps upload/ingest from public git
repositories
• User-created “workspaces”
• Private, shared*, public apps
30. araport.org
Developer Support
Online Tutorial Topic Link
Getting started http://bit.ly/aip-get-started
Technical overview http://bit.ly/aip-overview
Your first AIP app http://bit.ly/aip-first-app
Araport APIs and authentication http://bit.ly/aip-agave-auth
Creating a data-driven
application
http://bit.ly/aip-build-app
Deploying your app to Araport http://bit.ly/aip-deploy
Creating web services for Araport http://bit.ly/aip-websvcs
Linking to Araport content http://bit.ly/aip-link
• Bookmark araport.org/devzone
• Follow @araport on Twitter
• Join araport-developers Google Group
• Follow Arabidopsis-Information-Portal GitHub
31. araport.org
Chris Town, PI
Lisa McDonald
Education and
Outreach
Coordinator
Chris Nelson
Project
Manager
Jason Miller, Co-PI
JCVI Technical Lead
Erik Ferlanti
Software Engineer
Vivek Krishnakumar
Bioinf. Engineer
Svetlana Karamycheva
Bioinf Engineer
Eva Huala
Project lead, TAIR
Bob Muller
Technical lead, TAIR
Gos Micklem, co-PI Sergio Contrino
Software Engineer
Matt Vaughn
co-PI
Steve Mock
Portal Engineer
Rion Dooley,
API Engineer
Matt Hanlon,
Portal Engineer
Maria Kim
Bioinf Engineer
Ben Rosen
Bioinf
Analyst
Joe Stubbs,
API
Engineer
Walter Moreira,
API Engineer
33. araport.org
Araport Service Architecture
RESTful API @ https://api.araport.org/
CLI clients,
Scripts, 3rd party
applications
Physical
resources
Agave Core
apps
meta
files
profile
jobssystems
ADAMA
manage
enroll
a b c d e f
AIP + 3rd party data
providers
API Types
• Query
• Map*
• generic
• pass-through
• Single-sign on
• Throttling
• Unified logging
• API versioning
• Automatic
HTTPS
REST*
CGI
SOAP
New
Web
Services
InterMin
e
Chado &
Tripal
Computing
Storage
Database
34. araport.org
ADAMA Road Map
• Automatic live documentation including
params
• Parameter validation at query time
• Response validation via JSON schema
• Automated provenance and attribution
• Language support (Java, Javascript, Perl)
• Full command line interface
• Status monitoring and notification
• Better “Data API Store”
• Per-namespace and-service Access Control
Lists
35. araport.org
Community Engagement
• Existing APIs + source turned over to the community
for additional development
• Community request for comment (RFC)
– Parameter metadata
– JSON Response schemas
– Provenance and attribution features
• Developing documentation, examples and tutorial
material
– Complete the entire API publication and usage lifecycle
without direct AIP intervention or personal support
• Assisting community in their development efforts
36. araport.org
Code Examples
• https://github.com/Arabidopsis-Information-Portal/jcvi-rtpcr-demos
• https://github.com/Arabidopsis-Information-
Portal/aip_thalemine_webservices
• https://github.com/Arabidopsis-Information-Portal/atted_webservices
• https://github.com/Arabidopsis-Information-Portal/bar_webservices_demos
In addition to our tutorial code, these are good, illustrative examples of ADAMA
web services.
37. araport.org
ADAMA: Araport DAta Mediator API
AGAVE
API MANAGER
NoSQL intermediary
Endpoint
https://api.araport.org/community/v0.3/
Live Docs
https://adama-dev.tacc.utexas.edu/api/adama.html
38. araport.org
API Manager + Enterprise Service Bus
Araport architecture (2)
Secure, rationalized REST services
Consumer Applications
Simple
Proxy
ThaleMine, Data
integration, other
services
Cache
XML-to-
JSON
SOAP-to-
REST
CGI-to-
REST
Throttle
Legacy
API A
Legacy
API B
REST
API C
Simple
Proxy
• Single-sign on
• Throttling
• Unified
logging
• API versioning
• Mediation and
translation
• Dev-friendly
interfaces
• Rationalized
REST for
consumer
apps
Mediators
39. araport.org
Science Objectives
• Make more, varied data available to the
Arabidopsis (and other) communities
within a unified user experience
• Enhance the innate value of data by
offering enhanced search, retrieval, and
display capabilities
• Facilitate analysis of user data
• Enable community participation in
functional annotation
40. araport.org
Technical Objectives
• Deploy a responsive, flexible community-
extensible system
• Provide APIs everywhere!
• Promote and facilitate data integration
• Enable language- and region-specific
presentation of scientific content
• Meet mobile computing on its own terms
41. araport.org
Local vs. Data-driven Apps
Resources are local and
inherently offline. Operating
on local data using local
computing.
Resources are cloud-based and
inherently online. Multiple data
streams integrated, queried,
presented in context of broader
objective.
Photoshop Express KAYAK Pro
42. araport.org
Araport Bill of Materials
• Araport is currently built using
– Drupal 7.25
• Developer-oriented content management system
– Bootstrap.js and some other Javascript toolkits
– InterMine (with modifications)
– Bioinformatics infrastructure + misc. other bits
– Agave 2.0 Software as a Service platform
• Developed by iPlant Collaborative project
• Bulk data, metadata, authentication, HPC app and job
management, notifications & events, and more
• OAuth2 out of the box
• Enterprise service bus (ESB) architecture
• http://agaveapi.co/
43. araport.org
Agave wso2 interface
Cache (Technology TBD)
CSV
Araport APIM Architecture (1)
POLYMORPH CGI
Form
Input Key
Map
Output
Key Map
Input
Transform
Output
Transform
Listen Respond
Send Listen
Input Key
Map
Output
Key Map
Input
Transform
Output
Transform
Listen Respond
Send Listen
Araport API
Manager
JSON Query JSON Response
ElasticSearch
Remote Services
SNP by Locus REST Indel by Position REST Enroll Manage
44. araport.org
Araport Architecture: Use Cases (1)
• 1001 Genomes POLYMORPH tools
– Provides variation data via locus or positional
search
– Total of seven variant types available for search
– Search parameterization depends a lot on variant
type
– Example of a plain-text CGI service
– Returns results as CSV with named columns
• Objective: Transform into a RESTful API that
expects and returns rationalized JSON
http://polymorph.weigelworld.org
45. araport.org
Araport Architecture: Use Cases (2)
• ThaleMine
– Has native REST interface for general queries
– Has templates which can form basis of specific
services
• Objective: Offer both Intermine-native and
AIP-conformant interfaces as Data APIs
• Current path
– Enroll native services in our APIM
– Develop template-based AIP-conformant services
http://polymorph.weigelworld.org
46. araport.org
Data APIs: Getting Started
Service Queries Notes
BAR eFP Locus
BAR Expressologs Locus
BAR Interactions Locus
COGe Position Special case – output transform only
NASC $SERVICE Locus
SOAP based but may be offline
permanently
OrthologFinder Locus Based on a Thalemine template
POLYMORPH Locus, Position Actually seven CGI services
SUBA3 Locus
Compiling example queries, parameter mapping and description, and ideal
results for use in implementing the system
47. araport.org
Developing a Data API
• In order, we prefer that you have ready
• Well-documented REST
• Moderately well-documented REST
• SOAP services (plus WSDL or WADL)
• Plain Old XML
• Plaintext CGI
• HTML CGI
• No web services at all
• Work with us to enroll your services as a data
source. This will involve a minor amount of
coding.
48. araport.org
Computational App Model (1)
Host file
systems
Host OS
Docker.io
Centos
6.4
custom-
repo
Container
/scratch
/database
Host FS (250 GB)
TACC Corral (PB+)
sftp
Agave apps, data, jobs
REST API x JSON objects
49. araport.org
Science Apps: Grid View
• Current Scheme
• 2-3 column view w
draggable apps
• Apps are normal, full-
size, or collapsed
• Single app screen
• Later in 2014
• N x X grid scheme
implementing resizable
app “tiles” like one sees
in Android or Win8.x
• App SDK libraries will
have “help” for enabling
resizable design
• Multiple app screens
50. araport.org
Data API Details (2)
• For service-specific parameters
– Provide human-readable names mapped to original
parameter names
– Offer minimal descriptive text
– Specify validation
• Cardinality
• Pattern validator (regex)
• Type (number, string, etc.)
– Indicate whether required
– Indicate whether they should be visible in a UI
– Specify reasonable default values
• Seems familiar?
– This approach is used to to abstract command line apps
– Allows automatic generation of minimally functional UI
51. araport.org
Data APIs: Response types (1)
• locus_relationship – pairwise
relationship between A and B
– Directionality
– Type
– Array of scores (weights, etc.)
• sequence_feature – positional attribute
– Extension of GFF model plus
– Build
– Attributes array
52. araport.org
Data APIs: Response types (2)
• locus_feature – key-value attributes per locus
– Optional controlled vocabulary* for keys
– Support for both slots and arrays
• raw – for returning images or other binary formats
– Source and other metadata carried in X-headers instead of
JSON result
– Outbound transformation still supported
– Not a preferred response mode
• text – returning either native service response or a
non-conformant JSON document
– Source and other metadata carried in X-headers instead of
JSON result
– Not a preferred response mode
53. araport.org
Data API Details (6)
• Transparent caching will compensate for
transient remote service failures
• Automatic indexing of certain response
types via ElasticSearch, allowing for
sophisticated global search
– ElasticSearch allows us to index everything
we “know about” and return it quickly
– iPlant uses it to live-index >700 TB user data
54. araport.org
Developing an app
• Understand and document the user stories you’re
addressing with your app
• Identify all requisite data sources AND
• Help us prepare them as Data APIs
– This may involve coding
• Understand the data integration or aggregation needs
of your app
– This may involve coding
• Develop the user interface(s) for your app using our
tool kits and suggested practices
– This will involve coding.
– But you will learn tools like jQuery, Bootstrap, & D3 and will
thus be eminently employable!
Editor's Notes
MAIN POINT: Just about any bioinformatics-savvy person can implement this workflow
25 MINUTES
Service CGI. Mediated by Code A. Hosted by ADAMA. Served at API. Available to all consumers.