SlideShare a Scribd company logo
http://bioteam.net
Web Services for
Bioinformatics
Chris Dwan
The BioTeam
http://bioteam.net
Totally Unscientific Impression
The vast majority of CPU cycles (clusters, SMP
machines, and grids) in the life sciences either sit
idle, or are dominated by a very few power users.
• Because:
– Most users aren’t aware of what they have
– Or, they don’t know how to use it
– Or, they’ve tried to use it, and it’s difficult
– Or, it doesn’t read their Excel data
– Or, they tried to use it last year, and it gave them incorrect
results
http://bioteam.net
Bioinformatic
s
In the XXI
Century
Lincoln Stein’s “Bioinformatics Nation”
http://bioteam.net
Convergence
• Web interfaces, currently human-
friendly, will become machine-friendly
• Data formats and interfaces will begin
to standardize
• Heterogeneous platforms,
applications, and systems will begin to
interoperate
• Machines will begin to communicate
with each other in profound and
powerful new ways.
http://bioteam.net
Computing For Science
• Many user models
• Many applications, mostly open source,
some quite proprietary
• Cooperative, collaborative, yet competitive
• Compute and data intensive
• Rapid rate of growth / change
• There is no single solution.
Many skill levels: Physicist -> MD
http://bioteam.net
No single solution
http://bioteam.net
Core Problems
• Distribution
Data and applications are created and controlled by
autonomous groups all over the world
• Biology is difficult and messy:
Large collections of data, many data types and tools
developed in a massively distributed environment.
• Research code is different from business code
Rapid development, flexibility, “interactive” development
http://bioteam.net
Web Services
The World Wide Web is more and more used for application to
application communication. The programmatic interfaces
made available are referred to as Web Services.
•WSDL (advertisement)
–Machine readable
–An “interface contract” defining what
services are available via a particular
server
•SOAP (access)
–Independent of platform, language,
and transport protocol
http://bioteam.net
Why Web Services?
• Why not?
– CORBA, RMI, Bytecodes, Relocatable libraries,
The Grid, Opportunistic computing,
metacomputing …
• Selfish benefit to both publishers and users
– Easy publishing (no interface needed)
– Choice of client (command line .. integrated
workflow environments)
– Minimal buy-in
http://bioteam.net
Web Services Adoption?
• Languages
– PERL, C, C++, Objective C, Ruby, Java,
Applescript, Python, …
• Open Source Graphical Clients
– Taverna
• Commercial SOAP / WSDL Clients
– Inforsense, Pipeline Pilot, TurboWorx, VIBE, OS
X, Mathematica, Spotfire, …
http://bioteam.net
Bioinformatic Web Services
• EBI SOAPLab, Emboss, Ensembl, …
• KEGG Pathway
• GO Gene Ontologies
• BioMOBY Objects for modeling data
• NCBI Netblast
• iNquiry Clustered tools
As more organizations adopt common standards,
those standards become more valuable
http://bioteam.net
The BioTeam
• Consulting company:
– Scientists,
Developers, IT
Professionals
• Expertise:
– Scientific, parallel,
distributed computing
– Infrastructure
– Optimization
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompresso
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
http://bioteam.net
BioTeam’s iNquiry
• iNquiry is two things:
– “Instant” cluster deployment kit
• Scheduler, Web Browser, integrated configuration
– Web portal for Bioinformatics
• 170+ applications pre-installed
• HTML interface
• SOAP / Web Services interface, integrated with Cluster tools
• OS X / Apple, HP, Linux, SGI, Orion Multisystems
• 190+ installations worldwide
– 170+ are Apple
– 2 -> 240 nodes
http://bioteam.net
iNquiry (2004)
• All interfaces defined by “PISE” XML
documents
– /usr/local/lib/Pise/5.a/Xml
– Other files created by scripts
HTML
PISE XML
CGI Scripts
PERL ModulesPISE Scripts
Cluster
http://bioteam.net
iNquiry Interface
blastall.xml
<pise>
<head>
<title>BLASTALL</title>
<version>2.2.1</version>
<description>with gaps</description>
<authors>Altschul, Madden, Schaeffer, Zhang, Miller, Lipman</authors>
<category>NCBI</category>
<reference>Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaeffer,J
inghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Gapped
BLAS
T and PSI-BLAST: a new generation of protein database search programs,
Nucleic
Acids Res. 25:3389-3402.</reference>
<doclink>http://www.ncbi.nih.gov/Education/BLASTinfo/information3.html<
/doclink>
</head>
http://bioteam.net
iNquiry Web Services
• Released, summer 2004
• Actually in use at Novartis, BMS, VBI
• Called from Perl, Java, Taverna, Inforsense,
Pipeline Pilot, VIBE, Apple Automater,
Applescript, … HTML
PISE XML
CGI Scripts
PERL ModulesPISE Scripts
Cluster
SOAP Interface
WSDL
http://bioteam.net
A Vision for Web Services – Based
Computing
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Chemistry
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Chemistry
Workflow Tools/Scripts
Pipeline Pilot, Perl
Web Services
Pise, InQuiry
Job Distribution/Management
LSF
Clustered ComputingClustered Computing
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Chemistry
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Chemistry
Workflow Tools/Scripts
Pipeline Pilot, Perl
Web Services
Pise, InQuiry
Job Distribution/Management
LSF
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Biology/Chemistry
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Biology/Chemistry
Workflow Tools/Scripts
Pipeline Pilot, Perl
Web Services
InQuiry/Pise
Job Distribution/Management
LSF / SGE
Expert
Users
System
Administrators
Novice
Users
http://bioteam.net
What Web Services Don’t Do
• Traditional scheduler tasks:
– Job Control
– Queuing
– Scheduling
– Failure handling
http://bioteam.net
What Web Services Do Not Do
• Semantics
– Service ‘X’ must still be
interpreted and used in
some context.
– No OMG-like object
model imposed by
default!
– In bioinformatics, other
related projects
(BioMOBY, etc) attempt
to deal with semantic
issues.
http://bioteam.net
What Web Services Do
• Standard interface to arbitrary resources
• Allow someone else to write the interface
• Allow someone else to build the infrastructure
Completely split the interface from the service
provision
Divide and conquer
http://bioteam.net
PERL Web Service Client
$res = $server->blastall_simple(
SOAP::Data->name(TICKET)->value($ticket),
SOAP::Data->name("BLOCKING")->value(0),
SOAP::Data->name("blastall")->value("blastn"),
SOAP::Data->name("query")->value("$query_id"),
SOAP::Data->name("protein_db")->value("yeast.nt"),
SOAP::Data->name("nucleotid_db")->value("yeast.nt"),
SOAP::Data->name("tmp_outfile")
->value($query_id.".blastx")
);
http://bioteam.net
Example Taverna Workflow: Running
Blast
http://bioteam.net
Inforsense Workflow - Microarray Normalization
http://bioteam.net
Pipeline Pilot Web Service Plugin
http://bioteam.net
OS X Tiger - Automator
http://bioteam.net
Re-publication
• Most high level tools
can publish their
protocols as web
services
• All can also call
published web services
• It’s turtles all the way
down.
http://bioteam.net
This can lead to difficulties
http://bioteam.net
Sneak Preview
http://bioteam.net
Excel Web Services Plugin
http://bioteam.net
http://bioteam.net
Stumbling Blocks
• Pass by reference (URL)
– SOAP data bloat
– MIME encode / decode
• System security
– Inadvertent DoS attacks are
easy
• Blocking / Timeouts
– Reattach
• Complex Data Types
• Service Relocation
http://bioteam.net
Plan For Failure
• Myron Livney (U. Wisconsin, Madison)
– Condor project: 20+ years of distributed
computing
– Management (pessimistic) rather than
engineering (optimistic) assumptions.
• Scheduling is complete when the job finishes, not
when it starts.
• Double check all results
• Assume each element will fail.
• Double-schedule the critical path
http://bioteam.net
Users (Research) are the Point
• Maximize user freedom
– Let users help each other:
• shared repository of workflows, codes, etc.
• mailing lists, chat rooms,
– If at all possible, provide source code
– The key problems are social / managerial
• Technical issues are simple by comparison.
• Include all possible resources
– Never try to get in the way of your users
Assume that users know what they’re doing
http://bioteam.net
Take Home
• Biology is difficult and messy
• IT and HPC are difficult and
messy
• Federate, don’t integrate (divide
and conquer)
• Web Services (WSDL and
SOAP) are the standard of
choice.
• If your resources are sitting idle,
there is a problem, and it’s not
the users.
http://bioteam.net
Thank You
• Early adopters (iNquiry web services):
– Nathan Siemers (Bristol-Meyers Squibb)
– John Davies, Jeremy Jenkins (Novartis IBR)
– Dustin Machai (VBI)
– Tim Kunau*, Michael Heuer (CCGB, University of Minnesota)
• Collaborators & Partners:
– Tom Oinn (Taverna), Scitegic, Inforsense
• The Bioteam
– Michael Athanas, Chris Dagdigian, Stan Gloss, Bill Van Etten,
Jiesheng Zhang
• Bio-IT World / Life Sciences Expo
http://bioteam.net

More Related Content

What's hot

Alamw15 VIVO
Alamw15 VIVOAlamw15 VIVO
Alamw15 VIVO
Kristi Holmes
 
The “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspectiveThe “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspective
Marie Kennedy
 
Ldl2012
Ldl2012Ldl2012
OALT- Create.Collaborate.Communicate
OALT- Create.Collaborate.CommunicateOALT- Create.Collaborate.Communicate
OALT- Create.Collaborate.CommunicateKathy Hicks
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
National Information Standards Organization (NISO)
 
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
Harvesting From Many Silos at Web-scale Makes E-content Truly  DiscoverableHarvesting From Many Silos at Web-scale Makes E-content Truly  Discoverable
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
Electronic Resources & Libraries
 
ERMes: An Open Source ERM (Speaker's Notes)
ERMes: An Open Source ERM (Speaker's Notes)ERMes: An Open Source ERM (Speaker's Notes)
ERMes: An Open Source ERM (Speaker's Notes)
Galadriel Chilton
 
Graph Structure In The Web
Graph Structure In The WebGraph Structure In The Web
Graph Structure In The Web
dailyye
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataMatthew Rowe
 

What's hot (9)

Alamw15 VIVO
Alamw15 VIVOAlamw15 VIVO
Alamw15 VIVO
 
The “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspectiveThe “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspective
 
Ldl2012
Ldl2012Ldl2012
Ldl2012
 
OALT- Create.Collaborate.Communicate
OALT- Create.Collaborate.CommunicateOALT- Create.Collaborate.Communicate
OALT- Create.Collaborate.Communicate
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
Harvesting From Many Silos at Web-scale Makes E-content Truly  DiscoverableHarvesting From Many Silos at Web-scale Makes E-content Truly  Discoverable
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
 
ERMes: An Open Source ERM (Speaker's Notes)
ERMes: An Open Source ERM (Speaker's Notes)ERMes: An Open Source ERM (Speaker's Notes)
ERMes: An Open Source ERM (Speaker's Notes)
 
Graph Structure In The Web
Graph Structure In The WebGraph Structure In The Web
Graph Structure In The Web
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
 

Similar to 2006 bio it web services

Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
Maté Ongenaert
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
Tao Xie
 
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
Rafael C. Jimenez
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
Nguyen Tung
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
Tao Xie
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
Chris Dwan
 
Web Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide WebWeb Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide Web
Artificial Intelligence Institute at UofSC
 
Realigning library services with e resources (ss)
Realigning library services with e resources (ss)Realigning library services with e resources (ss)
Realigning library services with e resources (ss)
Dhanashree Date
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
Yannick Pouliot
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Paige_Roberts
 
sem_web_slides_k2013.ppt
sem_web_slides_k2013.pptsem_web_slides_k2013.ppt
sem_web_slides_k2013.ppt
RichaAngel2
 
Ict uses in libraries
Ict uses in librariesIct uses in libraries
Ict uses in libraries
Liaquat Rahoo
 
Blockchains and databases a new era in distributed computing
Blockchains and databases a new era in distributed computingBlockchains and databases a new era in distributed computing
Blockchains and databases a new era in distributed computing
InfinIT - Innovationsnetværket for it
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
myGrid team
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
thomasrconnor
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
e-Learning Delivery System : The Challenges
e-Learning Delivery System : The Challengese-Learning Delivery System : The Challenges
e-Learning Delivery System : The Challenges
Denpong Soodphakdee
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of Librarianship
Liaquat Rahoo
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
Tao Xie
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 

Similar to 2006 bio it web services (20)

Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Web Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide WebWeb Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide Web
 
Realigning library services with e resources (ss)
Realigning library services with e resources (ss)Realigning library services with e resources (ss)
Realigning library services with e resources (ss)
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
sem_web_slides_k2013.ppt
sem_web_slides_k2013.pptsem_web_slides_k2013.ppt
sem_web_slides_k2013.ppt
 
Ict uses in libraries
Ict uses in librariesIct uses in libraries
Ict uses in libraries
 
Blockchains and databases a new era in distributed computing
Blockchains and databases a new era in distributed computingBlockchains and databases a new era in distributed computing
Blockchains and databases a new era in distributed computing
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
e-Learning Delivery System : The Challenges
e-Learning Delivery System : The Challengese-Learning Delivery System : The Challenges
e-Learning Delivery System : The Challenges
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of Librarianship
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 

More from Chris Dwan

Somerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdfSomerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdf
Chris Dwan
 
2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf
Chris Dwan
 
One Size Does Not Fit All
One Size Does Not Fit AllOne Size Does Not Fit All
One Size Does Not Fit All
Chris Dwan
 
Somerville FY23 Proposed Budget
Somerville FY23 Proposed BudgetSomerville FY23 Proposed Budget
Somerville FY23 Proposed Budget
Chris Dwan
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
Chris Dwan
 
#Defund thepolice
#Defund thepolice#Defund thepolice
#Defund thepolice
Chris Dwan
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user training
Chris Dwan
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
Chris Dwan
 
Somerville ufc memo tree hearing
Somerville ufc memo   tree hearingSomerville ufc memo   tree hearing
Somerville ufc memo tree hearing
Chris Dwan
 
2011 career-fair
2011 career-fair2011 career-fair
2011 career-fair
Chris Dwan
 
Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)
Chris Dwan
 
"The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You""The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You"
Chris Dwan
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPC
Chris Dwan
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
Chris Dwan
 
Proposed tree protection ordinance
Proposed tree protection ordinanceProposed tree protection ordinance
Proposed tree protection ordinance
Chris Dwan
 
Tree Ordinance Change Matrix
Tree Ordinance Change MatrixTree Ordinance Change Matrix
Tree Ordinance Change Matrix
Chris Dwan
 
Tree protection overhaul
Tree protection overhaulTree protection overhaul
Tree protection overhaul
Chris Dwan
 
Response from newport
Response from newportResponse from newport
Response from newport
Chris Dwan
 
Sacramento underpass bid_docs
Sacramento underpass bid_docsSacramento underpass bid_docs
Sacramento underpass bid_docs
Chris Dwan
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition
Chris Dwan
 

More from Chris Dwan (20)

Somerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdfSomerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdf
 
2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf
 
One Size Does Not Fit All
One Size Does Not Fit AllOne Size Does Not Fit All
One Size Does Not Fit All
 
Somerville FY23 Proposed Budget
Somerville FY23 Proposed BudgetSomerville FY23 Proposed Budget
Somerville FY23 Proposed Budget
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
#Defund thepolice
#Defund thepolice#Defund thepolice
#Defund thepolice
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user training
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Somerville ufc memo tree hearing
Somerville ufc memo   tree hearingSomerville ufc memo   tree hearing
Somerville ufc memo tree hearing
 
2011 career-fair
2011 career-fair2011 career-fair
2011 career-fair
 
Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)
 
"The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You""The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You"
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPC
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Proposed tree protection ordinance
Proposed tree protection ordinanceProposed tree protection ordinance
Proposed tree protection ordinance
 
Tree Ordinance Change Matrix
Tree Ordinance Change MatrixTree Ordinance Change Matrix
Tree Ordinance Change Matrix
 
Tree protection overhaul
Tree protection overhaulTree protection overhaul
Tree protection overhaul
 
Response from newport
Response from newportResponse from newport
Response from newport
 
Sacramento underpass bid_docs
Sacramento underpass bid_docsSacramento underpass bid_docs
Sacramento underpass bid_docs
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition
 

Recently uploaded

GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 

Recently uploaded (20)

GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 

2006 bio it web services

  • 2. http://bioteam.net Totally Unscientific Impression The vast majority of CPU cycles (clusters, SMP machines, and grids) in the life sciences either sit idle, or are dominated by a very few power users. • Because: – Most users aren’t aware of what they have – Or, they don’t know how to use it – Or, they’ve tried to use it, and it’s difficult – Or, it doesn’t read their Excel data – Or, they tried to use it last year, and it gave them incorrect results
  • 4. http://bioteam.net Convergence • Web interfaces, currently human- friendly, will become machine-friendly • Data formats and interfaces will begin to standardize • Heterogeneous platforms, applications, and systems will begin to interoperate • Machines will begin to communicate with each other in profound and powerful new ways.
  • 5. http://bioteam.net Computing For Science • Many user models • Many applications, mostly open source, some quite proprietary • Cooperative, collaborative, yet competitive • Compute and data intensive • Rapid rate of growth / change • There is no single solution. Many skill levels: Physicist -> MD
  • 7. http://bioteam.net Core Problems • Distribution Data and applications are created and controlled by autonomous groups all over the world • Biology is difficult and messy: Large collections of data, many data types and tools developed in a massively distributed environment. • Research code is different from business code Rapid development, flexibility, “interactive” development
  • 8. http://bioteam.net Web Services The World Wide Web is more and more used for application to application communication. The programmatic interfaces made available are referred to as Web Services. •WSDL (advertisement) –Machine readable –An “interface contract” defining what services are available via a particular server •SOAP (access) –Independent of platform, language, and transport protocol
  • 9. http://bioteam.net Why Web Services? • Why not? – CORBA, RMI, Bytecodes, Relocatable libraries, The Grid, Opportunistic computing, metacomputing … • Selfish benefit to both publishers and users – Easy publishing (no interface needed) – Choice of client (command line .. integrated workflow environments) – Minimal buy-in
  • 10. http://bioteam.net Web Services Adoption? • Languages – PERL, C, C++, Objective C, Ruby, Java, Applescript, Python, … • Open Source Graphical Clients – Taverna • Commercial SOAP / WSDL Clients – Inforsense, Pipeline Pilot, TurboWorx, VIBE, OS X, Mathematica, Spotfire, …
  • 11. http://bioteam.net Bioinformatic Web Services • EBI SOAPLab, Emboss, Ensembl, … • KEGG Pathway • GO Gene Ontologies • BioMOBY Objects for modeling data • NCBI Netblast • iNquiry Clustered tools As more organizations adopt common standards, those standards become more valuable
  • 12. http://bioteam.net The BioTeam • Consulting company: – Scientists, Developers, IT Professionals • Expertise: – Scientific, parallel, distributed computing – Infrastructure – Optimization QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompresso are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 13. http://bioteam.net BioTeam’s iNquiry • iNquiry is two things: – “Instant” cluster deployment kit • Scheduler, Web Browser, integrated configuration – Web portal for Bioinformatics • 170+ applications pre-installed • HTML interface • SOAP / Web Services interface, integrated with Cluster tools • OS X / Apple, HP, Linux, SGI, Orion Multisystems • 190+ installations worldwide – 170+ are Apple – 2 -> 240 nodes
  • 14. http://bioteam.net iNquiry (2004) • All interfaces defined by “PISE” XML documents – /usr/local/lib/Pise/5.a/Xml – Other files created by scripts HTML PISE XML CGI Scripts PERL ModulesPISE Scripts Cluster
  • 15. http://bioteam.net iNquiry Interface blastall.xml <pise> <head> <title>BLASTALL</title> <version>2.2.1</version> <description>with gaps</description> <authors>Altschul, Madden, Schaeffer, Zhang, Miller, Lipman</authors> <category>NCBI</category> <reference>Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaeffer,J inghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Gapped BLAS T and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25:3389-3402.</reference> <doclink>http://www.ncbi.nih.gov/Education/BLASTinfo/information3.html< /doclink> </head>
  • 16. http://bioteam.net iNquiry Web Services • Released, summer 2004 • Actually in use at Novartis, BMS, VBI • Called from Perl, Java, Taverna, Inforsense, Pipeline Pilot, VIBE, Apple Automater, Applescript, … HTML PISE XML CGI Scripts PERL ModulesPISE Scripts Cluster SOAP Interface WSDL
  • 17. http://bioteam.net A Vision for Web Services – Based Computing Scientific Questions Sequence Analysis, Genomic Profiling, Computational Chemistry Scientific Questions Sequence Analysis, Genomic Profiling, Computational Chemistry Workflow Tools/Scripts Pipeline Pilot, Perl Web Services Pise, InQuiry Job Distribution/Management LSF Clustered ComputingClustered Computing Scientific Questions Sequence Analysis, Genomic Profiling, Computational Chemistry Scientific Questions Sequence Analysis, Genomic Profiling, Computational Chemistry Workflow Tools/Scripts Pipeline Pilot, Perl Web Services Pise, InQuiry Job Distribution/Management LSF Scientific Questions Sequence Analysis, Genomic Profiling, Computational Biology/Chemistry Scientific Questions Sequence Analysis, Genomic Profiling, Computational Biology/Chemistry Workflow Tools/Scripts Pipeline Pilot, Perl Web Services InQuiry/Pise Job Distribution/Management LSF / SGE Expert Users System Administrators Novice Users
  • 18. http://bioteam.net What Web Services Don’t Do • Traditional scheduler tasks: – Job Control – Queuing – Scheduling – Failure handling
  • 19. http://bioteam.net What Web Services Do Not Do • Semantics – Service ‘X’ must still be interpreted and used in some context. – No OMG-like object model imposed by default! – In bioinformatics, other related projects (BioMOBY, etc) attempt to deal with semantic issues.
  • 20. http://bioteam.net What Web Services Do • Standard interface to arbitrary resources • Allow someone else to write the interface • Allow someone else to build the infrastructure Completely split the interface from the service provision Divide and conquer
  • 21. http://bioteam.net PERL Web Service Client $res = $server->blastall_simple( SOAP::Data->name(TICKET)->value($ticket), SOAP::Data->name("BLOCKING")->value(0), SOAP::Data->name("blastall")->value("blastn"), SOAP::Data->name("query")->value("$query_id"), SOAP::Data->name("protein_db")->value("yeast.nt"), SOAP::Data->name("nucleotid_db")->value("yeast.nt"), SOAP::Data->name("tmp_outfile") ->value($query_id.".blastx") );
  • 23. http://bioteam.net Inforsense Workflow - Microarray Normalization
  • 26. http://bioteam.net Re-publication • Most high level tools can publish their protocols as web services • All can also call published web services • It’s turtles all the way down.
  • 31. http://bioteam.net Stumbling Blocks • Pass by reference (URL) – SOAP data bloat – MIME encode / decode • System security – Inadvertent DoS attacks are easy • Blocking / Timeouts – Reattach • Complex Data Types • Service Relocation
  • 32. http://bioteam.net Plan For Failure • Myron Livney (U. Wisconsin, Madison) – Condor project: 20+ years of distributed computing – Management (pessimistic) rather than engineering (optimistic) assumptions. • Scheduling is complete when the job finishes, not when it starts. • Double check all results • Assume each element will fail. • Double-schedule the critical path
  • 33. http://bioteam.net Users (Research) are the Point • Maximize user freedom – Let users help each other: • shared repository of workflows, codes, etc. • mailing lists, chat rooms, – If at all possible, provide source code – The key problems are social / managerial • Technical issues are simple by comparison. • Include all possible resources – Never try to get in the way of your users Assume that users know what they’re doing
  • 34. http://bioteam.net Take Home • Biology is difficult and messy • IT and HPC are difficult and messy • Federate, don’t integrate (divide and conquer) • Web Services (WSDL and SOAP) are the standard of choice. • If your resources are sitting idle, there is a problem, and it’s not the users.
  • 35. http://bioteam.net Thank You • Early adopters (iNquiry web services): – Nathan Siemers (Bristol-Meyers Squibb) – John Davies, Jeremy Jenkins (Novartis IBR) – Dustin Machai (VBI) – Tim Kunau*, Michael Heuer (CCGB, University of Minnesota) • Collaborators & Partners: – Tom Oinn (Taverna), Scitegic, Inforsense • The Bioteam – Michael Athanas, Chris Dagdigian, Stan Gloss, Bill Van Etten, Jiesheng Zhang • Bio-IT World / Life Sciences Expo