The document discusses web services for bioinformatics. It notes that most computing resources in life sciences sit idle or are dominated by a few power users due to lack of awareness or difficulty of use. It promotes the use of web services via SOAP and WSDL as a standard way to programmatically access bioinformatics tools over the web. Examples are given of various tools and workflows that can be built using bioinformatics web services. Challenges including security, data types and service relocation are also discussed.
Learn about library guides and open source software to create them, presented by Katie Lynn in March 2010 for Get On The Bus Wyoming: http://getonthebuswyoming.wordpress.com/.
This presentation was provided by Salwa Ismail of Georgetown University during the NISO webinar, Library as Publisher, Part Two, held on Wednesday, March 14, 2018.
Learn about library guides and open source software to create them, presented by Katie Lynn in March 2010 for Get On The Bus Wyoming: http://getonthebuswyoming.wordpress.com/.
This presentation was provided by Salwa Ismail of Georgetown University during the NISO webinar, Library as Publisher, Part Two, held on Wednesday, March 14, 2018.
The “use” of an electronic resource from a social network analysis perspectiveMarie Kennedy
Presented at QQML 2013: Qualitative and Quantitative Methods in Libraries International Conference. Rome, Italy.
Academic libraries in the United States typically reference proxy server and/or COUNTER statistics to describe the usage of their electronic resources, but we know that a “use” is arguably more than a resource accessed or downloaded. This article employs social network analysis to bridge the typical ways of talking about usage statistics, to provide a context-specific perspective about the mediated use of electronic resources. The article reports on an analysis of data gathered at the Loyola Marymount University (Los Angeles, California) using traditional statistics as well as library reference encounters with patrons during which an electronic resource is mentioned. We use the reference encounters in a social network analysis to examine the relationship between a patron, a librarian, and an electronic resource to more fully describe the use of the resource. This research provides a conceptual model for comparison between traditional COUNTER statistics, proxy server statistics, and the social network analysis perspective. We transform qualitative data into quantitative data in order to develop a grounded theory about the mediated access to library electronic resources.
Presentation of "Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach", March 9, DGfS 34, Frankfurt Germany.
Find the paper at: http://www.springerlink.com/content/k535323272457913
From the Feb 19 2014 NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
The Web of Data - Ralph Swick, Domain Lead of the Information and Knowledge Domain at W3C
For many years, libraries have been dreaming about a simple, easy, fast search solution that unifies all of the resources into a single repository. In the current model, the user is faced with the problem of dealing with multiple information silos and no-compelling starting place in implementing their search. Recently, the introduction of a “Web-scale discovery” layer, or Next-Generation Catalog, can provide this starting point for library patrons. This talk will discuss how these Next-Generation library discovery applications can go beyond the local library holdings and beyond federated search to offer a single Google like search service across all local and subscription resources.
Realigning library services with e resources (ss)Dhanashree Date
The presentation is an introduction to various challenges that librarians face in managing e-resourcses. It provides helpful pointers to guie librarians on decisions with respect to licensing,
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
ODSC East virtual presentation - The best machine learning, and advanced analytics projects are often stopped when it comes time to move into large scale production, preventing them from ever impacting the business in a meaningful way. Hundreds of hours of work may never get put to use.
Python is rapidly becoming the language of choice for scientists and researchers of many types to build, test, train and score models. But when data science models need to go into production, challenges of performance and scale can be a huge roadblock.
By combining a Python application with an underlying massively parallel (MPP) database, Python users can achieve a simplified path to production. An MPP database also allows you to do data preparation and data analysis at far greater speeds, accelerating development and testing as well as production performance. It also allows greater numbers of concurrent jobs to run, while also continuously loading data for IoT or other streaming use cases.
Analyze data in the database where it sits, rather than first moving it to another framework, then analyzing it, then moving the results, taking multiple performance hits from both CPU and IO for every move and transformation.
In this talk, you will learn about combination architectures that can get your work into production, shorten development time, and provide the performance and scale advantages of an MPP database with the convenience and power of Python. Use case examples use the open source Vertica-Python project created by Uber with contributions from Twitter, Palantir, Etsy, Vertica, Kayak and Gooddata.
The “use” of an electronic resource from a social network analysis perspectiveMarie Kennedy
Presented at QQML 2013: Qualitative and Quantitative Methods in Libraries International Conference. Rome, Italy.
Academic libraries in the United States typically reference proxy server and/or COUNTER statistics to describe the usage of their electronic resources, but we know that a “use” is arguably more than a resource accessed or downloaded. This article employs social network analysis to bridge the typical ways of talking about usage statistics, to provide a context-specific perspective about the mediated use of electronic resources. The article reports on an analysis of data gathered at the Loyola Marymount University (Los Angeles, California) using traditional statistics as well as library reference encounters with patrons during which an electronic resource is mentioned. We use the reference encounters in a social network analysis to examine the relationship between a patron, a librarian, and an electronic resource to more fully describe the use of the resource. This research provides a conceptual model for comparison between traditional COUNTER statistics, proxy server statistics, and the social network analysis perspective. We transform qualitative data into quantitative data in order to develop a grounded theory about the mediated access to library electronic resources.
Presentation of "Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach", March 9, DGfS 34, Frankfurt Germany.
Find the paper at: http://www.springerlink.com/content/k535323272457913
From the Feb 19 2014 NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
The Web of Data - Ralph Swick, Domain Lead of the Information and Knowledge Domain at W3C
For many years, libraries have been dreaming about a simple, easy, fast search solution that unifies all of the resources into a single repository. In the current model, the user is faced with the problem of dealing with multiple information silos and no-compelling starting place in implementing their search. Recently, the introduction of a “Web-scale discovery” layer, or Next-Generation Catalog, can provide this starting point for library patrons. This talk will discuss how these Next-Generation library discovery applications can go beyond the local library holdings and beyond federated search to offer a single Google like search service across all local and subscription resources.
Realigning library services with e resources (ss)Dhanashree Date
The presentation is an introduction to various challenges that librarians face in managing e-resourcses. It provides helpful pointers to guie librarians on decisions with respect to licensing,
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
ODSC East virtual presentation - The best machine learning, and advanced analytics projects are often stopped when it comes time to move into large scale production, preventing them from ever impacting the business in a meaningful way. Hundreds of hours of work may never get put to use.
Python is rapidly becoming the language of choice for scientists and researchers of many types to build, test, train and score models. But when data science models need to go into production, challenges of performance and scale can be a huge roadblock.
By combining a Python application with an underlying massively parallel (MPP) database, Python users can achieve a simplified path to production. An MPP database also allows you to do data preparation and data analysis at far greater speeds, accelerating development and testing as well as production performance. It also allows greater numbers of concurrent jobs to run, while also continuously loading data for IoT or other streaming use cases.
Analyze data in the database where it sits, rather than first moving it to another framework, then analyzing it, then moving the results, taking multiple performance hits from both CPU and IO for every move and transformation.
In this talk, you will learn about combination architectures that can get your work into production, shorten development time, and provide the performance and scale advantages of an MPP database with the convenience and power of Python. Use case examples use the open source Vertica-Python project created by Uber with contributions from Twitter, Palantir, Etsy, Vertica, Kayak and Gooddata.
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes. Overview of work underway to add applications and computational analysis pipelines to iPlant for metagenomics and microbial ecology.
The features of a cloud based analysis platform perceived as most important and valuable change over an organization's life. Early stage companies need a relentless focus on agility and time to result. Later stage organizations focus on things like compliance, information security, integration into an enterprise strategy, and marginal cost. This talk will cover the journey from using a platform in early stage research through to using that same platform in manufacturing or the clinic.
Production Bioinformatics, emphasis on ProductionChris Dwan
Production bioinformatics at Sema4 can be thought of as data ops - a peer to the lab ops organization. We operate 24/7 to deliver correct and timely results on NGS and other data for thousands of samples per week. This deck introduces the Prod BI organization and systems architecture with a focus on what it takes to run bioinformatics in production rather than for R&D or pure research.
Training delivered in 2009 for a compute cluster customer in Calcutta, India. I honestly have no idea what I was thinking. There is no possible audience who would have been pleased with this talk.
No Free Lunch: Metadata in the life sciencesChris Dwan
This presentation covers some challenges and makes suggestions to support the work of creating flexible, interoperable data systems for the life sciences.
A response from Newport Construction to the city of Somerville's demand that we be compensated for the improper destruction of our trees.
In which they respond: "No."
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
2. http://bioteam.net
Totally Unscientific Impression
The vast majority of CPU cycles (clusters, SMP
machines, and grids) in the life sciences either sit
idle, or are dominated by a very few power users.
• Because:
– Most users aren’t aware of what they have
– Or, they don’t know how to use it
– Or, they’ve tried to use it, and it’s difficult
– Or, it doesn’t read their Excel data
– Or, they tried to use it last year, and it gave them incorrect
results
4. http://bioteam.net
Convergence
• Web interfaces, currently human-
friendly, will become machine-friendly
• Data formats and interfaces will begin
to standardize
• Heterogeneous platforms,
applications, and systems will begin to
interoperate
• Machines will begin to communicate
with each other in profound and
powerful new ways.
5. http://bioteam.net
Computing For Science
• Many user models
• Many applications, mostly open source,
some quite proprietary
• Cooperative, collaborative, yet competitive
• Compute and data intensive
• Rapid rate of growth / change
• There is no single solution.
Many skill levels: Physicist -> MD
7. http://bioteam.net
Core Problems
• Distribution
Data and applications are created and controlled by
autonomous groups all over the world
• Biology is difficult and messy:
Large collections of data, many data types and tools
developed in a massively distributed environment.
• Research code is different from business code
Rapid development, flexibility, “interactive” development
8. http://bioteam.net
Web Services
The World Wide Web is more and more used for application to
application communication. The programmatic interfaces
made available are referred to as Web Services.
•WSDL (advertisement)
–Machine readable
–An “interface contract” defining what
services are available via a particular
server
•SOAP (access)
–Independent of platform, language,
and transport protocol
9. http://bioteam.net
Why Web Services?
• Why not?
– CORBA, RMI, Bytecodes, Relocatable libraries,
The Grid, Opportunistic computing,
metacomputing …
• Selfish benefit to both publishers and users
– Easy publishing (no interface needed)
– Choice of client (command line .. integrated
workflow environments)
– Minimal buy-in
11. http://bioteam.net
Bioinformatic Web Services
• EBI SOAPLab, Emboss, Ensembl, …
• KEGG Pathway
• GO Gene Ontologies
• BioMOBY Objects for modeling data
• NCBI Netblast
• iNquiry Clustered tools
As more organizations adopt common standards,
those standards become more valuable
12. http://bioteam.net
The BioTeam
• Consulting company:
– Scientists,
Developers, IT
Professionals
• Expertise:
– Scientific, parallel,
distributed computing
– Infrastructure
– Optimization
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompresso
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
13. http://bioteam.net
BioTeam’s iNquiry
• iNquiry is two things:
– “Instant” cluster deployment kit
• Scheduler, Web Browser, integrated configuration
– Web portal for Bioinformatics
• 170+ applications pre-installed
• HTML interface
• SOAP / Web Services interface, integrated with Cluster tools
• OS X / Apple, HP, Linux, SGI, Orion Multisystems
• 190+ installations worldwide
– 170+ are Apple
– 2 -> 240 nodes
14. http://bioteam.net
iNquiry (2004)
• All interfaces defined by “PISE” XML
documents
– /usr/local/lib/Pise/5.a/Xml
– Other files created by scripts
HTML
PISE XML
CGI Scripts
PERL ModulesPISE Scripts
Cluster
16. http://bioteam.net
iNquiry Web Services
• Released, summer 2004
• Actually in use at Novartis, BMS, VBI
• Called from Perl, Java, Taverna, Inforsense,
Pipeline Pilot, VIBE, Apple Automater,
Applescript, … HTML
PISE XML
CGI Scripts
PERL ModulesPISE Scripts
Cluster
SOAP Interface
WSDL
19. http://bioteam.net
What Web Services Do Not Do
• Semantics
– Service ‘X’ must still be
interpreted and used in
some context.
– No OMG-like object
model imposed by
default!
– In bioinformatics, other
related projects
(BioMOBY, etc) attempt
to deal with semantic
issues.
20. http://bioteam.net
What Web Services Do
• Standard interface to arbitrary resources
• Allow someone else to write the interface
• Allow someone else to build the infrastructure
Completely split the interface from the service
provision
Divide and conquer
21. http://bioteam.net
PERL Web Service Client
$res = $server->blastall_simple(
SOAP::Data->name(TICKET)->value($ticket),
SOAP::Data->name("BLOCKING")->value(0),
SOAP::Data->name("blastall")->value("blastn"),
SOAP::Data->name("query")->value("$query_id"),
SOAP::Data->name("protein_db")->value("yeast.nt"),
SOAP::Data->name("nucleotid_db")->value("yeast.nt"),
SOAP::Data->name("tmp_outfile")
->value($query_id.".blastx")
);
31. http://bioteam.net
Stumbling Blocks
• Pass by reference (URL)
– SOAP data bloat
– MIME encode / decode
• System security
– Inadvertent DoS attacks are
easy
• Blocking / Timeouts
– Reattach
• Complex Data Types
• Service Relocation
32. http://bioteam.net
Plan For Failure
• Myron Livney (U. Wisconsin, Madison)
– Condor project: 20+ years of distributed
computing
– Management (pessimistic) rather than
engineering (optimistic) assumptions.
• Scheduling is complete when the job finishes, not
when it starts.
• Double check all results
• Assume each element will fail.
• Double-schedule the critical path
33. http://bioteam.net
Users (Research) are the Point
• Maximize user freedom
– Let users help each other:
• shared repository of workflows, codes, etc.
• mailing lists, chat rooms,
– If at all possible, provide source code
– The key problems are social / managerial
• Technical issues are simple by comparison.
• Include all possible resources
– Never try to get in the way of your users
Assume that users know what they’re doing
34. http://bioteam.net
Take Home
• Biology is difficult and messy
• IT and HPC are difficult and
messy
• Federate, don’t integrate (divide
and conquer)
• Web Services (WSDL and
SOAP) are the standard of
choice.
• If your resources are sitting idle,
there is a problem, and it’s not
the users.
35. http://bioteam.net
Thank You
• Early adopters (iNquiry web services):
– Nathan Siemers (Bristol-Meyers Squibb)
– John Davies, Jeremy Jenkins (Novartis IBR)
– Dustin Machai (VBI)
– Tim Kunau*, Michael Heuer (CCGB, University of Minnesota)
• Collaborators & Partners:
– Tom Oinn (Taverna), Scitegic, Inforsense
• The Bioteam
– Michael Athanas, Chris Dagdigian, Stan Gloss, Bill Van Etten,
Jiesheng Zhang
• Bio-IT World / Life Sciences Expo