The BioAssay Research Database (BARD) aims to enable scientists to utilize data from the Molecular Libraries Program Collection (MLPCN) to generate new hypotheses. BARD provides a platform for public data sharing and analysis through intuitive query and visualization tools accessible via a web portal or desktop client. BARD integrates data from multiple sources and centers, and aims to improve data annotation and standardization to enable more meaningful experiment descriptions and discovery. The project involves ongoing community engagement and development of new analytical tools through its open API and plugin framework.
Piwowar AMIA 2008: Identifying data sharing in biomedical literatureHeather Piwowar
Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to finding shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.
Piwowar AMIA 2008: Identifying data sharing in biomedical literatureHeather Piwowar
Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to finding shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...ChemAxon
Building on the success of the Molecular Libraries Program (MLP), the Broad Institute MLP team is co-leading with the National Center for Advancing Translational Sciences (NCATS) an NIH-sponsored project across 7 institutions to augment the data in PubChem with the creation of the Bioassay Research Database (BARD). The BARD platform standardizes the representation of bioassays in a next-generation repository and provides a user-friendly interface that supports sophisticated queries and data mining. Data originating from publicly-funded chemical biology research efforts will be presented with appropriate context including structured assay and result annotations. These annotations use relevant ontologies including, for example, the BioAssay Ontology, Gene Ontology, and the Unit Ontology. We simplified the representation of ontologies into a hierarchical data dictionary to enable data producers to more easily create and upload projects, assays, and results, while creating two separate user interfaces for data consumers. The BARD WebQuery Interface leverages a Google-like interface with auto-suggest functionality for complex queries, such as retrieval of all assays, and results for biological pathways such as “DNA repair” or “oxidative stress”; presentation of this information in a rich-user interface that includes spreadsheet support for structure-activity relationship analyses. Compounds, projects, and assays can be exported into an Amazon-like query cart for refining queries, and additional computations can be executed on datasets via community-developed plug-ins including promiscuity analyses via the BioActivity Data Associative Promiscuity Pattern Learning Engine (BADAPPLE) and a CYP450 metabolism site prediction plugin (hgp://www.farma.ku.dk/smartcyp/) using 2D structure fingerprints. Integration between the WebQuery and Desktop clients enables power users to initiate analyses in WebQuery and gain more insight via the Desktop client.
Lastly, as industry and academia work together to innovate in small-molecule therapeutics, we have created an initial specification for the Assay Definition Standard. This standard through the Assay Definition Format has been used as the medium of data file transfer for data upload. We expect that the Chemical Biology community now has an opportunity to leverage this standard to routinely transfer assay and result data within and between information systems and organizations.
This presentation will highlight the BARD platform with a focus on representing the cumulative body of work that exploits the ChemAxon toolkit.
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
Bob Stanley, CEO, IO Informatics, explains the utility to RDF as a standard way of defining and redefining data that could have utility in managing life science information.
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...ChemAxon
Building on the success of the Molecular Libraries Program (MLP), the Broad Institute MLP team is co-leading with the National Center for Advancing Translational Sciences (NCATS) an NIH-sponsored project across 7 institutions to augment the data in PubChem with the creation of the Bioassay Research Database (BARD). The BARD platform standardizes the representation of bioassays in a next-generation repository and provides a user-friendly interface that supports sophisticated queries and data mining. Data originating from publicly-funded chemical biology research efforts will be presented with appropriate context including structured assay and result annotations. These annotations use relevant ontologies including, for example, the BioAssay Ontology, Gene Ontology, and the Unit Ontology. We simplified the representation of ontologies into a hierarchical data dictionary to enable data producers to more easily create and upload projects, assays, and results, while creating two separate user interfaces for data consumers. The BARD WebQuery Interface leverages a Google-like interface with auto-suggest functionality for complex queries, such as retrieval of all assays, and results for biological pathways such as “DNA repair” or “oxidative stress”; presentation of this information in a rich-user interface that includes spreadsheet support for structure-activity relationship analyses. Compounds, projects, and assays can be exported into an Amazon-like query cart for refining queries, and additional computations can be executed on datasets via community-developed plug-ins including promiscuity analyses via the BioActivity Data Associative Promiscuity Pattern Learning Engine (BADAPPLE) and a CYP450 metabolism site prediction plugin (hgp://www.farma.ku.dk/smartcyp/) using 2D structure fingerprints. Integration between the WebQuery and Desktop clients enables power users to initiate analyses in WebQuery and gain more insight via the Desktop client.
Lastly, as industry and academia work together to innovate in small-molecule therapeutics, we have created an initial specification for the Assay Definition Standard. This standard through the Assay Definition Format has been used as the medium of data file transfer for data upload. We expect that the Chemical Biology community now has an opportunity to leverage this standard to routinely transfer assay and result data within and between information systems and organizations.
This presentation will highlight the BARD platform with a focus on representing the cumulative body of work that exploits the ChemAxon toolkit.
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
Bob Stanley, CEO, IO Informatics, explains the utility to RDF as a standard way of defining and redefining data that could have utility in managing life science information.
Case Study in Linked Data and Semantic Web: Human GenomeDavid Portnoy
The National Human Genome Research Institute's "GWAS Catalog" (Genome-Wide Association Studies) project is a successful implementation of Linked Data (http://linkeddata.org/) and Semantic Web (http://www.w3.org/standards/semanticweb/) concepts. This deck discusses how this project has been implemented, challenges faced and possible paths for the future.
Automated and Explainable Deep Learning for Clinical Language Understanding a...Databricks
Unstructured free-text medical notes are the only source for many critical facts in healthcare. As a result, accurate natural language processing is a critical component of many healthcare AI applications like clinical decision support, clinical pathway recommendation, cohort selection, patient risk or abnormality detection.
The Open PHACTS project delivers an online platform integrating a wide variety of data from across chemistry and the life sciences and an ecosystem of tools and services to query this data in support of pharmacological research, turning the semantic web from a research project into something that can be used by practising medicinal chemists in both academia and industry. In the summer of 2015 it was the first winner of the European Linked Data Award. At the Royal Society of Chemistry we have provided the chemical underpinnings to this system and in this talk we review its development over the past five years. We cover both our early work on semantic modelling of chemistry data for the Open PHACTS triplestore and more recent work building an all-purpose data platform, for which the Open PHACTS data has been an important test case, what has worked well, what's missing and where this is is likely to go in future.
The design of chemical libraries is usually informed by pre-existing characteristics and desired features. On the other hand, assesing the prospective performance of a new library is more difficult. Importantly, a given screening library is often screened in a variety of systems which can differ in cell lines, readouts, formats and so on. In this study we explore to what extent pre-existing libraries can shed light on the relation between library activity and assay features. Using an ontology such as the BAO, it is possible to construct a hierarchy of annotations associated with an assay. Based on this annotation hierarchy we can then ask how likely are molecules associated with a specific annotation, to be identified as active. To allow generalization we consider substrucural features, as represented by a structural key fingerprint, rather than whole molecules. We employ a Bayesian framework to quantify the the association between a substructural feature and a given assay annotation, using a set of NCGC assays that have been annotated with BAO terms. We discuss our approach to training the Bayesian model and describe benchmarks that characterize model performance relative to the position of the annotation in the BAO hierarchy. Finally we discuss the role of this approach in a library design workflow that includes traditional design features such as chemical space coverage and physicochemical properties but also takes in to account screening platform features.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
The BioAssay Research Database
1. The
BioAssay
Research
Database
A
Pla4orm
to
Support
the
Collec:on,
Management
and
Analysis
of
Chemical
Biology
Data
hCp://bard.nih.gov
ACS
Na'onal
Mee'ng
New
Orleans
@AskTheBARD
April
7,
2013
2. Direct
Contributors
NIH Molecular Libraries – Glenn McFadden, Ajay Pillai
NIH Chemical Genomics Center – Chris Austin (PI), John Braisted, Marc
Ferrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, Noel
Southall, Henrike Veith
Broad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, Joshua
Bittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, Siva
Dandapani, Andrea DeSouza, Dan Durkin, David Lahr, Jeri Levine, Judy
McGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), Gil
Walzer, Xiaorong Xiang
University of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea,
Larry Sklar, Oleg Ursu, Anna Waller, Jeremy Yang
University of Miami – Saminda Abeyruwan, Hande Küküc, Vance
Lemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, Stephan
Schürer, Uma Vempati, Ubbo Visser
Vanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley, Shaun
Stauffer
Sanford-Burnham Medical Research Institute – “T.C.” Chung, Jena
Diwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass,
Anthony Pinkerton, Derek Stonich
Scripps Research Institute – Yasel Cruz, Mark Southern
3. BARD: BioAssay Research Database
BARD’s mission is to enable novice and expert scientists to
effectively utilize MLP data to generate new hypotheses
• Unique collaboration amongst NIH and academic centers
with expertise in screening and software development
• Developed as an open-source, industrial-strength platform
to support public translational research.
• Provides opportunity to address existing cheminformatics barriers
o Deploy predictive models
o Foster new methods to interpret chemical biology data
o Enable private data sharing
o Develop and adopt a Assay Data Standard with tools to:
o Annotate assays to a minimum standards and definitions
o Integrate and extend existing ontologies for meaningful experiment
descriptions
o Enable assay creation, registration and modification
o Provide an easy-to-use portal and an advanced desktop
client
4. Engagement
&
Milestones
Summer
2011
MLP issues administrative supplement and call for proposals to
create the Molecular Libraries Biological Database
January
2012
Inaugural
mee'ng
of
MLPCN
Stakeholders
&
NIH
MLP
PT
February
2012
Update
on
progress-‐
data
extrac'on
&
annota'on,
test
plaKorm
selec'on,
GUI
design
&
test,
Outreach
March
2012
BARD
Program
Kick-‐off
April
2012
Outreach
strategy
&
tac'c
session
at
UNM
w/
subteam
May
–
July
2012
Discussions
with
and
reviews
of
Amgen,
Vertex,
Novar's,
Sanofi
assay
registra'on
and
chem-‐bio
informa'on
query
systems
November
2012
Conducted
mul'-‐level
usability
interviews
on
BARD
GUI
&
func'on
w/
Dir.
Computa'on,
Informa'cs/Lab
Mgr,
TA
Lead,
Dir.
Chem,
Med
chem,
Db
developer,
Cmpd
curator
January
2013
BARD
Review
by
Ext.
Sci
Panel
&
Public
alpha
release
(CAP,
REST
API,
Web
&
Desktop
clients)
March
2013
BARD
limited
beta-‐release
–
then
transi'on
to
enabling
science
5. BARD
Technology
Components
Define & Register
Assays
Enable Hypothesis Generation
Data Dictionary – std terms
Catalog of Assay Protocols
High Quality Data &
Result Deposition
Calculations & Results
Project-experiment association
Query & Interpret
Information
Intuitive Guided Queries
Cross Assay & SAR centric views
Advance applications
Novice
Expert
6. Where
Are
We
today?
CAP, Data Dictionary, Dictionary defined as
and Results OWL using Protégé
Deposition Data
model created & Annotations for 85%
populated of MLPCN
experiments &
CAP UI with View and projects loaded via
basic editing spreadsheet
Warehouse loaded Manual annotation of
with all PubChem AIDs ~70% completed
AIDs and results by centers
~95% of PubChem
Warehouse loaded result types mapped
with GO terms, KEGG to BARD dictionary
terms, and DrugBank
annotations ~70% of PubChem
columns mapped to
BARD result types
7. The
BARD
Data
Warehouse
• Running on MySQL with replication
• 0.85 TB of data…
– 151M result rows
– 46M compound rows
• Locally deployed at UNM
• Planning to build better packaging
– VM based deployment
8. Open
Source
As
Far
as
Possible
http://bard.nih.gov/api
Jersey Webapps
deployed on HA
Application
Server Cluster
Caching Layer
ETL Database Text Search Engine Structure Search Engine
9. The
BARD
Public
API
• Java, REST-like, read-only, deployed on
Glassfish cluster
• Different functionality
hosted in different
containers API Plugins
– Maintenance, security
– Stability Text Struct
– Performance Search Search
• Versioned Data Warehouse
• Fully documented
10. API
Resources
• Extensive list of
resources covering
many data types
• Each resource
supports a variety of
sub-resources
– Usually linked to
other resources
11. API
Level
of
Detail
• Supports different
levels of detail
• Allows clients to trade-
off detail for speed
• Good for mobile apps
12. API
Caching
&
Storage
• Caching is enabled at resource level
• The API supports ETags
– Every request returns an ETag in the header
– With If-None-Match, supports web caching
• We also abuse ETags to support persistent
references to collections
• An ETag can refer to other ETags recursively
– Allows clients to create and store arbitrarily
complex collections
• Not permanent, not infinite!
13. Annota:ng
Data
• To best exploit the current data set, and
encourage discoverability, we need to
better structure the data
– Annotate all assays to a minimum standard
– Integrate and extend existing ontologies to
support meaningful experiment descriptions
– Develop processes
BARD
Assay
Definition
Hierarchy
and tools to BARD Dictionary & Term Hierarchy
enable assay BioAssay Ontology BioAssay Ontology
Gene Ontology
BioAssay Ontology
Gene Ontology
BioAssay Ontology
registration Uniprot Uniprot Uniprot
Chemical Ontology
Entrez
Disease Ontology
Unit Ontology Unit Ontology
14. (Pseudo)
Linked
Data
• Full text search enabled by Solr
– Enables filtering, faceting, auto-suggest
– Key entry point for users
– Type ahead suggestions provide guidance
• By virtue of manual associations of data
types, we enable “linked data”
– Allows searches to indicate what matched the
query and how
– Solr supports sophisticated scoring schemes
• Doesn’t yet take advantage of ontologies
15. Desktop
Client
• Support large datasets
• Merge private &
public data
• Examine SAR
16. Web
Client
Google-‐like
searching
of:
4,000+
assays,
35M+
compounds,
300+
projects
Amazon-‐like
Query
Cart
Save
items
of
interest
for
further
analysis
Filter
on
annota'ons,
such
as
detec'on
method
type
17. Community
Engagement
• Sustained outreach efforts
– 7 MLPCN sites participating
• Facilitate access, driven by compelling use-
cases and stakeholder feedback
– Assay definition standard is collaboration with
industrial partners in addition to MLPCN
• Publish APIs for data access, first-adopters
• A ‘BARD App Store’: Enabling new
approaches to data integration, mining
– Promiscuity calculations
– CYP450 prediction
18. Extending
BARD
with
Plugins
• BARD supports deployment of external code
as part of core API
• Plugins can access the data warehouse via
direct calls
– No need to go via REST API
• Plugin resources can accept anything
– Text, JSON, files, links, …
• Plugin responses can be anything
– Plain text, JSON, HTML, SVG, …
20. BARD
-‐
SMARTCyp
• Predicts site of metabolism by CYP450
isoforms using 2D structures
• Developed by Patrik Rydberg and co-
workers
• Released under LGPL
• BARD plugin exposes two resources
– Summary HTML view
– Data view (JSON)
22. BARD - BADAPPLE
• BioActivity Data Associative
Promiscuity Pattern Learning Engine
• Associations via scaffolds for chemical
space navigation.
Example
URI*
descrip'on
<base>/badapple/prom/cid/ For
compound
with
specified
ID,
752424
return
scaffold
IDs
and
scores.
<base>/badapple/prom/cid/ Addi'onal
sta's'cs,
scaffold
smiles,
752424?expand=true
and
inDrug
flag.
<base>/badapple/prom/ For
scaffold
with
specified
ID,
scafid/233
return
sta's'cs
and
smiles.
23. On the Horizon
• Reproducibility
– Be honest with me …
• Private data in the context of public data
– Local installs, molecule hashes
• Mobile
– Compounds as funny looking QR tags
23
24. Long-Term Path Forward
• BARD is not just a data store – it’s a platform
– Seamlessly interact with users’ preferred tools
– Allows the community to tailor it to their needs
– Serve as a meeting ground for experimental and
computational methods
– Enhance collaboration opportunities
– Consider cloud deployment
• Enhance the ability to translate data from
individual experiments to systems level insight