The document discusses high throughput screening (HTS) at the NIH Center for Translation Therapeutics (NCTT). It outlines the center's capabilities for small molecule and RNAi HTS, including assay formats, detection methods, quantitative high throughput screening (qHTS) to obtain dose response curves, and associated bioinformatics activities like automated curve fitting, data integration and structure-activity relationship analysis. The goal is to enable discoveries through these HTS approaches and computational analyses.
We act as a consolidated CRO company, providing extensive range of world-class services for drug discovery from our offices in seven locations in Northern Europe and India. We are serving over 150 pharma and biotech customers in over 20 countries. We bring our expertise and experience from more than 600 engagements with customers ranging from virtual biotechs to several global top 10 pharmaceutical companies.
Presentation entitled "Hit identification Strategies for Epigenetic Targets" at X-Gen Epigenetics iV, March 5-7th, 2012. Presentation was delivered by Dr Amy Quinn as I had a conflict which prevented my attendance
We act as a consolidated CRO company, providing extensive range of world-class services for drug discovery from our offices in seven locations in Northern Europe and India. We are serving over 150 pharma and biotech customers in over 20 countries. We bring our expertise and experience from more than 600 engagements with customers ranging from virtual biotechs to several global top 10 pharmaceutical companies.
Presentation entitled "Hit identification Strategies for Epigenetic Targets" at X-Gen Epigenetics iV, March 5-7th, 2012. Presentation was delivered by Dr Amy Quinn as I had a conflict which prevented my attendance
Presented by Dr. Miller at the 40th Annual Symposium "Diagnostic and Clinical Challenges of 20th Century Microbes", held on Nov 18, 2010 in Philadelphia.
This file includes the SLAS2013 presentations of Paul A. Johnston of University of Pittsburgh; Douglas Auld of Novartis Institutes for Biomedical Research; and Lisa Minor of In Vitro Strategies, LLC.
Presented by Dr. Miller at the 40th Annual Symposium "Diagnostic and Clinical Challenges of 20th Century Microbes", held on Nov 18, 2010 in Philadelphia.
This file includes the SLAS2013 presentations of Paul A. Johnston of University of Pittsburgh; Douglas Auld of Novartis Institutes for Biomedical Research; and Lisa Minor of In Vitro Strategies, LLC.
The design of chemical libraries is usually informed by pre-existing characteristics and desired features. On the other hand, assesing the prospective performance of a new library is more difficult. Importantly, a given screening library is often screened in a variety of systems which can differ in cell lines, readouts, formats and so on. In this study we explore to what extent pre-existing libraries can shed light on the relation between library activity and assay features. Using an ontology such as the BAO, it is possible to construct a hierarchy of annotations associated with an assay. Based on this annotation hierarchy we can then ask how likely are molecules associated with a specific annotation, to be identified as active. To allow generalization we consider substrucural features, as represented by a structural key fingerprint, rather than whole molecules. We employ a Bayesian framework to quantify the the association between a substructural feature and a given assay annotation, using a set of NCGC assays that have been annotated with BAO terms. We discuss our approach to training the Bayesian model and describe benchmarks that characterize model performance relative to the position of the annotation in the BAO hierarchy. Finally we discuss the role of this approach in a library design workflow that includes traditional design features such as chemical space coverage and physicochemical properties but also takes in to account screening platform features.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Pushing the limits of ePRTC: 100ns holdover for 100 days
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the NCTT
1. Enabling
Discoveries
at
High
Throughput
Small
molecule
and
RNAi
HTS
at
the
NCTT
Rajarshi
Guha
NIH
Center
for
Transla6on
Therapeu6cs
May
3,
2011
2. Outline
• Informa6cs
for
small
molecule
&
RNAi
screening
• HCA
&
automated
decision
making
– Pre7y
pictures
can
lead
to
more
efficient
screens
• Large
scale
cheminforma6cs
– We
can
do
it,
but
do
we
need
to?
3. NIH Chemical Genomics Center
• Founded
2004
as
part
of
NIH
Roadmap
Molecular
Libraries
Ini6a6ve
– NCGC
staffed
with
90+
scien6sts
–
biologists,
chemists,
informa6cians,
engineers
– Post-‐doc
program
• Mission
– MLPCN
(screening
&
chemical
synthesis;
compound
repository;
PubChem
database;
funding
for
assay,
library
and
technology
development
)
– Develop
new
chemical
probes
for
basic
research
and
leads
for
therapeu6c
development,
par6cularly
for
rare/neglected
diseases
– New
paradigms
&
applica6ons
of
HTS
for
chemical
biology
/
chemical
genomics
• All
NCGC
projects
are
collabora6ons
with
a
target
or
disease
expert;
currently
>200
collabora6ons
with
inves6gators
worldwide
7. qHTS:
High
Throughput
Dose
Response
Assay concentration ranges over 4 logs Informatics pipeline. Automated curve fitting
A
(high:~ 100 μM)
1536-well plates, inter-plate dilution series
and classification. 300K samples
C
Assay volumes 2 – 5 μL
B
Automated concentration-response data collection
~1 CRC/sec
8. Informa?cs
Ac?vi?es
• High
throughput
curve
fieng
• Data
integra6on,
automated
cherry
picking
• SAR
algorithms
– QSAR
modeling
– Fragment
based
analysis
– Ac6vity
cliffs
• Tools
–
standardizer,
tautomers,
fragment
acDvity
browser,
kinome
browser
and
more
• RNAi
hit
selec6on,
OTE
analysis
• High
content
analysis
9. Kinome
Navigator
• Browse
kinase
panel
data
• Currently
focused
on
the
Abbot
dataset
• View
• Fragments
• Target
pairs
• Kinome
overlay
hip://tripod.nih.gov
10. Fragment
Browser
• View
ac6vi6es
on
a
fragment
wise
basis
• Compare
ac6vity
distribu6ons
by
fragment
• Currently
based
around
ChEMBL
assays
but
users
can
browse
their
own
compounds
&
ac6vi6es
hip://tripod.nih.gov
11. Structure
Ac?vity
Landscapes
• Rugged
gorges
or
rolling
hills?
– Small
structural
changes
associated
with
large
ac6vity
changes
represent
steep
slopes
in
the
landscape
– But
tradi6onally,
QSAR
assumes
gentle
slopes
– We
can
characterize
the
landscape
using
SALI
Maggiora,
G.M.,
J.
Chem.
Inf.
Model.,
2006,
46,
1535–1535
12. What
Can
We
Do
With
SALI’s?
• SALI
characterizes
cliffs
&
non-‐cliffs
• For
a
given
molecular
representa6on,
SALI’s
gives
us
an
idea
of
the
smoothness
of
the
SAR
landscape
• Models
try
and
encode
this
landscape
• Use
the
landscape
to
guide
descriptor
or
model
selec6on
Guha,
R.;
Van
Drie,
J.H.,
J.
Chem.
Inf.
Model.,
2008,
48,
646–658
13. Predic?ng
the
Landscape
• Rather
than
predic6ng
ac6vity
directly,
we
can
try
to
predict
the
SAR
landscape
• Implies
that
we
aiempt
to
directly
predict
cliffs
– Observa6ons
are
now
pairs
of
molecules
Original
pIC50
SALI,
AbsDiff
SALI,
GeoMean
RMSE
=
0.97
RMSE
=
1.10
RMSE
=
1.04
Scheiber
et
al,
StaDsDcal
Analysis
and
Data
Mining,
2009,
2,
115-‐122
14. Data
Integra?on
• It’s
nice
to
simplify
data,
but
we
can
s6ll
be
faced
with
a
mul6tude
of
data
types
• We
want
to
explore
these
data
in
a
linked
fashion
• How
we
explore
and
what
we
explore
is
generally
influenced
by
the
task
at
hand
• At
one
point,
make
inferences
over
all
the
data
15. Data
Integra?on
User’s
Network
Content:
-‐ Drugs
-‐ Compounds
-‐ Scaffolds
-‐ Assays
-‐ Genes
-‐ Targets
-‐ Pathways
-‐ Diseases
-‐ Clinical
Trials
-‐ Documents
Links:
Network
of
Public
Data
-‐Manually
curated
-‐Derived
from
algorithms
20. Going
Beyond
Explora?on?
• Simply
being
able
to
explore
data
in
an
integrated
manner
is
useful
as
an
idea
generator
• Can
we
integrate
heterogenous
data
types
&
sources
to
get
a
systems
level
view?
– Current
research
problem
in
genomics
and
systems
biology
– Some
aiempts
have
been
made
to
merge
chemical
data
with
other
data
types
Young,
D.W.
et
al,
Nat.
Chem.
Biol.,
2008,
4,
59-‐68
21. RNAi
Facility
Mission
• Perform
collabora6ve
genome-‐wide
RNAi
screening-‐
based
projects
with
intramural
inves6gators
• Advance
the
science
of
RNAi
and
miRNA
screening
and
informa6cs
via
technology
development
to
improve
efficiency,
reliability,
and
costs.
Simple Phenotypes Pathway (Reporter Complex Phenotypes
(Viability, cytotoxicity, assays, e.g. luciferase, (High-content imaging, cell
oxidative stress, etc)! β-lactamase)! cycle, translocation, etc)!
Range of Assays!
22. RNAi
Effectors
RNAi effectors provide an excellent way to conduct gene-specific loss of
function studies."
23. Issues
Using
RNAi
Effectors
• RNAi effectors give a knockdown not a knockout (70% - 80% is considered
good). Therefore, they may not silence enough to give a phenotype even if the
target is involved in what you are assaying for."
• RNAi effectors induce off-target effects!!!!! "
24. Examples of of
Current
Projects
Examples
Current Projects
•
Protein
Quality
Control
•
Poxvirus
•
DNA
Re-‐replica6on
•
Respiratory
Viruses
•
Base
Excision
Repair
•
Lysosomal
Storage
Disorders
•
DNA
Damage
–
ELG1
stabiliza6on
•
Parkinsons
–
Mitochondrial
Quality
Control
•
An6oxidant
Response
•
Ewings
Sarcoma
•
Hypoxia
•
Drug
Modifiers,
Pancrea6c
Cancer
•
TNFa
Response
•
Drug
Modifiers,
TOP1
Clinical
•
Interferon
Response
Agents
•
iPS
to
RPE
•
Immunotoxin-‐Mediated
Cell
Death
26. RNAi
Libraries
Ambion Human Genome- Ambion Mouse Genome-Wide
Wide Library, 21,585 genes, 3 Library, 17,582 genes, 3
unique siRNAs per gene. " unique siRNAs per gene."
Dharmacon Human Duet Human and Mouse miRNA
Genome-Wide siRNA Mimic Libraries &
Libraries, 18,236 genes, Human miRNA Inhibitor
siRNA pools." Library"
Qiagen Human Druggable Kinome Libraries"
Genome Library, > 7,000
Purchased from a number of
genes, 4 unique siRNAs per
vendors."
gene."
• Smaller libraries (e.g. kinome and miRNA mimics) will enable high-impact screens
in systems less amenable to high throughput applications."
• Considerations are being made for additional species and shRNA resources."
27. Druggable
Genome
Screening
Campaign
Pseudo-colored Blue/Green Ratio
(Normalized to plate Median)
• Over 7,000 genes, 4
unique siRNAs per gene
(≈36,000 wells).
• 85 genes were selected Significant enrichment for core
for follow-up through a NF-kB components
variety of threshold-based Percent Reduction in NF-kB Signal
100
selection schemes. Qiagen siRNAs
Ambion siRNAs
Average Inhibition (%)
80
• 27 genes were validated
as confident hits using 60
siRNAs from multiple 40
vendors.
20
0
TNFα Receptor IKKα
RELA NEMO
28. Druggable
Genome
Screening
Campaign
Significant enrichment for proteins that form the 28S
proteasome
Percent Reduction in NF-kB Signal Qiagen
Ambion RPN
100 19S
Regulator
particle
Average Inhibition (%)
80
RPT
60 α1-7 20S
ß1-7 Proteasome
40 α1-7
20 RPT
19S
Regulator
0 particle
RPN
D14
C4
C5
D2
D7
B2
B3
B4
A4
A5
A6
A7
A1
A2
A3
PSM Gene
Murata et al
PSM Protein α core 20S β core 20S RPT 19S RPN 19S Nature Reviews
Mol. Cell Biol.
An additional 34 genes remain inconclusive, but noteworthy hits that require further study.
Some of these tie into the core NF-kB pathway
29. Seed
Sequence
Analysis
Other instances of the seeds incorporated within siRNAs targeting PSMA3 do not
exhibit significant activity, adding to the likelihood of this being an on-target effect."
30. Seed
Sequence
Analysis
Other instances of the seeds within the active siRNAs targeting SLC24A1 tend to
downregulate NF-kB reporter, adding to the likelihood of this being an off-target effect."
31. RNAi
&
Small
Molecule
Screens
What
targets
mediate
ac6vity
of
siRNA
and
compound
Pathway
elucida6on,
iden6fica6on
•
Reuse
pre-‐exis6ng
MLI
data
of
interac6ons
•
Develop
new
annotated
libraries
CAGCATGAGTACTACAGGCCA
TACGGGAACTACCATAATTTA
Target
ID
and
valida6on
Link
RNAi
generated
pathway
peturba6ons
to
small
molecule
ac6vi6es.
Could
provide
insight
into
polypharmacology
•
Run
parallel
RNAi
screen
Goal:
Develop
systems
level
view
of
small
molecule
acUvity
33. Merging
Screening
Technologies
• Lead
iden6fica6on
High
throughput
screening
High
content
screening
• Single
(few)
read
outs
• Phenotypic
profiling
• High-‐throughput
• Mul6ple
parameters
• Moderate
data
volumes
• Moderate
throughput
• Very
large
data
volumes
• We’d
like
to
combine
the
technologies,
to
obtain
rich
high-‐resolu6on
data
at
high
speed
• Is
this
feasible?
What
are
the
trade-‐offs?
34. Merging
Screening
Technologies
• A
simple
solu6on
is
to
run
a
HTS
&
HCS
as
separate,
primary
&
secondary
screens
• Alterna6vely
–
Wells
to
Cells
– Integrate
HTS
&
HCS
in
a
single
screen
using
a
combined
plavorm
for
robo6cs
&
real
6me
automated
HTS
analy6cs
– Selec6ve
imaging
of
interes6ng
wells
35. Wells
to
Cells
Workflow
• Sequen6al
qHTS
using
laser
scanning
cytometry
followed
by
high-‐res
microscopy
• Unit
of
work
is
a
plate
series
• The
same
aliquot
is
analyzed
by
both
techniques
• A
message
based
system
• The
key
is
deciding
which
wells
go
through
the
workflow
36. Well
to
Cells
Assays
• Cell
cycle,
cell
transloca6on,
DNA
repreplica6on
• All
assays
run
against
LOPAC1280
• Consistency
between
cytometry
&
microscopy
is
measured
by
the
R2
between
log
AC50’s
– Cell
cycle,
0.94
–
0.96
– Cell
transloca6on,
0.66
–
0.94
– DNA
rereplica6on,
s6ll
in
progress
38. Informa?cs
Pla[orm
InCell
Layout
File
• Advanced
correc6on
and
normaliza6on
methods
• Sophis6cated
curve
fieng
algorithm
• Good
performance,
allows
paralleliza6on
of
the
en6re
workflow
39. Why
Messaging?
• A
messaging
architecture
allows
for
significant
flexibility
– Persistent,
can
be
kept
for
process
tracking,
repor6ng
– Asynchronous,
allows
individual
components
of
the
workflow
to
proceed
at
their
own
pace
– Modular,
new
components
can
be
introduced
at
any
6me
without
redesigning
the
whole
workflow
• We
employ
Oracle
AQ,
but
any
message
queue
can
be
employed
40. Handling
Mul?ple
Pla[orms
• Current
examples
employ
InCell
hardware
• We
also
use
Molecular
Devices
hardware
• As
a
result
we
have
two
orthogonal
image
stores
/
databases
• Need
to
integrate
them
– Support
seamless
data
browsing
across
mul6ple
screens
irrespec6ve
of
imaging
plavorm
used
– Support
analy6cs
external
to
vendor
code
41. A
Unified
Interface
• A
client
sees
a
single,
simple
interface
to
screening
image
data
hXp://host/rest/protocol/plate/well/image
• Transparently
extract
image
data
via
the
MetaXpress
database
or
via
custom
code
• Currently
the
interface
address
image
serving
• Unified
metadata
interface
in
the
works
42. Trade-‐offs
&
Opportuni?es
• Automa6on
reduces
the
ability
to
handle
unforeseen
errors
– Dispense
errors
and
other
plate
problems
– Well
selec6on
based
on
curve
classes
may
need
to
be
modified
on
the
fly
• Well
selec6on
does
not
consider
SAR
– Wells
are
selected
independently
of
each
other
– If
we
could
model
SAR
on
the
fly
(or
from
valida6on
screens),
we’d
select
mul6ple
wells,
to
obtain
posi6ve
and
nega?ve
results
43. Cloud
Compu?ng
&
Cheminforma?cs
• Cloud
compu6ng
is
a
hot
topic
• A
number
of
examples
of
computa6onal
chemistry
/
cheminforma6cs
on
the
cloud
– MolPlex,
hBar,
Numerate,
Wingu,
Sciligence,
Pfizer
• Many
examples
use
the
cloud
for
remote
storage
remote
(hosted)
computa6ons
• But
providers
such
as
Amazon
allow
us
to
run
distributed
compuDng
applica6ons
on
the
cloud
44. Map/Reduce
• Map/Reduce
is
a
programming
model
for
efficient
distributed
compu6ng
• M/R
made
“famous”
by
Google,
but
the
idea
has
been
around
for
a
long
6me
• It
works
like
a
Unix
pipeline:
– cat input | grep | sort | uniq -c | cat > output
–
Input
|
Map
|
Shuffle
&
Sort
|
Reduce
|
Output
• Efficiency
from
– Streaming
through
data,
reducing
seeks
– Pipelining
Owen
O’Malley,
hip://bit.ly/ecHPvB
46. Hadoop
&
Cheminforma?cs
• Hadoop
is
an
Open
Source
implementa6on
of
the
map/reduce
paradigm
• Hadoop
is
a
framework
for
scalable,
distributed
compu6ng
– Hadoop,
HDFS,
Hive,
PIG
• Importantly,
you
can
play
with
all
this
on
your
laptop
and
just
copy
files
to
the
big
cluster
when
you’re
ready
for
produc6on
47. Why
Hadoop?
• Simple
way
to
make
use
of
large
clusters
without
MPI
etc
• AWS
supports
Hadoop,
so
easy
to
scale
up
to
100’s
or
1000’s
of
cores
• Great
for
Java
code,
but
non-‐Java
code
can
also
make
use
of
Hadoop
• M/R
can
be
applied
to
a
lot
of
problems,
but
one
of
the
simplest
is
to
use
it
as
a
“chunker”
48. Cheminforma?cs
in
Parallel
• Many
cheminforma6cs
problems
are
data
parallel
– Chunk
the
data
and
apply
the
same
technique
over
each
chunk
• This
makes
many
problems
amenable
for
M/R
– Substructure
/
pharmacophore
search
– Descriptor
calcula6ons,
virtual
screening
– Model
development
(?)
• In
general,
each
chunk
is
processed
on
a
dis6nct
node
–
so
code
itself
can
be
non-‐parallel
50. Substructure
Searching
public class SubSearch {!
• Substructure
…!
public static class MoleculeMapper extends !
Mapper<Object, Text, Text, IntWritable> {!
searching
is
a
trivial
private Text matches = new Text();!
private String pattern;!
extension
of
atom
public void setup(Context context) {!
pattern = context.getConfiguration().get
("net.rguha.dc.data.pattern");!
coun6ng
}!
public void map(Object key, Text value, Context context) throws!
IOException, InterruptedException {!
• If
a
structure
try {!
IAtomContainer molecule = sp.parseSmiles(value.toString()); !
matches,
emit
sqt.setSmarts(pattern);!
boolean matched = sqt.matches(molecule);!
matches.set((String) molecule.getProperty(CDKConstants.TITLE));!
if (matched) context.write(matches, one);!
(name,1)!
else context.write(matches, zero);!
} catch (CDKException e) {!
e.printStackTrace();!
}!
• Otherwise
}!
}!
public static class SMARTSMatchReducer extends !
(name,0)
Reducer<Text, IntWritable, Text, IntWritable> {!
private IntWritable result = new IntWritable();!
• Reducer
simply
public void reduce(Text key, Iterable<IntWritable> values,!
Context context) throws IOException,
InterruptedException {!
for (IntWritable val : values) {!
outputs
tuples
of
the
if (val.compareTo(one) == 0) {!
result.set(1);!
context.write(key, result);!
form
(name,1)
}!
}!
}!
51. Running
on
AWS
• All
the
code
was
debugged
on
my
laptop
with
rela6vely
small
files
• To
test
the
scalability,
I
shi{ed
everything
to
AWS
– Pharmacophore
search
– 136K
structures,
single
conformer,
560MB
– Created
a
single
JAR
file
with
CDK
&
applica6on
code
– Uploaded
data
files
to
S3
• Total
cost
of
experiments
was
~
$10
52. But
I
Don’t
Want
to
Write
Programs
• All
these
examples
require
us
to
write
full
fledged
Java
classes
• An
easier
way
to
use
Pig
&
Pig
La6n
–
a
plavorm
and
query
language
built
on
top
of
Hadoop
• Lets
us
write
SQL-‐like
queries
that
make
use
of
Hadoop
underneath
• Flexible
due
to
user
defined
func6ons
(UDF’s)
– UDF’s
encapsulate
the
cheminforma6cs
53. Cheminforma?cs
&
Pig
A = load 'medium.smi' as (smiles:chararray);!
B = filter A by net.rguha.dc.pig.SMATCH(smiles, 'NC(=O)C(=O)N');!
store B into 'output.txt';!
• Iden6fy
molecules
in
medium.smi
that
match
the
SMARTS
paiern
and
dump
to
output.txt
• The
complexity
is
now
hidden
in
the
UDF
• Many
toolkit
func6ons
could
be
wrapped
as
UDF’s,
allowing
flexible
queries
with
much
simpler
code
• See
hip://blog.rguha.net/?p=748
for
the
code
54. Latency
• Hadoop
is
suited
for
batch
processing
• Significant
network
I/O
involved
in
distribu6ng
data
to
compute
nodes
• Not
good
for
– Random
ad
hoc
processing
of
small
subsets
– Small
volume
data
– Real
6me
(low
latency)
work
• But
latency
issues
can
be
addressed
somewhat
by
Hbase,
Hive
and
other
technologies
55. More
than
Chunking?
• But
all
the
examples
so
far
could
have
been
done
via
PBS/Condor
or
any
other
job
scheduler
– (With
Hadoop
we
don’t
have
to
worry
about
explicit
chunking
of
the
input
data)
• But
are
there
cheminforma6cs
algorithms
that
can
be
reworked
in
to
the
M/R
paradigm?
– Predic6ve
modeling?
– Graph
algorithms?
56. More
than
Chunking?
• Both
predic6ve
&
graph
algorithms
are
increasingly
supported
in
Hadoop
– Mahout
for
M/L
algorithms
on
massive
datasets
– Cloud9
for
graph
algorithms
• A
number
of
bioinforma6cs
applica6ons
make
use
of
M/R
at
the
algorithmic
level
• They
are
all
big
applica6ons
– Crossbow
aligns
3
billion
paired/unpaired
reads
• Cheminforma?cs
datasets
are
not
very
big
57. Summary
• HTS
data
is
an
ample
playground
for
interes6ng
analy6cs,
mul6ple
data
types
makes
it
more
fun
• A
major
challenge
in
our
informa6cs
infrastructure
is
dealing
with
proprietary
vendor
interfaces
• Hadoop
and
M/R
provide
great
opportuni6es
for
handling
large
data
in
a
flexible
manner
• But
can
cheminforma6cs
really
make
use
of
it?
58. Acknowledgments
InformaUcs
RNAi
&
Small
Molecule
• Ajit
Jadhav
• Scoi
Mar6n
• Trung
Nguyen
• Pinar
Tuzmen
• Noel
Southall
• Yu-‐Chi
Chen
• Ruili
Huang
• Carleen
Klump
• Min
Shen
• Craig
Thomas
• Hongmao
Sun
• Jim
Inglese
• Xin
Hu
• Ron
Johnson
• Tongan
Zhao
• Sam
Michael
• Jennifer
Wichterman
59.
60. Coun?ng
Atoms
• The
canonical
Hadoop
program
is
to
count
the
frequency
of
words
in
a
text
file
– Mapper
reads
a
line,
outputs
a
tuple
–
(word,
1)
– Reducer
will
receive
tuples,
keyed
on
word!
• Summing
up
the
1’s
gives
us
the
frequency
of
word
• By
default,
Hadoop
works
on
a
line-‐by-‐line
basis
• For
cheminforma6cs
problems,
SMILES
files
sa6sfy
this
requirement
–
one
line,
one
molecule
61. Coun?ng
Atoms
public class HeavyAtomCount {!
• Uses
the
CDK
to
static SmilesParser sp = new SmilesParser(DefaultChemObjectBuilder.getInstance());!
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
{!
!
parse
SMILES
private final static IntWritable one = new IntWritable(1);!
private Text word = new Text();!
• For
each
public void map(Object key, Text value, Context context) throws !
IOException, InterruptedException {!
try {!
IAtomContainer molecule = sp.parseSmiles(value.toString());!
molecule
loop
for (IAtom atom : molecule.atoms()) {!
word.set(atom.getSymbol());!
context.write(word, one);!
}!
over
atoms
} catch (InvalidSmilesException e) {!
// do nothing for now!
}!
}!
}!
– Emit
public static class IntSumReducer extends Reducer<Text, IntWritable, !
Text, IntWritable> {!
private IntWritable result = new IntWritable();!
(symbol,1)! public void reduce(Text key, Iterable<IntWritable> values,!
Context context) throws IOException, InterruptedException {!
int sum = 0;!
• Reducer
simply
for (IntWritable val : values) {!
sum += val.get();!
}!
result.set(sum);!
sums
the
1’s
for
context.write(key, result);!
}!
}!
….!
each
symbol
}!
62. Mul?line
Records
• Lots
of
cheminforma6cs
applica6ons
require
3D
–
SMILES
won’t
do.
Need
to
support
SDF
• We
implement
a
custom
RecordReader to
process
SD
files!
• We’re
now
ready
to
tackle
preiy
much
most
cheminforma6cs
tasks
63. Why
Hadoop?
• Java
and
C++
APIs
– In
Java
use
Objects,
while
in
C++
bytes
• Each
task
can
process
data
sets
larger
than
RAM
• Automa6c
re-‐execu6on
on
failure
– In
a
large
cluster,
some
nodes
are
always
slow
or
flaky
– Framework
re-‐executes
failed
tasks
• Locality
op6miza6ons
– M/R
queries
HDFS
for
loca6ons
of
input
data
– Map
tasks
are
scheduled
close
to
the
inputs
when
possible
Owen
O’Malley,
hip://bit.ly/ecHPvB