This document discusses challenges and opportunities in analyzing large and diverse datasets in life sciences. It notes that while life sciences datasets are large, they are still relatively small compared to other domains. Integrating multiple data types and sources from different studies presents challenges in obtaining a coherent understanding. Large datasets can be useful for statistical modeling and pattern recognition, but may not provide insights into underlying mechanisms. The document also discusses using fragment-based approaches and scaffold analysis to explore structure-activity relationships in large compound collections. Overall, the key point is that while large datasets enable new analyses, traditional hypothesis-driven science is still needed to understand biological systems.
The design of chemical libraries is usually informed by pre-existing characteristics and desired features. On the other hand, assesing the prospective performance of a new library is more difficult. Importantly, a given screening library is often screened in a variety of systems which can differ in cell lines, readouts, formats and so on. In this study we explore to what extent pre-existing libraries can shed light on the relation between library activity and assay features. Using an ontology such as the BAO, it is possible to construct a hierarchy of annotations associated with an assay. Based on this annotation hierarchy we can then ask how likely are molecules associated with a specific annotation, to be identified as active. To allow generalization we consider substrucural features, as represented by a structural key fingerprint, rather than whole molecules. We employ a Bayesian framework to quantify the the association between a substructural feature and a given assay annotation, using a set of NCGC assays that have been annotated with BAO terms. We discuss our approach to training the Bayesian model and describe benchmarks that characterize model performance relative to the position of the annotation in the BAO hierarchy. Finally we discuss the role of this approach in a library design workflow that includes traditional design features such as chemical space coverage and physicochemical properties but also takes in to account screening platform features.
The design of chemical libraries is usually informed by pre-existing characteristics and desired features. On the other hand, assesing the prospective performance of a new library is more difficult. Importantly, a given screening library is often screened in a variety of systems which can differ in cell lines, readouts, formats and so on. In this study we explore to what extent pre-existing libraries can shed light on the relation between library activity and assay features. Using an ontology such as the BAO, it is possible to construct a hierarchy of annotations associated with an assay. Based on this annotation hierarchy we can then ask how likely are molecules associated with a specific annotation, to be identified as active. To allow generalization we consider substrucural features, as represented by a structural key fingerprint, rather than whole molecules. We employ a Bayesian framework to quantify the the association between a substructural feature and a given assay annotation, using a set of NCGC assays that have been annotated with BAO terms. We discuss our approach to training the Bayesian model and describe benchmarks that characterize model performance relative to the position of the annotation in the BAO hierarchy. Finally we discuss the role of this approach in a library design workflow that includes traditional design features such as chemical space coverage and physicochemical properties but also takes in to account screening platform features.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
2. Characteris9cs
• Large sizes (but this is rela;ve)
– Chemistry datasets are not really that big
• Mul;‐dimensional
• Mul;ple sources (and hence, types)
• Challenges
– Handling and processing large datasets
– Integra;ng mul;ple data types / sources
– Get a coherent story out of it all
3. How Useful is More Data?
• Alterna;vely, can we stop doing science and
just do paMern recogni;on on increasingly
large datasets?
• According to Chris Anderson, yes.
There is now a better way. Petabytes allow us to say:
"Correlation is enough." We can stop looking for models. We
can analyze the data without hypotheses about what it might
show. We can throw the numbers into the biggest computing
clusters the world has ever seen and let statistical algorithms
find patterns where science cannot.
hMp://www.wired.com/science/discoveries/magazine/16‐07/pb_theory
5. Big Data for Some Problems
• Halevy et al discuss the effec;veness of
extremely large datasets
• Their applica;on focuses on machine
transla;on – see the Google n‐gram corpus
• They suggest that such extremely large datasets
are useful because they effec;vely encompass
all n‐grams (phrases) commonly used
• Domain is rela;vely constrained
Halevy et al, IEEE Intelligent Systems, 2009, 24, 8‐12
6. Google Scale in Chemistry?
• What would be the equivalent of an n‐gram
corpus in chemistry?
– Fragments
– A more direct analogy can be made by using LINGO’s
• It is possible to generate arbitrarily large (virtual)
compound and fragment collec;ons
• But would such a collec;on span all of
“commonly used” chemistry?
– Depending on the ini;al compound set, yes
– But we’re also interested in going beyond such a
“commonly used” set
Fink T, Reymond JL, J Chem Inf Model, 2007, 47, 342
7. Fragment Diversity
• Consider a set of bioac;ves such as the LOPAC
collec;on, 1280 compounds
• Using exhaus;ve
fragmenta;on we get 40
2,460 unique fragments
Percent of Total
30
• On the MLSMR
(~ 400K compounds),
20
we get 164,583 10
fragments 0
0 1 2 3 4
log Fragment Frequency
8. Fragment Diversity
6 All fragments 4
Fragments occurring in
5 to 50 molecules
4
2
2
PC 2
0
PC 2
0
-2
-2
-4
-4
-4 -2 0 2
-4 -2 0 2 4
PC 1 PC 1
• Distribu;on of MLSMR fragments in BCUT
space
9. What Do We Do with Fragments?
• Assuming we obtain fragments from a large
enough collec;on what do we do?
– Learning from fragments – QSARs, genera;ve
models
– Use fragments as
filters, alterna;ve
to clustering
– Explore chemotypes
and ac;vity
White, D and Wilson, RC, J Chem Inf Model, 2010, ASAP
15. Big Data and Chemistry
• But in the end, the fundamental problem with
big data is the issue of domain applicability
• Tradi;onal models are developed on small
datasets and perform well within the training
domain
• But models trained on very large datasets will
not necessarily perform well, even though the
training domain is now much larger
Helgee et al, J Chem Inf Model, 2010, 50, 677‐689
16. Processing Large Datasets
• Most cheminforma;cs tasks are not
algorithmically parallel
• Rather, they are applied to large numbers of
inputs and hence embarrassingly parallel
– Start up lots of jobs
• Hadoop is useful technology for those problems
that follow the map/reduce paradigm
– Not aware of cheminforma;cs methods that work in
this manner
– But can also be used like a job submission system
17. Common HTS Analysis Tasks
• Analysis of Ac;vity
– Concentra;on response across mul;ple phenotypes, mul;ple assays
– Assay interference (differen;a;ng ac;vity from ar;facts)
– Assay ontology (biological rela;onships, assay plaqorms)
– Compound annota;ons, known ligand‐target network, prior art assessment
– Profile data (PubChem, BindingDB, ChEMBL, PDSP, etc, physical proper;es)
• Iden;fica;on of Series and Singletons
– Clustering of ac;ves, iden;fica;on of top scaffolds
– Profiling of series across all assays
– Series and singleton priori;za;on
• Compound Selec;on for Followup
– Assessment of structure ac;vity rela;onships
– Rapid iden;fica;on of key compounds to confirm, new compounds to test
– Mining of commercially available chemical libraries
How do we beMer automate such tasks?
19. Data Integra9on
• It’s nice to simplify data, but we can s;ll be faced
with a mul;tude of data types
• We want to explore these data in a linked fashion
• How we explore and what we explore is generally
influenced by the task at hand
• At one point, make inferences over all the data
20. Data Integra9on
User’s Network
Content:
‐ Drugs
‐ Compounds
‐ Scaffolds
‐ Assays
‐ Genes
‐ Targets
‐ Pathways
‐ Diseases
‐ Clinical Trials
‐ Documents
Links:
Network of Public Data ‐Manually curated
‐Derived from algorithms
24. Going Beyond Explora9on?
• Simply being able to explore data in an
integrated manner is useful as an idea
generator
• Can we integrate heterogenous data types &
sources to get a systems level view?
– Current research problem in genomics and
systems biology
– Some aMempts have been made to merge
chemical data with other data types
Young, D.W. et al, Nat. Chem. Biol., 2008, 4, 59‐68
25. RNAi & Compound Screens
What targets mediate ac;vity of
siRNA and compound
Pathway elucida;on, iden;fica;on
• Reuse pre‐exis;ng MLI data of interac;ons
• Develop new annotated libraries
CAGCATGAGTACTACAGGCCA
TACGGGAACTACCATAATTTA
Target ID and valida;on
Link RNAi generated pathway
peturba;ons to small molecule
ac;vi;es. Could provide insight into
polypharmacology
• Run parallel RNAi screen
Goal: Develop systems level view of small molecule acDvity
26. Small Molecule HTS Summary
• 2,899 FDA‐approved !
Most Potent AcDves
!
! ! Proscillaridin A
compounds screened
0
!
!
!20
Activity
• 55 compounds retested ac;ve
!
!40
!
!
! !
!
!
!
!60
!
!9 !8 !7 !6 !5
• Which components of the NF‐
log Concentration (uM)
! !
Trabec;din
0
! !
!
!20
κB pathway do they hit?
!
Activity
!60
!
– 17 molecules have target/
!100
!
!
!
! ! ! ! !
!9 !8 !7 !6 !5
pathway informa;on in GeneGO
log Concentration (uM)
!
! !
Digoxin
0
!
!
– Literature searches list a few
!
!20
Activity
more
!40
! !
!
! !
!
!60
! !
!
!9 !8 !7 !6 !5
log Concentration (uM)
Miller, S.C. et al, Biochem. Pharmacol., 2010, ASAP
27. RNAi HTS Summary
• Qiagen HDG library – 6886 genes, 4 siRNA’s
per gene
• A total of 567 genes were knocked
down by 1 or more siRNA’s
– We consider >= 2 as a “reliable” hit
– 16 reliable hits
– Added in 66 genes for
follow up via triage procedure
28. RNAi & Small Molecule
• Based on reporter assays, the only conclusions
one can draw are the obvious ones
• Limited by 1‐D signal
• Going to high content gives us much richer
data, but more complexity
– Shown to be useful for compounds
– Much more difficult when the phenotypic
parameters come from different systems
29. Summary
• Mul;ple data types are probably the most
challenging aspect of data driven discovery
• Size issues can be addressed with more
hardware or wai;ng (a bit) longer
• Integra;on issues require new approaches
both at the presenta;on & algorithmic levels