CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...Sean Ekins
A perspective on 12 yrs of CDD and developing products and collaborations.
A presentation given at the ACS meeting in San Diego - small business section
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...ChemAxon
In a collaboration with ChemAxon we have developed a web-based interface for searching, browsing and managing chemical information. The system was designed to accommodate to capture the information that users stored in various documents in local files(like pdf, ppt slides, as images etc.). These bits of information were not centrally available, and when people moved on, this data was lost.
ChemAxon’s JChem Cartridge and its Markush extensions and Document to Database tool enabled us to collect this data. It serves a good basis for future developments too. When developing this new interface, we focused on ease of use, maintainability, and flexibility.
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...Sean Ekins
A perspective on 12 yrs of CDD and developing products and collaborations.
A presentation given at the ACS meeting in San Diego - small business section
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...ChemAxon
In a collaboration with ChemAxon we have developed a web-based interface for searching, browsing and managing chemical information. The system was designed to accommodate to capture the information that users stored in various documents in local files(like pdf, ppt slides, as images etc.). These bits of information were not centrally available, and when people moved on, this data was lost.
ChemAxon’s JChem Cartridge and its Markush extensions and Document to Database tool enabled us to collect this data. It serves a good basis for future developments too. When developing this new interface, we focused on ease of use, maintainability, and flexibility.
Customer Due Dilligence - Is your organisation Compliant?rosspemberton69
Knowing your customer is a fundamental part of meeting the FSA compliance requirements for "customer due diligence" Is your organisation compliant? Learn how this is achieved using smart data to reduce risk, drive customer understanding and identify potential fraud and potential cases of anti money laundering activities
Revisiting the Four Pillars Supporting an Effective BSA/AML Compliance ProgramRachel Hamilton
ACI's 10th National Forum on Prepaid Card Compliance will bring together an unparalleled faculty of regulatory and enforcement officials, compliance experts from industry leaders, and outside counsel specializing in prepaid card regulatory compliance who will provide you with best practices and targeted guidance.
NICSA Webinar | AML Enhanced Customer Due Diligence - "Beneficial Owner Rule"NICSA
The wait is finally over, after years of waiting we now have the final Customer Due Diligence Rule. This new rule will require financial institutions to enhance their AML programs to further scrutinize entity accounts and their beneficial owners. The panel will detail key requirements and dates while comparing the CDD rule to the EU 3rd directive.
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...Chester Chen
Topic:
NVIDIA FLARE: Federated Learning Application Runtime Environment for Developing Robust AI Models
Summary:
Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without moving data. We created NVIDIA FLARE as an open-source SDK to make it easier for data scientists to use FL in their research. The SDK allows existing machine learning and deep learning workflows adapted for distributed learning across enterprises and enables platform developers to build a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, or even NumPy), and apply them in real-world FL settings. This talk will introduce the key design principles of NVIDIA FLARE and illustrate use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms.
Speaker: Dr. Holger Roth ( Nvidia)
Holger Roth is a Sr. Applied Research Scientist at NVIDIA focusing on deep learning for medical imaging. He has been working closely with clinicians and academics over the past several years to develop deep learning based medical image computing and computer-aided detection models for radiological applications. He is an Associate Editor for IEEE Transactions of Medical Imaging and holds a Ph.D. from University College London, UK. In 2018, he was awarded the MICCAI Young Scientist Publication Impact Award.
In this session we will explore how Google's Cloud services (CloudML, Vision, Genomics API) can be used to process genomic and phenotypic data and solve problems in healthcare and agriculture.
Building big scale data product doesn't rely only on sophisticated modeling. It also requires an agile methodology, iterative research & development process, versatile big data stack, and a value-oriented mindset. I'll discuss how we -at Dsquares- build big-scale AI product that leverages clients' data from different industries to deliver business-critical value to the end customer. I'll cover the process of product discovery, R&D tasks for unsolved problems, and mapping business requirements into big data technical requirements.
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
Typically in predictive data analysis challenges, participants are provided a dataset and asked to make predictions. Participants include with their prediction the scripts/code used to produce it. Challenge administrators validate the winning model by reconstructing and running the source code.
Often data cannot be provided to participants directly, e.g. due to data sensitivity (data may be from living human subjects) or data size (tens of terabytes). Further, predictions must be reproducible from the code provided by particpants. Containerization is an excellent solution to these problems: Rather than providing the data to the participants, we ask the participants to provided a Dockerized "trainable" model. We run the both the training and validation phases of machine learning and guarantee reproducibility 'for free'.
We use the Docker tool suite to spin up and run servers in the cloud to process the queue of submitted containers, each essentially a batch job. This fleet can be scaled to match the level of activity in the challenge. We have used Docker successfully in our 2015 ALS Stratification Challenge and our 2015 Somatic Mutation Calling Tumour Heterogeneity (SMC-HET) Challenge, and are starting an implementation for our 2016 Digitial Mammography Challenge.
Machine Learning is increasingly being used by companies as a disruptor or providing a USP. This means that Machine Learning models need to cope with being a critical part of solutions and if those solutions use PCI-DSS or PII then the models must be highly secure.
In addition, if a Machine Learning model is part of your USP then you will want to protect it. Also, the EU AI Regulation and UK AI Strategy means that AI is becoming increasingly regulated. This means you need to be able to prove what model made a prediction and why it made it by providing auditability and explainabilty.
In this talk we go over these issues and how to address them including using AWS and how to implement development best practices.
Mining software vulns in SCCM / NIST's NVDLoren Gordon
Patch management for 3rd-party software can be a significant challenge. The raw data for effective vulnerability management is available in MS’ SCCM (software inventory) and NIST’s NVD (vulnerability database). However extracting the relevant information from complex, sometimes undocumented data structures poses significant challenges.
The stage is set with a brief overview of SCCM / NVD data structures as well as a look at a (non-typical but interesting!) production environment. Then we’ll take a quick dive into data wrangling / Machine Learning fundamentals applied to this problem: feature extraction, choice of approach, algorithm choice and turning.
Once the technical challenges are resolved, the path to “Data Nirvana” can still be strewn with significant non-technical hurdles to overcome as well. We will discuss some practical “been there, done that” examples.
Slides from the presentation at IDAMO 2016, Rostock. May 2016.
Most scientific discoveries rely on previous or other findings. A lack of transparency and openness led to what many consider the "reproducibility crisis" in systems biology and systems medicine. The crisis arose from missing standards and inappropriate support of
standards in software tools. As a consequence, numerous results in low-and high-profile publications cannot be reproduced.
In my presentation, I summarise key challenges of reproducibility in systems biology and systems medicine, and I demonstrate available solutions to the related problems.
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering
What is Kedro?
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best practices and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.
Kedro 2-minute Intro Video: https://youtu.be/KEdmJ2ADy_M
Kedro Docs: https://kedro.readthedocs.io
Kedro GitHub repo: https://github.com/quantumblacklabs/kedro
Meetup: https://www.meetup.com/f7324858-b804-4ed8-ba45-580c262189f1/events/280986950/
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
This presentation described how data-driven chemoinformatics methods may automate much of what has historically been done by a medicinal chemist. It explored what is reasonable to expect “AI” approaches might achieve, and what is best left with a human expert. The implications of automation for the human-machine interface were explored and illustrated with examples from Bradshaw, GSK’s experimental automated design environment.
Customer Due Dilligence - Is your organisation Compliant?rosspemberton69
Knowing your customer is a fundamental part of meeting the FSA compliance requirements for "customer due diligence" Is your organisation compliant? Learn how this is achieved using smart data to reduce risk, drive customer understanding and identify potential fraud and potential cases of anti money laundering activities
Revisiting the Four Pillars Supporting an Effective BSA/AML Compliance ProgramRachel Hamilton
ACI's 10th National Forum on Prepaid Card Compliance will bring together an unparalleled faculty of regulatory and enforcement officials, compliance experts from industry leaders, and outside counsel specializing in prepaid card regulatory compliance who will provide you with best practices and targeted guidance.
NICSA Webinar | AML Enhanced Customer Due Diligence - "Beneficial Owner Rule"NICSA
The wait is finally over, after years of waiting we now have the final Customer Due Diligence Rule. This new rule will require financial institutions to enhance their AML programs to further scrutinize entity accounts and their beneficial owners. The panel will detail key requirements and dates while comparing the CDD rule to the EU 3rd directive.
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...Chester Chen
Topic:
NVIDIA FLARE: Federated Learning Application Runtime Environment for Developing Robust AI Models
Summary:
Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without moving data. We created NVIDIA FLARE as an open-source SDK to make it easier for data scientists to use FL in their research. The SDK allows existing machine learning and deep learning workflows adapted for distributed learning across enterprises and enables platform developers to build a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, or even NumPy), and apply them in real-world FL settings. This talk will introduce the key design principles of NVIDIA FLARE and illustrate use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms.
Speaker: Dr. Holger Roth ( Nvidia)
Holger Roth is a Sr. Applied Research Scientist at NVIDIA focusing on deep learning for medical imaging. He has been working closely with clinicians and academics over the past several years to develop deep learning based medical image computing and computer-aided detection models for radiological applications. He is an Associate Editor for IEEE Transactions of Medical Imaging and holds a Ph.D. from University College London, UK. In 2018, he was awarded the MICCAI Young Scientist Publication Impact Award.
In this session we will explore how Google's Cloud services (CloudML, Vision, Genomics API) can be used to process genomic and phenotypic data and solve problems in healthcare and agriculture.
Building big scale data product doesn't rely only on sophisticated modeling. It also requires an agile methodology, iterative research & development process, versatile big data stack, and a value-oriented mindset. I'll discuss how we -at Dsquares- build big-scale AI product that leverages clients' data from different industries to deliver business-critical value to the end customer. I'll cover the process of product discovery, R&D tasks for unsolved problems, and mapping business requirements into big data technical requirements.
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
Typically in predictive data analysis challenges, participants are provided a dataset and asked to make predictions. Participants include with their prediction the scripts/code used to produce it. Challenge administrators validate the winning model by reconstructing and running the source code.
Often data cannot be provided to participants directly, e.g. due to data sensitivity (data may be from living human subjects) or data size (tens of terabytes). Further, predictions must be reproducible from the code provided by particpants. Containerization is an excellent solution to these problems: Rather than providing the data to the participants, we ask the participants to provided a Dockerized "trainable" model. We run the both the training and validation phases of machine learning and guarantee reproducibility 'for free'.
We use the Docker tool suite to spin up and run servers in the cloud to process the queue of submitted containers, each essentially a batch job. This fleet can be scaled to match the level of activity in the challenge. We have used Docker successfully in our 2015 ALS Stratification Challenge and our 2015 Somatic Mutation Calling Tumour Heterogeneity (SMC-HET) Challenge, and are starting an implementation for our 2016 Digitial Mammography Challenge.
Machine Learning is increasingly being used by companies as a disruptor or providing a USP. This means that Machine Learning models need to cope with being a critical part of solutions and if those solutions use PCI-DSS or PII then the models must be highly secure.
In addition, if a Machine Learning model is part of your USP then you will want to protect it. Also, the EU AI Regulation and UK AI Strategy means that AI is becoming increasingly regulated. This means you need to be able to prove what model made a prediction and why it made it by providing auditability and explainabilty.
In this talk we go over these issues and how to address them including using AWS and how to implement development best practices.
Mining software vulns in SCCM / NIST's NVDLoren Gordon
Patch management for 3rd-party software can be a significant challenge. The raw data for effective vulnerability management is available in MS’ SCCM (software inventory) and NIST’s NVD (vulnerability database). However extracting the relevant information from complex, sometimes undocumented data structures poses significant challenges.
The stage is set with a brief overview of SCCM / NVD data structures as well as a look at a (non-typical but interesting!) production environment. Then we’ll take a quick dive into data wrangling / Machine Learning fundamentals applied to this problem: feature extraction, choice of approach, algorithm choice and turning.
Once the technical challenges are resolved, the path to “Data Nirvana” can still be strewn with significant non-technical hurdles to overcome as well. We will discuss some practical “been there, done that” examples.
Slides from the presentation at IDAMO 2016, Rostock. May 2016.
Most scientific discoveries rely on previous or other findings. A lack of transparency and openness led to what many consider the "reproducibility crisis" in systems biology and systems medicine. The crisis arose from missing standards and inappropriate support of
standards in software tools. As a consequence, numerous results in low-and high-profile publications cannot be reproduced.
In my presentation, I summarise key challenges of reproducibility in systems biology and systems medicine, and I demonstrate available solutions to the related problems.
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering
What is Kedro?
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best practices and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.
Kedro 2-minute Intro Video: https://youtu.be/KEdmJ2ADy_M
Kedro Docs: https://kedro.readthedocs.io
Kedro GitHub repo: https://github.com/quantumblacklabs/kedro
Meetup: https://www.meetup.com/f7324858-b804-4ed8-ba45-580c262189f1/events/280986950/
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
This presentation described how data-driven chemoinformatics methods may automate much of what has historically been done by a medicinal chemist. It explored what is reasonable to expect “AI” approaches might achieve, and what is best left with a human expert. The implications of automation for the human-machine interface were explored and illustrated with examples from Bradshaw, GSK’s experimental automated design environment.
Production Bioinformatics, emphasis on ProductionChris Dwan
Production bioinformatics at Sema4 can be thought of as data ops - a peer to the lab ops organization. We operate 24/7 to deliver correct and timely results on NGS and other data for thousands of samples per week. This deck introduces the Prod BI organization and systems architecture with a focus on what it takes to run bioinformatics in production rather than for R&D or pure research.
Often information is spread among
several data sources, such as hospital databases, lab databases,
spreadsheets, etc. Moreover, the complexity of each of these data sources
might make it difficult for end-users to access them, and even
more, to query all of them at the same time.
A new solution that has been proposed to this problem is
ontology-based data access (OBDA).
OBDA is a popular paradigm, developed since the mid 2000s, to query
various types of data sources
using a common vocabulary familiar to the end-users. In a nutshell
OBDA separates the user
from the data sources (relational databases, CVS files, etc.) by means
of an ontology, which is a common terminology that provides the user with a
convenient query vocabulary, hides the structure of the data sources,
and can enrich incomplete data with background knowledge. About a
dozen OBDA systems have been implemented in both academia and
industry.
In this tutorial we will give an overview of OBDA, and our system -ontop-
which is currently being used in the context of the European project
Optique. We will discuss how to use -ontop- for data integration,
in particular concentrating on:
– How to create an ontology (common vocabulary) for a life science domain.
– How to map available data sources to this ontology.
– How to query the database using the terms in the ontology.
– How to check consistency of the data sources w.r.t. the ontology
Building Data Ecosystems for Accelerated Discoveryadamkraut
Large federated data ecosystems require diverse teams that can design, build, and integrate a broad range of services to support scientific workflows. Our collaborative team operates at the intersection of science, technology, and data to assess, implement, and teach the key capabilities and capacities modern healthcare and life science needs. Learn the data management techniques, tools, platforms, and frameworks that are proven to be effective at solving complex problems at scale.
Presentation from AAPS PharmSci360 (October 23, 2023) in which I describe highlights of my Springer/AAPS book Winning Grants (https://link.springer.com/book/10.1007/978-3-031-27516-6) - presenting a 'how to' guide on writing small business grants - e.g. NIH STTR and SBIR grants. Written by someone experienced in winning such grants.
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Sean Ekins
The presentation was given at SETAC 2022 Nov 16 and describes our work on Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic Toxicity.
We generated many models that are available to license in our MegaTox software. We found that the support vector machines performed the best after assessing many algorithms for both classification and regression models.
The authors of this work are Thomas R Lane, Fabio Urbina and Sean Ekins.
The contact is sean@collaborationspharma.com
A presentation at the Global Genes rare drug development symposium on governm...Sean Ekins
This presentation from June 12 2020 gives a brief overview of my experience of 15 years of applying for government grants to fund small companies. Prior to this I had no experience of applying for such grants. The bottom line for rare disease groups / families is find a scientist that can do this or assist you. please also see www.collaborationspharma.com
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Sean Ekins
Slides from AAPS Careers session by Maren Katherina Preis, Kyle Bagin, Sean Ekins
Provides some clear steps on how you could use social media to help your career.
Oral presentation given in MEDI session at 2017 ACS in DC.
co-authors Kimberley M. Zorn, Mary A. Lingerfelt, Jair L. de Siqueira-Neto, Alex M. Clark, Sean Ekins
describes drug repurposing and machine learning - for more details see www.collaborationspharma.com
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Sean Ekins
Oral presentation at 2017 ACS in DC - given by Kimberley Zorn
co-authors include Mary A. Lingerfelt, Alex M. Clark, Sean Ekins
for more details see www.collaborationspharma.com
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchSean Ekins
Presentation given at the AAPS 2016 conference in Denver. Some of the slides are from AAPS, Some from Kudos and some from Figshare. One slide is from Tony Williams. All slides used with permission.
This presentation summarizes some early efforts on an open drug discovery collaboration between scientists in Brazil and the US. The amazing virus images were created by John Liebler and can be licensed from him http://www.artofthecell.com/animation/will-the-real-zika-virus-please-stand-up
The homology models were created with Swiss Model by Sean Ekins:
Marco Biasini, Stefan Bienert, Andrew Waterhouse, Konstantin Arnold, Gabriel Studer, Tobias Schmidt, Florian Kiefer, Tiziano Gallo Cassarino, Martino Bertoni, Lorenza Bordoli, Torsten Schwede. (2014). SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research; (1 July 2014) 42 (W1): W252-W258; doi: 10.1093/nar/gku340.
Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.
Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392.
Guex, N., Peitsch, M.C., Schwede, T. (2009). Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis, 30(S1), S162-S173.
Ensuring Chemical Structure, Biological Data and Computational Model Quality
A talk given at SLAS 2016 mon Jan 25th in San Diego
covers published work and recent forays with BIA 10-2474
Pros and cons of social networking for scientistsSean Ekins
Over the past 4 years I have been using social networking tools for scientists more inspired by Antony Williams. I realized I am using many tools and there are pros and cons of them. Here is my brief summary.
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery CollaborationsSean Ekins
A talk given at SERMACS 7th Nov 2015 in Memphis, describes CDD Vault, CDD Vision and CDD Models. In addition it also describes how the software is used in large and smaller scale collaborations for drug discovery.
Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Mode...Sean Ekins
Slides from SERMACS 2015 meeting in Memphis 2015 describing a collaborative project with SRI International and Rutgers. The work was published in PLOS ONE http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141076
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
1. CASE STUDY #2
SEAN EKINS
COLLABORATIVE DRUG DISCOVERY, 1633
BAYSHORE HIGHWAY, SUITE 342, BURLINGAME, CA
94010, USA
2. MoDELS RESIDE IN PAPERS
NOT ACCESSIBLE…THIS IS
UNDESIRABLE
How do we share them?
How do we use Them?
3. What if we could build Machine Learning Models in the CDD Vault
We could then use them to score public or private libraries in the
Vault
We can leverage models from other companies or groups to help
internal projects
We can export models to use in other software
We can develop our own private database of models
Deliverable: This Case Study walks you through building a model
with a dataset in CDD Public and generating predictions in a CDD
Vault
4. Open Extended Connectivity Fingerprints
ECFP_6 FCFP_6
• Collected,
deduplicated,
hashed
• Sparse integers
• Invented for Pipeline Pilot: public method, proprietary details
• Often used with Bayesian models: many published papers
• Built a new implementation: open source, Java, CDK
– stable: fingerprints don't change with each new toolkit release
– well defined: easy to document precise steps
– easy to port: already migrated to iOS (Objective-C) for TB Mobile app
• Provides core basis feature for CDD open source model service
Clark et al., J Cheminform 6:38 2014
12. Predictions for some approved drugs in Vault with Model – select
model from protocol section
Select protocol for
model in explore
data
Or customize your
report
13. Predictions for some approved drugs in Vault with Model – output
You can rank molecules by these scores
14. You can create a private database of CDD Models in your CDD Vault
15. You can also export your CDD Model
Search under protocols tab
16. Clark et al., JCIM 55: 1231-1245 (2015)
Exporting models from CDD
17. Clark et al., JCIM 55: 1231-1245 (2015)9R44TR000942-02
You can import your model in a mobile app like
MMDS for private use of the model or sharing
with a collaborator
18. Find out more about
Clark AM, Dole K, Coulon-Spektor A, McNutt A, Grass G, Freundlich JS, Reynolds RC and Ekins S, Open
Source Bayesian Models: 1. Application to ADME/Tox and Drug Discovery Datasets, J Chem Inf Model,
55(6):1231-45, 2015
Clark AM, and Ekins S Open Source Bayesian Models: 2. Mining a “Big Dataset” to Create and Validate
Models with ChEMBL, J Chem Inf Model, 55(6):1246-60, 2015.
Ekins S, Clark, AM and Wright SH, Making transporter models for drug-drug interaction prediction mobile,
Drug Metab Dispos, 43:1642-5, 2015
Clark AM, Dole K and Ekins S, Open Source Bayesian Models: 3. Composite Models for prediction of
binned responses, 56: 275-85, 2016.
Perryman AL, Stratton TP, Ekins S and Freundlich, Predicting mouse liver microsomal stability with
"pruned' machine learning models and public data, Pharm Res, 33: 433-49, 2016.
https://www.collaborativedrug.com/pages/co
ntact
Sales: (650) 242-5259
http://info.collaborativedrug.com/vision