This document summarizes code examples and best practices for working with RDKit. It provides idioms for efficiently looping over molecules and atoms, accepting string arguments, representing reactions and bonds, and writing PDB files. It also discusses techniques for proximity calculations and bond ordering like template-based methods and different proximity algorithms.
JCConf 2020 - New Java Features Released in 2020Joseph Kuo
In 2020, Java 14 and 15 are released with many great features, including ZGC, Shenandoah GC, helpful NullPointerExceptions, pattern matching for instanceof, switch expressions, text blocks, records, hidden classes, and sealed classes. They not only improve performance of GC and Java applications, but also introduce new syntax to ease our effort to write more readable and efficient code. Let's take a look at those features!
https://cyberjos.blog/java/seminar/jcconf-2020-new-java-features-released-in-2020/
JCConf 2020 - New Java Features Released in 2020Joseph Kuo
In 2020, Java 14 and 15 are released with many great features, including ZGC, Shenandoah GC, helpful NullPointerExceptions, pattern matching for instanceof, switch expressions, text blocks, records, hidden classes, and sealed classes. They not only improve performance of GC and Java applications, but also introduce new syntax to ease our effort to write more readable and efficient code. Let's take a look at those features!
https://cyberjos.blog/java/seminar/jcconf-2020-new-java-features-released-in-2020/
Максим Карбасникофф, партнер и руководитель департамента торговой недвижимости Cushman & Wakefield, о рынке торговой недвижимости и перспективах его развития в Казахстане.
El famoso empresario, creador de la marca Apple compartió en numerosas ocasiones sus técnicas en cuestión de mercadotecnia. Steve Jobs era también reconocido por sus inspiradores discursos y este gráfico compila algunas de sus estrategias.
A survey of connectivity options available to folks who want to leverage their favorite business collaboration platform interacting with their favorite mobile client. Wverything from "non-app" solutions (stuff already in the smartphone) to C#/CSOM with a few alternatives in between.
Content Marketing: How to Attract Talent using Sponsored UpdatesRebecca Feldman
Content Marketing Guide: Why you should care about Status Updates and how to use them to attract talent
Created by: Sophie Weber (www.linkedin.com/pub/sophie-weber/69/9b1/537/en)
Максим Карбасникофф, партнер и руководитель департамента торговой недвижимости Cushman & Wakefield, о рынке торговой недвижимости и перспективах его развития в Казахстане.
El famoso empresario, creador de la marca Apple compartió en numerosas ocasiones sus técnicas en cuestión de mercadotecnia. Steve Jobs era también reconocido por sus inspiradores discursos y este gráfico compila algunas de sus estrategias.
A survey of connectivity options available to folks who want to leverage their favorite business collaboration platform interacting with their favorite mobile client. Wverything from "non-app" solutions (stuff already in the smartphone) to C#/CSOM with a few alternatives in between.
Content Marketing: How to Attract Talent using Sponsored UpdatesRebecca Feldman
Content Marketing Guide: Why you should care about Status Updates and how to use them to attract talent
Created by: Sophie Weber (www.linkedin.com/pub/sophie-weber/69/9b1/537/en)
Basic intro to running Siesta, a code written to simulate multi atomic material using Density functional Theory (DFT). It covers how to create an input file, simulation command and analysing the output file, i.e. to make sense of the data dumped in .out file.
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
Prediction is much harder than analysis. Consider hurricanes and tornadoes; it's much easier to follow the path of destruction by locating devastated neighborhoods, than to forecast the paths of such weather systems in advance. Likewise for many chemical reactions, such as nitration (by refluxing with nitric acid and sulfuric acid) where the appearance of one or more nitro groups indicates a nitration reaction, but predicting where on a non-trivial organic molecule this functional group appears is a much harder challenge. In this sense, reaction analysis is much simpler than (either forward or retrosynthetic) synthesis planning.
NextMove Software's namerxn is an expert system for classifying reactions (from reaction SMILES, MDL connection tables or ChemDraw sketches) typically assigning each reaction instance to a leaf classification in the Royal Society of Chemistry's RXNO ontology. These tools can be helpful in the analysis of regioselectivity preferences of reactions.
This talk consists of two parts. A technical part describing the recent algorithmic and methodological improvements to the namerxn software, including describing some of the more challenging of the 1000+ reactions it currently identifies. And a scientific part that investigates the regioselective preferences of some of these reactions.
CINF 35: Structure searching for patent information: The need for speedNextMove Software
Chemical databases grow larger every year. Without investing in additional hardware or improved software, the time to search these databases will in turn grow longer annually. With an ever-increasing number of pharmaceutical patents, the amount of chemical data associated with these is growing at a rate with which hardware advances alone cannot keep up.
Using automated mining of U.S. and European patents, we have extracted large collections of structural data in the form of reactions, mixtures, and exemplified compounds. Additional information such as protein targets and diseases are also extracted from each patent and associated with the structural data. We will describe how this data can be queried with natural language phrases and how these phrases are interpreted as structural queries.
Through innovations in substructure and similarity search algorithms, it is possible to search and retrieve hundreds of millions of chemical records in fractions of a second. We will demonstrate how this is achieved on a regular desktop machine using just-in-time and ahead-of-time compilation techniques.
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
Presented by Roger Sayle at the 11th International Conference on Chemical Structures (ICCS) 2018.
Exponential database growth will always be a technical challenge but evolutionary strategies offer to hold off the inevitable in the short term and revolutionary strategies promise a longer-term solution.
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
The Cahn-Ingold-Prelog (CIP) priority rules have been the corner stone in written communication of stereo-chemical configuration for more than half a century. The rules rank ligands around a stereocentre allowing an atom order and layout invariant stereo-descriptor to be assigned, for example R (right) or S (left) for tetrahedral atoms. Despite their widespread daily use, many chemists may be surprised to find that beyond trivial cases, different software may assign different labels to the same structure diagram.
There have been several attempts to either replace or amend the CIP rules. This talk will highlight the more challenging aspects of the ranking and present a comparison of software that provide CIP labels and where they disagree. Providing an IUPAC verified free and open source CIP implementation would allow software maintainers and vendors to validate and improve their implementations. Ultimately this would improve the accuracy in exchange of written chemical information for all.
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
We have previously described the extraction of reactions from US and European patents. This talk will discuss the assembly of over six million extracted reaction details consisting of the connection tables, procedure, quantities, solvents, catalysts and yields into a searchable "read-only" Electronic Lab Notebook.
In addition to reactions details, concepts including diseases, drug targets, and assignees are recognised from the patent documents and normalised to appropriate ontologies. Each normalised term is paired with the reaction details found in the document to allow intuitive cross concept querying (e.g. "GlaxoSmithKline C-C Bond Formation greater than 80% yield Myocardial Infarction"). Reactions are classified and assigned to leafs in the RXNO Ontology. The ontologies are used to provide organisation, faceting, and filtering of results. The reaction classification also provides a precise atom mapping that facilitates structural transformation queries and can improve reaction diagram layout.
Through improvements in substructure search technology we will demonstrate several types of chemical synthesis queries that can be efficiently answered. The combination of high performance chemical searching and additional document terms provides a powerful exploratory and trend analysis tool for chemists.
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
The molfile serves as a de facto standard for chemical information exchange. It is perhaps the most widely supported format with its core syntax being easy to understand, parse, and generate. Beyond the core syntax, more advanced features such as sgroups and enhanced stereochemistry are rarely supported, often only being partially implemented and used. Additionally, several vendors, toolkits, and service providers have added extended syntax to their molfiles to solve particular corner cases or representation problems. This talk will provide a brief summary of the less widely supported features of the molfile including sgroups and enhanced stereochemistry. Additionally, a survey of how extensions can/have been implemented and which extensions exist "in the wild". Special attention will be paid to capturing of coordination bonds for extending the InChI to handle organometallics.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
2. introduction
• This presentation is intended as a short talktorial
describing real world code examples, recent
features and commonly used RDKit idioms.
• Although many refect personal/NextMove
Software experiences developing with the C++
APIs, most (though not all) observations apply
equally to the Java and Python APIs.
3. looping over molecule atoms
• Poor
for (RWMol::AtomIterator atomIt=mol->beginAtoms();
atomIt != mol->endAtoms(); atomIt++) …
• Better
for (RWMol::AtomIterator atomIt=mol->beginAtoms();
atomIt != mol->endAtoms(); ++atomIt) …
• The C++ post increment operator returns
the original value prior to the increment,
requiring a copy to made.
4. looping over aTOM's bonds
• Documented idiom:
... molPtr is a const ROMol * ...
... atomPtr is a const Atom * ...
ROMol::OEDGE_ITER beg,end;
boost::tie(beg,end) = molPtr->getAtomBonds(atomPtr);
while(beg!=end){
const Bond * bond=(*molPtr)[*beg].get();
... do something with the Bond ...
++beg;
}
5. looping over aTOM's bonds
• Proposed replacement:
... molPtr is a const ROMol * ...
... atomPtr is a const Atom * ...
ROMol::OBOND_ITER_PAIR bondIt;
bondIt = molPtr->getAtomBonds(atomPtr);
while(bondIt.first != bondIt.second){
const Bond * bond=(*molPtr)[*bondIt.first].get();
... do something with the Bond ...
++bondIt;
}
6. Accepting string arguments
• Good practice to accept both C++
constant literals and STL std::strings.
• Poor
void foo(const char *ptr) { foo(std::string(ptr)); }
void foo(const std::string str) { … }
• Better
void foo(const std::string &str) { foo(str.c_str()); }
void foo(const char *ptr) { … }
7. isoelectronic valences
• Supporting PF6
-
in RDKit currently requires
adding 7 to the neutral valences of P.
Instead, P-1
should be isoelectronic with S.
• Poor
valence(atomicNo,charge) ~= valence(atomicNo,0)-charge
• Better
valence(atomicNo,charge) ~= valence(atomicNo-charge,0)
8. representing reactions
• Two possible representations
– A new object containing vectors of reactant
and product (and agent?) molecules.
– Annotation of existing molecule with reaction
roles annotated on each atom.
• Both are representations are equivalent
and one may be converted to the other
and vice versa.
9. common ancestry/base class
• A significant advantage of avoiding new
object types is the common requirement in
cheminformatics of reading reactions,
molecules, queries, peptides, transforms,
and pharmacophores for the same file
format.
• OpenBabel and GGA's Indigo require you
to know the contents of a file/record (mol
vs. rxn) before reading it!?
11. and where this gets used...
• Code/GraphMol/FileParser/MolFileWriter.cpp:
• Function GetMolFileAtomLine contains
• Currently unused local variables
– rxnComponentType
– rxnComponentNumber
• This would allow support for CPSS-style
reactions [i.e. reactions in SD files]
12. inverted protein hierarchy
• Optimize data structures for common ops.
• Poor
for (modelIt mi=mol->getModels(); mi; ++mi)
for (chainIt ci=(*mi)->getChains(); ci; ++ci)
for (resIt ri=(*ci)->getResidues(); ri; ++ri)
for (atomIt ai=(*ri)->getAtoms(); ai; ++ai)
… /* Do something! */
• Better
for (atomIt ai=mol->getAtoms(); ai; ++ai)
… /* Do something! */
13. rdkit pdb file writer features
• By default
– Atoms are written as HETAM records
– Bonds are written as CONECT records
– Bond orders encoded by popular extension
– All atoms are uniquely named
– Molecule title written as COMPND record
– Formal charges are preserved.
– Default residue is “UNL” (not “LIG”).
15. unique atom names
• Counts unique per element type
• C1..C99,CA0..CA9,CB0..CZ9,CAA...CZZ
• This allows for up to 1036 indices.
16. beware openbabel's lig residue
• PDB residue LIG is reserved for 3-pyridin-
4-yl-2,4-dihydro-indeno[1,2-c]pyrazole.
• Use of residue UNL (for UNknown Ligand)
is much preferred.
17. PDB file readers
• Not all connectivity is explicit
• Two options for determining bonds
– Template methods
– Proximity methods
• Two options for determining bond orders
– Template methods
– 3D geometry methods
• See Sayle, “PDB Cruft to Content”, MUG 2001
18. template-based bond orders
• Need only specify the double bonds
• Most (11/20) amino acids only need C=O.
• Arg: C=O,CZ=NH2
• Asn/Asp: C=O,CG=OD1
• Gln/Glu: C=O,CD=OE1
• His: C=O,CG=CD2,ND1=CE1
• Phe/Tyr: C=O,CG=CD1,CD2=CE2,CE1=CZ
• Trp: C=O,CG=CD1,CD2=CE2,CE3=CZ3,CZ2=CH2
19. switching on short strings
// These are macros to allow their use in C++ constants
#define BCNAM(A,B,C) (((A)<<16) | ((B)<<8) | (C))
#define BCATM(A,B,C,D) (((A)<<24) | ((B)<<16) | ((C)<<8) | (D))
switch (rescode) {
case BCNAM('A','L','A'):
case BCNAM('C','Y','S'):
case BCNAM('G','L','Y'):
case BCNAM('I','L','E'):
case BCNAM('L','E','U'):
...
if (atm1==BCATM(' ','C',' ',' ') && atm2==BCATM(' ','O',' ',' '))
return true;
break;
20. proximity bonding
• The ideal distance between two bonded
atoms is the sum of their covalent radii.
• The ideal distance between two non-
bonded atoms is the sum of their van der
Waal's radii.
• Assume bonding between all atoms closer
than the sum of Rcov plus a delta (0.45Å).
• Impose minimum distance threshold (0.4Å)
21. compare the square
• A rate limiting step in many (poorly written)
proximity calculations is the calculation of
square roots.
• Poor
if(sqrt(dx*dx+dy*dy+dz*dz) < radius) ...
• Better
if(dx*dx + dy*dy + dz*dz < radius*radius) …
c.f. http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01683.html