This document discusses developing chemoinformatics models to predict molecular properties from structural descriptors. It summarizes benchmarking results for various algorithms predicting logP values on public and proprietary datasets. On public data, top methods had RMSE below 0.7 and over 60% of predictions within 0.5 log units, while on much larger proprietary datasets some methods failed or had higher errors, showing the challenge of predicting a wide chemical space. Applicability domains are important for ensuring reliable predictions across diverse data.
Estimation of subpixel land surface temperature using an endmember index tech...grssieee
1. The study estimated subpixel land surface temperature (LST) at 90m resolution using an endmember index technique based on ASTER and MODIS products over a heterogeneous area.
2. Land cover was classified into four types - vegetation, bare soil, impervious surface and water. Area ratios and endmember indices were analyzed at 990m and 90m resolutions.
3. A genetic algorithm-self-organizing feature map-artificial neural network model was trained to estimate subpixel LST at 90m based on MODIS LST at 990m and ASTER reflectance data endmember indices. The model performance was evaluated.
The document discusses reproducible bioscience data. It describes Susanna-Assunta Sansone as a principal investigator and team leader at the University of Oxford e-Research Centre who gives a presentation on policies, communities, and standards around reproducible bioscience data. The presentation covers topics like preserving institutional memory, utilizing public data, and addressing reproducibility and reuse of public data through community standards and structured data annotation.
Composição da carteira ifix novembro 2013claudiusinhos
O documento apresenta gráficos e dados sobre o desempenho do Índice de Fundos de Investimento Imobiliário (IFIX) comparado ao IBOVESPA e IMOB. Nos últimos 12 meses, o IFIX teve queda de 5,2% enquanto IBOVESPA e IMOB tiveram quedas menores. A composição da carteira do IFIX é liderada pelo BTG Pactual Corporate Office Fund, com 13,15% da participação total.
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...himbaza
1. The document discusses the organization and procedures for the state final certification of 9th grade students in Russia after they complete basic general education programs.
2. Key aspects covered include the subjects assessed, accommodations provided for students with disabilities, timelines, committee roles, and approval of the certification process and materials.
3. Results are used to determine whether students have achieved required competency levels and can advance to the next grade.
This document contains an evaluation of instructor Mansour Lotayif from the Winter 2010 season. He received an average grade of 1.62 from 19 students. The evaluation includes grades for Mr. Lotayif in 4 courses: Transnational Management (average grade 1.6), Intercultural Aspects (average grade 1.6), Seminar (average grade 1.59), and Management Bachelor Thesis (average grade 1.96). Student feedback was generally positive, with average grades between 1.4-2 for most evaluation criteria in the courses.
Una estudiante universitaria que se consideraba de izquierdas discute con su padre de derechas. Ella critica las filosofías de derecha como injustas. Sin embargo, cuando su padre sugiere redistribuir sus buenas notas con su amiga que no estudia mucho, se niega porque trabajó duro por sus logros. Su padre la abraza y le dice "¡¡¡BIENVENIDA A LA DERECHA!!!" indicando que ahora entiende los valores de esfuerzo individual y mérito de la filosofía de derecha.
Estimation of subpixel land surface temperature using an endmember index tech...grssieee
1. The study estimated subpixel land surface temperature (LST) at 90m resolution using an endmember index technique based on ASTER and MODIS products over a heterogeneous area.
2. Land cover was classified into four types - vegetation, bare soil, impervious surface and water. Area ratios and endmember indices were analyzed at 990m and 90m resolutions.
3. A genetic algorithm-self-organizing feature map-artificial neural network model was trained to estimate subpixel LST at 90m based on MODIS LST at 990m and ASTER reflectance data endmember indices. The model performance was evaluated.
The document discusses reproducible bioscience data. It describes Susanna-Assunta Sansone as a principal investigator and team leader at the University of Oxford e-Research Centre who gives a presentation on policies, communities, and standards around reproducible bioscience data. The presentation covers topics like preserving institutional memory, utilizing public data, and addressing reproducibility and reuse of public data through community standards and structured data annotation.
Composição da carteira ifix novembro 2013claudiusinhos
O documento apresenta gráficos e dados sobre o desempenho do Índice de Fundos de Investimento Imobiliário (IFIX) comparado ao IBOVESPA e IMOB. Nos últimos 12 meses, o IFIX teve queda de 5,2% enquanto IBOVESPA e IMOB tiveram quedas menores. A composição da carteira do IFIX é liderada pelo BTG Pactual Corporate Office Fund, com 13,15% da participação total.
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...himbaza
1. The document discusses the organization and procedures for the state final certification of 9th grade students in Russia after they complete basic general education programs.
2. Key aspects covered include the subjects assessed, accommodations provided for students with disabilities, timelines, committee roles, and approval of the certification process and materials.
3. Results are used to determine whether students have achieved required competency levels and can advance to the next grade.
This document contains an evaluation of instructor Mansour Lotayif from the Winter 2010 season. He received an average grade of 1.62 from 19 students. The evaluation includes grades for Mr. Lotayif in 4 courses: Transnational Management (average grade 1.6), Intercultural Aspects (average grade 1.6), Seminar (average grade 1.59), and Management Bachelor Thesis (average grade 1.96). Student feedback was generally positive, with average grades between 1.4-2 for most evaluation criteria in the courses.
Una estudiante universitaria que se consideraba de izquierdas discute con su padre de derechas. Ella critica las filosofías de derecha como injustas. Sin embargo, cuando su padre sugiere redistribuir sus buenas notas con su amiga que no estudia mucho, se niega porque trabajó duro por sus logros. Su padre la abraza y le dice "¡¡¡BIENVENIDA A LA DERECHA!!!" indicando que ahora entiende los valores de esfuerzo individual y mérito de la filosofía de derecha.
Este paciente de 25 años presenta dolor abdominal, vómito y diarrea por 2 semanas. El examen físico muestra taquicardia, hipotensión y dolor abdominal difuso. Los análisis muestran leucocitosis, plaquetopenia, y elevación de enzimas hepáticas y amilasa. Posteriormente desarrolla déficit motor en miembros inferiores y dificultad respiratoria. Los estudios descartan causas infecciosas y reumáticas. La electromiografía muestra polineuropat
Este documento presenta la información de contacto de Vicente Zambrano, incluyendo su nombre de usuario en Twitter y Facebook, así como su dirección de correo electrónico.
The document contains a discussion around a conference or event called #SPSRIC. It includes several Twitter handles like @eavanesian and @usher that were tweeting about the event. There are also several links shared between the tweets related to resources from the event or organization. The document touches on topics like vision, business cases, change management, documentation, governance, planning, and execution as it relates to the #SPSRIC event.
The document describes the NCI/CADD Chemical Identifier Resolver, a web-based tool that converts between different chemical structure identifiers and representations. It indexes over 150 million structures from public databases and assigns unique identifiers to represent chemical structures and related forms in a standardized way. This allows disambiguation of structures and tracking of chemical space.
Disruptive technologies in social commerce, mobile and customer experience have transformed the retail industry. The mission for all brand marketers now is to stay ahead of the curve and spot trends that will boost sales and provide customers with the most efficient and pleasurable online shopping experience.
Redacción de textos Nicolas Arturo Vargasnicolas1629
Este documento describe los elementos clave de una buena redacción, incluyendo la planificación, redacción, revisión y estilo. Explica la importancia de tener una idea clara, coherencia temática y capacidad de argumentar. Además, detalla diferentes tipos de textos como informativos, narrativos, explicativos y argumentativos, así como sus características y estructuras.
HXRefactored - Doesn't Your Mom Deserve BetterSanjay Khurana
Stop designing for the messenger-bag-totting hipster, try addressing the 100 million strong demographic (the 50+) that generates annually over $7.1 trillion in economic activity in the US, and 260 Billion dollars in consumer packaged goods sales. Out-of-pocket consumer health spending is forecasted at $100B over the next 5 years, but poor design and aesthetics are limiting usability and consumer demand. How’s that for a challenge!
Este documento describe una "crisis silenciosa" en los sistemas educativos actuales. Se argumenta que la educación se está enfocando demasiado en la rentabilidad económica a corto plazo en lugar de desarrollar la creatividad y el pensamiento crítico. También se discuten los desafíos planteados por la brecha digital y el impacto de las redes sociales. Europa se ve afectada por la falta de liderazgo y soluciones a largo plazo para la crisis económica.
The document discusses the skills needed for collaborative instructional leadership. It identifies skills such as teaching norms of collaboration, conflict resolution, valuing diverse opinions, knowing resources, facilitating learning-focused conversations, giving and receiving feedback, and leading data-driven dialogue. Specific behaviors are also outlined, including engaging in team building, creating trust, supporting the team, emphasizing staff development, providing peer learning opportunities, being a data coach, aligning collaboration structures, and mediating conflicts. Potential stakeholders for collaborative inquiry on improving Chinese language learning at Tianjin International School are identified as the school administration team, admission director, head of the Chinese department, Chinese teachers, and Chinese parent liaisons.
2015 Ultimate Hiring Toolbox For Small & Medium BusinessesSage HR
The document provides templates and checklists to help small businesses streamline their hiring process from information collection and writing job descriptions to conducting phone screens, on-site interviews, evaluations, and onboarding. It includes forms for intake meetings, job descriptions, phone screening questions, interview preparation checklists, behavioral interview questions, and candidate evaluation forms. The templates are meant to guide small businesses through each step of the recruiting and hiring process.
The document describes several iPad cart and charging solutions from Dukane, including an iPad case with kickstand, a height-adjustable iPad stand called Dewey, and four different iPad carts and cabinets. Cart 1 holds up to 30 iPads for simultaneous charging and has numbered slots. Cart 2 holds up to 32 iPads for charging and syncing and has LED indicators and extra outlets. Cart 3 holds up to 32 iPads and allows access from both sides. The iPad charging cabinet has shelves to hold 30 iPads and includes outlets and space for laptops.
9 2 business environment and business ideasbananaapple2
The document discusses several factors that make up a business's external environment and market. This includes the natural environment surrounding a business location, the demographics of customers and suppliers in the area, competitors, and local government regulations. Transportation networks and infrastructure can also affect business operations by facilitating product transportation and customer accessibility. The local economy and culture influence both entrepreneurs and consumers in terms of financial matters, costs of living, and types of acceptable products. Advances in technology can provide efficiency and productivity improvements.
The document discusses key themes and takeaways from the 2016 Cannes Lions International Festival of Creativity. The main topics covered include:
- The festival focused on the intersection of data, technology, and creativity, with winning work showcasing this balance.
- Gender diversity and portrayal of women in advertising were prominent issues, with campaigns like #WomenNotObjects addressing objectification.
- Brand purpose beyond products was emphasized, with purpose-driven campaigns like REI's #OptOutside performing strongly. Authenticity is important.
- Simplicity remained important amid the focus on new technologies and data, with warnings against overcomplicating campaigns.
This document discusses cheminformatics toolkits and summarizes a presentation about implicit and explicit representations of hydrogens, atom types, aromaticity models, valence models, and performance benchmarks. The author has experience with various cheminformatics software and advocates separating chemistry models from computer science implementations to allow flexibility. Benchmark results on an MDL test set show toolkits have varying interpretations of valence rules.
SCAN is an approach that assigns meaningful labels to chunks of segmented execution traces and relates these segments. It accepts segmented traces as input and labels segments using terms from invoked method signatures. Formal concept analysis is used to group related segments and identify execution phases. The approach was evaluated on traces from JHotDraw and ArgoUML, achieving reasonable precision and recall in labeling segments and discovering relations between them. Future work includes automating phase recognition and further validation.
Sparse feature analysis for detection of clustered microcalcifications in mam...Wesley De Neve
This document analyzes the use of sparse feature analysis for detecting clustered microcalcifications in mammogram images. It compares different feature types, combinations of features, and dictionary construction techniques for sparse representation based classification (SRC) of mammogram images. The experimental results show that texture features like Laws' texture features (LAW) are more effective than shape/morphology features. SRC using LAW features alone or combined with local binary patterns (LBP) achieved high performance. Larger dictionaries containing more atoms resulted in higher discriminative power for the SRC-based detection system.
Paper and pencil_cosmological_calculatorSérgio Sacani
The document describes a paper-and-pencil cosmological calculator designed for the ΛCDM cosmological model. The calculator contains nomograms (graphs) for quantities like redshift, distance, size, age, and more for different redshift intervals up to z=20. It is based on cosmological parameters from the Planck mission of H0=67.15 km/s/Mpc, ΩΛ=0.683, and Ωm=0.317. To use the calculator, the user finds a known value and reads off other quantities at the same horizontal level.
Este paciente de 25 años presenta dolor abdominal, vómito y diarrea por 2 semanas. El examen físico muestra taquicardia, hipotensión y dolor abdominal difuso. Los análisis muestran leucocitosis, plaquetopenia, y elevación de enzimas hepáticas y amilasa. Posteriormente desarrolla déficit motor en miembros inferiores y dificultad respiratoria. Los estudios descartan causas infecciosas y reumáticas. La electromiografía muestra polineuropat
Este documento presenta la información de contacto de Vicente Zambrano, incluyendo su nombre de usuario en Twitter y Facebook, así como su dirección de correo electrónico.
The document contains a discussion around a conference or event called #SPSRIC. It includes several Twitter handles like @eavanesian and @usher that were tweeting about the event. There are also several links shared between the tweets related to resources from the event or organization. The document touches on topics like vision, business cases, change management, documentation, governance, planning, and execution as it relates to the #SPSRIC event.
The document describes the NCI/CADD Chemical Identifier Resolver, a web-based tool that converts between different chemical structure identifiers and representations. It indexes over 150 million structures from public databases and assigns unique identifiers to represent chemical structures and related forms in a standardized way. This allows disambiguation of structures and tracking of chemical space.
Disruptive technologies in social commerce, mobile and customer experience have transformed the retail industry. The mission for all brand marketers now is to stay ahead of the curve and spot trends that will boost sales and provide customers with the most efficient and pleasurable online shopping experience.
Redacción de textos Nicolas Arturo Vargasnicolas1629
Este documento describe los elementos clave de una buena redacción, incluyendo la planificación, redacción, revisión y estilo. Explica la importancia de tener una idea clara, coherencia temática y capacidad de argumentar. Además, detalla diferentes tipos de textos como informativos, narrativos, explicativos y argumentativos, así como sus características y estructuras.
HXRefactored - Doesn't Your Mom Deserve BetterSanjay Khurana
Stop designing for the messenger-bag-totting hipster, try addressing the 100 million strong demographic (the 50+) that generates annually over $7.1 trillion in economic activity in the US, and 260 Billion dollars in consumer packaged goods sales. Out-of-pocket consumer health spending is forecasted at $100B over the next 5 years, but poor design and aesthetics are limiting usability and consumer demand. How’s that for a challenge!
Este documento describe una "crisis silenciosa" en los sistemas educativos actuales. Se argumenta que la educación se está enfocando demasiado en la rentabilidad económica a corto plazo en lugar de desarrollar la creatividad y el pensamiento crítico. También se discuten los desafíos planteados por la brecha digital y el impacto de las redes sociales. Europa se ve afectada por la falta de liderazgo y soluciones a largo plazo para la crisis económica.
The document discusses the skills needed for collaborative instructional leadership. It identifies skills such as teaching norms of collaboration, conflict resolution, valuing diverse opinions, knowing resources, facilitating learning-focused conversations, giving and receiving feedback, and leading data-driven dialogue. Specific behaviors are also outlined, including engaging in team building, creating trust, supporting the team, emphasizing staff development, providing peer learning opportunities, being a data coach, aligning collaboration structures, and mediating conflicts. Potential stakeholders for collaborative inquiry on improving Chinese language learning at Tianjin International School are identified as the school administration team, admission director, head of the Chinese department, Chinese teachers, and Chinese parent liaisons.
2015 Ultimate Hiring Toolbox For Small & Medium BusinessesSage HR
The document provides templates and checklists to help small businesses streamline their hiring process from information collection and writing job descriptions to conducting phone screens, on-site interviews, evaluations, and onboarding. It includes forms for intake meetings, job descriptions, phone screening questions, interview preparation checklists, behavioral interview questions, and candidate evaluation forms. The templates are meant to guide small businesses through each step of the recruiting and hiring process.
The document describes several iPad cart and charging solutions from Dukane, including an iPad case with kickstand, a height-adjustable iPad stand called Dewey, and four different iPad carts and cabinets. Cart 1 holds up to 30 iPads for simultaneous charging and has numbered slots. Cart 2 holds up to 32 iPads for charging and syncing and has LED indicators and extra outlets. Cart 3 holds up to 32 iPads and allows access from both sides. The iPad charging cabinet has shelves to hold 30 iPads and includes outlets and space for laptops.
9 2 business environment and business ideasbananaapple2
The document discusses several factors that make up a business's external environment and market. This includes the natural environment surrounding a business location, the demographics of customers and suppliers in the area, competitors, and local government regulations. Transportation networks and infrastructure can also affect business operations by facilitating product transportation and customer accessibility. The local economy and culture influence both entrepreneurs and consumers in terms of financial matters, costs of living, and types of acceptable products. Advances in technology can provide efficiency and productivity improvements.
The document discusses key themes and takeaways from the 2016 Cannes Lions International Festival of Creativity. The main topics covered include:
- The festival focused on the intersection of data, technology, and creativity, with winning work showcasing this balance.
- Gender diversity and portrayal of women in advertising were prominent issues, with campaigns like #WomenNotObjects addressing objectification.
- Brand purpose beyond products was emphasized, with purpose-driven campaigns like REI's #OptOutside performing strongly. Authenticity is important.
- Simplicity remained important amid the focus on new technologies and data, with warnings against overcomplicating campaigns.
This document discusses cheminformatics toolkits and summarizes a presentation about implicit and explicit representations of hydrogens, atom types, aromaticity models, valence models, and performance benchmarks. The author has experience with various cheminformatics software and advocates separating chemistry models from computer science implementations to allow flexibility. Benchmark results on an MDL test set show toolkits have varying interpretations of valence rules.
SCAN is an approach that assigns meaningful labels to chunks of segmented execution traces and relates these segments. It accepts segmented traces as input and labels segments using terms from invoked method signatures. Formal concept analysis is used to group related segments and identify execution phases. The approach was evaluated on traces from JHotDraw and ArgoUML, achieving reasonable precision and recall in labeling segments and discovering relations between them. Future work includes automating phase recognition and further validation.
Sparse feature analysis for detection of clustered microcalcifications in mam...Wesley De Neve
This document analyzes the use of sparse feature analysis for detecting clustered microcalcifications in mammogram images. It compares different feature types, combinations of features, and dictionary construction techniques for sparse representation based classification (SRC) of mammogram images. The experimental results show that texture features like Laws' texture features (LAW) are more effective than shape/morphology features. SRC using LAW features alone or combined with local binary patterns (LBP) achieved high performance. Larger dictionaries containing more atoms resulted in higher discriminative power for the SRC-based detection system.
Paper and pencil_cosmological_calculatorSérgio Sacani
The document describes a paper-and-pencil cosmological calculator designed for the ΛCDM cosmological model. The calculator contains nomograms (graphs) for quantities like redshift, distance, size, age, and more for different redshift intervals up to z=20. It is based on cosmological parameters from the Planck mission of H0=67.15 km/s/Mpc, ΩΛ=0.683, and Ωm=0.317. To use the calculator, the user finds a known value and reads off other quantities at the same horizontal level.
Sustainable research progress in many scientific disciplines critically depends on the existence of robust specialized databases that integrate and structure all available experimental information in the respective fields. Over years a multitude of chemical formats and approaches were created to address various aspects of handling chemical information and building databases of chemical knowledge. Additional to that inconsistencies in data formatting by individual labs leads to the need to invest significant resources in data curation and interpretation by the technical staff involved in the maintenance of the centralized data collection resource. Acquisition of data from public sources is inefficient, time consuming and limited in scope. The NIH has recently posted its intention to financially support data deposition by investigators through the ‘data sharing plan' for each funded proposal. However, this plan also points to a current weakness of the centralized data sharing and acquisition as all laboratories use different data collection and formatting approaches. It would be far more efficient and useful if there were a standardized data collection and deposition template with standard key terms that could be modified to add new or important additional data or parameters for each investigator. These new features could be ultimately adopted in the classification scheme and guide the scope of the expanding database. This approach would be a win-win as it would enable structure for the investigators laboratory, consistency in data reporting and a means of transmitting data to the database in parallel to publication to eliminate the acquisition step from the process. In this talk we will outline our experience building Open Data Science Platform, a federated database system for direct acquisition, curation and management of research data with integrated Machine Learning capabilities.
Molecular design: How to and how not to?Peter Kenny
This document discusses various topics related to molecular design and quantitative structure-activity relationships (QSAR). It notes some challenges in drug discovery like targeting poorly linked disease targets. It also discusses hypothesis-driven versus prediction-driven molecular design and challenges in predicting toxicity. Various methods for analyzing correlations in structure-activity data are described, including issues like data binning inflating correlations. The document advocates analyzing continuous data as continuous and considering relationships between molecular structures rather than just descriptors. It also discusses limitations of commonly used partitioning systems like octanol/water and highlights alternative approaches.
This document provides guidance on how to estimate the size, effort, cost, and schedule of a software project using a COCOMO II-based estimation toolkit. The key steps include:
1) Estimating the project size in function points and source lines of code based on features and components.
2) Calculating effort based on the size, selected scaling factors, and effort multipliers.
3) Distributing the effort across project phases and calculating costs using selected rate tiers.
4) Generating a project plan and schedule based on resources and potential delays to determine if the estimated duration meets client needs.
The presentation I gave in the plenary session at the ICCS (Presentation A3). It has been slightly modified for publishing. Please contact me if you have any questions!
Benchmark Calculations of Atomic Data for Modelling ApplicationsAstroAtom
This document summarizes benchmark calculations of atomic data for modeling applications. It discusses numerical methods like close-coupling and distorted-wave approaches for calculating atomic collision data. It provides selected results on energy levels, oscillator strengths, and electron-impact excitation cross sections. It also discusses applications to modeling neon discharges and takes a closer look at ionization calculations and examples. The document concludes by discussing the production and assessment of atomic data and outlines challenges in obtaining reliable data from both experiments and calculations.
The TRASGO project aims to develop an innovative cosmic ray detector based on timing RPCs. The detector, called TRASGO, will be able to measure particle timing, tracking, and identification. It will consist of timing RPC planes with 100ps time resolution, a fast tracking algorithm called TimTrack, and a particle identification method called MIDAS. An array of 10-50 TRASGO detectors called MEIGA will be installed to study cosmic rays around the knee and test simulation packages. The MEIGA collaboration has been formed between universities in Spain and Portugal to develop the detectors and carry out the cosmic ray measurements.
Faster, More Effective Flowgraph-based Malware ClassificationSilvio Cesare
Silvio Cesare is a PhD candidate at Deakin University researching malware detection and automated vulnerability discovery. His current work extends his Masters research on fast automated unpacking and classification of malware. He presented this work last year at Ruxcon 2010. His system uses control flow graphs and q-grams of decompiled code as "birthmarks" to detect unknown malware samples that are suspiciously similar to known malware, reducing the need for signatures. He evaluated the system on 10,000 malware samples with only 10 false positives. The system provides improved effectiveness and efficiency over his previous work in 2010.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document presents a Grey Theory approach combined with artificial neural networks (ANN) for assessing the state of power transformers using dissolved gas analysis (DGA). Grey Theory is applied to analyze DGA samples based on partial information to standardize interpretation. Key gases from DGA samples are used as input for the Grey model. The Grey model calculates a "target heart degree" to determine transformer state. An ANN model is developed and validated against the Grey model outputs. The ANN shows some success in validating the benchmarks of the proposed Grey model for assessing transformer condition from DGA results.
The document describes Emerald Feng's work on physiologically-based pharmacokinetic (PBPK) modeling. It discusses how PBPK models are used to quantify absorption, distribution, metabolism, and excretion of chemicals for health risk assessments. It notes that PBPK models vary in complexity from 2-compartment to several compartments. Key parameters like organ flows and compartment partitioning are influenced by intrinsic and extrinsic factors. Molecular descriptors derived from chemical properties are important for PBPK modeling and predicting absorption, distribution, metabolism, and excretion. The document also discusses related topics like quantitative structure-activity relationship modeling and converting PBPK models to a mobile application format.
Flexscore: Ensemble-based evaluation for protein Structure modelsPurdue University
Presentation at ISMB 2016 for the paper on Flexscore. Score for evaluating computational protein models by considering flexibility derived from NMR or molecular dynamics simluation. Paper published on Bioinformatics: http://www.ncbi.nlm.nih.gov/pubmed/27307633
by Kihara Lab http://kiharalab.org
The document discusses a tool called ROSE that mines version histories to suggest related changes when a programmer modifies code. ROSE analyzes past transactions from version control systems to determine associations between code changes and uses these to recommend additional locations that may need to be updated. An evaluation shows that ROSE is able to predict the correct changed entities 15% of the time on average across various projects and its top 3 suggestions are correct 64% of the time.
The document discusses concepts related to the limit of detection (LOD) in chemical analysis. It defines LOD as the lowest concentration of an analyte that can be reliably detected by an analytical method. The document outlines different definitions of LOD and distinguishes it from method sensitivity. It discusses statistical approaches to estimating LOD using parameters like standard deviation of blank measurements. Factors that can affect LOD determination like number of replicates, matrix effects, and instrument performance are also covered. The relationship between LOD and limit of quantification is explained.
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...James Salter
This document summarizes a thesis titled "An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer Networks". It introduces ROME, a new architecture that runs on top of Chord to dynamically control the size of the Chord ring based on workload. The ROME node process monitors workload and attempts actions like replacing overloaded nodes, adding nodes, or removing underloaded nodes to maintain an optimally sized ring. Simulations show ROME can reduce lookup costs compared to a standard Chord ring under changing network conditions like node joins and failures.
Integrating R with the CDK: Enhanced Chemical Data MiningRajarshi Guha
The document discusses integrating the statistical programming language R with chemical toolkits CDK and PubChem. It describes how R can be used to perform cheminformatics tasks like loading molecules, generating fingerprints, calculating descriptors, and building QSAR models. The integration allows accessing cheminformatics functionality from R and performing statistical analysis and modeling of chemical data.
Similar to Developing Chemoinformatics Models (20)
This document discusses student organizations and the university system in Germany. It provides an overview of the different types of higher education institutions in Germany, including universities, universities of applied sciences, and arts universities. It describes the degree system including bachelor's, master's, and Ph.D. programs. It also outlines the systems of student participation at universities, using the examples of Leipzig and Hanover. Student councils, departments, and faculty student organizations are discussed.
The document discusses grand challenges in energy and perspectives on moving towards more sustainable systems. It notes that while global energy demand and CO2 emissions rebounded in 2010 after the economic downturn, urgent changes are still needed. It explores perspectives on changing direction, including overcoming barriers like technologies, economies, management, and mindsets. The document advocates a systems approach and backcasting from desirable futures to identify pathways for transitioning between states.
Engineering can play an important role in sustainable development by focusing on meeting human needs over wants and prioritizing projects that serve the most vulnerable populations. Engineers should consider how their work impacts sustainability, affordability, and accessibility. A socially sustainable product is manufactured sustainably and also improves people's lives. Engineers are not neutral and should strive to serve societal needs rather than just generate profits. They can help redefine commerce and an engineering culture focused on meeting needs sustainably through services rather than creating unnecessary products and infrastructure.
Consensus and interaction on a long term strategy for sustainable developmentSSA KPI
The document discusses the need for a long-term vision for sustainable development to address major challenges like climate change, resource depletion, and inequity. A long-term perspective is required because these problems will take consistent action over many years to solve. However, short-term solutions may counteract long-term goals if not guided by an overall strategic vision. Developing a widely accepted long-term sustainable development vision requires input from many stakeholders to find balanced solutions and avoid dead ends. Strategic decisions with long-lasting technological and social consequences need a vision that can adapt to changing conditions over time.
Competences in sustainability in engineering educationSSA KPI
The document discusses competencies in sustainability for engineering education. It defines competencies and lists taxonomies that classify competencies into categories like knowledge, skills, attitudes, and ethics. Engineering graduates are expected to have competencies like critical thinking, systemic thinking, and interdisciplinarity. Analysis of competency frameworks from different universities found that competencies are introduced at varying levels, from basic knowledge to complex problem solving and valuing sustainability challenges. The document also outlines the University of Polytechnic Catalonia's framework for its generic sustainability competency.
The document discusses concepts related to sustainability including carrying capacity, ecological footprint, and the IPAT equation. It provides data on historical and projected world population growth. Examples are given showing the ecological footprint of different countries and how it is calculated based on factors like energy use, agriculture, transportation, housing, goods and services. The human development index is also introduced as a broader measure than GDP for assessing well-being. Graphs illustrate the relationship between increasing HDI, ecological footprint, and the goal of transitioning to sustainable development.
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesSSA KPI
Huygens observed that two pendulum clocks suspended near each other would synchronize their swings to be 180 degrees out of phase. He conducted experiments that showed the synchronization was caused by small movements transmitted through their common frame. While this discovery did not help solve the longitude problem as intended, it sparked further investigations into coupled oscillators and synchronization phenomena.
1) The document discusses whether dice rolls and other mechanical randomizers can truly produce random outcomes from a dynamics perspective.
2) It analyzes the equations of motion for different dice shapes and coin tossing, showing that outcomes are theoretically predictable if initial conditions can be reproduced precisely.
3) However, in reality small uncertainties in initial conditions mean mechanical randomizers can approximate random processes, even if they are deterministic based on their underlying dynamics.
This document discusses the concept of energy security costs. It defines energy security costs as externalities associated with short-term macroeconomic adjustments to changes in energy prices and long-term impacts of monopoly or monopsony power in energy markets. The document provides references on calculating health and environmental impacts of electricity generation and assessing costs and benefits of oil imports. It also outlines a proposed 4-hour course on basic concepts, examples, and a case study analyzing energy security costs for Ukraine based on impacts of increasing natural gas import prices.
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsSSA KPI
This document provides an overview of naturally occurring radioactivity (NOR) and naturally occurring radioactive materials (NORM) with a focus on their relevance to the oil and gas industry. It discusses the main radionuclides of interest, including radium-226, radium-228, uranium, radon-222, and lead-210. It also summarizes the origins of NORM in the oil and gas industry and the types of radiation emitted by NORM.
Advanced energy technology for sustainable development. Part 5SSA KPI
All energy technologies involve risks that must be carefully evaluated and minimized to ensure sustainable development. No technology is perfectly safe, so ongoing analysis of benefits, risks and impacts is needed. Public understanding and acceptance of risks is also important.
Advanced energy technology for sustainable development. Part 4SSA KPI
The document discusses the impacts and benefits of energy technology research, using fusion research as a case study. It outlines four pathways through which energy research can impact economies and societies: 1) direct economic effects, 2) impacts on local communities, 3) impacts on industrial technology capabilities, and 4) long-term impacts on energy markets and technologies. It then analyzes the direct and indirect economic impacts of fusion research investments and the technical spin-offs that fusion research has produced. Finally, it evaluates the potential future role of fusion electricity in global energy markets under environmental constraints.
Advanced energy technology for sustainable development. Part 3SSA KPI
This document discusses using fusion energy for sustainable development through biomass conversion. It proposes a system where fusion energy is used to provide heat for gasifying biomass into synthetic fuels like methane and diesel. Experiments show biomass can be over 95% converted to hydrogen, carbon monoxide and methane gases using nickel catalysts at temperatures of 600-1000 degrees Celsius. A conceptual biomass reactor is presented that could process 6 million tons of biomass per year, consisting of 70% cellulose and 30% lignin, into synthetic fuels to serve as carbon-neutral transportation fuels. Fusion energy could provide the high heat needed for the gasification and synthesis processes.
Advanced energy technology for sustainable development. Part 2SSA KPI
The document summarizes fusion energy technology and its potential for sustainable development. Fusion occurs at extremely high temperatures and is the process that powers the Sun and stars. Researchers are working to develop fusion energy on Earth using hydrogen isotopes as fuel. Key challenges include confining the hot plasma long enough at high density for fusion reactions to produce net energy gain. Progress is being made towards achieving the conditions needed for a sustainable fusion reaction as defined by Lawson's criteria.
Advanced energy technology for sustainable development. Part 1SSA KPI
1. The document discusses the concept of sustainability and sustainable systems. It provides an example of a closed ecosystem with algae, water fleas, and fish, where energy and material balances must be maintained for long-term stability.
2. Key requirements for a sustainable system include energy balance between inputs and outputs, recycling of materials or wastes, and mechanisms to control population relationships and prevent overconsumption of resources.
3. Historically, the environment was seen as external and unchanging, but it is now recognized that the environment co-evolves interactively with the living creatures within it.
This document discusses the use of fluorescent proteins in current biological research. It begins with an overview of the development of optical microscopy and fluorescence techniques. It then focuses on the green fluorescent protein (GFP) and how it has been used as a molecular tag to study protein expression and interactions in living cells through techniques like gene delivery, transfection, viral infection, FRET, and optogenetics. The document concludes that fluorescent proteins have revolutionized cell biology by enabling the real-time visualization and control of molecular pathways and signaling processes in living systems.
Neurotransmitter systems of the brain and their functionsSSA KPI
1. Neurotransmitters are chemical substances released at synapses that transmit signals between neurons. The main neurotransmitters in the brain are acetylcholine, serotonin, dopamine, norepinephrine, glutamate, GABA, and endorphins.
2. Each neurotransmitter system is involved in regulating key brain functions and behaviors such as movement, mood, sleep, cognition, and pain perception.
3. Neurotransmitters act via membrane receptors on target neurons, including ionotropic receptors that are ligand-gated ion channels and metabotropic G-protein coupled receptors.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
How Barcodes Can Be Leveraged Within Odoo 17Celine George
In this presentation, we will explore how barcodes can be leveraged within Odoo 17 to streamline our manufacturing processes. We will cover the configuration steps, how to utilize barcodes in different manufacturing scenarios, and the overall benefits of implementing this technology.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
1. Developing chemoinformatics models: One can’t
embrace unembraceable
Igor V. Tetko
Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH)
Institute of Bioinformatics & Systems Biology
Kyiv, 10 August 2009, Summer School
2. Representation of Molecules
Can be defined with calculated
properties (logP, quantum-
chemical parameters, etc.) HO "12.3%
$ '
$ 4.6 '
Can be defined with a set of $ M '
structural descriptors $
13.2'
'
$
(topological 2D, 3D, etc.). N
$10.1'
# &
Goal is to correlate descriptors
with some properties. !
HO
One of these sets of descriptors "13.7%
$ '
could be used for determine an $ 4.8 '
applicability domain of a model. $ M '
N $ '
$15.8'
$12.0'
# &
!
Distance to model:
3. "One can not embrace the unembraceable.”
Possible: 1060 - 10100 molecules theoretically exist
( > 1080 atoms in the Universe)
Achievable: 1020 - 1024 can be synthesized now
by companies (weight of the Moon is ca 1023 kg)
Available: 2*107 molecules are on the market
Measured: 102 - 104 molecules with data
Kozma Prutkov
Problem: To predict these properties of just molecules
on the market we must extrapolate data from one to
1,000 - 100,000 molecules! OH O
O N
There is a need for methods
which can estimate
the accuracy of predictions!
4. Models can fail due to chemical diversity
of training & test sets (i.e. outside of applicability
domain)
Training set data used
to develop a model
Our model given
the training set
New data to be estimated
Correct model
5. Declining R&D productivity in the
pharmaceutical industry
Approved medicine
http://www.frost.com/prod/servlet/market-insight-top.pag?docid=128394740
6. Benchmarking data: logP prediction
Public dataset: Industrial datasets:
N=266 molecules1 N=95809 (Pfizer Inc.)2
• N=233 Star set (supported with N=882 (Nycomed GmbH)3
experimental values from CLOGP
v5.0 program)
2logP
and logD (pH 7.4)
• N=43 Non-Star set (no
experimental logP values in measurements
CLOGP v5.0) 3logP measurements only
1Provided by A. Avdeef, “Absorption
and drug development. Solubility,
permeability and charge state”, ed.
Hoboken, NJ: Wiley–Interscience,
2003.
Mannhold, R. et al, J. Pharm. Sci., 2009, 98(3), 861-893.
7. Performance of algorithms for the public dataset
Star set (N = 223) Non-Star set (N = 43)
Method % within error range % within error range
RMSE rank <0.5 0.5-1 >1 RMSE rank <0.5 0.5-1 >1
AB/LogP 0.41 I 84 12 4 1.00 I 42 23 35
S+logP 0.45 I 76 22 3 0.87 I 40 35 26
ACD/logP 0.50 I 75 17 7 1.00 I 44 33 23
Consensus log P 0.50 I 74 18 8 0.80 I 47 28 26
CLOGP 0.52 II 74 20 6 0.91 I 47 28 26
VLOGP OPS 0.52 II 64 21 7 1.07 I 33 28 26
AAM = average logP used ALOGPS 0.53 II 71 23 6 0.82 I 42 30 28
as predicted value for all MiLogP 0.57 II 69 22 9 0.86 I 49 30 21
XLOGP 0.62 II 60 30 10 0.89 I 47 23 30
molecules R2=0 KowWIN 0.64 II 68 21 11 1.05 I 40 30 30
CSlogP 0.65 II 66 22 12 0.93 I 58 19 23
Bootstrap test: ALOGP (Dragon) 0.69 II 60 25 16 0.92 I 28 40 33
• rank I - similar to “best MolLogP 0.69 II 61 25 14 0.93 I 40 35 26
ALOGP98 0.70 II 61 26 13 1.00 I 30 37 33
model” OsirisP 0.71 II 59 26 16 0.94 I 42 26 33
• rank II -- better than AAM VLOGP 0.72 II 65 22 14 1.13 I 40 28 33
• rank III - similar to AAM TLOGP 0.74 II 67 16 13 1.12 I 30 37 30
ABSOLV 0.75 II 53 30 17 1.02 I 49 28 23
QikProp 0.77 II 53 30 17 1.24 II 40 26 35
QuantlogP 0.80 II 47 30 22 1.17 II 35 26 40
SLIPPER-2002 0.80 II 62 22 15 1.16 II 35 23 42
COSMOFrag 0.84 II 48 26 19 1.23 II 26 40 33
1Provided by A. Avdeef, XLOGP2 0.87 II 57 22 20 1.16 II 35 23 42
Absorption and drug QLOGP 0.96 II 48 26 25 1.42 II 21 26 53
VEGA 1.04 II 47 27 26 1.24 II 28 30 42
development. Solubility,
CLIP 1.05 II 41 25 30 1.54 III 33 9 49
permeability and charge LSER 1.07 II 44 26 30 1.26 II 35 16 49
state, ed. Hoboken, NJ: MLOGP (Sim+) 1.26 II 38 30 33 1.56 III 26 28 47
Wiley–Interscience, NC+NHET 1.35 III 29 26 45 1.71 III 19 16 65
SPARC 1.36 III 45 22 32 1.70 III 28 21 49
2003. MLOGP(Dragon) 1.52 III 39 26 35 2.45 III 23 30 47
LSER UFZ 1.60 III 36 23 41 2.79 III 19 12 67
AAM 1.62 III 22 24 53 2.10 III 19 28 53
VLOGP-NOPS 1.76 III 1 1 7 1.39 III 7 0 7
HINT 1.80 III 34 22 44 2.72 III 30 5 65
GBLOGP 1.98 III 32 26 42 1.75 III 19 16 65
8. Performance of algorithms for in-house datasets
Benchmarking of logP Pfizer set (N = 95 809) Nycomed set (N = 882)
methods for in-house Method
RMSE Failed1 rank % in error range
<0.5 0.5-
1
>1
RMSE,
zwitterions
excluded2
RMSE rank % in error range
<0.5 0.5-
1
>1
data of Pfizer & Nycomed Consensus log P 0.95 I 48 29 24 0.94 0.58 I 61 32 7
ALOGPS 1.02 I 41 30 29 1.01 0.68 I 51 34 15
• Large number of methods could not S+logP 1.02 I 44 29 27 1.00 0.69 I 58 27 15
perform better than AAM model NC+NHET 1.04 II 38 30 32 1.04 0.88 III 42 32 26
MLOGP(S+) 1.05 II 40 29 31 1.05 1.17 III 32 26 41
• Best results are calculated using
XLOGP3 1.07 II 43 28 29 1.06 0.65 I 55 34 12
Consensus logP model
MiLogP 1.10 27 II 41 28 30 1.09 0.67 I 60 26 14
• log P = 1.46 + 0.11 (NC - NHET) AB/LogP 1.12 24 II 39 29 33 1.11 0.88 III 45 28 27
N=95 809, RMSE=1.04, R2=0.2 ALOGP 1.12 II 39 29 32 1.12 0.72 II 52 33 15
ALOGP98 1.12 II 40 28 32 1.10 0.73 II 52 31 17
OsirisP 1.13 6 II 39 28 33 1.12 0.85 II 43 33 24
AAM 1.16 III 33 29 38 1.16 0.94 III 42 31 27
Different MlogP implementations CLOGP 1.23 III 37 28 35 1.21 1.01 III 46 28 22
ACD/logP 1.28 III 35 27 38 1.28 0.87 III 46 34 21
demonstrate very different CSlogP 1.29 20 III 37 27 36 1.28 1.06 III 38 29 33
performances for both sets COSMOFrag 1.30 10883 III 32 27 40 1.30 1.06 III 29 31 40
QikProp 1.32 103 III 31 26 43 1.32 1.17 III 27 24 49
KowWIN 1.32 16 III 33 26 41 1.31 1.20 III 29 27 44
QLogP 1.33 24 III 34 27 39 1.32 0.80 II 50 33 17
XLOGP2 1.80 III 15 17 68 1.80 0.94 III 39 31 29
N.B! Do we really compare MLOGP(Dragon) 2.03 III 34 24 42 2.03 0.90 III 45 30 25
methods or their implementations? 1Nr of molecu les with ca lculations failures due to errors or crash of programs. All methods predicted all
molecules for the Nycomed dataset. 2RMSE calculated after excluding of 769 zwitterionic compounds from the
Pfizer dataset. 3Most molecules failed by COSMOFrag are zwitterions.
Mannhold, R. et al, J. Pharm. Sci., 2009, 98(3), 861-893.
9. Optimization of a drug: we need to find a road in
multidimensional space
15. Nearest neighbors and activity
1 y
0.8
0.6
0.4
0.2
y=exp(-(x1+x2)2) 0
-3 -2 -1 0 1 2 3
x The nearest
x=x1+x2 ! x2
4
neighbors in
descriptor space
are not the
2
neighbors in
0 x property!
-2
-4
-3 -2 -1 0 1 2 3
x1
16. Ensemble methods
Hansen, L.K.; Salamon, P. IEEE Trans. Pattern. Anal. Mach. Learn., 1990, 12, 993.
Tetko, I. V.; Luik, A. I.; Poda, G. I. J. Med. Chem., 1993, 36, 811.
Tetko, I.V.; Livingstone, D. J.; Luik, A. I. Neural Network Studies. 1. Comparison of
Overfitting and Overtraining. J. Chem. Inf. Comput. Sci. 1995, 35(5), 826.
17. Who of them would be elected in 1904?
2000
G.W. Bush A. Gore
republican democrat
19. President elections in the USA by states
The distribution of votes in states creates a typical fingerprint,
which is invisible when we use just average values
Democrats of 2000 == Republicans of 1904
20. Ensemble Based Distance
HO
" net 1 %
"12.3%
$ '
$ '
$ net 2 '
$ 4.6 '
$ M '
logP=3.11 $ M '
$ ' Morphinan-3-ol, 17-methyl-
$net 63'
$ '
N
$13.2'
$net 64'
# &
$
#10.1&'
HO
" net 1 %
"13.7%
$ '
$ '
$ net 2 '
logP=3.48 $ 4.8 ' ! Levallorphan
! $ M '
$ M '
$ '
$net 63'
N $ '
$15.8' $ '
$12.0' #net 64&
# &
!
!
CORREL -- correlation
between model vectors
for any two molecules
21. Nearest neighbors and activity
A B
1 y 1 y
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
x x
x=x1+x2 C x2
4
2 CORREL measure
0 x
correctly detect the
nearest neighbors!
-2
-4
-3 -2 -1 0 1 2 3
x1
22. Local models: Instant learning of logP for Pt(II) molecules
H2 N Cl
Pt
Cl H2N Cl
Cl N Cl
H2N Cl Pt
N Cl
Pt
H 2N Cl
Cl
N
Pt H2N Cl
N Cl H2N
Pt
Cl H2N Cl
Pt
H2N Cl
O
N O H3N Cl
Pt
N O Pt
O H3N Cl
Prediction of new classes of compounds can be extremely difficult as
exemplified by an absence of correlations between predicted and experimental
values using developed by us ALOGPS program.
Tetko et al, J. Inorg. Biochem, 2008, 102, 1424-37.
23. Local models: Instant learning by knowledge transfer
The use of LIBRARY mode (local correction of the global model)
dramatically (5 times!) decreased logP errors,
Tetko et al, J. Inorg. Biochem, 2008, 102, 1424-37.
24. Performance of algorithms for in-house datasets
Pfizer set (N = 95 809) Nycomed set (N = 882)
% in error range RMSE, % in error range
RMSE Failed1 rank RMSE rank
<0.5 0.5- >1 zwitterions <0.5 0.5- >1
Method 1 excluded2 1
Consensus log P 0.95 I 48 29 24 0.94 0.58 I 61 32 7
ALOGPS 1.02 I 41 30 29 1.01 0.68 I 51 34 15
S+logP 1.02 I 44 29 27 1.00 0.69 I 58 27 15
NC+NHET 1.04 II 38 30 32 1.04 0.88 III 42 32 26
MLOGP(S+) 1.05 II 40 29 31 1.05 1.17 III 32 26 41
XLOGP3 1.07 II 43 28 29 1.06 0.65 I 55 34 12
MiLogP 1.10 27 II 41 28 30 1.09 0.67 I 60 26 14
AB/LogP 1.12 24 II 39 29 33 1.11 0.88 III 45 28 27
ALOGP 1.12 II 39 29 32 1.12 0.72 II 52 33 15
ALOGP98 1.12 II 40 28 32 1.10 0.73 II 52 31 17
OsirisP 1.13 6 II 39 28 33 1.12 0.85 II 43 33 24
AAM 1.16 III 33 29 38 1.16 0.94 III 42 31 27
CLOGP 1.23 III 37 28 35 1.21 1.01 III 46 28 22
ACD/logP 1.28 III 35 27 38 1.28 0.87 III 46 34 21
CSlogP 1.29 20 III 37 27 36 1.28 1.06 III 38 29 33
COSMOFrag 1.30 10883 III 32 27 40 1.30 1.06 III 29 31 40
QikProp 1.32 103 III 31 26 43 1.32 1.17 III 27 24 49
KowWIN 1.32 16 III 33 26 41 1.31 1.20 III 29 27 44
QLogP 1.33 24 III 34 27 39 1.32 0.80 II 50 33 17
XLOGP2 1.80 III 15 17 68 1.80 0.94 III 39 31 29
MLOGP(Dragon) 2.03 III 34 24 42 2.03 0.90 III 45 30 25
1Nr of molecu les with ca lculations failures due to errors or crash of programs. All methods predicted all
molecules for the Nycomed dataset. 2RMSE calculated after excluding of 769 zwitterionic compounds from the
Pfizer dataset. 3Most molecules failed by COSMOFrag are zwitterions.
Mannhold, R. et al, J. Pharm. Sci., 2009, 98(3), 861-893.
25. Local models: Instant learning of in-house data
(Pfizer Inc.), N=95809
ALOGPS Blind prediction ALOGPS LIBRARY
RMSE=1.02 RMSE=0.59
in less than 30 minutes of calculations on a notebook!
Tetko, Poda, Ostermann, Mannhold, QSAR Comb. Sci, 2009, in press.
26. REACH and QSAR (Quantitative Structure
Activity Relationship) models
> 140,000 chemicals to be registered … is a lot!
It is expensive to measure all of them ($200,000 per compound), a lot of
animal testing
QSAR models can be used to prioritize compounds
• Compound is predicted to be toxic
• Biological testing will be done to prove/
disprove the models
• Compound is predicted to be not toxic
• tests can be avoided, saving money, animals
• but ... only if we are confident in the predictions
• FP7 project CADASTER http://www.cadaster.eu has
goals to develop a strategy for use of in silico methods
in REACH
27. Estimation toxicity of T. pyriformis
Initial Dataset1,2
n=983 molecules
n=644 training set
n=339 test set 1
Test set 2:
n=110 molecules1,2
The overall goal is to predict and to assess the reliability of predictions
toxicity against T. pyriformis for chemicals directly from their structure.
1Zhu et al, J. Chem. Inf. Comput. Sci, 2008, 48(4), 766-784.
2Schultz et al, QSAR Comb Sci, 2007, 26(2), 238-254.
28. Overview of analyzed distances to models (DMs)
EUCLID k
TANIMOTO
"d j "x a,i x b,i
EUm= j=1
k is number of nearest Tanimoto(a,b) =
k "x a,i x a,i +" x
b,i x b,i # " x a,i x b,i
neighbors, m index of
EUCLID = EU m xa,i and xb,i are fragment counts
model
! !
LEVERAGE PLSEU (DModX)
!
LEVERAGE=xT(XTX)-1x Error in approximation (restoration) of the
vector of input variables from the latent
variables and PLS weights.
1 2
CORREL
STD STD =
N "1
#( yi " y)
CORREL(a) =maxj CORREL(a,j)=R2(Yacalc,Yjcalc)
yi is value calculated with model i and y is average Ya=(y ,…,y ) is vector of predictions of molecule i
! 1 N
value
!
29. Mixture of Gaussian
Distributions (MGD)
Idea is to find a MGD,
which maximize
likelihood (probability)
! N(0,!2(ei))
of the observed
distribution of errors
30. MGDs for the simulated datasets
A) No significant
MGD was found
B) A MGD composed
of 3 Gaussian
distributions was
found
33. Estimations based on training set errors
Tetko et al, J Chem Inf Model, 2008, 48(9):1733-46.
34. Estimations of errors using MGD and 5-fold
cross-validation
Tetko et al, J Chem Inf Model, 2008, 48(9):1733-46.
35. Prediction accuracy for training and two external sets
Estimated experimental
accuracy is about
SE = 0.38
HPV (High Production
Volume): 3182
EINECS (REACH): 48774
Challenge to predict toxicity organized in collaboration with European Neural Network Society
at http://www.cadaster.eu
Tetko et al, J Chem Inf Model, 2008, 48(9):1733-46.
36. Challenges and solutions
new measurement
N
O O
reliable predictions
O N O
O OH
N O
new series to predict
Our methodology allows confident navigation in a defined molecular space.
! It can be used to develop targeted (local) models covering specific series.
! It can be used to reliably estimate which compounds can/can’t be reliably predicted.
! It can be used to guide experimental design and to minimize costs of new measurements.
37. Acknowledgements
My group Collaborators:
Mr I. Sushko Dr. C. Höfer (DMPKore)
Mr S. Novotarskyi Dr. G. Poda (Pfizer)
Mr A.K. Pandey Dr. C. Ostermann (Nycomed)
Mr R. Körner Prof. R. Mannhold (Düsseldorf
Mr S. Brandmaier University)
Dr M. Rupp Prof. A. Tropsha (NC, USA)
Prof. T. Oprea (New Mexico, USA)
+ many other colleagues &
Visiting Scientists
co-authors
Dr. V. Kovalishyn
Dr. V. Prokopenko
Prof. J. Emmersen Funding
FP7 MC ITN ECO
GO-Bio BMBF http://qspr.eu
FP7 CADASTER http://www.cadaster.ue
Germany-Ukraine grant UKR 08/006
FP6 INTAS VCCLAB http://www.vcclab.org
DFG TE 380/1-1