Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Mining and Big Data Analytics in Pharma

  • Login to see the comments

Data Mining and Big Data Analytics in Pharma

  1. 1. Drug Research Software Solutions Proposal for XXXX Accommodator Consultancy Services, Lucknow Dr Vibhor Mahendru Ankur Khanna Accommodator Consultancy Services Lucknow
  2. 2. Drug Research: Common Challenges  EXPENSIVE! - Drug research is expensive. A new drug takes around 15 years and $1.2b from concept to market.  LOW SUCCESS RATE! - The success ratio is extremely low with most candidate molecules being abandoned midway. Two out of three submissions with regulatory authorities result in failures.  LOW GROWTH RATE! - The typical growth rate has reduced from 13% to 5%. Limited resources.  DUPLICATION OF EFFORT! - Companies often end up duplicating research effort as they fail to determine if similar research is taking place somewhere else.  OMICS EMIT UNMANAGEABLE DATA! - Newer technologies have come in, that deal at gene and cell level. Resulting data is Voluminous, in Various formats and gets piled up at a blistering pace. Drug research faces challenges in leveraging these technologies in a timely, effective , efficient and optimum manner. Accommodator Consultancy Services Lucknow
  3. 3. Our Offerings in IT in Life Sciences  TEXT MINING SOLUTIONS  DATA WAREHOUSE SOLUTIONS  DATA MINING SOLUTIONS  DATABASE DEVELOPMENT SOLUTIONS  BIG DATA ANALYTICS  CANCER SOLUTIONS Accommodator Consultancy Services Lucknow
  4. 4. TEXT MINING SOLUTIONS Philosophy – Researchers to be able to find new information found in the various scientific reports and papers published around the world and then absorb that information into their ongoing work and give direction to their work by gathering and analyzing trends. Areas Covered:  Patents  Research papers  Publications  Specialized web sites such as Pubchem, Pubmed covering millions of articles  Social media sites such as Facebook, Twitter, Instagram, blogging forums etc,  Internal collection of documents and information. Deliverable: A set of programs that would automatically run and prepare Reports and documents with relevant summarized and detailed information downloaded from above mentioned sources based on input keywords and events. Accommodator Consultancy Services Lucknow
  5. 5. WHY PATENT MINING  Patent information of Novel bioactive chemical structures related to drug discovery exceed those in journals by at least five-fold.  Patents encompass academic, as well as commercial, global med. chem. output.  Targets, assays, mechanisms of action, disease descriptions and in-vivo data.  ~ 70% of data initially patent-only, some never disclosed elswhere.  Include synthetic descriptions and other useful enabling information.  Precede journal or meeting reports by ~ 1.5 to 5 years.  Can be complementary to papers (e.g. larger SAR matrix).  Intersect with papers at chemistry, target, disease, author and citation levels  IP exploitable for Neglected Tropical Disease research becoming ”open”. Accommodator Consultancy Services Lucknow
  6. 6. PATENT MINING @ NOVARTIS Accommodator Consultancy Services Lucknow MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGG APLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQ GYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHR YYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSI PHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEI ARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSE VLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIV RVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVF EAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWN IFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQD DCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFA VSACHVHDEFRTAAVEGPFVTLDMEDCGYNIPQTDEST LMTIAYVMAAICALFMLPLCLMVCQWRCLRCLRQQH DDFADDISLLK Document Assay Result Compound Target
  7. 7. DATA MINING SOLUTIONS  Definition - Data Mining is an interdisciplinary subfield of computer science that discovers patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.  Philosophy – The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.  Areas Covered:  Virtual HTS and HCS Data  Predictive Toxicology  Life sciences and health related issues trending on social media  FDA datasets  Micro-biomes  Chemo-genomics  Predicting and preventing diseases through gene analysis  Both big an small molecules  Deliverable: Converting raw data into actionable information after detecting patterns and trends, and applying a number of verified algorithms.  Benefits: Improves prediction of early stage drug safety testing. Data mining (as opposed to conventional statistical analysis) can uncover patterns and relationships in large data volumes that are completely unexpected. Patterns can be used to extrapolate and predict. Accommodator Consultancy Services Lucknow
  8. 8. DATA MINING PROCESS & ALGORIGHTMS Accommodator Consultancy Services Lucknow
  9. 9. DATA MINING CASE STUDIES @ Roche: Used DM techniques to set up models for the diagnostic of diabetes high risk group to analyze existing samples sets (including Diabetes II patients and healthy subjects), to identify the factors (age, sex, race, height, weight, BMI value, ADA value) that may cause Diabetes II, and predict the probability of the subjects developing Diabetes II in the next 7 and half years, in order to take preventive measures a traditional statistical methods are not as accurate as DM methods. @ GSK: Data Mining Human Gut Microbiota for therapeutic targets. This could lead to a systems-level understanding of the global physiology of the host–microbiota superorganism in health and disease. Such knowledge will provide a platform for the identification and development of new therapeutic strategies for chronic diseases possibly involving microbial as well as human-host targets that improve upon existing probiotics, prebiotics or antibiotics used text analytics to analyze public discussion boards on BabyCenter.com and WhattoExpect.com, to learn what factors motivate parents to either go ahead or delay vaccinating their children for diseases like measles and mumps. Data mining was used to identify unrecognized drug interaction (pravastatin and paroxetine) that suggested raising blood glucose level manifold. However this would need a careful crystallization of the problem statement by experts to make the exercise useful. Accommodator Consultancy Services Lucknow
  10. 10. DATA MINING CASE STUDIES @ Bayer: GI adverse effect of short term Aspirin use. Meta analysis of AE comparison with similar drugs for mktg. & drug improvement. @ Pfizer: Uses mining to determine if certain AE’s are being reported with greater frequency than expected. large-scale semantic Web-based data mining and network methods to seek to uncover previously undiscovered historical links between chemical compounds, drugs, biological pathways, targets, genes and diseases. By using big data to bring together genomic data, clinical trials and EMR data, Pfizer was able to develop precise drug ‘Xalkori’ which proves very effective for around 5% of patients suffering from cancer who suffer mutation of their ALK gene. Through data mining, this sub section of population was identified which had a healthy lifestyle, yet got affected by cancer. It funded a study that would use genomic data mining to identify antigens in NTS (non-typhoidal salmonella) that may be used as targets for vaccine development. @ Johnson & Johnson: Has built an open source data management system called Transmart. The idea is to combine genomic data sets, from internal and external sources, using the platform's data standards and processing capabilities. This facilitates data mining which provides immense opportunities. Accommodator Consultancy Services Lucknow
  11. 11. DATA MINING CASE STUDIES @ Novartis: In HTS, used Ontology Based Pattern Identification (OPI) algorithm to predict patters by which they were able to find out 1500 scaffold families with significant structure-HTS activity profile relationships. @ Astra Zeneca It uses data-mining tools to identify plausible preclinical Gastro Intestinal effects that may be associated with nausea and that could be of potential use in its prediction. A total of 86 marketed drugs were used in this analysis, and the main outcome was a confirmation that nausogenic and non-nausogenic drugs can be clearly separated based on their preclinical GI observations. . Accommodator Consultancy Services Lucknow
  12. 12. CHEMOGENOMICS DATA MINING Chemogenomics is rapidly emerging as a way of helping discover new disease therapies and uncovering new uses for existing drugs. There are large structure activity databases set up by pharmaceutical companies and commercial vendors. These databases can be mined to derive insights into common properties or structural features among ligands linked to common features of the receptors to which they bind. These insights can then used for the rational compilation of screening sets or the knowledge-based synthesis of chemical libraries to accelerate lead finding. Can be used to reposition drugs and find new applications for existing drugs/molecules/compounds. Four Canadian government research funding agencies will spend around US$6.7 million to create a cloud computing facility and data mining tools that will enable researchers to access and use data from the International Cancer Genome Consortium. DM could lead to a systems-level understanding of the global physiology of the host–microbiota superorganism in health and disease. Such knowledge will provide a platform for the identification and development of new therapeutic strategies for chronic diseases possibly involving microbial as well as human-host targets that improve upon existing probiotics, prebiotics or antibiotics We can collect or organize known GPCR and non GPCR ligands and mining models can be trained based on such properties. New compounds can automatically be classified as ligand or non ligand based compound. Design and knowledge based synthesis of chemical libraries targeting subfamily of purinergic GPCR . Chemical scoffolds can be synthesized. Accommodator Consultancy Services Lucknow
  13. 13. DATA WAREHOUSE SOLUTIONS Definition – Central repository created by integrating data from disparate sources, with past and current data for both operational and strategic decision making and senior management reporting such as annual comparisons of budget per scientist. Goal – to enable users appropriate access to a homogenized, comprehensive and consistent view of the organization, supporting forecasting and decision- making processes at the enterprise level.. Areas Covered:  Bioinformatics research  Finance  HR  Marketing  Disease Management etc Deliverable: Central repository of useful and actionable data integrated from multiple departments and sources and available to end users for operational and strategic decision making in an efficient and effective manner. Benefits: Better use of internal resources, Reduction in critical time path for statistical analysis. Standard exchange of data with CRO’s, partners and regulatory agencies. Cross trial analysis and leveraged use of historical data. Globalization and knowledge sharing. Facilitates open source drug development. Compliance with regulatory authorities. Accommodator Consultancy Services Lucknow
  14. 14. DWH DESIGN Accommodator Consultancy Services Lucknow
  15. 15. DWH @ NOVARTIS Prominent DWH – FDA’s Janus, Johnson and Johnson, Pfizer, Novartis’ Avalon, GSK and Roche DWH Use Cases: Accommodator Consultancy Services Lucknow
  16. 16. DWH USE CASES Novartis: Tell me everything about a given structure  Collect comprehensive data of corporate interest in a single place.  Data grouped by chemical structure.  Standardized data dictionary to describe data.  Chemical structure conventions are unified.  Computed descriptors would be available Given a substructure give me useful calculated descriptors.  Assays physical properties and calculated descriptors are represented uniformly.  Will support changing row model between batch, compound and bioactive. Find all compounds in stock with some publicly known activity.  Integrate structured in house data with external data.  Set the row model by active substance.  Pre defined task based query to automate this kind of query. FDA Janus: Janus creates an integrated data platform for most commercial tools for review, analysis and reporting. It reduces overall cost of information gathering and submissions, development process as well as review and analysis of information. It provides a common data model that is based on the SDTM standard to represent four classes of clinical data submitted to regulatory agencies: tabulation datasets, patient profiles, listings, etc. It provides central access to standardized data, and provides common data views across collaborative partners. It supports cross-trial analyses for data mining and helps detect clinical trends and address clinical hypotheses, and performs more advanced, robust analysis. This enables the ability to contrast and compare data from multiple clinical trials to help improve efficacy and safety. It facilitates a more efficient review process and ability to locate and query data more easily through automated processes and data standards. It provides a potentially broader data view for all clinical trials with proper security, de-identified patient data, and proper agreements in place to share data.Accommodator Consultancy Services Lucknow
  17. 17. ERP v/s DWH People confuse between ERP and DWH. They are different as shown below: Accommodator Consultancy Services Lucknow ERP DWH  Detailed  Summarized  Facilitate data entry & storage  Facilitate quick analysis  Used by Operations  Used by Strategists  End users need to be trained  Generalist end users  No AdHoc reporting  Facilitates ad hoc reports  ERP for biochemical less available  Easily integrates and stores biochemical data
  18. 18. DATABASE SOLUTIONS Drug discovery analytics is traditionally performed on Relational Database Management Systems. However with new discoveries, it does not remain an optimal choice. Discoveries require newer technologies. Commercial RDBMS have kept pace by introducing newer features (such as column store indexes) We design the RDBMS to consolidate data from disparate sources to facilitate analytics. We also convert existing DBMS systems to leverage newly introduced features. We also undertake performance enhancements, provide additional security and other maintenance tasks. Accommodator Consultancy Services Lucknow
  19. 19. BIG DATA SOLUTIONS Definition – A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions. Philosophy – To handle such huge data generated by Omics, regular computers are used that are networked/set up in such a way to make it loss proof and leverage individuals processors to work in synergy and solve bigger problems Companies have started offering cloud storage for big data and publicly available. Areas Covered:  Finding cause of diseases  Repositioning of drugs  Prescription of more effective drugs and procedures. Deliverable - We collect information about possible sources of data for related research area. We analyze the data for volume variety and velocity. We do a small pilot prototype of the big data set up using source big data on cloud. We set up programs to collect and process the data and then try to solve the hypothesis Accommodator Consultancy Services Lucknow
  20. 20. BIG DATA SOLUTIONS Accommodator Consultancy Services Lucknow Use Case 1: Researchers found that previously undetected mutations in a single gene (called LMX1B) triggered focal segmental glomerulosclerosis (FSGS), a disease that scars the kidneys’ filtering system. This was possible after genome data was collected and compared for healthy and diseased individuals. Use Case 2: Big data approach already has predicted the efficacy of drug repurposing for treating colitis — a form of inflammatory bowel disease — small-cell lung cancer and other conditions, according to Scott Saywell, vice president, corporate development, NuMedii. Use Case 3: For patients, the use of big data analytics in drug development results in less trial and error when physicians prescribe drugs. This tighter targeting of drugs to disease also results in fewer side effects. According to new draft policy by Dept. of Biotechnology, Govt. of India, genome based prescription and treatment will be top priority in next few years. The draft policy envisages converting half of hospitals currently engaged in treatment of human diseases to that of prediction and prevention of diseases using genomic tools. It also aims to provide all available genetic screening tests to general public at affordable prices. Genome data processing and analysis has been possible by Big Data as genome (and other omics technologies) for just one individual results in data that tops 80 story building when translated on a paper.
  21. 21. Cancer Solutions Accommodator Consultancy Services Lucknow • We offer collaborate with CDRI and ITRI for providing cancer patients data for further research. • We do research on National Cancer Data Repository providing consultancy on cancer drugs and assisting in cancer research with a goal of personalized cancer solutions. • Any other assistance you would need on this subject.
  22. 22. Value that Accommodator Consultancy would add Accommodator Consultancy Services Lucknow  We have vast experience in data analysis, text and data mining and dealing in technologies compatible with biochemical substances having delivered successful projects throughout the world. We will take the IT and statistics worries away from you so you can concentrate on pure research.  We have the skills to be able to work with large volumes of data and Big Data (Hadoop) source systems.  Vast experience in developing, using and configuring different kinds of bioinformatics software.  Team consists of chemist, data warehouse and data mining professional and senior cancer surgeon.  We firmly believe in providing great value in our service/product offering.
  23. 23. Questions/Comments? Accommodator Consultancy Services Lucknow In the interest of keeping material short, only a simple summary has been provided. Please do not hesitate to ask any questions/clarification for further details. Our contact details: Ankur Khanna: Director Technical 945 166 8432 Dr Vibhor Mahendru: Director Business Development 800 536 5132 THANK YOU

×