This document discusses leveraging large molecular datasets and clinical data to transform drug discovery using precision medicine approaches. The author describes how advances in omics technologies have enabled the generation of big molecular data from sources like DNA, RNA, proteins, and metabolites. Disease and drug characteristics can be modeled as functions of these molecular features. As an example, the author details their work using gene expression signatures to characterize hepatocellular carcinoma (HCC) and identify existing drugs that reverse the disease signature, including the drug NEN. Integrating these large molecular datasets holds promise for precision medicine approaches to drug discovery and development.
Review that illustartes the link between bile acids and non steroideal anti-inflmmatory drugs and aspirin in the small intestine.
Discovery of the activity of liquorice on FXR
Review that illustartes the link between bile acids and non steroideal anti-inflmmatory drugs and aspirin in the small intestine.
Discovery of the activity of liquorice on FXR
Network pharmacology: From BioAssay Response Data to NetworkBin Chen
Network pharmacology, comparing protein pharmacology networks built from Ligand based approach (Similarity Ensemble Approach) with those built from BioAssay response data.
Towards semantic systems chemical biology Bin Chen
introduce a semantic framework for studying systems chemical biology / systems pharmacology, in which three major projects (Chem2Bio2RDF, Chem2Bio2OWL, SLAP (semantic link association prediction) are covered.
Bibliological data science and drug discoveryJeremy Yang
Presented at the 2016 ACS Fall Meeting in Philadelpha, session "Effectively Harnessing the World's Literature to Inform Rational Compound Design", on 8/21/16.
The Uneven Future of Evidence-Based MedicineIda Sim
An Apple ResearchKit study enrolled 22,000 people in five days. A
study claims that Twitter can be used to identify depressed patients. A computer program crunches genomic data, the published literature, and electronic health record data to guide cancer treatment. The pace, the data sources, and the methods for generating medical evidence are changing radically. What will — what should — evidence-based medicine look like in a faster, personalized, data-dense tomorrow?
- Presented as the 3rd Annual Cochrane Lecture, October 2015 in Vienna, Austria.
Will data scientists lead the discovery of cancer therapeutics?Laura Berry
Presented at the Global Pharma R&D Informatics Congress. To find out more, visit:
www.global-engage.com
The rapidly decreasing cost of molecular measurement technologies not only enables the profiling of disease samples but also the cellular signatures of individual drugs in clinically relevant models. In this presentation, Bin Chen from the University of California San Francisco, proposes a systems-approach to identifying drugs that reverse the molecular state of a disease.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Network pharmacology: From BioAssay Response Data to NetworkBin Chen
Network pharmacology, comparing protein pharmacology networks built from Ligand based approach (Similarity Ensemble Approach) with those built from BioAssay response data.
Towards semantic systems chemical biology Bin Chen
introduce a semantic framework for studying systems chemical biology / systems pharmacology, in which three major projects (Chem2Bio2RDF, Chem2Bio2OWL, SLAP (semantic link association prediction) are covered.
Bibliological data science and drug discoveryJeremy Yang
Presented at the 2016 ACS Fall Meeting in Philadelpha, session "Effectively Harnessing the World's Literature to Inform Rational Compound Design", on 8/21/16.
The Uneven Future of Evidence-Based MedicineIda Sim
An Apple ResearchKit study enrolled 22,000 people in five days. A
study claims that Twitter can be used to identify depressed patients. A computer program crunches genomic data, the published literature, and electronic health record data to guide cancer treatment. The pace, the data sources, and the methods for generating medical evidence are changing radically. What will — what should — evidence-based medicine look like in a faster, personalized, data-dense tomorrow?
- Presented as the 3rd Annual Cochrane Lecture, October 2015 in Vienna, Austria.
Will data scientists lead the discovery of cancer therapeutics?Laura Berry
Presented at the Global Pharma R&D Informatics Congress. To find out more, visit:
www.global-engage.com
The rapidly decreasing cost of molecular measurement technologies not only enables the profiling of disease samples but also the cellular signatures of individual drugs in clinically relevant models. In this presentation, Bin Chen from the University of California San Francisco, proposes a systems-approach to identifying drugs that reverse the molecular state of a disease.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Leveraging molecular and clinical data to transform drug discovery in the era of precision medicine
1. Leveraging molecular and clinical data to
transform drug discovery in the era of
precision medicine
Bin Chen, PhD
Instructor,Dept. of Pediatrics
Institute for Computational Health Sciences
DDW, 2016
Bin. Chen@ucsf.edu
10. Disease = f(molecular data)
Genome
Transcriptome
Proteome
Metabolome
Figure adapted from The Cancer Genome Atlas Research Network, Nature Genetics 45, 1113–1120 (2013)
Diseases can be characterized using molecular features
12. Outcome = f(Drug, Disease)
Matching disease and drug using their molecular features
transcriptome proteome metabolome
13. Muhammed A Y, Nature Biotechnology, 2007
No magic bullet?
14. Drugs that reverse expression of a spectrum of disease genes
may be its therapeutic agents
15. • >0.5 million HCC patients every year worldwide
• Second leading cause of cancer death in the world
Hepatocellular Carcinoma (HCC) is a global health problem
Hashem B. El-Serag,
N Engl J Med 2011
22. Expression of 85% HCC patients
can be reversed by NEN
0
50
100
150
200
GSE14520_GPL3921
GSE14520_GPL571GSE54236_GPL6480
TCGA
Patientcounts
reversed
no
yes
Patient reversed by NEN
Patient not reversed by NEN
HCC
NENinviv
o
0
D
losamide
35. Acknowledgement
Andy Godwin
Scott Weir
Yan Ma
Ziyan Pessetto
University of Kansas
Stanford Asian Liver Center
Samuel So
Mei-Sze Chua
Wei Wei
Li Ma
Butte Lab
Atul Butte
Boris Oskotsky
Mary Lyall
Hua Fan-Minogue
Marina Sirota
Hyojung Paik
Dexter Hadley
Dvir Aran
Bin. Chen@ucsf.edu
Menghua Wu
Jane Wei
Editor's Notes
Precisoin medicine using clinical and epidemical data?
Talking about big data, how big is it in the biomedical domain. In 2013, the EMBL EBI hosted 15 petabytes data and it quickly increased to 25 petabytes in 2014. Well what does 25 petabytes data mean?
the storage of one personal laptop is about 2T… the attendees of all DDW is about . I heard the total attendeeds. 25 petabytes is equal to the space of over 12,000 personal laptops in total.. You cannot imagine how much information can be stored in these laptops. But it does include over 120,000 datasets, over 120,000 datasets. these numbers just simply reflect the complexity and growth of the data from one single institute.
This growth of big data is largely due to increasing use of high throughput technologies and decreasing costs of genomics, sequencing,
This figure, which you may see quite often, simply shows how the cost per genome drops in the last decade. From 100M in 2001 to less than 1K today. and the
Single cell– tissue– organism --
I also want to add the points that Other than genome, you can get other types of data with a very reasonable cost. For example, a couple of hundreds of running RNA sequencing, proteomics and metabolomics.
With these technologies and the decent cost, we can quickly generate, the molecular data of the disease, from single cells to organs, from cancer cells to microorganisms, from cell lines to genetically modified mice to individual patients, and from one time point to the longitudinal course of progression. As we started to analyze these data, we finally realized that dammit, it’s really complex. For example: the molecular makeup of tumor cells are quite different in different regions even in the same patient. Because of the complexity, No single laboratory, institute, or consortium is able to produce the data fully capturing all the layers of the complex disease systems. Integrative analysis of multiple layers of data points from different sources is thus essential to understand disease and discover new drugs. Therefore the data must be open to the public, such that every piece of information can be easily connected
Now, we are fully aware of the importance of open data, if you want to publish your results, you have to deposit your raw data somewhere. If you want to get a grant, you need to clearly state how to share your data. Because of data sharing policy, we have seen the dramatic growth of public data. One example in GEO: a public repository for different types of molecular data. it surpass 1.8 million samples from different labs in the world.
Since many folks today are GI related, here is the screen shot o liver cancer in TCGA. Tons of clinical, mutation, mRNA, or even mutation data are available
I believe public data is not only revolutionizing our understanding of disease and changing how we are doing science. . In my lab, we are interested in leveraging these data to find new use or better use of existing drugs.
in our lab, we are very interested in developing computational pipelines to engineer this process. From patient tissue collection, hypothesis generation, validation in vitro and in vivo and back to the patients. There are many small components: such as the selection of the appropriate preclinical models. We think each step in this process can be driven by big data. We believe big data from the public, plays a critical role in driving this process.
We start to ask the fundamental question: what is a disease? Conventionally we defined disease by symptoms and signs. it’s now possible to characterize disease and understand disease at different molecular levels such genome, epigenomc, metabolome. Disease classification is not only based on symptoms, but also based on molecular features. For example, you may often heard ER+ positive patients and EGFR mutated lung cancer patients, which we all never head about one decade ago
Next so how does each drug act? Now we can molecularly characterize disease systems upon the treatment of drugs, that allows us to understand drug action at different molecular levels. Of course, we cannot test individual drugs on patients, but we can do that in clinically relevant systems such as cell lines or even genetically modified mouse models.
drug discovery in my opinion is just to match disease and drug based on their molecular profiles. If the patients have high glucose level, we find a drug to reverse the glucose to the normal stage.. If the patients have HER2 over expressed, we find one drug to reverse it expression to the normal stage. However, in most cases, what we get is a list of dysregulated genes, instead of one single feature. So our strategy is to reverse the disease molecular features in a systematic manner. We are now using transcriptome data but will be expanding to other features.
. In current discovery, the most common approach is to target one single molecular component selected from these features. We hope to find a magic bullet, but . Cancers and many other diseases are complicated, it’s hard to find one single component to elucidate the cause of this disease. It is the consequence of defects of a large network. Looking back existing drugs retrospectively, most of drugs have more than one target, which indicates targeting one component may not be sufficient to disrupt the COMPLEX DISEASE DISTRATION.
We create a disease gene expression signature by comparing disease samples and healthy samples, for example tumors vs non-tumors in cancers. Meanwhile, we treat cancer cells with all FDA drugs or drugs in the clinic. That allows us to build gene expression signatures for individual drugs. Then we match the drug signatures to the disease signature. If the drug could reverse disease gene expression, we think the drug may have therapeutic effect. Here, we are not going to look at drug effect one single gene, but the effect again a spectrum of genes.
We took liver cancer as one example
Why are we interested in HCC? HCC is a global health problem. Every year, over 800,000 HCC patients are diagnosed worldwide. The majority of them are from Asia and Africa counties. Around 20,000 patients are from the USA. HCC is the second leading cause of cancer death in the world.
http://www.nejm.org/doi/full/10.1056/NEJMra1001683
First, we crea
And validated these signatures using external datasets. As you can see these signatures can clear differenate active and tumor-none tumors as well
1000 samples from
Given that we also had profiles for fda approved drugs and many other compounds of biological interest, and it is also cheap to profile the drugs you are interested, you may ask can we use this approach to virtually screen drugs?
Here I showed that the drugs that reverse gene expression of hepatocellular carcinoma or HCC, which is one common liver cancer. The first column shows disease gene expression. Red represented up regulated, green represented down regulated. and the remaining columns show drug gene expression…ideally we want to the red color goes down and green color goes up in the drug columns. We found a few antihelminthics are doing a pretty good job.. These drugs are going to kill worms, we think they probably can kill cancer cells. So tested a few of them
This is one drug called NEN, the salt form of niclosamde, the drug we predicted. This drug can inhabited tumor growth significantly in Patient derived mouse models
After six weeks treatment, we also harvested and profiled the tumors. and find that disease gene expression is reversed after six months treatment.
Then we create expression signature of individual patients, we found that the expression 85% of patietns can be reversed by NEN. We are preparing clinical trials in USA and China right now. Hopefully, we really be able to translate our discovery into the clinic.
We and many other labs have used the similar idea to find new therapeutics in different diseases. This motivated us to think well, it should be by random chance. Then we quantify the reversal potency to reverse disease gene expression of each drugs and correlate them to drug efficacy. This figure shows their correlation in liver cancer, the drug efficacy is a measure of how much doses needed to kill half of the cancer cells. The strong correlations shows that the drug is likely to reverse disease gene expression, it is more likely to be effective in cancer cells.
We also applied this method to discover drugs for Ewing Sarcoma. It turned out over half of them are validated successfully in vitro. This hit rate is incredible high.
50%
Single digit
, moving beyond, we wondered which genes are reversed.. We know many of us still want to study individual genes. Probably the gene reversed can help us understand the disease mechanism. Or even find therapeutic target. We identified one of them called HMMR is suppressed by niclosamides, and many other drugs which are effective in HCC. It means if the drug is about to inhibit tumor growth, it must suppress HMMR as well. We hypothesize that this gene may be critical to disease progression.
We showed that it is specifically expressed in HCC.
It turned out that this expression is expressed in HCC and knock –down of HMMR inhibited cell invasion in HCC cells., suggesting its potential to be a therapeutic targets
To do so, we need to think of it systematically. We should develope computational pipelines to engineer this process. From patient tissue collection, hypothesis generation, validation in vitro and in vivo and back to the patients. This reminds me this figure, a kind of homework in my colleage, where I was actually trained as a chemical eningeer, focusing on the design of a factory. Although I pretty much forgot all the mathmatical equation, I will never forgot that every step in our design is backed up by a mathematical model, that ensure the yield at the end is exactly the same as we expect. I think this is also true in drug discovery as well. There are many small components. The success of finding a precise treatment is dependent on the success of understanding the small components. We think each step in this process can be driven by big data and rigorous data models.
I’d like to show a few of small pieces of work we have published.
First, dose matters. One gaol of precision medicine is to treat patient with the right dose, right? If we dose very high, all the drugs would show potency to disease gene expression, but they would end up high toxicity to other normal cells as well. Any two structural similar drugs, if you treat with high dose, they are very likely to share similar expression profiles regardless of the cell lines. This indicates in order to measure the reversal potency, we need to be careful about its dose. And other biological conditions, such as treatment, cell line
Models matters. In liver cancer, we found half of cell lines are not highly correlated to tumors. Suggests we may have trouble using the cell lines to validate the hypothesis which are derived from the tumors that are not similar to these cell lines.
The reason why I mentioned this is that we proposed the combination by our computational model, which work pretty in preclinical mdoels, but physicians have their own judgement. I like this I like that. I don’t like this because it has severe side effects to my patients. So well, why don’t we systematically characterize current clinical trials. See how drugs are combined now. My summery study Menghua developed method to extract a combinaiton trial in oncology and made the first effort to understand the trials
Last but not least is the infrastructure. Also it’s a headache in our daily life. Datasets are scattered in different places with different structures, with ontologies. One same drug sorafenib, can be called Nexavar, nexavaar, nexavar . How can we integrate them so that we know they are refering the same drug? Many peope joking that 80% of time data scientists spends is clieaning up the data. We have seen many informatics challenges, but we are more embracing the opportuties we have” that’s to translate these points to eventually benefit patients.