MT115 Precision Medicine: Integrating genomics to enable better patient outcomesDell EMC World
"The emergence of genomics and real-time screening is helping to transform the practice of medicine as we know it today. New technologies present improved ways to tackle health issues and what was once thought to be “untouchable” due to cost, timing or resources, is now achievable through genetic screenings and genome sequencing.
During this session, we will explore:
1. The benefits of incorporating a genomics strategy early in lifeline
2. The Precision Medicine Initiative – how does this help? Does this encourage more people to get genetic screenings?
3. What’s involved in a genetic screening
"
From Bits to Bedside: Translating Big Data into Precision Medicine and Digita...Dexter Hadley
Lecture Objectives:
1) To use examples from my research to define and introduce the ideals of precision medicine and digital health. 2) To introduce how large scale population-wide analysis of data can be used to facilitate these two ideals. 3) To introduce how freely available open data can be used to facilitate these two ideals. 4) To show how mobile technology can be used to facilitate these two ideals.
MT115 Precision Medicine: Integrating genomics to enable better patient outcomesDell EMC World
"The emergence of genomics and real-time screening is helping to transform the practice of medicine as we know it today. New technologies present improved ways to tackle health issues and what was once thought to be “untouchable” due to cost, timing or resources, is now achievable through genetic screenings and genome sequencing.
During this session, we will explore:
1. The benefits of incorporating a genomics strategy early in lifeline
2. The Precision Medicine Initiative – how does this help? Does this encourage more people to get genetic screenings?
3. What’s involved in a genetic screening
"
From Bits to Bedside: Translating Big Data into Precision Medicine and Digita...Dexter Hadley
Lecture Objectives:
1) To use examples from my research to define and introduce the ideals of precision medicine and digital health. 2) To introduce how large scale population-wide analysis of data can be used to facilitate these two ideals. 3) To introduce how freely available open data can be used to facilitate these two ideals. 4) To show how mobile technology can be used to facilitate these two ideals.
FDA NGS and Big Data Conference September 2014Warren Kibbe
Presentation for the FDA NGS and Big Data Conference September 2014 held on the NIH campus. NCI initiatives, including Cancer Genomics Data Commons, NCI Cloud Pilots, big data issues for cancer
UCSF Informatics Day 2014 - Keith R. Yamamoto, "Precision Medicine"CTSI at UCSF
Keith R. Yamamoto, PhD — Opening Remarks – Precision Medicine
Vice Chancellor for Research
Executive Vice Dean of the School of Medicine
Professor of Cellular and Molecular Pharmacology
UCSF
The reality of moving towards precision medicineElia Stupka
How do we move towards precision medicine? How can we deliver on the big data in health promise? Who will be the enablers and players? Pharma, Big Tech, or newcomers?
Neil Bennett: Introduction to Action Duchenne & Building a Patient RegistryJoe Ball
A presentation by Neil Bennett regarding Action Duchenne and building a patient registry, delivered at Sano Genetics' Demystifying Genomics for Patient Registries Event on 11th July 2019.
Jillian Hastings Ward: Genomics England Towards 5 Million Genomes in the UKJoe Ball
A presentation by Jillian Hastings Ward regarding Genomics England's progress on sequencing 5 million genomes in the UK, delivered at Sano Genetics' Demystifying Genomics for Patient Registries Event on 11th July 2019.
An overview of the oncology clinical trials network (CTNeT) which is being implemented throughout Texas.
The non-profit network is a first of its kind and combines the innovative science of Texas cancer centers with the expertise and resources of both academic and community oncologists throughout the state.
To learn more, visit www.ctnet.org
Leveraging Text Classification Strategies for Clinical and Public Health Appl...Karin Verspoor
Human-generated text is a critical component of recorded clinical data, yet remains an under-utilised resource in clinical informatics applications due to minimal standards for sharing of unstructured data as well as concerns about patient privacy. Where we can access and analyse clinical text, we find that it provides a hugely valuable resource. In this talk, I will describe two projects where we have used text classification as the basis for addressing a clinical objective: (1) a syndromic surveillance project where the task is the monitoring of health and social media data sources for changes that indicate the onset of disease outbreaks, and (2) the analysis of hospital records to enable retrieval of specific disease cases, for monitoring of the hospital case mix as well as for construction of patient cohorts for clinical research studies. I will end by briefly discussing the huge potential for clinical text analysis to support changing the way modern medicine is practised.
FDA NGS and Big Data Conference September 2014Warren Kibbe
Presentation for the FDA NGS and Big Data Conference September 2014 held on the NIH campus. NCI initiatives, including Cancer Genomics Data Commons, NCI Cloud Pilots, big data issues for cancer
UCSF Informatics Day 2014 - Keith R. Yamamoto, "Precision Medicine"CTSI at UCSF
Keith R. Yamamoto, PhD — Opening Remarks – Precision Medicine
Vice Chancellor for Research
Executive Vice Dean of the School of Medicine
Professor of Cellular and Molecular Pharmacology
UCSF
The reality of moving towards precision medicineElia Stupka
How do we move towards precision medicine? How can we deliver on the big data in health promise? Who will be the enablers and players? Pharma, Big Tech, or newcomers?
Neil Bennett: Introduction to Action Duchenne & Building a Patient RegistryJoe Ball
A presentation by Neil Bennett regarding Action Duchenne and building a patient registry, delivered at Sano Genetics' Demystifying Genomics for Patient Registries Event on 11th July 2019.
Jillian Hastings Ward: Genomics England Towards 5 Million Genomes in the UKJoe Ball
A presentation by Jillian Hastings Ward regarding Genomics England's progress on sequencing 5 million genomes in the UK, delivered at Sano Genetics' Demystifying Genomics for Patient Registries Event on 11th July 2019.
An overview of the oncology clinical trials network (CTNeT) which is being implemented throughout Texas.
The non-profit network is a first of its kind and combines the innovative science of Texas cancer centers with the expertise and resources of both academic and community oncologists throughout the state.
To learn more, visit www.ctnet.org
Leveraging Text Classification Strategies for Clinical and Public Health Appl...Karin Verspoor
Human-generated text is a critical component of recorded clinical data, yet remains an under-utilised resource in clinical informatics applications due to minimal standards for sharing of unstructured data as well as concerns about patient privacy. Where we can access and analyse clinical text, we find that it provides a hugely valuable resource. In this talk, I will describe two projects where we have used text classification as the basis for addressing a clinical objective: (1) a syndromic surveillance project where the task is the monitoring of health and social media data sources for changes that indicate the onset of disease outbreaks, and (2) the analysis of hospital records to enable retrieval of specific disease cases, for monitoring of the hospital case mix as well as for construction of patient cohorts for clinical research studies. I will end by briefly discussing the huge potential for clinical text analysis to support changing the way modern medicine is practised.
Data sharing drivers in precision oncology, biomedical research, and healthcare. Accelerating discovery, innovation, providing credit for all stakeholders - patients, researchers, care providers, payers.
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingWarren Kibbe
Big data in oncology and implications for open data, open science, rapid innovation, data reuse, reproducibility and data sharing. Cancer Moonshot, Precisions Medicine Initiative (PMI), the Genomic Data Commons, NCI Cloud Pilots, NCI-DOE Pilots, and the Cancer Research Data Ecosystem.
Using real-world evidence to investigate clinical research questionsKarin Verspoor
Adoption of electronic health records to document extensive clinical information brings with it the opportunity to utilise that information to support clinical research, and ultimately to support clinical decision making. In this talk, I discuss both these opportunities and the challenges that we face when working with real-world clinical data, and introduce some of the strategies that we are adopting to make this data more usable, and to extract more value from it. I specifically discuss the use of natural language processing to transform clinical documentation into structured data for this purpose.
National Cancer Data Ecosystem and Data SharingWarren Kibbe
Grand Rounds at the Siteman Cancer Center at Washington University. Highlighting the Genomic Data Commons and the National Cancer Data Ecosystem defined by the Cancer Moonshot Blue Ribbon Panel
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
Checking in on Healthcare Data AnalyticsCybera Inc.
Data science and the use of big data in healthcare delivery could revolutionize the field by decreasing costs and vastly improving efficiency and outcomes. There is an abundance of healthcare data in Canada, but it is mostly siloed and difficult to access due to privacy and security challenges.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Overview on Edible Vaccine: Pros & Cons with Mechanism
PMED: APPM Workshop: Data & Analytics in Precision Oncology- Warren Kibbe, March 14, 2019
1. Data and Analytics in Precision
Oncology
Warren A. Kibbe, Ph.D.
Professor, Biostatistics & Bioinformatics
Chief Data Officer, Duke Cancer Institute
warren.kibbe@duke.edu
@wakibbe
#PredictiveModeling
#ComputationalPhenomics
#PrecisionOncology
3. Fundamental Changes
• Data generation is not the bottleneck
• Most data are now ‘digital first’
• Old statistical models assuming variable
independence are inadequate – systems
and pathways are not independent!
• Project management is critical in scaling
population science
Well-defined experiments are still key
4. Changes in Oncology
• Understanding Cancer Biology
• Anatomic vs molecular classification
• Health vs Disease
10. Big Data Scientist Training Enhancement
Program (BD-STEP)
Graduates of BD-STEP would:
• have skillsets to perform next-generation patient-
centered outcomes research by manipulating and
analyzing large-scale, multi-element, patient data
sets to develop novel disease signatures or unique
performance-based clinical benchmarks
• have an understanding of real-time, performance-
driven health care delivery in the VA systems
Frank Meng, VA Michelle Berny-Lang, NCI
11. Mining the VA Corporate Data
Warehouse
• From 130 clinical sites covering
about 9 current million veterans, 16
million since VistA was put in place in
1990
Work performed by David Winski, PhD
https://www.hsrd.research.va.gov/for_researchers/cyber_seminars/archives/2376-notes.pdf
12. Understanding NSCLC
• What is the impact of new
immunotherapies on the outcomes of
NSCLC patients in the VA?
• Does Mutational Tumor Burden
impact effectiveness?
• Is PD-L1 expression predictive of
response to immunotherapies
13. Mining the VA Corporate Data
Warehouse
Transforming the National Department of
Veterans Affairs Data Warehouse to the OMOP
Common Data Model
Fern FitzHenry ;
Jesse Brannen ;
Jason Denton ;
Jonathan R. Nebeker ;
Scott L. Duvall ;
Freneka F. Minter ;
Jeffrey Scehnet ;
Brian Sauer;
Lucila Ohno-Machado ;
Michael E. Matheny
14. Cancer Registry Tables (“Raw Onc Tables”)
- Set of two T-SQL tables comprised of a “Patient” table and
a “Cancer” table
- When a VA patient is diagnosed with cancer, cancer
registrars will enter a patient record in the Patient table and
a cancer record in the the Cancer table
- Tables structured along North American Association of
Central Cancer Registry (NAACCR) guidelines
- Patient table contains >100 fields containing patient
identifiers, patient demographic data and patient military
service data
- Cancer table contains >500 fields including date of
diagnosis, diagnosis codes, tumor location, tumor histology
and diagnosis-related medications/procedures
Work performed by David Winski, PhD
15. Identify patients receiving
immunotherapy
Work performed by David Winski, PhD
Transforming the National Department of
Veterans Affairs Data Warehouse to the OMOP
Common Data Model
Fern FitzHenry ;
Jesse Brannen ;
Jason Denton ;
Jonathan R. Nebeker ;
Scott L. Duvall ;
Freneka F. Minter ;
Jeffrey Scehnet ;
Brian Sauer;
Lucila Ohno-Machado ;
Michael E. Matheny
16. Building a Tumor-Sequenced Non-Small Cell
Lung Cancer (NSCLC) Cohort
1.Begin with all patients in Precision Oncology
Program (i.e. tumor profiled by NGS) with
associated NSCLC diagnosis (n=2057)
2.Filter to subset of these patients who received
chemo or immuno drugs through VA (n=1457)
3.Filter to those patients whose first date of
immunotherapy treatment was prior to April 2018
to allow enough time for survival analysis
(n=383)
4.Filter to those patients who had NSCLC diagnosis
corroborated in the Cancer Registry (n=330)
17. Lag in Cancer Registry Records
Work performed by David Winski, PhD
18. Lag in Cancer Registry is a Reporting Lag
Work performed by David Winski, PhD
Number of visits vs cancer diagnosis in the ‘Raw Onc’ tables
20. Immunotherapy Drugs of Interest
- Four drugs of interest: Pembrolizumab,
Nivolumab, Atezolizumab and Durvalumab
# of Orders at VA
21. NSCLC POP Dx With Tumor
Profiled by NGS: 2057
patients
NSCLC Verified in
Cancer Registry:
330
Immuno Prior to
April 2018: 383
Chemo/Immuno
Drug Orders at VA:
1472
22. PD-L1 expression and Nivolumab in NSLC
• We also examined PD-L1 testing and
the impact of high expressing tumors
on outcomes
– Inconclusive because many patients
were treated as second line therapy,
where PD-L1 testing is optional.
23. • Retrospective mining still requires
good questions and adequate power
• Even given the size of the VA, the
ability to build a well powered cohort
with good data is difficult