This document provides an overview of using various R libraries and SQL queries to connect to and extract data from an OMOP Common Data Model database for the purposes of data visualization and analysis. It discusses connecting to the database using RODBC, querying table names, column names and other metadata, and exploring the data types and primary keys of different tables. It also covers refining the research question, defining the unit of observation, calculating prevalence, and exploring ICD codes for major depressive disorder.
SQL is a database computer language designed for the retrieval and management of data in relational database. SQL stands for Structured Query Language.
SQL Developer provides powerful editors for working with SQL, PL/SQL, Stored Java Procedures, and XML.
Run queries, generate execution plans, export data to the desired format (XML, Excel, HTML, PDF, etc.), execute, debug, test, and document your database programs, and much more with SQL Developer.
SQL Developer provides powerful editors for working with SQL, PL/SQL, Stored Java Procedures, and XML.
SQL is a database computer language designed for the retrieval and management of data in relational database. SQL stands for Structured Query Language.
SQL Developer provides powerful editors for working with SQL, PL/SQL, Stored Java Procedures, and XML.
Run queries, generate execution plans, export data to the desired format (XML, Excel, HTML, PDF, etc.), execute, debug, test, and document your database programs, and much more with SQL Developer.
SQL Developer provides powerful editors for working with SQL, PL/SQL, Stored Java Procedures, and XML.
UKOUG Tech14 - Getting Started With JSON in the DatabaseMarco Gralike
Presentation used during the UKOUG Tech14 conference in Liverpool (UK) discussing possibilities of the use of, and explaining, the new JSON database functionality in the Oracle 12.1.0.2 database
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
Presentation used during the UKOUG Tech14 conference in Liverpool (UK) discussing possibilities of the use of the 12.1.0.2 In-Memory Column Store option with XMLType data type storage options common to the Oracle 12.1 database
Optimizing Data Extracts from Oracle Clinical SAS Viewsrajopadhye
This presentation delivered at OCUG Annual Conference 2009 at New Orleans is on a new way developed to optimize SAS dataset extracts in Batch Data Load (BDL) format from Oracle Clinical SAS views.
Tips and Techniques for Improving the Performance of Validation Procedures in...Perficient, Inc.
Ensuring the validity of patient data in your clinical data management and EDC system is essential. However, without a way to programmatically identify discrepancies and inconsistencies, such a task can inadvertently leave bad data in your system. Through validation procedures, an endless assortment of expressions and formulas, Oracle Clinical offers the powerful ability to clean and compare patient data.
In this slideshare, Perficient's Dr. Steve Rifkin, a leading expert in Oracle Clinical, demonstrates the structure of validation procedures, as well as provides various tips and techniques for developing procedures that improve the performance of edit checks.
These are the slides from my presentation on Running R in the Database using Oracle R Enterprise. The second half of the presentation is a live demo of using the Oracle R Enterprise. Unfortunately the demo is not listed in these slides
UKOUG Tech14 - Getting Started With JSON in the DatabaseMarco Gralike
Presentation used during the UKOUG Tech14 conference in Liverpool (UK) discussing possibilities of the use of, and explaining, the new JSON database functionality in the Oracle 12.1.0.2 database
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
Presentation used during the UKOUG Tech14 conference in Liverpool (UK) discussing possibilities of the use of the 12.1.0.2 In-Memory Column Store option with XMLType data type storage options common to the Oracle 12.1 database
Optimizing Data Extracts from Oracle Clinical SAS Viewsrajopadhye
This presentation delivered at OCUG Annual Conference 2009 at New Orleans is on a new way developed to optimize SAS dataset extracts in Batch Data Load (BDL) format from Oracle Clinical SAS views.
Tips and Techniques for Improving the Performance of Validation Procedures in...Perficient, Inc.
Ensuring the validity of patient data in your clinical data management and EDC system is essential. However, without a way to programmatically identify discrepancies and inconsistencies, such a task can inadvertently leave bad data in your system. Through validation procedures, an endless assortment of expressions and formulas, Oracle Clinical offers the powerful ability to clean and compare patient data.
In this slideshare, Perficient's Dr. Steve Rifkin, a leading expert in Oracle Clinical, demonstrates the structure of validation procedures, as well as provides various tips and techniques for developing procedures that improve the performance of edit checks.
These are the slides from my presentation on Running R in the Database using Oracle R Enterprise. The second half of the presentation is a live demo of using the Oracle R Enterprise. Unfortunately the demo is not listed in these slides
SQL for Web APIs - Simplifying Data Access for API ConsumersJerod Johnson
From Nordic APIs Platform Summit 2019 - Stockholm, Sweden
As the data world evolves, businesses are moving more of their data out of databases and into SaaS applications. Despite the migration, SQL remains a ubiquitous language for data access, so much so that many SaaS applications and non-relational cloud data stores support SQL endpoints in their APIs. While these endpoints allow users to leverage SQL queries to easily request data, there are still costly challenges to overcome when it comes to processing and managing the returned data.
In this presentation, we'll showcase popular APIs that offer SQL endpoints, explore the benefits of providing customers SQL access, and cover how standards-based drivers enable SaaS integration and self-service data access through SQL.
=-=-=-==-=-Overview of the Talk-=-=-=-=-=
Introduction to the Subject
Database
Rational Database
Object Rational Database
Database Management System
History
Programming
SQL,
Connecting Java, Matlab to a Database
Advance DBMS
Data Grid
BigTable
Demo
Products
MySQL, SQLite, Oracle,
DB2, Microsoft Access,
Microsoft SQL Server
Products Comparison.
Make an object be searchable is difficult? Taking control of search result's ranking is even more challenging.
In BarCamp Bangkhen 2015 (Nov 22, 2015), Suparit Krityakien, Co-founder & Software Architecture of Wongnai (http://www.wongnai.com) shared his experiences on applying Wongnai ranking algorithm to take control of our search engine's scoring systematically as well as what can be tuned to make it smarter.
This presentation is to furnish you with fundamental knowledge required when tweaking ranking on your search.
10 Rules for Safer Code [Odoo Experience 2016]Olivier Dony
In this talk, we will cover the top 10 development mistakes that lead to security issues. Olivier Dony will go through all the security issues we have had over the past 3 years and give tips on how to avoid the traps for safer Odoo code.
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Databricks
We describe design and implementation of Cognitive Database, a Spark-based relational database that demonstrates novel capabilities of AI-enabled SQL queries. A key aspect of our approach is to first view the structured data source as meaningful unstructured text, and then use the text to build an unsupervised neural network model using a Natural Language Processing (NLP) technique called word embedding. We seamlessly integrate the word embedding model into existing SQL query infrastructure and use it to enable a new class of SQL-based analytics queries called cognitive intelligence (CI) queries.
CI queries use the model vectors to enable complex queries such as semantic matching, inductive reasoning queries such as analogies/semantic clustering, predictive queries using entities not present in a database, and, more generally, using knowledge from external sources. We demonstrate unique capabilities of Cognitive Databases using an Apache Spark 2.2.0 based prototype to execute inductive reasoning CI queries over a multi-modal relational database containing text and images from the ImageNet dataset. We illustrate key aspects of the Spark-based implementation, e.g., UDF implementations of various cognitive functions using Spark SQL, Python (via Jupyter notebook) and Scala based interfaces, Distributed Spark implementation, and integration of GPU-enabled nearest neighbor kernels.
We also discuss a variety of real-world use cases from different application domains. Further details of this system can be found in the Arxiv paper: https://arxiv.org/abs/1712.07199
Access Reports for Tenderfeet (or is that tenderfoots?) Alan Manifold
Presentation to the Endeavor Users Group meeting (EndUser) in 2003 by Alan Manifold. Gives basic information for setting up use of Access for Voyager reporting.
Kief Morris - Infrastructure is terribleThoughtworks
Why is nearly every infrastructure project I've run across a big ball of mud? We're still in the early days of infrastructure as code tooling, so we're struggling with messy glue code, configuration files, and weird custom scripts and tools. What can you do on your project to cope with the current state of tooling? And what should we, as an industry, do to level up?
How to solve complex business requirements with Oracle Data Integrator?Gurcan Orhan
Business requirements are always hard to implement, develop, operate and always changeable. In this session attendees will have some fact examples of turning unstructured data into structural meaning, writing complex queries without typing anything, adding function based joins, implementing CTAS (Create Table As Select) and IAS (Insert As Select) methods and simplifying business rules, writing optimized queries to decrease operational and development costs as well as the faster loads.
In this presentation, see how you can solve some of complex business requirements with Oracle Data Integrator's flexibility and ease of usage features.
Similar to CDM SynPuf OMOP CDM library(rodbc) library(ggplot2) library(jsonlite) 180403 (20)
MH Prediction Modeling and Validation -cleanMin-hyung Kim
Overfitting
The Bias-Variance Trade-Off
Sequestered (unseen) test dataset
K-fold cross-validation (within the training dataset)
Performance measures
regression: mean squared error (MSE)
classification: sensitivity, specificity, AUROC, etc.
Visualize the overfitting and the bias-variance trade-off versus the model complexity
r for data science 2. grammar of graphics (ggplot2) clean -refMin-hyung Kim
REFERENCES
#1. RStudio Official Documentations (Help & Cheat Sheet)
Free Webpage) https://www.rstudio.com/resources/cheatsheets/
#2. Wickham, H. and Grolemund, G., 2016.R for data science: import, tidy, transform, visualize, and model data. O'Reilly.
Free Webpage) https://r4ds.had.co.nz/
Cf) Tidyverse syntax (www.tidyverse.org), rather than R Base syntax
Cf) Hadley Wickham: Chief Scientist at RStudio. Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University
r for data science 4. exploratory data analysis clean -rev -refMin-hyung Kim
REFERENCES
#1. RStudio Official Documentations (Help & Cheat Sheet)
Free Webpage) https://www.rstudio.com/resources/cheatsheets/
#2. Wickham, H. and Grolemund, G., 2016.R for data science: import, tidy, transform, visualize, and model data. O'Reilly.
Free Webpage) https://r4ds.had.co.nz/
Cf) Tidyverse syntax (www.tidyverse.org), rather than R Base syntax
Cf) Hadley Wickham: Chief Scientist at RStudio. Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University
These simplified slides by Dr. Sidra Arshad present an overview of the non-respiratory functions of the respiratory tract.
Learning objectives:
1. Enlist the non-respiratory functions of the respiratory tract
2. Briefly explain how these functions are carried out
3. Discuss the significance of dead space
4. Differentiate between minute ventilation and alveolar ventilation
5. Describe the cough and sneeze reflexes
Study Resources:
1. Chapter 39, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 34, Ganong’s Review of Medical Physiology, 26th edition
3. Chapter 17, Human Physiology by Lauralee Sherwood, 9th edition
4. Non-respiratory functions of the lungs https://academic.oup.com/bjaed/article/13/3/98/278874
Title: Sense of Smell
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the primary categories of smells and the concept of odor blindness.
Explain the structure and location of the olfactory membrane and mucosa, including the types and roles of cells involved in olfaction.
Describe the pathway and mechanisms of olfactory signal transmission from the olfactory receptors to the brain.
Illustrate the biochemical cascade triggered by odorant binding to olfactory receptors, including the role of G-proteins and second messengers in generating an action potential.
Identify different types of olfactory disorders such as anosmia, hyposmia, hyperosmia, and dysosmia, including their potential causes.
Key Topics:
Olfactory Genes:
3% of the human genome accounts for olfactory genes.
400 genes for odorant receptors.
Olfactory Membrane:
Located in the superior part of the nasal cavity.
Medially: Folds downward along the superior septum.
Laterally: Folds over the superior turbinate and upper surface of the middle turbinate.
Total surface area: 5-10 square centimeters.
Olfactory Mucosa:
Olfactory Cells: Bipolar nerve cells derived from the CNS (100 million), with 4-25 olfactory cilia per cell.
Sustentacular Cells: Produce mucus and maintain ionic and molecular environment.
Basal Cells: Replace worn-out olfactory cells with an average lifespan of 1-2 months.
Bowman’s Gland: Secretes mucus.
Stimulation of Olfactory Cells:
Odorant dissolves in mucus and attaches to receptors on olfactory cilia.
Involves a cascade effect through G-proteins and second messengers, leading to depolarization and action potential generation in the olfactory nerve.
Quality of a Good Odorant:
Small (3-20 Carbon atoms), volatile, water-soluble, and lipid-soluble.
Facilitated by odorant-binding proteins in mucus.
Membrane Potential and Action Potential:
Resting membrane potential: -55mV.
Action potential frequency in the olfactory nerve increases with odorant strength.
Adaptation Towards the Sense of Smell:
Rapid adaptation within the first second, with further slow adaptation.
Psychological adaptation greater than receptor adaptation, involving feedback inhibition from the central nervous system.
Primary Sensations of Smell:
Camphoraceous, Musky, Floral, Pepperminty, Ethereal, Pungent, Putrid.
Odor Detection Threshold:
Examples: Hydrogen sulfide (0.0005 ppm), Methyl-mercaptan (0.002 ppm).
Some toxic substances are odorless at lethal concentrations.
Characteristics of Smell:
Odor blindness for single substances due to lack of appropriate receptor protein.
Behavioral and emotional influences of smell.
Transmission of Olfactory Signals:
From olfactory cells to glomeruli in the olfactory bulb, involving lateral inhibition.
Primitive, less old, and new olfactory systems with different path
Basavarajeeyam is an important text for ayurvedic physician belonging to andhra pradehs. It is a popular compendium in various parts of our country as well as in andhra pradesh. The content of the text was presented in sanskrit and telugu language (Bilingual). One of the most famous book in ayurvedic pharmaceutics and therapeutics. This book contains 25 chapters called as prakaranas. Many rasaoushadis were explained, pioneer of dhatu druti, nadi pareeksha, mutra pareeksha etc. Belongs to the period of 15-16 century. New diseases like upadamsha, phiranga rogas are explained.
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journeygreendigital
Tom Selleck, an enduring figure in Hollywood. has captivated audiences for decades with his rugged charm, iconic moustache. and memorable roles in television and film. From his breakout role as Thomas Magnum in Magnum P.I. to his current portrayal of Frank Reagan in Blue Bloods. Selleck's career has spanned over 50 years. But beyond his professional achievements. fans have often been curious about Tom Selleck Health. especially as he has aged in the public eye.
Follow us on: Pinterest
Introduction
Many have been interested in Tom Selleck health. not only because of his enduring presence on screen but also because of the challenges. and lifestyle choices he has faced and made over the years. This article delves into the various aspects of Tom Selleck health. exploring his fitness regimen, diet, mental health. and the challenges he has encountered as he ages. We'll look at how he maintains his well-being. the health issues he has faced, and his approach to ageing .
Early Life and Career
Childhood and Athletic Beginnings
Tom Selleck was born on January 29, 1945, in Detroit, Michigan, and grew up in Sherman Oaks, California. From an early age, he was involved in sports, particularly basketball. which played a significant role in his physical development. His athletic pursuits continued into college. where he attended the University of Southern California (USC) on a basketball scholarship. This early involvement in sports laid a strong foundation for his physical health and disciplined lifestyle.
Transition to Acting
Selleck's transition from an athlete to an actor came with its physical demands. His first significant role in "Magnum P.I." required him to perform various stunts and maintain a fit appearance. This role, which he played from 1980 to 1988. necessitated a rigorous fitness routine to meet the show's demands. setting the stage for his long-term commitment to health and wellness.
Fitness Regimen
Workout Routine
Tom Selleck health and fitness regimen has evolved. adapting to his changing roles and age. During his "Magnum, P.I." days. Selleck's workouts were intense and focused on building and maintaining muscle mass. His routine included weightlifting, cardiovascular exercises. and specific training for the stunts he performed on the show.
Selleck adjusted his fitness routine as he aged to suit his body's needs. Today, his workouts focus on maintaining flexibility, strength, and cardiovascular health. He incorporates low-impact exercises such as swimming, walking, and light weightlifting. This balanced approach helps him stay fit without putting undue strain on his joints and muscles.
Importance of Flexibility and Mobility
In recent years, Selleck has emphasized the importance of flexibility and mobility in his fitness regimen. Understanding the natural decline in muscle mass and joint flexibility with age. he includes stretching and yoga in his routine. These practices help prevent injuries, improve posture, and maintain mobilit
263778731218 Abortion Clinic /Pills In Harare ,sisternakatoto
263778731218 Abortion Clinic /Pills In Harare ,ABORTION WOMEN’S CLINIC +27730423979 IN women clinic we believe that every woman should be able to make choices in her pregnancy. Our job is to provide compassionate care, safety,affordable and confidential services. That’s why we have won the trust from all generations of women all over the world. we use non surgical method(Abortion pills) to terminate…Dr.LISA +27730423979women Clinic is committed to providing the highest quality of obstetrical and gynecological care to women of all ages. Our dedicated staff aim to treat each patient and her health concerns with compassion and respect.Our dedicated group ABORTION WOMEN’S CLINIC +27730423979 IN women clinic we believe that every woman should be able to make choices in her pregnancy. Our job is to provide compassionate care, safety,affordable and confidential services. That’s why we have won the trust from all generations of women all over the world. we use non surgical method(Abortion pills) to terminate…Dr.LISA +27730423979women Clinic is committed to providing the highest quality of obstetrical and gynecological care to women of all ages. Our dedicated staff aim to treat each patient and her health concerns with compassion and respect.Our dedicated group of receptionists, nurses, and physicians have worked together as a teamof receptionists, nurses, and physicians have worked together as a team wwww.lisywomensclinic.co.za/
Rasamanikya is a excellent preparation in the field of Rasashastra, it is used in various Kushtha Roga, Shwasa, Vicharchika, Bhagandara, Vatarakta, and Phiranga Roga. In this article Preparation& Comparative analytical profile for both Formulationon i.e Rasamanikya prepared by Kushmanda swarasa & Churnodhaka Shodita Haratala. The study aims to provide insights into the comparative efficacy and analytical aspects of these formulations for enhanced therapeutic outcomes.
Title: Sense of Taste
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the structure and function of taste buds.
Describe the relationship between the taste threshold and taste index of common substances.
Explain the chemical basis and signal transduction of taste perception for each type of primary taste sensation.
Recognize different abnormalities of taste perception and their causes.
Key Topics:
Significance of Taste Sensation:
Differentiation between pleasant and harmful food
Influence on behavior
Selection of food based on metabolic needs
Receptors of Taste:
Taste buds on the tongue
Influence of sense of smell, texture of food, and pain stimulation (e.g., by pepper)
Primary and Secondary Taste Sensations:
Primary taste sensations: Sweet, Sour, Salty, Bitter, Umami
Chemical basis and signal transduction mechanisms for each taste
Taste Threshold and Index:
Taste threshold values for Sweet (sucrose), Salty (NaCl), Sour (HCl), and Bitter (Quinine)
Taste index relationship: Inversely proportional to taste threshold
Taste Blindness:
Inability to taste certain substances, particularly thiourea compounds
Example: Phenylthiocarbamide
Structure and Function of Taste Buds:
Composition: Epithelial cells, Sustentacular/Supporting cells, Taste cells, Basal cells
Features: Taste pores, Taste hairs/microvilli, and Taste nerve fibers
Location of Taste Buds:
Found in papillae of the tongue (Fungiform, Circumvallate, Foliate)
Also present on the palate, tonsillar pillars, epiglottis, and proximal esophagus
Mechanism of Taste Stimulation:
Interaction of taste substances with receptors on microvilli
Signal transduction pathways for Umami, Sweet, Bitter, Sour, and Salty tastes
Taste Sensitivity and Adaptation:
Decrease in sensitivity with age
Rapid adaptation of taste sensation
Role of Saliva in Taste:
Dissolution of tastants to reach receptors
Washing away the stimulus
Taste Preferences and Aversions:
Mechanisms behind taste preference and aversion
Influence of receptors and neural pathways
Impact of Sensory Nerve Damage:
Degeneration of taste buds if the sensory nerve fiber is cut
Abnormalities of Taste Detection:
Conditions: Ageusia, Hypogeusia, Dysgeusia (parageusia)
Causes: Nerve damage, neurological disorders, infections, poor oral hygiene, adverse drug effects, deficiencies, aging, tobacco use, altered neurotransmitter levels
Neurotransmitters and Taste Threshold:
Effects of serotonin (5-HT) and norepinephrine (NE) on taste sensitivity
Supertasters:
25% of the population with heightened sensitivity to taste, especially bitterness
Increased number of fungiform papillae
These lecture slides, by Dr Sidra Arshad, offer a quick overview of the physiological basis of a normal electrocardiogram.
Learning objectives:
1. Define an electrocardiogram (ECG) and electrocardiography
2. Describe how dipoles generated by the heart produce the waveforms of the ECG
3. Describe the components of a normal electrocardiogram of a typical bipolar lead (limb II)
4. Differentiate between intervals and segments
5. Enlist some common indications for obtaining an ECG
6. Describe the flow of current around the heart during the cardiac cycle
7. Discuss the placement and polarity of the leads of electrocardiograph
8. Describe the normal electrocardiograms recorded from the limb leads and explain the physiological basis of the different records that are obtained
9. Define mean electrical vector (axis) of the heart and give the normal range
10. Define the mean QRS vector
11. Describe the axes of leads (hexagonal reference system)
12. Comprehend the vectorial analysis of the normal ECG
13. Determine the mean electrical axis of the ventricular QRS and appreciate the mean axis deviation
14. Explain the concepts of current of injury, J point, and their significance
Study Resources:
1. Chapter 11, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 9, Human Physiology - From Cells to Systems, Lauralee Sherwood, 9th edition
3. Chapter 29, Ganong’s Review of Medical Physiology, 26th edition
4. Electrocardiogram, StatPearls - https://www.ncbi.nlm.nih.gov/books/NBK549803/
5. ECG in Medical Practice by ABM Abdullah, 4th edition
6. Chapter 3, Cardiology Explained, https://www.ncbi.nlm.nih.gov/books/NBK2214/
7. ECG Basics, http://www.nataliescasebook.com/tag/e-c-g-basics
Knee anatomy and clinical tests 2024.pdfvimalpl1234
This includes all relevant anatomy and clinical tests compiled from standard textbooks, Campbell,netter etc..It is comprehensive and best suited for orthopaedicians and orthopaedic residents.
micro teaching on communication m.sc nursing.pdfAnurag Sharma
Microteaching is a unique model of practice teaching. It is a viable instrument for the. desired change in the teaching behavior or the behavior potential which, in specified types of real. classroom situations, tends to facilitate the achievement of specified types of objectives.
Here is the updated list of Top Best Ayurvedic medicine for Gas and Indigestion and those are Gas-O-Go Syp for Dyspepsia | Lavizyme Syrup for Acidity | Yumzyme Hepatoprotective Capsules etc
ABDOMINAL TRAUMA in pediatrics part one.drhasanrajab
Abdominal trauma in pediatrics refers to injuries or damage to the abdominal organs in children. It can occur due to various causes such as falls, motor vehicle accidents, sports-related injuries, and physical abuse. Children are more vulnerable to abdominal trauma due to their unique anatomical and physiological characteristics. Signs and symptoms include abdominal pain, tenderness, distension, vomiting, and signs of shock. Diagnosis involves physical examination, imaging studies, and laboratory tests. Management depends on the severity and may involve conservative treatment or surgical intervention. Prevention is crucial in reducing the incidence of abdominal trauma in children.
1. Big Data Management (HINF5018.03)
A Lecture for Data Visualization
using Open Database Connectivity
- library(RODBC) & library(jsonlite) & library(ggplot2)
Min-hyung Kim, M.D., M.S.
Fellow in Healthcare Policy and Research
Division of Health Informatics
Joan & Sanford I. Weill Medical College of Cornell University
April 3, 2018
2. [CONTENTS]
• library(RODBC)
• Establishing a connection to a database
• Query table names, column (variable) names, data types, primary keys
• Refine the research question
• Unit of observation (!)
• e.g.) period prevalence (proportion) by gender
• library(ggplot2)
• Grammar of graphics
• Syntax of library(ggplot2)
• library(jsonlite)
3. Acknowledgements
• The CMS 2008-2010 Data Entrepreneurs' Synthetic Public Use File (DE-SynPUF) is provided by
U.S. Centers for Medicare & Medicaid Services.
• https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-
Files/SynPUFs/DE_Syn_PUF.html
• The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) Version
5 is provided by Observational Health Data Sciences and Informatics (OHDSI) community.
• https://github.com/OHDSI/CommonDataModel
• The 1000 person data set from the CMS 2008-2010 Data Entrepreneurs' Synthetic Public Use
File (DE-SynPUF) converted it to the OMOP Common Data Model Version 5 format was
provided by LTS Computing LLC.
• http://www.ltscomputingllc.com/downloads/
• The 1000 person CMS_SynPUF_CDMv5 was loaded into MS SQL server in the Weill Cornell
Architecture for Research Computing in Health (ARCH) by Evan Scholle, in the Information
Technologies & Services, Weill Cornell Medicine.
• The 1000 person CMS_SynPUF_CDMv5 transformed into JSON format was provided by Dr. Yiye
Zhang in the Division of Health Informatics, Department of Healthcare Policy and Research,
Weill Medical College of Cornell University.
4. R markdown notebook available
@ Github.com/mkim0710
• R markdown notebook
• https://github.com/mkim0710/tidystat/blob/master/Rmd/MK%20Lecture)%20OMOP_SYN
THETIC%20library(RODBC)%20library(ggplot2)%20library(jsonlite)%20180403.Rmd
• Shortened URL) goo.gl/ssa1d2
• Printed html file
• https://rawgit.com/mkim0710/tidystat/master/Rmd/MK%20Lecture)%20OMOP_SYNTHETI
C%20library(RODBC)%20library(ggplot2)%20library(jsonlite)%20180403.nb.html
• Shortened URL) goo.gl/dnKf6z
• Slide file
• https://rawgit.com/mkim0710/tidystat/master/Rmd/MK%20Lecture)%20OMOP_SYNTHETI
C%20library(RODBC)%20library(ggplot2)%20library(jsonlite)%20180403.pdf
• Shortened URL) goo.gl/7wZYmo
• Please "Fork" and send a "Pull request" if you have any suggestion on how to
improve the codes!
6. # An appropriate ODBC driver should be installed in your computer first!
for (packagename in c("tidyverse", "RODBC", "maps", "jsonlite")) {
if(!require(packagename, character.only = T)) install.packages(packagename)
}
detach("package:maps", unload=TRUE) # package:maps conflicts with some other
packages, so load only when you need it, detach when you are done with it.
# You should be within WCMC network to connect to VITS-ARCHSQLP04 server.
# If you are not not in WCMC network, you may have to set-up the MS SQL server,
using the 1000 person data set in CDMv5 format provided by LTS Computing LLC.
(If you set up some other database software, you may have to change the SQL
codes because the syntax may be slightly different.)
# Or, you may just skip the library(RODBC) part and go to the library(jsonlite)
part.
Necessary R libraries
8. Why use ODBC connection?
• Your local computer hardware cannot load the entire data you would like to
analyze.
• Exporting the data into hard drive in csv format and then re-loading into
statistical software may be time-consuming and error-prone process.
• The computing power of the database server will be higher than that of your
local computer.
• The database server may employ efficient indexing algorithms that outperforms
your statistical software for certain queries.
9. # An appropriate ODBC driver should be installed in your
computer first!
library(RODBC)
channel = odbcDriverConnect("Driver= {SQL Server};
server=VITS-ARCHSQLP04; database=OMOP_SYNTHETIC")
Establishing a connection to a database
17. Refine the research question
Unit of observation (!)
• E.g.) Which gender has more healthcare visits with depression diagnosis in our
dataset?
18. Refine the research question
Unit of observation (!)
• E.g.) Which gender has more healthcare visits with depression diagnosis in our
dataset?
• Among the individuals (people) who had one or more healthcare visit(s) with depression
diagnosis in our dataset, which gender is more prevalent?
• unit of observation = person
• population for inference = population of people
• (person-level statistics, individual-level statistics)
• Among the healthcare visits with depression diagnosis in our dataset, which gender is more
prevalent compose the larger proportion of the visits?
• unit of observation = encounter
• population for inference = population of encounters?
• (encounter-level statistics? record-level statistics?)
19. Prevalence
• Point Prevalence
• Point prevalence is the proportion of a population that has the condition at a specific point
in time.
• Period Prevalence
• Period prevalence is the proportion of a population that has the condition at some time
during a given period (e.g., 12 month prevalence), and includes people who already have
the condition at the start of the study period as well as those who acquire it during that
period.
• Lifetime Prevalence
• Lifetime prevalence is the proportion of a population that at some point in their life (up to
the time of assessment) have experienced the condition.
20. International Classification of Diseases (ICD)
• Disease (& related health problem) classification system maintained by the World
Health Organization (WHO), the directing and coordinating authority for health
within the United Nations (UN).
• The Ninth Revision of the International Statistical Classification of Diseases,
Injuries, and Causes of Death (ICD-9): since 1978
• ~ 17,000 codes
• The Tenth revision of the International Statistical Classification of Diseases and
Related Health Problems (ICD-10): since 1992
• ~ 160,000 codes
• In the U.S., ICD-10 codes are effective 10/2015 (Prior to that, ICD-9 was still used...)
• ICD-11 in development: initially planned for 2017, but pushed back..
21. ICD-9 codes for Major Depressive Disorder
• Non-specific code 296 Episodic mood
disorders
• Non-specific code 296.2 Major depressive disorder
single episode
• Specific code 296.20 Major depressive affective
disorder, single episode, unspecified
• Specific code 296.21 Major depressive affective
disorder, single episode, mild
• Specific code 296.22 Major depressive affective
disorder, single episode, moderate
• Specific code 296.23 Major depressive affective
disorder, single episode, severe, without mention of
psychotic behavior
• Specific code 296.24 Major depressive affective
disorder, single episode, severe, specified as with
psychotic behavior
• Specific code 296.25 Major depressive affective
disorder, single episode, in partial or unspecified
remission
• Specific code 296.26 Major depressive affective
disorder, single episode, in full remission
• Non-specific code 296.3 Major depressive disorder
recurrent episode
• Specific code 296.30 Major depressive affective
disorder, recurrent episode, unspecified
• Specific code 296.31 Major depressive affective
disorder, recurrent episode, mild
• Specific code 296.32 Major depressive affective
disorder, recurrent episode, moderate
• Specific code 296.33 Major depressive affective
disorder, recurrent episode, severe, without
mention of psychotic behavior
• Specific code 296.34 Major depressive affective
disorder, recurrent episode, severe, specified as
with psychotic behavior
• Specific code 296.35 Major depressive affective
disorder, recurrent episode, in partial or unspecified
remission
• Specific code 296.36 Major depressive affective
disorder, recurrent episode, in full remission
35. ggplot2
• ggplot2 is based on the grammar of
graphics, the idea that you can build every
graph from the same components: a data
set, a coordinate system, and geoms—
visual marks that represent data points.
• To display values, map variables in the
data to visual properties of the geom
(aesthetics) like size, color, and x and y
locations.
37. Period Prevalence, by Year, by Gender
Independent variables = group_by(VISIT_START_YEAR, GENDER_SOURCE_VALUE)
Dependent variable = n_distinct(PERSON_ID)
38. How to convert text data type into datetime type in
MS SQL Server
• https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-
convert-transact-sql
• -- Syntax for CONVERT:
• CONVERT ( data_type [ ( length ) ] , expression [ , style ] )
Standard Input/Output (3)
Default for datetime and small
datetime
mon dd yyyy hh:miAM (or PM)
U.S. 1 = mm/dd/yy
101 = mm/dd/yyyy
ANSI 2 = yy.mm.dd
102 = yyyy.mm.dd
British/French 3 = dd/mm/yy
103 = dd/mm/yyyy
German 4 = dd.mm.yy
104 = dd.mm.yyyy
Italian 5 = dd-mm-yy
105 = dd-mm-yyyy
- 6 = dd mon yy
106 = dd mon yyyy
- 7 = Mon dd, yy
107 = Mon dd, yyyy
USA 10 = mm-dd-yy
110 = mm-dd-yyyy
JAPAN 11 = yy/mm/dd
111 = yyyy/mm/dd
ISO 12 = yymmdd
112 = yyyymmdd
39. Prevalence, by Year, by Gender
Independent variables = group_by(VISIT_START_YEAR, GENDER_SOURCE_VALUE)
Dependent variable = n_distinct(PERSON_ID)
#@ total & MDD & pMDD (PERSON_ID_n_distinct, 1-year period prevalence/proportion) ------
tblTotal = sqlQuery(channel, paste("
select year(convert(datetime, CONDITION_START_DATE, 112)) as CONDITION_START_year
, GENDER_SOURCE_VALUE
, count(*) as nrow
, count(distinct CONDITION_OCCURRENCE_ID) as n_distinct_CONDITION_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID) as n_distinct_VISIT_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.PERSON_ID) as n_distinct_PERSON_ID
from (
CONDITION_OCCURRENCE
left join VISIT_OCCURRENCE
on CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID = VISIT_OCCURRENCE.VISIT_OCCURRENCE_ID
left join PERSON
on VISIT_OCCURRENCE.PERSON_ID = PERSON.PERSON_ID
)
group by year(convert(datetime, CONDITION_START_DATE, 112)), GENDER_SOURCE_VALUE
order by year(convert(datetime, CONDITION_START_DATE, 112)), GENDER_SOURCE_VALUE
;"))
tblMDD = sqlQuery(channel, paste("
select year(convert(datetime, CONDITION_START_DATE, 112)) as CONDITION_START_year
, GENDER_SOURCE_VALUE
, count(*) as nrow
, count(distinct CONDITION_OCCURRENCE_ID) as n_distinct_CONDITION_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID) as n_distinct_VISIT_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.PERSON_ID) as n_distinct_PERSON_ID
from (
CONDITION_OCCURRENCE
left join VISIT_OCCURRENCE
on CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID = VISIT_OCCURRENCE.VISIT_OCCURRENCE_ID
left join PERSON
on VISIT_OCCURRENCE.PERSON_ID = PERSON.PERSON_ID
)
where (CONDITION_SOURCE_VALUE like '2962%'
or CONDITION_SOURCE_VALUE like '2963%'
)
group by year(convert(datetime, CONDITION_START_DATE, 112)), GENDER_SOURCE_VALUE
order by year(convert(datetime, CONDITION_START_DATE, 112)), GENDER_SOURCE_VALUE
;"))
41. Time series, Prevalence
Independent variables = group_by(VISIT_START_YEAR, VISIT_START_MONTH,
GENDER_SOURCE_VALUE)
Dependent variable = n_distinct(PERSON_ID)
g = tblMDDp %>% ggplot(aes(x = as.factor(CONDITION_START_year), y
= n_distinct_PERSON_ID, group = GENDER_SOURCE_VALUE, color =
GENDER_SOURCE_VALUE, fill = GENDER_SOURCE_VALUE))
g+geom_area()
g+geom_area(position = "stack")
g+geom_area(position = "fill")
g+geom_col()
g+geom_col(position = "stack")
g+geom_col(position = "dodge")
g+geom_col(position = "fill")
42. g+geom_col(position = "dodge") g+geom_area(position = "stack")
ggplot(aes(x = as.factor(CONDITION_START_month), y = n_distinct_PERSON_ID,
group = GENDER_SOURCE_VALUE, color = GENDER_SOURCE_VALUE, fill =
GENDER_SOURCE_VALUE))+facet_grid(CONDITION_START_year~.)
43. Period Prevalence, by Month, by Gender
Independent variables = group_by(VISIT_START_YEAR, VISIT_START_MONTH,
GENDER_SOURCE_VALUE)
Dependent variable = n_distinct(PERSON_ID)
44. Prevalence, by Month, by Gender
Independent variables = group_by(VISIT_START_YEAR, VISIT_START_MONTH,
GENDER_SOURCE_VALUE)
Dependent variable = n_distinct(PERSON_ID)
#@ total & MDD & pMDD (n_distinct_PERSON_ID, 1-month period prevalence/proportion) ------
system.time((
tblTotal = sqlQuery(channel, paste("
select year(convert(datetime, CONDITION_START_DATE, 112)) as CONDITION_START_year
, month(convert(datetime, CONDITION_START_DATE, 112)) as CONDITION_START_month
, GENDER_SOURCE_VALUE
, count(*) as nrow
, count(distinct CONDITION_OCCURRENCE_ID) as n_distinct_CONDITION_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID) as n_distinct_VISIT_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.PERSON_ID) as n_distinct_PERSON_ID
from (
CONDITION_OCCURRENCE
left join VISIT_OCCURRENCE
on CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID = VISIT_OCCURRENCE.VISIT_OCCURRENCE_ID
left join PERSON
on VISIT_OCCURRENCE.PERSON_ID = PERSON.PERSON_ID
)
group by year(convert(datetime, CONDITION_START_DATE, 112)), month(convert(datetime, CONDITION_START_DATE, 112)), GENDER_SOURCE_VALUE
order by year(convert(datetime, CONDITION_START_DATE, 112)), month(convert(datetime, CONDITION_START_DATE, 112)), GENDER_SOURCE_VALUE
;"))
))
system.time((
tblMDD = sqlQuery(channel, paste("
select year(convert(datetime, CONDITION_START_DATE, 112)) as CONDITION_START_year
, month(convert(datetime, CONDITION_START_DATE, 112)) as CONDITION_START_month
, GENDER_SOURCE_VALUE
, count(*) as nrow
, count(distinct CONDITION_OCCURRENCE_ID) as n_distinct_CONDITION_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID) as n_distinct_VISIT_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.PERSON_ID) as n_distinct_PERSON_ID
from (
CONDITION_OCCURRENCE
left join VISIT_OCCURRENCE
on CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID = VISIT_OCCURRENCE.VISIT_OCCURRENCE_ID
left join PERSON
on VISIT_OCCURRENCE.PERSON_ID = PERSON.PERSON_ID
)
where (CONDITION_SOURCE_VALUE like '2962%'
or CONDITION_SOURCE_VALUE like '2963%'
)
group by year(convert(datetime, CONDITION_START_DATE, 112)), month(convert(datetime, CONDITION_START_DATE, 112)), GENDER_SOURCE_VALUE
order by year(convert(datetime, CONDITION_START_DATE, 112)), month(convert(datetime, CONDITION_START_DATE, 112)), GENDER_SOURCE_VALUE
;"))
))
46. Time series, Prevalence
Independent variables = group_by(VISIT_START_YEAR, VISIT_START_MONTH,
GENDER_SOURCE_VALUE)
Dependent variable = n_distinct(PERSON_ID)
g = tblMDDp %>%
ggplot(aes(
x = as.factor(CONDITION_START_month)
, y = n_distinct_PERSON_ID
, group = GENDER_SOURCE_VALUE
, color = GENDER_SOURCE_VALUE
, fill = GENDER_SOURCE_VALUE)) +
facet_grid(CONDITION_START_year~.)
g+geom_step()
g+geom_step()+geom_smooth(method = "loess")
g+geom_step()+geom_smooth(method = "lm")
g+geom_line()
g+geom_line()+geom_smooth(method = "loess")
g+geom_line()+geom_smooth(method = "lm")
g+geom_area()
47. g+geom_line()+geom_smooth(method = "loess") g+geom_area()
ggplot(aes(x = as.factor(CONDITION_START_month), y = n_distinct_PERSON_ID,
group = GENDER_SOURCE_VALUE, color = GENDER_SOURCE_VALUE, fill =
GENDER_SOURCE_VALUE))+facet_grid(CONDITION_START_year~.)
48. Period Prevalence, by Year, by State
Independent variables = group_by(VISIT_START_YEAR, STATE)
Dependent variable = n_distinct(PERSON_ID)
49. Prevalence, by Year, by State
Independent variables = group_by(VISIT_START_YEAR, STATE)
Dependent variable = n_distinct(PERSON_ID)
#@ total & MDD & pMDD (n_distinct_PERSON_ID, 1-year period prevalence/proportion, by state) ------
system.time((
tblTotal = sqlQuery(channel, paste("
select year(convert(datetime, CONDITION_START_DATE, 112)) as CONDITION_START_year
, STATE
, count(*) as nrow
, count(distinct CONDITION_OCCURRENCE_ID) as n_distinct_CONDITION_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID) as n_distinct_VISIT_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.PERSON_ID) as n_distinct_PERSON_ID
from (
CONDITION_OCCURRENCE
left join VISIT_OCCURRENCE
on CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID = VISIT_OCCURRENCE.VISIT_OCCURRENCE_ID
left join PERSON
on VISIT_OCCURRENCE.PERSON_ID = PERSON.PERSON_ID
left join LOCATION
on PERSON.LOCATION_ID = LOCATION.LOCATION_ID
)
group by year(convert(datetime, CONDITION_START_DATE, 112)), STATE
order by year(convert(datetime, CONDITION_START_DATE, 112)), STATE
;")) %>% as.tibble
))
system.time((
tblMDD = sqlQuery(channel, paste("
select year(convert(datetime, CONDITION_START_DATE, 112)) as CONDITION_START_year
, STATE
, count(*) as nrow
, count(distinct CONDITION_OCCURRENCE_ID) as n_distinct_CONDITION_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID) as n_distinct_VISIT_OCCURRENCE_ID
, count(distinct CONDITION_OCCURRENCE.PERSON_ID) as n_distinct_PERSON_ID
from (
CONDITION_OCCURRENCE
left join VISIT_OCCURRENCE
on CONDITION_OCCURRENCE.VISIT_OCCURRENCE_ID = VISIT_OCCURRENCE.VISIT_OCCURRENCE_ID
left join PERSON
on VISIT_OCCURRENCE.PERSON_ID = PERSON.PERSON_ID
left join LOCATION
on PERSON.LOCATION_ID = LOCATION.LOCATION_ID
)
where (CONDITION_SOURCE_VALUE like '2962%'
or CONDITION_SOURCE_VALUE like '2963%'
)
group by year(convert(datetime, CONDITION_START_DATE, 112)), STATE
order by year(convert(datetime, CONDITION_START_DATE, 112)), STATE
;")) %>% as.tibble
))
50. Prevalence, by Year, by State
Independent variables = group_by(VISIT_START_YEAR, STATE)
Dependent variable = n_distinct(PERSON_ID)
tblTotal
tblMDD = full_join(select(tblTotal, -nrow, -n_distinct_CONDITION_OCCURRENCE_ID, -n_distinct_VISIT_OCCURRENCE_ID, -n_distinct_PERSON_ID), tblMDD)
tblMDD
tblMDDp = tblMDD; tblMDDp[,c("nrow", "n_distinct_CONDITION_OCCURRENCE_ID", "n_distinct_VISIT_OCCURRENCE_ID", "n_distinct_PERSON_ID")] = tblMDD[,c("nrow",
"n_distinct_CONDITION_OCCURRENCE_ID", "n_distinct_VISIT_OCCURRENCE_ID", "n_distinct_PERSON_ID")]/tblTotal[,c("nrow", "n_distinct_CONDITION_OCCURRENCE_ID",
"n_distinct_VISIT_OCCURRENCE_ID", "n_distinct_PERSON_ID")]
tblMDDp
tblMDDp %>% full_join(., tibble(state.abb, state.name, state.name.tolower = tolower(state.name)), by = c("STATE" = "state.abb"))
# > tblMDDp %>% full_join(., tibble(state.abb, state.name, state.name.tolower = tolower(state.name)), by = c("STATE" = "state.abb"))
# # A tibble: 156 x 8
# CONDITION_START_year STATE nrow n_distinct_CONDITION_OCCURRENCE_ID n_distinct_VISIT_OCCURRENCE_ID n_distinct_PERSON_ID state.name
state.name.tolower
# <int> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
# 1 2008 "" NA NA NA NA NA NA
# 2 2008 AK NA NA NA NA Alaska alaska
# 3 2008 AL NA NA NA NA Alabama alabama
# 4 2008 AR 0.00110 0.00110 0.00412 0.0833 Arkansas arkansas
# 5 2008 AZ 0.00696 0.00696 0.00495 0.125 Arizona arizona
# 6 2008 CA 0.00764 0.00764 0.0124 0.111 California california
# 7 2008 CO 0.00177 0.00177 0.00629 0.111 Colorado colorado
# 8 2008 CT NA NA NA NA Connecticut connecticut
# 9 2008 DC NA NA NA NA NA NA
# 10 2008 DE NA NA NA NA Delaware delaware
# # ... with 146 more rows