SlideShare a Scribd company logo
1 of 9
Download to read offline
Automatic Review in
Medicine
By: Vincent Adhi Handara
Text Mining Data Analysis Project in R
The medical literature is enormous. Pubmed, a database of medical publications maintained by the U.S. National
Library of Medicine, has indexed over 23 million medical publications. Further, the rate of medical publication has
increased over time, and now there are nearly 1 million new publications in the field each year, or more than one
per minute.
The large size and fast-changing nature of the medical literature has increased the need for reviews, which search
databases like Pubmed for papers on a particular topic and then report results from the papers found. While such
reviews are often performed manually, with multiple people reviewing each search result, this is tedious and time
consuming. In this problem, we will see how text analytics can be used to automate the process of information
retrieval.
The dataset consists of 1861 rows and 3 columns. The first and second column variables are title and abstract
respectively while the third column variable indicates whether the paper is a clinical trial testing a drug therapy for
cancer (variable trial). This trial label was obtained by two people reviewing each search result and accessing the
actual paper if necessary, as part of a literature review of clinical trials testing drug therapies for advanced and
metastatic breast cancer.
INTRODUCTION
Example of Clinical Research Paper
Title: Neoadjuvant vinorelbine-capecitabine versus docetaxel-doxorubicin-cyclophosphamide in early nonresponsive breast
cancer: phase III randomized GeparTrio trial.
Abstract: BACKGROUND: Among breast cancer patients, nonresponse to initial neoadjuvant chemotherapy is associated with
unfavorable outcome. We compared the response of nonresponding patients who continued the same treatment with that of
patients who switched to a well-tolerated non-cross-resistant regimen. METHODS: Previously untreated breast cancer
patients received two 3-week cycles of docetaxel at 75 mg/m(2), doxorubicin at 50 mg/m(2), and cyclophosphamide at 500
mg/m(2) per day (TAC). Patients whose tumors did not decrease in size by at least 50% were randomly assigned to four
additional cycles of TAC or to four cycles of vinorelbine at 25 mg/m(2) and capecitabine at 2000 mg/m(2) (NX). The outcome
was sonographic response, defined as a reduction in the product of the two largest perpendicular diameters by at least 50%. A
difference of 10% or less in the sonographic response qualified as noninferiority of the NX treatment. Pathological complete
response was defined as no invasive or in situ residual tumor masses in the breast and lymph nodes. Toxic effects were
assessed. All statistical tests were two-sided. RESULTS: Of 2090 patients enrolled in the GeparTrio study, 622 (29.8%) who did
not respond to two initial cycles of TAC were randomly assigned to an additional four cycles of TAC (n = 321) or to four cycles
of NX (n = 301). Sonographic response rate was 50.5% for the TAC arm and 51.2% for the NX arm. The difference of 0.7% (95%
confidence interval = -7.1% to 8.5%) demonstrated noninferiority of NX (P = .008). Similar numbers of patients in both arms
received breast-conserving surgery (184 [57.3%] in the TAC arm vs 180 [59.8%] in the NX arm) and had a pathological
complete response (5.3% vs 6.0%). Fewer patients in the NX arm than in the TAC arm had hematologic toxic effects, mucositis,
infections, and nail changes, but more had hand-foot syndrome and sensory neuropathy. CONCLUSION: Pathological complete
responses to both regimens were marginal. Among patients who did not respond to the initial neoadjuvant TAC treatment,
similar efficacy but better tolerability was observed by switching to NX than continuing with TAC.
Example of Non-Clinical Research Paper
Title: Long-term endometrial effects in postmenopausal women with early breast cancer participating in the Intergroup
Exemestane Study (IES)--a randomised controlled trial of exemestane versus continued tamoxifen after 2-3 years tamoxifen.
Abstract: BACKGROUND: The antiestrogen tamoxifen may have partial estrogen-like effects on the postmenopausal uterus.
Aromatase inhibitors (AIs) are increasingly used after initial tamoxifen in the adjuvant treatment of postmenopausal early
breast cancer due to their mechanism of action: a potential benefit being a reduction of uterine abnormalities caused by
tamoxifen.PATIENTS AND METHODS: Sonographic uterine effects of the steroidal AI exemestane were studied in 219 women
participating in the Intergroup Exemestane Study: a large trial in postmenopausal women with estrogen receptor-positive (or
unknown) early breast cancer, disease free after 2-3 years of tamoxifen, randomly assigned to continue tamoxifen or switch to
exemestane to complete 5 years adjuvant treatment. The primary end point was the proportion of patients with abnormal (>
or =5 mm) endometrial thickness (ET) on transvaginal ultrasound 24 months after randomisation.RESULTS: The analysis
included 183 patients. Two years after randomisation, the proportion of patients with abnormal ET was significantly lower in
the exemestane compared with tamoxifen arm (36% versus 62%, respectively; P = 0.004). This difference emerged within 6
months of switching treatment (43.5% versus 65.2%, respectively; P = 0.01) and disappeared within 12 months of treatment
completion (30.8% versus 34.7%, respectively; P = 0.67).CONCLUSION: Switching from tamoxifen to exemestane significantly
reverses endometrial thickening associated with continued tamoxifen.
OBJECTIVES:
What are some unique keywords for categorizing clinical and non-
clinical research paper?
METHODOLOGIES:
Data Import
Bag of
Corpus and
Cleaning
Bi-Word
Creation
Separating into
Clinical and Non-
Clinical Words
Converting
into Data
frame
Data
Visualiz
ation
Step 1: Data Import
setwd("C:/Data Science/Datasets")
data <- read.csv("clinical_trial.csv", stringsAsFactors = FALSE)
data$trial <- as.factor(data$trial)
Step 2: Bag of Corpus and Cleaning Corpus
clinical_abstract <- paste(subset(data, trial == 1)$abstract, collapse = " ")
nonclinical_abstract <- paste(subset(data, trial == 0)$abstract, collapse = " ")
all_abstract <- c(clinical_abstract, nonclinical_abstract)
all_abstract_corpus <- VCorpus(VectorSource(all_abstract))
clean_corpus <- function(corpus) {
corpus <- tm_map(corpus, content_transformer(stripWhitespace))
corpus <- tm_map(corpus, content_transformer(removePunctuation))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, content_transformer(removeNumbers))
corpus <- tm_map(corpus, removeWords, c(stopwords("en"), "purpose", "objective",
"objectives", "aim", "aims", "unlabelled", "introduction", "context",
"goals of work"))
return(corpus)
}
all_abstract_corp <- clean_corpus(all_abstract_corpus)
Step 4: Clinical and Non-clinical Separation
clinical_words <- subset(all_abstract_matrix, all_abstract_matrix[,1] > 0 &
all_abstract_matrix[,2] == 0)
nonclinical_words <- subset(all_abstract_matrix, all_abstract_matrix[,1] == 0 &
all_abstract_matrix[,2] > 0)
Step 5: Converting into Dataframe for
Clinical and Non-Clinical Words
clinical_words <- subset(all_abstract_matrix, all_abstract_matrix[,1] > 0 &
all_abstract_matrix[,2] == 0)
nonclinical_words <- subset(all_abstract_matrix, all_abstract_matrix[,1] == 0 &
all_abstract_matrix[,2] > 0)
Step 3: Bi-Word Creation
tokenizer <- function(x) {
NGramTokenizer(x, Weka_control(min = 2, max = 2))}
all_abstract_matrix <- as.matrix(TermDocumentMatrix(all_abstract_corp, control = list(tokenize
= tokenizer)))
Step 6: Data Visualization (using ggplot2)
Consisting of top 10 highly frequent bi-words for each clinical and non-clinical research papers
Results (1/2)
Results (2/2)
SUMMARY
Clinical research papers are mostly dominated by measurement unit
words such as progession months, mg-day, pcr rate, mgm qw, ttp-
months and toxicities
Non-Clinical research papers are mostly dominated by general medical
terminology (instead of measurement unit) words such as: breast
carcinomas, zoledronic acid, symptom distress, response
chemoteraphy, risk factors, bone turnover, cancer survivors

More Related Content

What's hot

Important trials of 2016
Important trials of 2016Important trials of 2016
Important trials of 2016Vibhay Pareek
 
Novel method for reviewing mechanistic evidence on diet, nutrition, physical ...
Novel method for reviewing mechanistic evidence on diet, nutrition, physical ...Novel method for reviewing mechanistic evidence on diet, nutrition, physical ...
Novel method for reviewing mechanistic evidence on diet, nutrition, physical ...World Cancer Research Fund International
 
The Role of Surgery in Metastatic Breast Cancer (MBC)
The Role of Surgery in Metastatic Breast Cancer (MBC)The Role of Surgery in Metastatic Breast Cancer (MBC)
The Role of Surgery in Metastatic Breast Cancer (MBC)Dana-Farber Cancer Institute
 
A review on data mining techniques for Digital Mammographic Analysis
A review on data mining techniques for Digital Mammographic AnalysisA review on data mining techniques for Digital Mammographic Analysis
A review on data mining techniques for Digital Mammographic Analysisijdmtaiir
 
Cost-Effective Approach to Managing Lab RR for Local Laboratories in CR, 2012
Cost-Effective Approach to Managing Lab RR for Local Laboratories in CR, 2012Cost-Effective Approach to Managing Lab RR for Local Laboratories in CR, 2012
Cost-Effective Approach to Managing Lab RR for Local Laboratories in CR, 2012Vadim Tantsyura
 
effective health care review
effective health care revieweffective health care review
effective health care reviewAdina Chen Bar
 
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET Journal
 
Miami Breast Cancer Conference: Opioid-Sparing ERAS in Breast Surgery K Rojas MD
Miami Breast Cancer Conference: Opioid-Sparing ERAS in Breast Surgery K Rojas MDMiami Breast Cancer Conference: Opioid-Sparing ERAS in Breast Surgery K Rojas MD
Miami Breast Cancer Conference: Opioid-Sparing ERAS in Breast Surgery K Rojas MDKristinRojas1
 
GS1 UK Healthcare Conference - Masterclass Presentation - Rod Beard
GS1 UK Healthcare Conference - Masterclass Presentation - Rod BeardGS1 UK Healthcare Conference - Masterclass Presentation - Rod Beard
GS1 UK Healthcare Conference - Masterclass Presentation - Rod BeardGS1 UK
 
QPS_EarlyStageClinical_serialCSFsampling
QPS_EarlyStageClinical_serialCSFsamplingQPS_EarlyStageClinical_serialCSFsampling
QPS_EarlyStageClinical_serialCSFsamplingAbby Millager
 
Manic switch in Bipolar patients treated with Electroconvulsive Therapy for T...
Manic switch in Bipolar patients treated with Electroconvulsive Therapy for T...Manic switch in Bipolar patients treated with Electroconvulsive Therapy for T...
Manic switch in Bipolar patients treated with Electroconvulsive Therapy for T...Yasir Hameed
 
Reducing bcc in ed through an enp 8 dec 2016 final
Reducing bcc in ed through an enp 8 dec 2016 finalReducing bcc in ed through an enp 8 dec 2016 final
Reducing bcc in ed through an enp 8 dec 2016 finalwilfredoebreomaan
 
IRJET - Cervical Cancer Prognosis using MARS and Classification
IRJET - Cervical Cancer Prognosis using MARS and ClassificationIRJET - Cervical Cancer Prognosis using MARS and Classification
IRJET - Cervical Cancer Prognosis using MARS and ClassificationIRJET Journal
 
Radiobiology behind dose fractionation
Radiobiology behind dose fractionationRadiobiology behind dose fractionation
Radiobiology behind dose fractionationfondas vakalis
 
Role of the pathologist in assessing response to treatment of ovarian and end...
Role of the pathologist in assessing response to treatment of ovarian and end...Role of the pathologist in assessing response to treatment of ovarian and end...
Role of the pathologist in assessing response to treatment of ovarian and end...Alejandro Palacio
 

What's hot (18)

Imaging After Breast Cancer
Imaging After Breast CancerImaging After Breast Cancer
Imaging After Breast Cancer
 
Important trials of 2016
Important trials of 2016Important trials of 2016
Important trials of 2016
 
Novel method for reviewing mechanistic evidence on diet, nutrition, physical ...
Novel method for reviewing mechanistic evidence on diet, nutrition, physical ...Novel method for reviewing mechanistic evidence on diet, nutrition, physical ...
Novel method for reviewing mechanistic evidence on diet, nutrition, physical ...
 
The Role of Surgery in Metastatic Breast Cancer (MBC)
The Role of Surgery in Metastatic Breast Cancer (MBC)The Role of Surgery in Metastatic Breast Cancer (MBC)
The Role of Surgery in Metastatic Breast Cancer (MBC)
 
A review on data mining techniques for Digital Mammographic Analysis
A review on data mining techniques for Digital Mammographic AnalysisA review on data mining techniques for Digital Mammographic Analysis
A review on data mining techniques for Digital Mammographic Analysis
 
Metastatic Breast Cancer Research and Treatment
Metastatic Breast Cancer Research and TreatmentMetastatic Breast Cancer Research and Treatment
Metastatic Breast Cancer Research and Treatment
 
Cost-Effective Approach to Managing Lab RR for Local Laboratories in CR, 2012
Cost-Effective Approach to Managing Lab RR for Local Laboratories in CR, 2012Cost-Effective Approach to Managing Lab RR for Local Laboratories in CR, 2012
Cost-Effective Approach to Managing Lab RR for Local Laboratories in CR, 2012
 
effective health care review
effective health care revieweffective health care review
effective health care review
 
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
 
Miami Breast Cancer Conference: Opioid-Sparing ERAS in Breast Surgery K Rojas MD
Miami Breast Cancer Conference: Opioid-Sparing ERAS in Breast Surgery K Rojas MDMiami Breast Cancer Conference: Opioid-Sparing ERAS in Breast Surgery K Rojas MD
Miami Breast Cancer Conference: Opioid-Sparing ERAS in Breast Surgery K Rojas MD
 
GS1 UK Healthcare Conference - Masterclass Presentation - Rod Beard
GS1 UK Healthcare Conference - Masterclass Presentation - Rod BeardGS1 UK Healthcare Conference - Masterclass Presentation - Rod Beard
GS1 UK Healthcare Conference - Masterclass Presentation - Rod Beard
 
QPS_EarlyStageClinical_serialCSFsampling
QPS_EarlyStageClinical_serialCSFsamplingQPS_EarlyStageClinical_serialCSFsampling
QPS_EarlyStageClinical_serialCSFsampling
 
Manic switch in Bipolar patients treated with Electroconvulsive Therapy for T...
Manic switch in Bipolar patients treated with Electroconvulsive Therapy for T...Manic switch in Bipolar patients treated with Electroconvulsive Therapy for T...
Manic switch in Bipolar patients treated with Electroconvulsive Therapy for T...
 
Endocrine treatment in metastatic breast cancer
Endocrine treatment in metastatic breast cancerEndocrine treatment in metastatic breast cancer
Endocrine treatment in metastatic breast cancer
 
Reducing bcc in ed through an enp 8 dec 2016 final
Reducing bcc in ed through an enp 8 dec 2016 finalReducing bcc in ed through an enp 8 dec 2016 final
Reducing bcc in ed through an enp 8 dec 2016 final
 
IRJET - Cervical Cancer Prognosis using MARS and Classification
IRJET - Cervical Cancer Prognosis using MARS and ClassificationIRJET - Cervical Cancer Prognosis using MARS and Classification
IRJET - Cervical Cancer Prognosis using MARS and Classification
 
Radiobiology behind dose fractionation
Radiobiology behind dose fractionationRadiobiology behind dose fractionation
Radiobiology behind dose fractionation
 
Role of the pathologist in assessing response to treatment of ovarian and end...
Role of the pathologist in assessing response to treatment of ovarian and end...Role of the pathologist in assessing response to treatment of ovarian and end...
Role of the pathologist in assessing response to treatment of ovarian and end...
 

Viewers also liked

Hair n andersn multivariate data analysis
Hair n andersn multivariate data analysisHair n andersn multivariate data analysis
Hair n andersn multivariate data analysisKrishna Mishra
 
Methodology Of Action Research
Methodology Of Action ResearchMethodology Of Action Research
Methodology Of Action Researchcharlotte1812
 
Lesson01
Lesson01Lesson01
Lesson01rlc6009
 
UCA: Data Analysis Techniques
UCA: Data Analysis TechniquesUCA: Data Analysis Techniques
UCA: Data Analysis Techniquesaukee
 
R user group presentation
R user group presentationR user group presentation
R user group presentationTom Liptrot
 
Text Mining with R for Social Science Research
Text Mining with R for Social Science ResearchText Mining with R for Social Science Research
Text Mining with R for Social Science ResearchRyan Wesslen
 
SUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project PresentationSUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project PresentationSung Park
 
Automatic extraction of microorganisms and their habitats from free text usin...
Automatic extraction of microorganisms and their habitats from free text usin...Automatic extraction of microorganisms and their habitats from free text usin...
Automatic extraction of microorganisms and their habitats from free text usin...Catherine Canevet
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 
Twitter Hashtag #appleindia Text Mining using R
Twitter Hashtag #appleindia Text Mining using RTwitter Hashtag #appleindia Text Mining using R
Twitter Hashtag #appleindia Text Mining using RNikhil Gadkar
 
Computing Probabilities With R: mining the patterns in lottery
Computing Probabilities With R: mining the patterns in lotteryComputing Probabilities With R: mining the patterns in lottery
Computing Probabilities With R: mining the patterns in lotteryChia-Chi Chang
 
Text mining with R-studio
Text mining with R-studioText mining with R-studio
Text mining with R-studioAshley Lindley
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression modelsHamideh Iraj
 
Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng
Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng
Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng Richard Sheng
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 

Viewers also liked (20)

Hair n andersn multivariate data analysis
Hair n andersn multivariate data analysisHair n andersn multivariate data analysis
Hair n andersn multivariate data analysis
 
Methodology Of Action Research
Methodology Of Action ResearchMethodology Of Action Research
Methodology Of Action Research
 
Lesson01
Lesson01Lesson01
Lesson01
 
UCA: Data Analysis Techniques
UCA: Data Analysis TechniquesUCA: Data Analysis Techniques
UCA: Data Analysis Techniques
 
R and data mining
R and data miningR and data mining
R and data mining
 
Predictshine
PredictshinePredictshine
Predictshine
 
R user group presentation
R user group presentationR user group presentation
R user group presentation
 
Text Mining with R for Social Science Research
Text Mining with R for Social Science ResearchText Mining with R for Social Science Research
Text Mining with R for Social Science Research
 
SUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project PresentationSUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project Presentation
 
Automatic extraction of microorganisms and their habitats from free text usin...
Automatic extraction of microorganisms and their habitats from free text usin...Automatic extraction of microorganisms and their habitats from free text usin...
Automatic extraction of microorganisms and their habitats from free text usin...
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
Twitter Hashtag #appleindia Text Mining using R
Twitter Hashtag #appleindia Text Mining using RTwitter Hashtag #appleindia Text Mining using R
Twitter Hashtag #appleindia Text Mining using R
 
Computing Probabilities With R: mining the patterns in lottery
Computing Probabilities With R: mining the patterns in lotteryComputing Probabilities With R: mining the patterns in lottery
Computing Probabilities With R: mining the patterns in lottery
 
Text mining with R-studio
Text mining with R-studioText mining with R-studio
Text mining with R-studio
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression models
 
Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng
Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng
Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
Action Research Methodology
Action Research MethodologyAction Research Methodology
Action Research Methodology
 

Similar to My Data Analysis Portfolio (Text Mining)

The Magnitude of Benefit from Adding Taxanes to Anthracyclines in the Adjuvan...
The Magnitude of Benefit from Adding Taxanes to Anthracyclines in the Adjuvan...The Magnitude of Benefit from Adding Taxanes to Anthracyclines in the Adjuvan...
The Magnitude of Benefit from Adding Taxanes to Anthracyclines in the Adjuvan...Osama Elzaafarany, MD.
 
NY Prostate Cancer Conference - T. Rancati - Session 7: Predicting radio-indu...
NY Prostate Cancer Conference - T. Rancati - Session 7: Predicting radio-indu...NY Prostate Cancer Conference - T. Rancati - Session 7: Predicting radio-indu...
NY Prostate Cancer Conference - T. Rancati - Session 7: Predicting radio-indu...European School of Oncology
 
Clonidina y ketamina en cx bariatrica
Clonidina y ketamina en cx bariatricaClonidina y ketamina en cx bariatrica
Clonidina y ketamina en cx bariatricaMayra Castañeda
 
Melanoma Nancy Shum And Anne Marcy Intro To Clinical Data Management
Melanoma   Nancy Shum And Anne Marcy Intro To Clinical Data ManagementMelanoma   Nancy Shum And Anne Marcy Intro To Clinical Data Management
Melanoma Nancy Shum And Anne Marcy Intro To Clinical Data Managementcunniffe6
 
Adaptive radiotherapy in head and neck cancer
Adaptive radiotherapy in head and neck cancerAdaptive radiotherapy in head and neck cancer
Adaptive radiotherapy in head and neck cancerDr. Rituparna Biswas
 
EFFICACY OF TRANSDERMAL PATCHES IN THE MANAGEMENT OF POSTOPERATIVE PAIN: AN O...
EFFICACY OF TRANSDERMAL PATCHES IN THE MANAGEMENT OF POSTOPERATIVE PAIN: AN O...EFFICACY OF TRANSDERMAL PATCHES IN THE MANAGEMENT OF POSTOPERATIVE PAIN: AN O...
EFFICACY OF TRANSDERMAL PATCHES IN THE MANAGEMENT OF POSTOPERATIVE PAIN: AN O...DrHeena tiwari
 
Medical Students 2011 - J.B. Vermorken - INTRODUCTION TO CANCER TREATMENT - I...
Medical Students 2011 - J.B. Vermorken - INTRODUCTION TO CANCER TREATMENT - I...Medical Students 2011 - J.B. Vermorken - INTRODUCTION TO CANCER TREATMENT - I...
Medical Students 2011 - J.B. Vermorken - INTRODUCTION TO CANCER TREATMENT - I...European School of Oncology
 
THE EFFECT OF METFORMIN ON CHEMOTHERAPY.pptx
THE EFFECT OF METFORMIN ON CHEMOTHERAPY.pptxTHE EFFECT OF METFORMIN ON CHEMOTHERAPY.pptx
THE EFFECT OF METFORMIN ON CHEMOTHERAPY.pptxShaikhAdnan46
 
Evolution of Hypofractionated Radiotherapy in Breast Cancer
Evolution of Hypofractionated Radiotherapy in Breast CancerEvolution of Hypofractionated Radiotherapy in Breast Cancer
Evolution of Hypofractionated Radiotherapy in Breast Cancerkoustavmajumder1986
 
Does homeopathic Arnica Montana reduce bruising after blepharoplasty
Does homeopathic Arnica Montana reduce bruising after blepharoplastyDoes homeopathic Arnica Montana reduce bruising after blepharoplasty
Does homeopathic Arnica Montana reduce bruising after blepharoplastyBrett Kotlus
 
11.[42 53]effectiveness of gefitinib as additional radiosensitizer to convent...
11.[42 53]effectiveness of gefitinib as additional radiosensitizer to convent...11.[42 53]effectiveness of gefitinib as additional radiosensitizer to convent...
11.[42 53]effectiveness of gefitinib as additional radiosensitizer to convent...Alexander Decker
 
11.effectiveness of gefitinib as additional radiosensitizer to conventional c...
11.effectiveness of gefitinib as additional radiosensitizer to conventional c...11.effectiveness of gefitinib as additional radiosensitizer to conventional c...
11.effectiveness of gefitinib as additional radiosensitizer to conventional c...Alexander Decker
 
Effectiveness of gefitinib as additional radiosensitizer to conventional chem...
Effectiveness of gefitinib as additional radiosensitizer to conventional chem...Effectiveness of gefitinib as additional radiosensitizer to conventional chem...
Effectiveness of gefitinib as additional radiosensitizer to conventional chem...Alexander Decker
 
MANAGEMENT OF COMPLICATIONS AFTER LRP: HOW TO IMPROVE EARLY CONTINENCE AND MA...
MANAGEMENT OF COMPLICATIONS AFTER LRP: HOW TO IMPROVE EARLY CONTINENCE AND MA...MANAGEMENT OF COMPLICATIONS AFTER LRP: HOW TO IMPROVE EARLY CONTINENCE AND MA...
MANAGEMENT OF COMPLICATIONS AFTER LRP: HOW TO IMPROVE EARLY CONTINENCE AND MA...Eduard Garcia Cruz
 

Similar to My Data Analysis Portfolio (Text Mining) (20)

JC HO - Colistin V. Tige - NOWICKI
JC HO - Colistin V. Tige - NOWICKIJC HO - Colistin V. Tige - NOWICKI
JC HO - Colistin V. Tige - NOWICKI
 
The Magnitude of Benefit from Adding Taxanes to Anthracyclines in the Adjuvan...
The Magnitude of Benefit from Adding Taxanes to Anthracyclines in the Adjuvan...The Magnitude of Benefit from Adding Taxanes to Anthracyclines in the Adjuvan...
The Magnitude of Benefit from Adding Taxanes to Anthracyclines in the Adjuvan...
 
NY Prostate Cancer Conference - T. Rancati - Session 7: Predicting radio-indu...
NY Prostate Cancer Conference - T. Rancati - Session 7: Predicting radio-indu...NY Prostate Cancer Conference - T. Rancati - Session 7: Predicting radio-indu...
NY Prostate Cancer Conference - T. Rancati - Session 7: Predicting radio-indu...
 
Clonidina y ketamina en cx bariatrica
Clonidina y ketamina en cx bariatricaClonidina y ketamina en cx bariatrica
Clonidina y ketamina en cx bariatrica
 
Melanoma Nancy Shum And Anne Marcy Intro To Clinical Data Management
Melanoma   Nancy Shum And Anne Marcy Intro To Clinical Data ManagementMelanoma   Nancy Shum And Anne Marcy Intro To Clinical Data Management
Melanoma Nancy Shum And Anne Marcy Intro To Clinical Data Management
 
Adaptive radiotherapy in head and neck cancer
Adaptive radiotherapy in head and neck cancerAdaptive radiotherapy in head and neck cancer
Adaptive radiotherapy in head and neck cancer
 
EFFICACY OF TRANSDERMAL PATCHES IN THE MANAGEMENT OF POSTOPERATIVE PAIN: AN O...
EFFICACY OF TRANSDERMAL PATCHES IN THE MANAGEMENT OF POSTOPERATIVE PAIN: AN O...EFFICACY OF TRANSDERMAL PATCHES IN THE MANAGEMENT OF POSTOPERATIVE PAIN: AN O...
EFFICACY OF TRANSDERMAL PATCHES IN THE MANAGEMENT OF POSTOPERATIVE PAIN: AN O...
 
Medical Students 2011 - J.B. Vermorken - INTRODUCTION TO CANCER TREATMENT - I...
Medical Students 2011 - J.B. Vermorken - INTRODUCTION TO CANCER TREATMENT - I...Medical Students 2011 - J.B. Vermorken - INTRODUCTION TO CANCER TREATMENT - I...
Medical Students 2011 - J.B. Vermorken - INTRODUCTION TO CANCER TREATMENT - I...
 
bioequivalence
bioequivalencebioequivalence
bioequivalence
 
ANZUP1 (dragged)
ANZUP1 (dragged)ANZUP1 (dragged)
ANZUP1 (dragged)
 
THE EFFECT OF METFORMIN ON CHEMOTHERAPY.pptx
THE EFFECT OF METFORMIN ON CHEMOTHERAPY.pptxTHE EFFECT OF METFORMIN ON CHEMOTHERAPY.pptx
THE EFFECT OF METFORMIN ON CHEMOTHERAPY.pptx
 
Evolution of Hypofractionated Radiotherapy in Breast Cancer
Evolution of Hypofractionated Radiotherapy in Breast CancerEvolution of Hypofractionated Radiotherapy in Breast Cancer
Evolution of Hypofractionated Radiotherapy in Breast Cancer
 
Cra helwan
Cra helwanCra helwan
Cra helwan
 
Does homeopathic Arnica Montana reduce bruising after blepharoplasty
Does homeopathic Arnica Montana reduce bruising after blepharoplastyDoes homeopathic Arnica Montana reduce bruising after blepharoplasty
Does homeopathic Arnica Montana reduce bruising after blepharoplasty
 
11.[42 53]effectiveness of gefitinib as additional radiosensitizer to convent...
11.[42 53]effectiveness of gefitinib as additional radiosensitizer to convent...11.[42 53]effectiveness of gefitinib as additional radiosensitizer to convent...
11.[42 53]effectiveness of gefitinib as additional radiosensitizer to convent...
 
11.effectiveness of gefitinib as additional radiosensitizer to conventional c...
11.effectiveness of gefitinib as additional radiosensitizer to conventional c...11.effectiveness of gefitinib as additional radiosensitizer to conventional c...
11.effectiveness of gefitinib as additional radiosensitizer to conventional c...
 
Effectiveness of gefitinib as additional radiosensitizer to conventional chem...
Effectiveness of gefitinib as additional radiosensitizer to conventional chem...Effectiveness of gefitinib as additional radiosensitizer to conventional chem...
Effectiveness of gefitinib as additional radiosensitizer to conventional chem...
 
Nejmoa1505066
Nejmoa1505066Nejmoa1505066
Nejmoa1505066
 
1725077 374
1725077 3741725077 374
1725077 374
 
MANAGEMENT OF COMPLICATIONS AFTER LRP: HOW TO IMPROVE EARLY CONTINENCE AND MA...
MANAGEMENT OF COMPLICATIONS AFTER LRP: HOW TO IMPROVE EARLY CONTINENCE AND MA...MANAGEMENT OF COMPLICATIONS AFTER LRP: HOW TO IMPROVE EARLY CONTINENCE AND MA...
MANAGEMENT OF COMPLICATIONS AFTER LRP: HOW TO IMPROVE EARLY CONTINENCE AND MA...
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

My Data Analysis Portfolio (Text Mining)

  • 1. Automatic Review in Medicine By: Vincent Adhi Handara Text Mining Data Analysis Project in R
  • 2. The medical literature is enormous. Pubmed, a database of medical publications maintained by the U.S. National Library of Medicine, has indexed over 23 million medical publications. Further, the rate of medical publication has increased over time, and now there are nearly 1 million new publications in the field each year, or more than one per minute. The large size and fast-changing nature of the medical literature has increased the need for reviews, which search databases like Pubmed for papers on a particular topic and then report results from the papers found. While such reviews are often performed manually, with multiple people reviewing each search result, this is tedious and time consuming. In this problem, we will see how text analytics can be used to automate the process of information retrieval. The dataset consists of 1861 rows and 3 columns. The first and second column variables are title and abstract respectively while the third column variable indicates whether the paper is a clinical trial testing a drug therapy for cancer (variable trial). This trial label was obtained by two people reviewing each search result and accessing the actual paper if necessary, as part of a literature review of clinical trials testing drug therapies for advanced and metastatic breast cancer. INTRODUCTION
  • 3. Example of Clinical Research Paper Title: Neoadjuvant vinorelbine-capecitabine versus docetaxel-doxorubicin-cyclophosphamide in early nonresponsive breast cancer: phase III randomized GeparTrio trial. Abstract: BACKGROUND: Among breast cancer patients, nonresponse to initial neoadjuvant chemotherapy is associated with unfavorable outcome. We compared the response of nonresponding patients who continued the same treatment with that of patients who switched to a well-tolerated non-cross-resistant regimen. METHODS: Previously untreated breast cancer patients received two 3-week cycles of docetaxel at 75 mg/m(2), doxorubicin at 50 mg/m(2), and cyclophosphamide at 500 mg/m(2) per day (TAC). Patients whose tumors did not decrease in size by at least 50% were randomly assigned to four additional cycles of TAC or to four cycles of vinorelbine at 25 mg/m(2) and capecitabine at 2000 mg/m(2) (NX). The outcome was sonographic response, defined as a reduction in the product of the two largest perpendicular diameters by at least 50%. A difference of 10% or less in the sonographic response qualified as noninferiority of the NX treatment. Pathological complete response was defined as no invasive or in situ residual tumor masses in the breast and lymph nodes. Toxic effects were assessed. All statistical tests were two-sided. RESULTS: Of 2090 patients enrolled in the GeparTrio study, 622 (29.8%) who did not respond to two initial cycles of TAC were randomly assigned to an additional four cycles of TAC (n = 321) or to four cycles of NX (n = 301). Sonographic response rate was 50.5% for the TAC arm and 51.2% for the NX arm. The difference of 0.7% (95% confidence interval = -7.1% to 8.5%) demonstrated noninferiority of NX (P = .008). Similar numbers of patients in both arms received breast-conserving surgery (184 [57.3%] in the TAC arm vs 180 [59.8%] in the NX arm) and had a pathological complete response (5.3% vs 6.0%). Fewer patients in the NX arm than in the TAC arm had hematologic toxic effects, mucositis, infections, and nail changes, but more had hand-foot syndrome and sensory neuropathy. CONCLUSION: Pathological complete responses to both regimens were marginal. Among patients who did not respond to the initial neoadjuvant TAC treatment, similar efficacy but better tolerability was observed by switching to NX than continuing with TAC.
  • 4. Example of Non-Clinical Research Paper Title: Long-term endometrial effects in postmenopausal women with early breast cancer participating in the Intergroup Exemestane Study (IES)--a randomised controlled trial of exemestane versus continued tamoxifen after 2-3 years tamoxifen. Abstract: BACKGROUND: The antiestrogen tamoxifen may have partial estrogen-like effects on the postmenopausal uterus. Aromatase inhibitors (AIs) are increasingly used after initial tamoxifen in the adjuvant treatment of postmenopausal early breast cancer due to their mechanism of action: a potential benefit being a reduction of uterine abnormalities caused by tamoxifen.PATIENTS AND METHODS: Sonographic uterine effects of the steroidal AI exemestane were studied in 219 women participating in the Intergroup Exemestane Study: a large trial in postmenopausal women with estrogen receptor-positive (or unknown) early breast cancer, disease free after 2-3 years of tamoxifen, randomly assigned to continue tamoxifen or switch to exemestane to complete 5 years adjuvant treatment. The primary end point was the proportion of patients with abnormal (> or =5 mm) endometrial thickness (ET) on transvaginal ultrasound 24 months after randomisation.RESULTS: The analysis included 183 patients. Two years after randomisation, the proportion of patients with abnormal ET was significantly lower in the exemestane compared with tamoxifen arm (36% versus 62%, respectively; P = 0.004). This difference emerged within 6 months of switching treatment (43.5% versus 65.2%, respectively; P = 0.01) and disappeared within 12 months of treatment completion (30.8% versus 34.7%, respectively; P = 0.67).CONCLUSION: Switching from tamoxifen to exemestane significantly reverses endometrial thickening associated with continued tamoxifen.
  • 5. OBJECTIVES: What are some unique keywords for categorizing clinical and non- clinical research paper? METHODOLOGIES: Data Import Bag of Corpus and Cleaning Bi-Word Creation Separating into Clinical and Non- Clinical Words Converting into Data frame Data Visualiz ation
  • 6. Step 1: Data Import setwd("C:/Data Science/Datasets") data <- read.csv("clinical_trial.csv", stringsAsFactors = FALSE) data$trial <- as.factor(data$trial) Step 2: Bag of Corpus and Cleaning Corpus clinical_abstract <- paste(subset(data, trial == 1)$abstract, collapse = " ") nonclinical_abstract <- paste(subset(data, trial == 0)$abstract, collapse = " ") all_abstract <- c(clinical_abstract, nonclinical_abstract) all_abstract_corpus <- VCorpus(VectorSource(all_abstract)) clean_corpus <- function(corpus) { corpus <- tm_map(corpus, content_transformer(stripWhitespace)) corpus <- tm_map(corpus, content_transformer(removePunctuation)) corpus <- tm_map(corpus, content_transformer(tolower)) corpus <- tm_map(corpus, content_transformer(removeNumbers)) corpus <- tm_map(corpus, removeWords, c(stopwords("en"), "purpose", "objective", "objectives", "aim", "aims", "unlabelled", "introduction", "context", "goals of work")) return(corpus) } all_abstract_corp <- clean_corpus(all_abstract_corpus) Step 4: Clinical and Non-clinical Separation clinical_words <- subset(all_abstract_matrix, all_abstract_matrix[,1] > 0 & all_abstract_matrix[,2] == 0) nonclinical_words <- subset(all_abstract_matrix, all_abstract_matrix[,1] == 0 & all_abstract_matrix[,2] > 0) Step 5: Converting into Dataframe for Clinical and Non-Clinical Words clinical_words <- subset(all_abstract_matrix, all_abstract_matrix[,1] > 0 & all_abstract_matrix[,2] == 0) nonclinical_words <- subset(all_abstract_matrix, all_abstract_matrix[,1] == 0 & all_abstract_matrix[,2] > 0) Step 3: Bi-Word Creation tokenizer <- function(x) { NGramTokenizer(x, Weka_control(min = 2, max = 2))} all_abstract_matrix <- as.matrix(TermDocumentMatrix(all_abstract_corp, control = list(tokenize = tokenizer))) Step 6: Data Visualization (using ggplot2) Consisting of top 10 highly frequent bi-words for each clinical and non-clinical research papers
  • 9. SUMMARY Clinical research papers are mostly dominated by measurement unit words such as progession months, mg-day, pcr rate, mgm qw, ttp- months and toxicities Non-Clinical research papers are mostly dominated by general medical terminology (instead of measurement unit) words such as: breast carcinomas, zoledronic acid, symptom distress, response chemoteraphy, risk factors, bone turnover, cancer survivors