Using machine learning to improve the user experience in online health care communities

Using machine learning to improve the
user experience in online health care
communities
Dr. Anja Pilz
June 25, 2018

Overview
1. Introduction
2. Content Based Recommendations
Latent Dirichlet Allocation
3. User Based Recommendations
Association Rule Learning
4. Ensemble Model
5. Conclusion & Outlook
Dr. Anja Pilz June 25, 2018 1 / 17

About DocCheck
Online medical community for health care professionals
• seek information in the medicine wiki Flexikon
• read the bi-weekly newsletter
• share and discuss medical images and videos
• buy medical products and supplies in the online shop
• exchange with peers: seek help or discuss cases

Motivation
Diverse user groups with different intentions and interests
• student might want to learn anatomical topics in some order
• nephrologist has different focus of interest than a cardiologist
• pharmacist might prefer reading pharma-related news
Long term goal
• find most relevant and interesting assets for each group to
enable targeted mailing & feed personalization

DocCheck Recommendation Engine
Provide related content for every asset
• Flexikon articles, pictures, videos,
shop products, and news
Diverse data types
• how is a text/video/picture/shop
product relevant?
Hybrid Model: content & user driven
• thematic relevance from text
• user preference from click journeys
Ensemble of two ML techniques
• Latent Dirichlet Allocation
• Association Rule Learning

Content Based Recommendations
Why?
• Cold start problem: want to propose related content also for new
assets without observed interactions
How?
• Represent textual content of asset in a Bag-of-Words (BoW)
model
• Find relevant assets using some similarity function (clustering)
But!
• Curse of Dimensionality: high dimensional BoW-vectors ”all look
the same” at some point
• BoW model can’t handle synonymy or polysemy
• vectors for Mammakarzinom and Brustkrebs have no similarity

Latent Dirichlet Allocation (LDA)
• LDA is a Bayesian probabilistic approach to topic modeling
• allows for low-dimensional, continuous representation of
documents
Generative model
• assumes a ﬁxed number K of underlying (latent) topics in a
document collection
• each document is a mixture of topics and generated by picking a
distribution over the latent topics
• given this mixture, the topic of each word is chosen and, given
their topics, the words are generated

Basic Idea of LDA
• you know stuﬀ about 20 topics and want to write some text
• you decide on some of the topics you want to write about (bit of
sports, bit of politics)
• you need words to express yourself that are related to these
topics, e.g. a round object associated with sports
• you pick one, for instance "ball", and write it down

Example: Topics from Flexikon & News
Topics generated by LDA are clusters of words that often co-occur
arzneimittel,
medikament, prä-
parat, apotheker,
tablette, arzt,
einnahme, verord-
nung, compliance
drugs
enzym, biochemie,
aktivität, hem-
mung, substrat,
reaktion, inhibitor,
spaltung, protease
enzymes

auge, augenheilkunde,
netzhaut, cornea,
hornhaut, linse,
glaukom, retina, iris
eyes
herz, kardiologie,
herzinsuﬃzienz, ekg,
kardiomyopathie,
herzmuskelzelle,
herzfrequenz
heart

impfung, impfstoﬀ,
masern, immu-
nisierung, vakzine,
schutz, röteln,
antikörper, polio
vaccine
ernährung, ﬂeisch,
nahrungsmittel,
gemüse, nahrung,
diät, obst, zucker,
lebensmittel
diet

Example: Renal Failure
• prominent topics: urea excretion,
kidneys (and more...)
urin, kreatinin,
clearance, nieren-
funktion, gfr,
niere, harnstoﬀ,
niereninsuﬃzienz
urea excretion
niere, dialyse,
glomerulonephritis,
nephrologie,
proteinurie,
nierenversagen
kidneys
• inferred topics: topic probability
distribution with peaks at most
prominent topics e.g.
p(urea excretion) = 0.4,
p(kidneys) = 0.3, ...

LDA Workﬂow
Training
• fetch corpus: content of all Flexikon and News articles
• do some preprocessing
• remove stopwords
• keep only ”medical terms” (MeSH), Named Entities, nouns, ...
• pump the documents into mallet & train the model
• run inference on all documents & store individual topic
distributions per asset

Finding Thematically Related Content
New assets
• ﬁrst apply the trained model
to infer and store the topic
distribution
Determine relevant links
• fetch stored distribution
• ﬁnd similar topic
distributions using some
similarity measure
• e.g. Kullback-Leibler
Divergence of topic
distributions

Basket Analysis
• given the items in a basket, what other items is someone likely
to buy?
DocCheck
• given the clicks in a session, which other links is a user likely to
click?
Motivation
• clicks give direct feedback: click on a link can be assumed as
"this is relevant to me"
• no need to ﬁnd relatedness measure for pictures and texts

Identify rules in a database using some measure of conﬁdence
• database is the collection of all user journeys
• each rule X ⇒ Y is composed by two itemsets X and Y
• instead of items in a basket, we use the set of clicks in a session,
i.e. {url1, url2, ...}, to learn rules
• conﬁdence: derived from the proportion of sessions that contain
X and Y

Association Rule Learning Workﬂow
Training
• split user journeys into sessions and form frequent itemsets
• learn association rules from these itemsets
• store learned rules together with their weight, e.g.
X = {url1, url2}, Y = {url3}, conf (X ⇒ Y ) = 0.9
Application
• new assets:
_
_(
")
)_/
_
• known asset
• fetch all rules containing current asset (URL)
• based on the associated conﬁdence, combine their URLs into set
of recommended links

Ensemble Model
Why?
• avoid cold-start problem: provide high quality recommendations
both for new and known assets
• prioritize ”labeled” data from user sessions
How?
• ask both models for a prediction
• combine the result in a weighted way, give user driven model
(AR) some boost
Reinforcement learning: which model returns better predictions?
• track which predictions are being clicked
• evaluate prevalence & update weights

Conclusion & Outlook
Combine content based and user generated data
• avoid cold-start problematic through content based model (LDA)
• adjust to user behavior through click journeys (AR)
• requires initial ﬁne-tuning but few maintenance work
Next steps: enhanced retrieval for related pictures and videos
• if image/video has no description or interaction:
_
_(
")
)_/
_
• use image or video analysis tools (work in progress...)

Thanks!

Using machine learning to improve the user experience in online health care communities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Using machine learning to improve the user experience in online health care communities

Similar to Using machine learning to improve the user experience in online health care communities (20)

More from Anja Pilz

More from Anja Pilz (6)

Recently uploaded

Recently uploaded (20)

Using machine learning to improve the user experience in online health care communities