Neural Network Model
3/19/2019 Copyright Scott N. Gerard 2019 2
Input layer
.6
.1 .9
.5
.4
.8
.6
.3 .3
.7
.3 .1
2 hidden layers Output layer
• Layers are fully connected
• Each edge contains a weight
• Final answer is output neuron with highest value
x f1(x) f2(f1(x)) f3(f2(f1(x))) • Each function/layer fi is a non-linear function
width
depth
Neural Network Training and Inference
3/19/2019 Copyright Scott N. Gerard 2019 3
Input layer
Error = label – prediction
Labeled
Input
Backpropagation
Feed forward Label (ground truth)
.6
.1 .9
.5
.4
.8
.6
.3 .3
.7
.3 .1
2 hidden layers Output layer
• Supervised learning
• epoch = 1 fwd+bwd pass over all training
• mini-batch = 1 fwd+bwd pass over fraction of training
• # iterations = training size / mini-batch size
Feed forward
Training
Phase
Inference
Phase
NNmodel
(weights)
Unseen
Input Prediction
Train model
Use model
Bad Autoencoder
3/19/2019 Copyright Scott N. Gerard 2019 4
input
feature vector
same input
feature vector
• Unsupervised learning
• Reconstruction loss
=sum (output-input)2
identity(x)
Autoencoder
<date> Copyright Scott N. Gerard 2019 5
encoder
compressor
decoder
generator
input
feature vector
same input
feature vector
• Unsupervised learning
• Compresses input
• Learn important features
• NLP’s word2vec is latent space
• ½-hour sit-coms 😉
• How much compression?
• Auto-generate new sit-coms?
“bottleneck”
coding
latent space
f(x) “f -1”(x)
MNIST dataset (sample)
Autoencoder
Autoencoder Learns Handwritten Digits
3/19/2019 Copyright Scott N. Gerard 2019 6
• 784 neurons in input layer (=28x28 pixels)
• 256 neurons in hidden layer
• 128 neurons in latent space (middle layer)
• 256 neurons in hidden layer
• 10 neurons in output layer (1 for each digit)
• 30,000 MNIST training images
• Batch size = 256 images
Compressor / Dimensionality Reduction
3/19/2019 Copyright Scott N. Gerard 2019 7
encoder
compressor
input
feature vector
• Save compressed version
“bottleneck”
coding
latent space
encoder
Features for Another Analytic
3/19/2019 Copyright Scott N. Gerard 2019 9
encoder
compressor
another
analytic
input
feature vector
• Autoencoder features are input to
another analytic
• Classification analytic
• Image analytic
• Whatever
Latent space,
code
encoder
g(x)
other features
Anomaly Detector
3/19/2019 Copyright Scott N. Gerard 2019 10
encoder
compressor
decoder
generator
input
feature vector
same input
feature vector
• If reconstruction loss is too big,
then it can’t be represented by a
coding ==> anomaly
“bottleneck”
coding
latent space
encoder decoder
Autoencoder
• Autoencoder has to
• Compress input to codings,
• Reconstruct the output given ONLY the codings
• Small reconstruction loss ==> input space successfully compressed to
just the codings
• Expect decrease coding => increased reconstruction loss
<date> Copyright Scott N. Gerard 2019 11
Impact & business opportunity of a global demographic shift
• US – Estimated assets for this
demographic $8.4 to $11.6 Trillion
• China – Estimated “silver hair” market
to rise to $17 Trillion by 2050,
amounting to a third of the Chinese
economy.
• Japan – Estimated 65+ financial
assets $9.1 trillion
• Rising Eldercare costs will disrupt
economies 6% of US GDP and 4 to
8% of EU GDP will account for social
service costs for the Elder. PercentageofPopulation65yearsandolder
Japan
Italy
Germany
Ireland
China
Australia
Brazil
US
India
Egypt
2017
•http://www.icis.com/blogs/chemicals-and-the-economy/2015/03/worlds-demographic-dividend-turns-deficit-populations-age/
•https://www.metlife.com/assets/cao/mmi/publications/studies/2010/mmi-inheritance-wealth-transfer-baby-boomers.pdf
•http://blogs.ft.com/ftdata/2014/02/13/guest-post-adapting-to-the-aging-baby-boomers/
•http://www.marketsandmarkets.com/Market-Reports/healthcare-data-analytics-market-905.html
•http://www.bloomberg.com/bw/articles/2014-09-25/chinas-rapidly-aging-population-drives-652-billion-silver-hair-market
•Asian Journal of Gerontology & Geriatrics for Centenarians: According to the National Institute of Population and Social Security Research, Japan had 67,000 centenarians in 2014, but that number is forecast to reach 110,000 in 2020, 253,000 in 2030 and peak at 703,000 in the year 2051.
ADLs (Activities of Daily Living)
• Activities we normally do. Determines level of care needed.
• Bathing and showering
• Personal hygiene and grooming (including brushing/combing/styling hair)
• Dressing
• Toileting (getting to the toilet, cleaning oneself, and getting back up)
• Eating (self-feeding not including cooking or chewing and swallowing)
• Functional mobility, often referred to as "transferring", as measured by the ability to
walk, get in and out of bed, and get into and out of a chair; the broader definition
(moving from one place to another while performing activities) is useful for people
with different physical abilities who are still able to get around independently.
• We expect to see additional ADLs in our data
• Sleeping, Watching TV, …
14 https://en.wikipedia.org/wiki/Activities_of_daily_living
Avamere – High Density Sensor Deployment
Instrumenting 20 Patient rooms in Skilled Nursing Facility
& 5 Independent Living Apartment
Over 1000 sensors deployed
Autoencoder
3/19/2019 Copyright Scott N. Gerard 2019 16
encoder
compressor
decoder
generator
input
feature vector
same input
feature vector
• output = 3 x 30 features
“bottleneck”
coding
latent space
f(x) “f -1”(x)
Input
• 30 sensors
• 1-minute windows
• sensor fire counts
• 3 adjacent time windows
• 3 x 30 features
Conclusions
• Tuning
• Time window: 1 minute is good (5 min was too long)
• Alpha (# concurrent ADLs)
• Ideal: small alpha (0.1, 0.01, …)
• But Spark LDA ML doesn’t allow alpha < 1.0
• Iterations: 100 is good (35 was too few)
• Choose #ADLs up front. 6?, 7?, 10? …
• No ADL looks like “dressing” or “grooming”
• Found non-standard “Watch TV” ADL
• Interpretation
• Must manually characterize sensor sets (ADLs)
• How to transfer learning across apartments (diff sensors) ?
• Encouraging results, but more work is needed
23
One Neuron in a Neural Network
• Neuron (perceptron) computes weighted sum of inputs,
then activation function
𝑎𝑗 = 𝜎 Σ 𝑘 𝑤 𝑘𝑗 𝑎 𝑘
• Activation function
• Differentiable (nearly everywhere)
• Sigmoid: 𝜎 𝑥 =
exp(𝑥)
1+exp(𝑥)
• soft-max 𝑥 𝑘 =
exp(𝑥 𝑘)
Σ 𝑗 exp(𝑥 𝑗)
3/19/2019 Copyright Scott N. Gerard 2018 24