Deep Learning for Retinal Image Analysis

Introduction
●
Purpose of this presentation is to
provide a light visual literature
review on “big data” or deep
learning / artificial intelligence
solutions to come for
ophthalmology and vision sciences.
– More with an idea to introduce
topics that you might have not
thought of before without
going to deeply to details
Some of the background in order
to understand this presentation
better are covered in my previous
presentation →
●
Presentation itself is quite dense,
and better suitable to be read from
a tablet/desktop rather than as a
slideshow projected somewhere
Shallow introduction for Deep Learning Retinal
Image Analysis
Published on Aug 20, 2016
https://www.slideshare.net/PetteriTeikariPhD/shallow-introduction
-for-deep-learning-retinal-image-analysis

“Old-school” unimodal model
Imageclassificationforretinalpathologies

Ophthalmic IMAGING 2D Fundus 3D OCT→
Examples of color and high-dynamic-range (HDR) disc
photographs of 2 normal controls (a, b and c, d) and 2
glaucoma patients (e, f and g, h). Left column (a, c, e,
and g) color disc photograph and right column (b, d, f,
and h) high-dynamic-range concept disc photograph.
https://doi.org/10.1155/2017/8209270
Linear-scale adaptive optics (AO)-Optical Coherence Tomography (OCT) volume acquired with three different AO focus depths (RNFL, OPL, and
IS/OS) and combined for displaying appearance of retinal layers in AO-OCT images. En face images are projections of subvolumes shown in
the middle, demonstrating the fine-depth sectioning ability of AO-OCT. (Jonnal et al., 2016)
Optical Coherence Tomography (OCT) and its variants, the de facto standard for eye diagnostics
Multispectral imaging going beyond RGB channels and laser-based OCTs (Figure from Annidis)

Ophthalmic IMAGING (A)SLO and multimodal systems
(2015) https://doi.org/10.1364/BOE.6.001407
(2016) https://doi.org/10.1364/BOE.7.001783
https://doi.org/10.1007/s00417-016-3361-7
Fundus autofluorescence, microperimetry and
hyperreflective intraretinal spor (HRS) analysis using OCT

Ophthalmic IMAGING Functional Imaging
http://dx.doi.org/10.1167/iovs.16-21389
Model of the retinal vasculature represented by a binary tree. The
vessels bifurcate in a dichotomous manner except for the
precapillaries, which are point of origin of four capillaries. Adapted
from Takahashi et al. (2009)
http://dx.doi.org/10.1111/aos.13365
http://dx.doi.org/10.1080/02713683.2016.1217544
KEYWORDS: Hyperspectral retinal camera, primary open-angle glaucoma, retinal oxygen saturation
The average arteriolar (left) and venular (right) OD values at each given (5-nm) imaged wavelength
from 500 to 600 nm for all of the volunteers.
In summary, this article has described a novel hyperspectral prototype for
spectral imaging of the retina that can potentially be used in the future to
acquire retinal vessel blood oxygen saturation values. By considering the
limitations of ocular imaging encountered by other retinal oximetry studies,
namely longer acquisition and exposure times, flash exposure, and limited
wavelength intervals, this new instrument may be promising in acquiring more
refined and faster measurements of nonflash exposure retinal oximetry
measurements in vivo that can potentially be applied to human retinal vascular
disease.

Ophthalmic IMAGING portable imaging
Human Factor and Usability Testing of a
Binocular OCT System - EASE Study
Reena Chopra1
, Padraig J. Mulholland1, 2
, Adam M. Dubis1
, Roger S.
Anderson1, 2
, Pearse A. Keane1
1
NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS
Foundation Trust and UCL Institute of Ophthalmology, London, United Kingdom; 2
Optometry
and Vision Science Research Group, School of Biomedical Sciences, Ulster University,
Coleraine, Northern Ireland, United Kingdom
Automated quantitative pupillometry using the Binocular OCT
Purpose: A prototype binocular optical coherence
tomography (OCT) device has recently been developed that
performs ‘whole-eye’ OCT imaging in an automated manner
(Envision Diagnostics, Inc. USA). The inclusion of ‘smart
technology’ such as customizable display screens and voice
recognition also permits the quantitative assessment of
visual acuity (VA), visual fields, ocular motility, and
pupillometry (Fig. 1). As this device will primarily be used in
elderly and visually impaired populations, we performed
prospective usability testing of an early prototype with a
view to predicting function in a clinical setting, and to
identify any potential user errors – EASE Study
(ClinicalTrials.gov Identfier: NCT02822612).
ARVO 2017 Annual Meeting Abstracts
Session 516: Advancements in OCT
Ophthalmologica 2017;238:89-99https://doi.org/10.1159/000475773
http://dx.doi.org/10.15761/NFO.1000102
Fundus Photography in the 21st Century—A Review of Recent Technological Advances
and Their Implications for Worldwide Healthcare
Panwar Nishtha, Huang Philemon, Lee Jiaying, Keane Pearse A., Chuan Tjin Swee, Richhariya Ashutosh, Teoh
Stephen, Lim Tock Han, and Agrawal Rupesh. Telemedicine and e-Health. March 2016, 22(3): 198-208.
https://doi.org/10.1089/tmj.2015.0068
iCam, 3nethra, CenterVue, iOptics EasyScan, Topcon TRC-NW8FPLUS, Zeiss Visucam 200, Kowa Nonmyd7, Canon CR-2, Oculus Imagecam,
iExaminer, PanOptic, Volk Pictor, VersaCam, JedMed Horus Scope, Optomed Smartscope, Kowa Genesis-D, Riester, Ocular Cellscope, PEEK,
dEye

Retinal Layer Segmentation Pathological retina challenging still
https://arxiv.org/abs/1704.02161
Branch Residual U-Network (BRU-net)
https://doi.org/10.1364/BOE.8.003292
https://doi.org/10.1364/BOE.8.001926
Voxeleron Awarded NIH SBIR
Grant for Device-independent
Retinal OCT Image Analysis
Software
February 8, 2017 Daniel Russakoff
Voxeleron will collaborate with Professor Pablo
Villoslada of UCSF/IDIBAPS and Dr. Pearse Keane of
Moorfields Eye Hospital to validate the algorithms
and ensure clinical utility.
in the choriocapillaris is shown. https://www.voxeleron.com/orion/

Vascular segmentation
http://dx.doi.org/10.1136/bmjophth-2016-000032
https://doi.org/10.1007/978-3-319-59876-5_56
https://doi.org/10.1007/s10916-017-0719-2

Other Retinal segmentation & Detection
Christos Bergeles, Adam M. Dubis, Benjamin Davidson,
Melissa Kasilian, Angelos Kalitzeos, Joseph Carroll, Alfredo
Dubra, Michel Michaelides, and Sebastien Ourselin
Biomedical Optics Express Vol. 8, Issue 6, pp. 3081-3094
(2017) https://doi.org/10.1364/BOE.8.003081
(2017) https://doi.org/10.1109/ISBI.2017.7950704
Suman Sedai, Ruwan Tennakoon, Pallab Roy Khoa Cao and Rahil Garnavi
IBM Research - Australia, Melbourne, VIC, Australia
localization of the fovea, second stage produces an accurate
segmentation of the fovea region.
We present an algorithm that automatically detects cones in
AOSLO split-detection images without supervision. Our
algorithm is among the first that use machine learning to
develop and use a photoreceptor model on-the-fly. Comparing
to Cunefare et al. (2016), specifically, the approach presented
here can tackle both densely and sparsely populated
photoreceptor images as it is independent of the spatial
arrangement of cones. Further, it introduces contrast
enhancement filters, which improve the quality of low signal-to-
noise ratio (SNR) images.
m

Optic disc and Cup segmentation or detection
Visual comparison of the predicted results and correct
segmentation on RIM-ONE v.3 for the optic disc (a)-(c), (g)-(i)
and cup (d)-(f), (j)-(l). On (d)-(f), (j)-(l) region of the optic disc is
shown as an input image.
https://doi.org/10.1109/TPAMI.2016.2577031
We propose a simple yet
effective method, termed
Deep Descriptor
Transforming (DDT), for
evaluating the correlations of
descriptors and then
obtaining the category-
consistent regions, which can
accurately locate the common
object in a set of unlabeled
images, i.e., unsupervised
object discovery.

IMAGE CLASSIFICATION #1
July–August, 2017 Volume 1, Issue 4, Pages 322–327
Cecilia S. Lee, MD, Doug M. Baughman, BS, Aaron Y. Lee, MD, MSCI
Department of Ophthalmology, University of Washington School of Medicine,
Seattle, Washington. http://dx.doi.org/10.1016/j.oret.2016.12.009
Examples of identification of pathology by the deep learning algorithm. Optical coherence
tomography images showing age-related macular degeneration (AMD) pathology
(A, B, C) are used as input images, and hotspots (D, E, F) are identified using an occlusion
test from the deep learning algorithm. The intensity of the color is determined by the drop
in the probability of being labeled AMD when occluded.
An occlusion test (Zeiler and Fergus, 2016) was performed to identify the
areas contributing most to the neural network's assigning the category of
AMD. A blank 20 × 20-pixel box was systematically moved across every
possible position in the image and the probabilities were recorded. The
highest drop in the probability represents the region of interest that
contributed the highest importance to the deep learning algorithm.
Varun Gulshan, PhD1; Lily Peng, MD, PhD1; Marc Coram, PhD1; et al
JAMA. 2016;316(22):2402-2410. doi:10.1001/jama.2016.17216
Validation Set Performance for All-Cause Referable Diabetic
Retinopathy in the EyePACS-1 Data Set (9946 Images) Performance of
the algorithm (black curve) and ophthalmologists (colored circles) for all-cause
referable diabetic retinopathy, defined as moderate or worse diabetic
retinopathy, diabetic macular edema, or ungradable image. The black
diamonds highlight the performance of the algorithm at the high-sensitivity
and high-specificity operating points.

IMAGE CLASSIFICATION #2
Stefanos Apostolopoulos, Carlos Ciller, Sandro I. De Zanet,
Sebastian Wolf, Raphael Sznitman
Ahmed ElTanboly,Marwa Ismail, Ahmed Shalaby, Andy Switala, Ayman El-Baz, Shlomit
Schaal, Georgy Gimel’farb,Magdi El-Azab
First published: 17 March 2017
DOI: 10.1002/mp.12071
https://doi.org/10.1146/annurev-bioeng-071516-044442

IMAGE Quality in image classification
Image Restoration: From Sparse and Low-rank Priors to Deep Priors
Learning Deep CNN Denoiser Prior for Image Restoration
Lei Zhang,, Wangmeng Zuo
The Hong Kong Polytechnic University, Harbin Institute of Technology
CLEAN
GAUSSIAN NOISE
GAUSSIAN BLUR
Example performance of quality resilient networks on various quality
distortions. This table shows the class prediction for an image under several
different types of distortions (from top to bottom: clean, Gaussian noise and
Gaussian blur). The original VGG16 network (Mclean
) fails on distorted images.
Networks fine-tuned on different types of distortions perform well on that
particular distortion, but not on other distortion types (Mnoise
and Mblur
). Our
mixture of experts based model (Mmix
) performs well over all distortion types as
well as the original clean image.
State-of-the-art image classification networks like VGG-16 perform poorly on blurred input (left),
when using model weights trained on high-quality sharp image datasets (center). However, while
they often make erroneous predictions in terms of the most likely classes for a blurred image, they
do so with lower confidence—producing distributions that are higher-entropy than those for sharp
images. However, this drop in performance is largely an artifact of being trained without any
blurred examples. We find that by fine-tuning the model on a mix of blurred and sharp images
for just a few epochs, allows it to perform well on both sharp and blurred inputs (right).

IMAGE Restoration enhancement
Deep Bilateral Learning for Real-Time Image Enhancement
MICHAËL GHARBI, MIT CSAIL; JIAWEN CHEN, Google Research; JONATHAN T. BARRON,
Google Research; SAMUEL W. HASINOFF, Google Research; FRÉDO DURAND, MIT
CSAIL / Inria, Université Côte d’Azur, http://dx.doi.org/10.1145/3072959.3073592
Our novel neural network architecture can reproduce sophisticated
image enhancements with inference running in real time at full HD
resolution on mobile devices. It can not only be used to dramatically
accelerate reference implementations, but can also learn subjective
effects from human retouching.
Image Restoration: From Sparse and Low-rank Priors to Deep Priors
Lei Zhang,, Wangmeng Zuo
The Hong Kong Polytechnic University, Harbin Institute of Technology
Kai Zhang ; Wangmeng Zuo ; Yunjin Chen ; Deyu Meng ; Lei Zhang
https://doi.org/10.1109/TIP.2017.2662206
An example to show the capacity of our proposed model for three different tasks (denoising, super-resolution, JPEG image deblocking). The input image is composed by noisy images with noise level 15
(upper left) and 25 (lower left), bicubically interpolated low-resolution images with upscaling factor 2 (upper middle) and 3 (lower middle), JPEG images with quality factor 10 (upper right) and 30 (lower right).
Note that the white lines in the input image are just used for distinguishing the six regions, and the residual image is normalized into the range of [0, 1] for visualization. Even the input image is corrupted with
different distortions in different regions, the restored image looks natural and does not have obvious artifacts.

IMAGE CLASSIFICATION Jointly with image restoration
(a) The whole ground truth image 0051x4 from DIV2K dataset. We show the
comparison of the zoom-in region between: (b) the ground truth; (c) the noisy image
with i.i.d. Gaussian noise of zero mean and σ = 30; (d) the denoised image by BM3D
; the denoising result of our proposed denoising network (e) without the guidance of
high-level vision information; (f) with the guidance of high-level vision information
Our experimental results demonstrate that the proposed architecture
not only yields superior image denoising results preserving fine
details, but also overcomes the performance degradation of different
high-level vision tasks, e.g., image classification and semantic
segmentation, due to image noise or artifacts caused by conventional
denoising approaches such as over-smoothing.
We propose a novel end-to-end differentiable architecture for joint denoising,
deblurring, and classification that makes classification robust to realistic noise and
blur. The proposed architecture dramatically improves the accuracy of a
classification network in low light and other challenging conditions,
outperforming alternative approaches such as retraining the network on noisy and
blurry images and preprocessing raw sensor inputs with conventional denoising
and deblurring algorithms

UNCERTAINTY in image enhancement
In this work, we investigate the value of uncertainty modelling in 3D super-
resolution with convolutional neural networks (CNNs). However, the highly ill-
posed nature of such problems results in inevitable ambiguity in the learning of
networks. We propose to account for intrinsic uncertainty through a per-patch
heteroscedastic noise model and for parameter uncertainty through approximate
Bayesian inference in the form of variational dropout. We demonstrate through
experiments on both healthy and pathological brains the potential utility of such
an uncertainty measure in the risk assessment of the super-resolved images for
subsequent clinical use.
This paper proposes a new implementation of supervised image quality
enhancement method referred as Bayesian image quality transfer (IQT). via
CNNs. This involves two key innovations in CNN-based models: 1) we extend
the subpixel CNNs previously limited to 2D images, to 3D volumes,
outperforming previous models in accuracy and speed on a DTI SR task; 2)
we devise new architectures enabling estimates of different components of
the uncertainty in the SR mapping

Sparsity and Model compressability
We thoroughly explored the granularity of sparsity with experiments on detailed
accuracy-density relationship. Due to the advantage of index saving, coarse-grained
pruning is able to achieve a higher model compression ratio, which is desirable for mobile
implementation. We also analyzed the hardware implementation advantages and show
that coarse-grained sparsity saves 2× output∼ memory access compared with fine-
grained sparsity, and ∼ 3× compared with dense implementation. Given the advantages
of simplicity and efficiency from a hardware perspective, coarse-grained sparsity enables
more efficient hardware architecture design of deep neural networks.

Towards multimodal models
Combining structuralandfunctionaldata

Future of OCT and retinal biomarkers
From Schmidt-Erfurth et al. (2016): “The therapeutic efficacy of VEGF inhibition in combination with the potential of
OCT-based quantitative biomarkers to guide individualized treatment may shift the medical need from CNV treatment
towards other and/or additional treatment modalities. Future therapeutic approaches will likely focus on early and/or
disease-modifying interventions aiming to protect the functional and structural integrity of the morphologic complex
that is primarily affected in AMD, i.e. the choriocapillary - RPE – photoreceptor unit. Obviously, new biomarkers
tailored towards early detection of the specific changes in this functional unit will be required as well as follow-up
features defining the optimal therapeutic goal during extended therapy, i.e. life-long in neovascular AMD. Three novel
additions to the OCT armamentarium are particularly promising in their capability to identify the biomarkers of the
future:”
Polarization-sensitive OCT OCT angiography Adaptiveopticsimaging
“this modality is particularly appropriate to highlight early
features during the pathophysiological development of
neovascular AMD
Findings from studies using adaptive optics implied that
decreased photoreceptor function in early AMD may be
possible, suggesting that eyes with pseudodrusen appearance
may experience decreased retinal (particularly scotopic) function
in AMD independent of CNV or RPE atrophy.”
“...the specific patterns of RPE plasticity
including RPE atrophy, hypertrophy, and
migration can be assessed and quantified).
Moreover, polarization-sensitiv e OCT allows
precise quantification of RPE-driven disease
at the early stage of drusen”,
“Angiographic OCT with its potential
to capture choriocapillary, RPE, and
neuroretinal fetures provides novel
types of biomarkers identifying
disease pathophysiology rather than
late consecutive features during
advanced neovascular AMD.””
Schlanitz et al. (2011)
zmpbmt.meduniwien.ac.at
See also Leitgeb et al. (2014)
Zayit-Soudry et al. (2013)

Multimodal models in general in medicine
https://dx.doi.org/10.1097%2FWCO.0000000000000460
Imaging plus X: multimodal models of neurodegenerative disease
Neil P. Oxtoby and Daniel C. Alexander, for the EuroPOND consortium
Old paradigm disease progression models. (a) It shows the hypothetical
model of Jack et al. (2010), which illustrates qualitative sigmoid evolution in
AD of scalar biomarkers such as CSF Aβ level, cognitive test scores and
hippocampal volume or atrophy. The lack of quantitative information prevents
direct diagnostic usage. (b) It shows a traditional longitudinal model of AD
atrophy Scahill et al. (2002) by binning individuals a-priori into ‘mild’,
‘moderate’ and ‘severe’ classes based on cognitive test scores. The model
can potentially match new individuals to the same stages using imaging data,
but must exclude cognitive scores to avoid circularity. AD, Alzheimer's disease.
The temporally continuous self-modelling regression approach of Jedynak et al. (2012).
The model shows the characteristic trajectories of a diverse set of biomarkers against a
common continuous disease stage variable learned from the ADNI and PAQUID (Personnes
Agées Quid) data sets. The model can potentially estimate the disease stage of a new
patient by identifying the position along the trajectory set that best matches their data.
ADNI, Alzheimer's disease neuroimaging initiative.
We have reviewed data-driven model-based analyses of neurodegenerative disease. We have argued the
potential for generative data-driven models to take centre stage in the study and management of
neurodegenerative diseases if we are to generate new avenues for disease understanding in the earliest,
preclinical stages. This is necessitated by the challenges in monitoring any neurological disease over its
full time course, coupled with overlapping phenotypes and lack of a single biomarker that is dynamic
across the full disease time course.
The main focus of development and application to date has been in Alzheimer's disease, but various efforts
including the EuroPOND project are expanding the application to other dementias, multiple-sclerosis, prion
diseases, normal ageing and development, and even non-brain applications. These techniques have the
potential for widespread impact in realising precision medicine across many such domains.

Retina as deep learning network
Photoreceptor
layer
Horizontal
Cells
BipolarCells AmacrineCells GanglionCell
layer
DL Layer1 DLLayer2 DL Layer3 DL Layer4 DLLayer5
LIGHT BRAIN
With enough data, we can do densely
connected (i.e. every layer is connected to
every other layer) feedforward network (or
even recurrent) not having to constrain the
network as all the modulatory pathways are
notwell known
https://arxiv.org/abs/1608.06993; Cited by 29
Joint training of
alllayers with
layer-wise
targets
derived from
ERGand
pupillometry
OPN4
https://arxiv.org/abs/1409.5185;Citedby292
Forexample, glaucoma
affectsganglion cell function,
whereas retinitis
pigmentosa affects
photoreceptors
DL-Deeplearning
OPN4- Melanopsin (ipRGC)

Retina (and V1) as deep learning network
DOI: 10.13140/RG.2.2.27438.72003 12/2016, Conference: NIPS 2016 Workshop -
Brains and Bits: Neuroscience Meets Machine Learning,
Riccardo Volpi, Istituto Italiano di Tecnologia; Matteo Zanotto; Diego Sona,: Vittorio Murino
International Work-Conference on the Interplay Between Natural and Artificial Computation
IWINAC 2017: Natural and Artificial Computation for Biomedicine and Neuroscience pp 464-472
Towards a Deep Learning Model of Retina: Retinal Neural Encoding of
Color Flash Patterns
Antonio Lozano. Javier Garrigós, J. Javier Martínez, J. Manuel Ferrández, Eduardo Fernández
https://doi.org/10.1007/978-3-319-59740-9_46
Visualizing the internal activity of a CNN
in response to a natural scene stimulus.
(A-C) Time series of the CNN activity
(averaged over space) for the first
convolutional layer (8 units, A), the
second convolutional layer (16 units, B),
and the final predicted response for an
example cell (C, cyan trace). The
recorded (true) response is shown below
the model prediction (C, gray trace) for
comparison. (D) Spatial activation of
example CNN filters at a particular time
point. The selected stimulus frame (top,
grayscale) is represented by parallel
pathways encoding spatial information
in the first (purple) and second (green)
convolutional layers (a subset of the
activation maps is shown for brevity). (E)
Autocorrelation of the temporal activity
in (A-C). The correlation in the recorded
firing rates is shown in gray
https://doi.org/10.1101/120956
Furthermore, the composite nonlinear computation performed by retinal
circuitry corresponds to a boolean OR function applied to bipolar cell feature
detectors. Our general computational framework may aid in extracting
principles of nonlinear hierarchical sensory processing across diverse
modalities from limited data.

Retina Model synthesis as Deep learning architecture
Indirectinferenceonretinalcircuit: Hardtorecordeveryintermediatestepin humans
INPUT
Light
OUTPUT
Pupil size
McDougal and Gamlin 2008
AUXILIARY OUTPUT
functional MRI (fMRI)
Temporal transfer functions for the postreceptoral cone
pathways.Spitschanet al. (2016). Seealso Hung etal. (2016).The original responses from the achromatic luminance experiments and their
derived PCA waveforms. The results of the component analysis illustrate that the
pupil response can be described quite well as a linear sum of a sustained and a
transientcomponent. - Young etal. (1993)
Maynard et al. (2015)
INTERMEDIATE
OUTPUT
Electroretinography (ERG)
(left) Proposed neural pathways andsynapticmechanisms underlying ipRGC
influence on light adaptation (right) M1 ipRGCs modulate the light-adapted
ERG b-wave viaD4dopaminereceptors– Priggeetal. (2016)
Multifocal Electroretinogram (UC Davis)
The relative spectral sensitivities of the five
photoreceptors in human retina, including S-, M-,
L-cones, rods, and ipRGCs (A), LED spectral
distributions (B), and LED chromaticities in 1964
CIE 10°space(C).- Cao etal. (2015)
Deeplearningframework forphototransduction studies, and clinicaldiagnosisdecisionsupportsystems

Retina Model synthesis Photoreceptor contributions #1: ERG
INPUT
Light
OUTPUT
Pupil size ?
Not done in the study by
Allen et al. (2016)
INTERMEDIATE OUTPUT
Vary the light parameters (intensity, wavelight, modulation) to probe what are the 'normal' responses
either in visual processing/phototransduction in 'basic science' paradigms, or alternatively employ light
parametersthatbestdiscriminate between retinalpathologies.
Note! In optimally constructed model with more parameters (more explicit retinal circuitry), one could infer all possible outcomes
(pathologicalornot)fromtheframework.Butinpracticewearelimitedtothedataavailable.
For example if glaucoma is shown to be detected well using PLR, we could extend that dataset with using same protocol and
simultaneously record ERG, visual fields, etc, and then have more complete model, and then have “good” predictive power with
ERGandvisualfieldaloneifPLRisnotpossible todo.
Rod and cone ERGsover mesopic
irradiances. Allenetal.(2016)
Stimulusdesignand quantification. The output
of athree-primaryLED light source (peak
emission at 354, 460, and 600 nm) wasused to
generatefour spectra, with precise excitation of
melanopsin, rod, SWS, and LWSopsins.
Allenetal.(2016)
Normalized b-wave amplitudes(G), implicit times(H), and OP amplitudes (I) for light-adapted
cone ERGsin Opn1mwR mice for pairsof rod-divergent stimuli(blackfilled circles are rod/mel-
lowand grayopen circles are rod/mel-high) withstimulusintensityquantified intermsof rod
effective photons/cm2/s. - Allen et al. (2016)
We now have the 'pure photoreceptor' response (well,
you know Ray), and if these responses are normal but
PLR abnormal, we could assume that the problem is
downstream giving hints about the given pathology

ERG Methodological background #1
Bingyao Tan; Erik Mason; Benjamin MacLellan; Kostadinka K. Bizheva
IOVS March 2017, Vol.58, 1673-1681. doi:10.1167/iovs.17-21543
Comparison of the changes in the total axial retinal blood flow (RBF) and the ERG b-wave
magnitude resulting from 200-ms single flash and 1-second, 10 Hz, 20% duty cycle flicker stimuli of
the same illumination intensity. (A) Representative ERG traces. The pink and gray shaded areas mark
the duration of the visual stimuli. Original time recordings of the total axial RBF in response to the
single flash and flicker stimuli.
Pedro Monsalve; Giacinto Triolo; Jonathon Toft-Nielsen; Jorge Bohorquez; Amanda D. Henderson;
Rafael Delgado; Edward Miskiel; Ozcan Ozdamar; William J. Feuer; Vittorio Porciatti
Translational Vision Science & Technology May 2017, Vol.6, 5.
doi:10.1167/tvst.6.3.5
A new PERG method with increased dynamic range allows recording of retinal
ganglion cell function in advanced stages of optic nerve disorders. It also
quantifies the response decline during the test, an autoregulatory
adaptation to metabolic challenge that decreases with age and presence of
disease.
Here we describe a new method for steady-state PERG recording in human
based on a visual display unit built with Light-Emitting Diode (LED) technology,
skin electrodes, and optimized signal processing to quantify response
adaptation (dubbed PERGx as a contraction of PERGnext). We show that,
compared to a validated method, the PERGx has a very high signal-to-noise ratio
(SNR); this suggests that meaningful responses can be recorded in advanced
stages of diseases such as nonarteritic ischemic optic neuropathy (NAION).
PERGx temporal dynamics and intrinsic variability in a representative
normal subject. (A) The amplitude of PERGx samples (blue circles, 16
consecutive partial averages of 64 epochs each over 2 minutes) progressive
declined (adapted) with a slope of −0.031 μV/sample (R2 = 0.48), whereas the
PERGx phase (red circles) was stationary. (B) Polar diagram displaying combined
amplitude and phase of PERG samples (open black circles) and noise samples
(open grey triangles). The PERG amplitude (1.65 μV) is represented by the
length of vector connecting the origin of the axes with the cluster centroid. The
PERG phase (63.6°) is represented by the angle Φ between the vector and the
x-axis.

ERG Methodological background #2
https://doi.org/10.1007/s10633-017-9593-y
Discrete Wavelet Transform (DWT) analysis applied to the mfERG response from a control (left)
and a patient (right). Topgraphical representation of the 2F-mfERG M-sequence used here
(MOFOFO), with frames displaced in time in order to better correspond visually to the recorded
response. The original signal from one hexagon of the mfERG (waveform inside box on top) can
be decomposed into many frequency levels, depending on the length of the time series. The first
level (1211 Hz) corresponds to high frequencies (noise), while the highest level (11 Hz)
corresponds to the lowest frequencies. For each frequency level, the vertical lines represent
individual wavelet coefficients. For each level, the variance between these coefficients is
computed and subjected to further analysis as the WVA (wavelet variance). Legend: DC direct
component; IC1 first induced component; IC2 second induced component
The entire process of retinal visual processing involves
the phototransduction cascade with different groups of
cells and circuits from the photoreceptors to the
ganglion cells. Thus, electrical signals produced by
different biological structures contribute to the retinal
response of the mfERG that is recorded from the cornea
[Hood et al. (2002); Luo et al. (2011)]
. In the standard mfERG, amplitude
and implicit time are often analyzed [Hood et al. (2012)]
.
Early glaucoma Dilru C Amarasekera BS, Arthur F Resende MD, Michael Waisbourd MD, Sanjeev Puri MD, Marlene R Moster MD,
Lisa A Hark PhD, L Jay Katz MD, Scott J Fudemberg MD, Anand V Mantravadi MD
First published: 20 July 2017 DOI: 10.1111/ceo.13006
Unreliable test results were excluded.
Abbreviations: ss-PERG=Steady-State Pattern Electroretinogram; SD-tVEP=ShortDuration
transient Visual Evoked Potentials; Lc=Low Contrast; Hc=High Contrast; SNR=Signal-to-
Noise Ratio.
Electrophysiological techniques thus play a valuable role in a diagnostic environment dominated by
highly effective tools such as OCT via the addition of an objective functional perspective to the
diagnosis of glaucoma. Although the use of PERG and VEP as a measure of retinal ganglion cell and
visual pathway dysfunction has been established, few studies have measured the potential clinical
utility of the novel rapid testing platform of ss-PERG and SD-tVEP in patients with glaucoma.ss-
PERG was effectively able to discern between glaucomatous and healthy eyes. The diagnostic
ability of ss-PERG was superior to that of SD-tVEP. ss-PERG may thus have a role as a clinically useful
electrophysiological diagnostic tool.

Retina Model synthesis Photoreceptor contributions #2: PLR
INPUT
Light
OUTPUT
Pupil size
INTERMEDIATE OUTPUT
Electroretinography (ERG)?
ERGnot done thistime
Experimental design. (A, Left) L, M, and S cones and melanopsin-containing
ipRGCs mediate vision at daytime light levels. (Center) Photoreceptor spectral
sensitivities. (Right) Physiological measurements of ipRGCs find excitatory L
and M cone inputs and inhibitory S-cone inputs (12). (B) A digital spectral
integrator produces sinusoidal photoreceptor-directed modulations that pass
through an artificial pupil into the pharmacologically dilated left eye. The
consensual pupil response of the right eye is recorded. (C) Photoreceptor-
directed modulations. Balanced changes in the spectrum of light around a
background spectrum nominally isolate targeted photoreceptors. -
Spitschan et al. (2014)
Group PLR data are well fit by the two componentlinear filtermodel. (A) The mean response across all subjects
(01–16) is shown at 0.05 and 0.5 Hz, for L+M-, melanopsin-, and S-cone-directed modulations. Fit values are
derived from those found for subject 01, with only amplitude parameters adjusted (Table S2). This is because the
average data are available at only two temporal frequencies and do not sufficiently constrain all parameters of the
model. To obtain the average data plotted, amplitudes and phases were averaged separately (i.e., average
amplitude obtained without consideration of phase, average phase obtained without consideration of amplitude).
The model was fit to the data as plotted. (B)Polar-plot representations of the group data with model fit points,
following conventions as in Fig 3. The data are normalized separately for each temporal frequency. Error bars (± 2
SEM across subjects) are smaller than the plot points for the data. -Spitschan et al. (2014)
Now aswe are feeding
in more data,we are in
theory learning how the
lightparameters
should be designed
tohave the best
photoreceptor
response isolation.
And have presentations
for corresponding ERG
and PLR responses.
It would also help if all the
studieswerefromhumans:P

Retina Model synthesis further downstream
INPUT
Light
OUTPUT
Pupil size
INTERMEDIATE OUTPUT
“KNOWN BEHAVIOR”
Auxiliary OUTPUT
dLGN
Build on top of previous models. We “know” how specific light stimulus is processed by the retina (ERG), and how is
this reflected in pupil behavior (PLR) via olivary pretectal nucleus (OPN). So using the same parameters, record the
activityofLGN for example whichisnice atleast forbasicscience, not necessarilyforpathologyscreening.
A: LED spectral power densities and in vivo photoreceptor spectral sensitivity
(normalised). The output of blue and yellow LEDs was adjusted to produce
equivalent effects on rods (black line). By contrast, the blue LED, always
appeared brighter for melanopsin (green line). B: Protocol 1. Melanopsin-
isolating steps in dLGN and retina, respectively) presentations of the blue LED
were interleaved with 210 or 180 sec of the (dLGN and retina, respectively)
yellow to produce a ‘step’ visible only to melanopsin. C: Protocol 2. Irradiance
slowly ramped up (0.5 ND per 200 seconds) before remaining at a steady state
for 10 seconds. D: The effective change in photon flux for melanopsin (green)
and rods (black) across a full repeat of Protocol 2. Settings of ND filter at the
point of each melanopsin isolating step are provided above.
- Davis et al. (2015)
INTERMEDIATE OUTPUT #2
Ganglion cell firingrates
Responsestomelanopsin-isolatingstepsand
gradual irradiance rampsin retina.
Responsesto melanopsin-stepsin the dLGN.

Retina Model synthesis
INPUT
Light
INTERMEDIATE OUTPUT
Electroretinography(ERG)
OUTPUT
Pupil size
Auxiliary OUTPUT
dLGN
Sonow we know how retina worksin a data-drivendeep learningsense (noexplicitmodelling ofretinainbiological
sense). We can heuristically cheatand define connectionsasdefined fromliterature
So as we feed in data from studies, the interactions between blocks are “automagically” quantified by adjusting the
convolutional weights in the deep learning model. At some point if we have enough data we could also start to relax the
circuitconstraints and hypothesize thatthere could be recurrent feedback from dLGN to OPN (controlling pupil size), and do
'blindcausalityanalysis' (Nikolaprobably an experton that)
We have proposed a novel framework for
causal analysis in time-series which does not
require any assumptions about the statistical
relationships among the variables of the study,
i.e., it is model-free.
Our results show that Twitter data polarity
does indeed have a causal impact on the
stock market prices of the examined
companies. Hence, we believe social media
data could represent a valuable source of
information for understanding the dynamics
of stock market movements
http://www.slideshare.net
http://dx.doi.org/10.1534/genetics.114.165704

Retina Model synthesis Pathologies?
INPUT
Light
INTERMEDIATE OUTPUT
OUTPUT
Pupil size
Auxiliary OUTPUT
dLGN
In case with glaucoma, one would expect that the peripheral retina gets destroyed first
(A) Schematic diagram showing the flash stimulation sequence of
the slow-sequence (slow flickering stimulation, MOOO) multifocal
electroretinogram (mfERG). (B) The first-order kernel of the slow-
sequence mfERG from the central (rings 1 to 2) and peripheral (rings
3 to 6)region - Chanetal.(2011)s.
Overlapping visual field test-region layout and luminance
characteristics of the multifocal pupillographic objective
perimetrystimuli forall protocols. -Carleetal. (2014)
Now we can define normal and pathologies as classes as you would in typical image classificationtasks ('dogs',
'cats', 'etc'), but instead of just using single image (whether it be fundus or OCT (SD/SS/AO/Angiography), we can
combine boththe image and behavioral response for better quantificationof the retinal pathology.

Retina Model synthesis VISUAL FIELD
Old school psychophysical functional measure that is often found stressful by the patients
https://doi.org/10.1016/j.ophtha.2017.04.021
De Moraes CG, Hood DC, Thenappan A, Girkin CA, Medeiros FA,
Weinreb RN, Zangwill LM, Liebmann JM.
Central visual field damage seen on the 10-2 test is
often missed with the 24-2 strategy in all groups. This
finding has implications for the diagnosis of glaucoma
and classification of severity.
JAMA Ophthalmol. 2017;135(7):783-788.
doi: 10.1001/jamaophthalmol.2017.1659
JAMA Ophthalmol. 2017;135(7):742-747.
doi: 10.1001/jamaophthalmol.2017.1396
A deep-learning based automatic
glaucoma identification
ARVO 2017: 320 Visual Fields, Vision Function, Psychophysics I
Serife Seda S. Kucur, Mathias Abegg, Sebastian Wolf, Raphael Sznitman.
ARTORG Center, University of Bern, Bern, Switzerland; Department of Opthalmology,
Inselspital Bern, Bern, Switzerland.
The inherent local and global characteristics of visual fields (VFs)
can be exploited in a strong data-driven sense and could provide
better understanding of VFs with regards to glaucoma. Ultimately,
this may help to efficiently automatize the diagnosis process.
Our hypothesis is that alternative representations of raw VFs, in
terms of different spatial scales, could be learned by computers
using machine learning techniques towards an effective
automatized glaucoma identification task. Accordingly, we present
a Convolutional Neural Network (CNN)-based approach for
classification of VFs as being glaucomatous or non-glaucomatous.
Conclusions: These results support the fact that processing Vfs
through a CNN generates different representation of data in
terms of its hidden characteristics and patterns that are efficient
to discriminate between glaucomatous and non-glaucomatous
VFs in an automated way. The performance could be further
improved with a different CNN architecture. The trained CNNs
have the potential to be utilized for glaucoma progression
analysis as well
https://doi.org/10.1016/j.ophtha.2017.01.027
http://dx.doi.org/10.1097/IJG.0000000000000710

Retina Model synthesis beyond retinopathies #1
What to diagnose fromthe eye, e.g. neurodegenerative disease such as alzheimer’s disease
Is the Eye an Extension of the Brain in
Central Nervous System Disease?
Lies De Groef1,2
and Maria Francesca Cordeiro1,3,4
Journal of Ocular Pharmacology and Therapeutics. June 2017,
https://doi.org/10.1089/jop.2016.0180
1
Glaucoma and Retinal Neurodegenerative Disease Research Group, Institute of Ophthalmology, University
College London, London, United Kingdom.
2
Neural Circuit Development and Regeneration Research Group, Department of Biology, University of Leuven,
Leuven, Belgium.
3
Western Eye Hospital, Imperial College Healthcare NHS Trust, London, United Kingdom.
4
ICORG, Department of Surgery and Cancer, Imperial College London, London, United Kingdom.
Compilation of examples to illustrate the concept ‘‘the eye as a window to the brain’’.
Typical ocular diseases, such as uveitis, glaucoma, and AMD, have in common several
pathological mechanisms with CNS diseases, for example, MS and AD. Both in vivo and post
mortem examinations of the eye can therefore be used to study the disease mechanisms
underlying these pathologies in the eye and brain. (1) fluorescein angiography; (2)
intraocular pressure measurement (copyright iCare, TonoLab); (3) optical coherence
tomography scan; (4) confocal scanning laser ophthalmoscopy imaging of curcumin-labeled
protein aggregates; (5) retinal oximetry; (6) ZO-1 tight junction immunostaining on
wholemounted retina; (7) transmission electron microscopy image of trabecular meshwork;
(8) Iba-1 microglia immunostaining on retinal section; (9) Brn3a retinal ganglion cell
immunostaining on wholemounted retina; (10) β-amyloid immunostaining on retinal section;
and (11) concanavalin A vessel labeling on wholemounted retina. AD, Alzheimer’s; AMD,
age-related macular degeneration; MS, multiple sclerosis
Front Aging Neurosci. 2017; 9: 214.
Published online 2017 Jul 6. doi: 10.3389/fnagi.2017.00214
The Role of Microglia in Retinal Neurodegeneration:
Alzheimer's Disease, Parkinson, and Glaucoma
Ana I. Ramirez,1,2 Rosa de Hoz,1,2 Elena Salobrar-Garcia,1,3 Juan J. Salazar,1,2 Blanca Rojas,1,3 Daniel Ajoy,1
Inés López-Cuenca,1 Pilar Rojas,1,4 Alberto Triviño,1,3 and José M. Ramírez1,3,*
Front Neurol. 2017; 8: 162.
Published online 2017 May 4. doi: 10.3389/fneur.2017.00162
Retinal Ganglion Cells and Circadian Rhythms in
Alzheimer’s Disease, Parkinson’s Disease, and
Beyond
Chiara La Morgia,1,2,* Fred N. Ross-Cisneros,3 Alfredo A. Sadun,3,4 and Valerio Carelli1,2
Summary of circadian rhythm
abnormalities in AD, PD, and HD.
AD, Alzheimer’s disease; PD, Parkinson’s disease;
HD, Huntington’s disease; IV, intra-daily variability;
IS, inter-daily stability; RA, relative amplitude; BP,
blood pressure; HR, heart rate.
Schematic representation of the hypothetical events
associated with the neuroinflammation in AD (A),
PD (B), and glaucoma (C). AD, Alzheimer's Disease; PD,
Parkinson's Disease; ILM, inner limitant membrane; NFL,
nerve fiber layer; GCL, ganglion cell layer; IPL, inner
plexiform layer; INL, inner nuclear layer; OPL, outer
plexiform layer; ONL, outer nuclear layer; OLM, outer
limitant membrane; PL, photoreceptor layer; RPE, retinal
pigment epithelium; BM, Bruch membrane; C, choroid;
Aβ, beta-amyloid; pTau, phosphorylated tau.

Health Economics for Medical
Startups | Background

Business Models focus
●
Often technical founders focus too much on the technology, and do no achieve the
Product-market fit
– In medical startups, it is often very useful to do proper health economics
calculations to see your idea to customers and investors.
●
In other words, how much can your solution make the healthcare more efficient
economically while improving quality of care to the patient.
– Other common problem in the long run is the reimbursement as in most
countries, the patient itself does not that pay fully the healthcare that the patient
receives, and the market access is complicated with varying regulations/policies in
each country.
http://startupheretoronto.com
www.smi-online.co.uk

Business Models Innovations on the model
https://hbr.org/2016/10
Healx: A Case Study
Informed by our business model framework, we advised (and Cambridge
Judge Business School’s business accelerator supported) the tech venture
Healx, which focuses on the treatment of patients with rare diseases in the
emerging field of personalized medicine. A big challenge for pharmaceutical
companies in this domain is that rare-disease markets are very small, so
companies usually have to charge astronomical prices. (One drug, Soliris,
used in the treatment of paroxysmal nocturnal hemoglobinuria, costs about
$500,000 per patient-year.)
Enter Healx, with a platform that leverages big data technology and analytics
across multiple databases owned by various organizations within global life
sciences and health care to efficiently match treatments to rare-disease patients.
Its initial business model hit three of our six key features. First, Healx’s value
proposition was about asset sharing (for example, making available clinical-trial
databases that record the effectiveness of most drugs across therapeutic areas
and diseases, including rare ones). Second, the business promised
more personalization by revealing drugs with high potential for treating the rare
diseases covered. Finally, Healx’s model would, in theory, create a collaborative
ecosystem by bringing together big pharma (which has the treatment and trial
data) and health care providers (which have data about effectiveness and
incompatibility reactions and also personal genome descriptions).
https://healx.io/
More recently, Healx has developed a machine-learning algorithm that can use a
patient’s biological information not only to match drugs to disease symptoms but
also to predict exactly which drug will achieve what level of effectiveness for
that particular patient. The latest version of its business model
brings personalization to the maximum possible level and adds agility, because
the treating clinician—armed with the biological data and the algorithm—can make
better treatment decisions directly with the patient and doesn’t have to rely on
fixed rules of thumb about which of the few available off-label drugs to use. In
this way, Healx is able to support decentralized, real-time, accurate decision
making.
This version of the Healx model has even more transformation potential—it
exhibits four of the six features; it has already generated revenue from
customers; and in the long term it could empower patients by giving them much
more information before they consult a medical practitioner. Although it is still
too early to tell whether that potential will be realized, Healx is clearly a venture
to watch. It has earned a number of prizes (including the 2015 Life Science
Business of the Year and the 2016 Graduate Business of the Year in the
Cambridge cluster) and sizable investments from several global funds.

LOSs Function performance quantification
●
In medical studies, the ROC curve and especially Area Under the Curve (AUC) is used as an
easy scalar to describe the performance of the classifier.
TensorFlow allows direct
optimization of ROC
http:dx.doi.org/10.1093/bib/bbr008
http://arxiv.org/abs/1605.06652
Conclusion: The AUC is an unreliable
measure of screening performance because
in practice the standard deviation of a
screening or diagnostic test in affected and
unaffected individuals can differ. The
problem is avoided by not using AUC at all,
and instead specifying detection rates
(DRs) for given false positive rates (FPRs) or
FPRs for given Drs.
http://dx.doi.org/10.1177/0969141313517497
http://tflearn.org/objectives/
Mozer, Michael C. "Optimizing classifier
performance via an approximation to
the Wilcoxon-Mann-Whitney statistic."
(2003). aaai.org/Papers
Front Public Health. 2015; 3: 57.
Published online 2015 Apr 20.
doi: 10.3389/fpubh.2015.00057
PMCID: PMC4403252
Threshold-Free Measures for
Assessing the Performance of
Medical Screening Tests

HEALTH ECONOMICAL Loss function
wikipedia.org
Analogies from churn prediction?
http://dx.doi.org/10.1186/s40165-015-0014-6
“Nevertheless, current state-of-the-art classification algorithms are not well
aligned with commercial goals, in the sense that, the models miss to include
the real financial costs and benefits during the training and evaluation phases.
In the case of churn, evaluating a model based on a traditional measure such
as accuracy or predictive power, does not yield to the best results when
measured by the actual financial cost, ie. investment per subscriber on a
loyalty campaign and the financial impact of failing to detect a real churner
versus wrongly predicting a non-churner as a churner”
What are the economical costs of each block in the contingency table, optimization for medical economics?
- More expensive to have false negatives as patients will not be diagnosed both in terms of economical cost and reduced quality of life for patients

Health economics models
https://dx.doi.org/10.3310/hta11410
Screening in UK for Glaucoma, NHS Setting
Published: Ann Intern Med. 2013;159(7):484-489
DOI: 10.7326/0003-4819-159-6-201309170-00686
Estimate of needed duration and number of subjects by Steve Kymes needed for
proper health economical study for glaucoma screening program. Presented by John
Boland at “Should we screen for glaucoma?” session at World Glaucoma Congress
2017 in Helsinki, Finland.
Indian J Ophthalmol. 2011 Jan; 59(Suppl1): S24–S30.
doi: 10.4103/0301-4738.73684 PMCID: PMC3038514
Cost-effectiveness of screening for open angle
glaucoma in developed countries
Anja Tuulonen
Clin Ophthalmol. 2017; 11: 337–346.
doi: 10.2147/OPTH.S120398 PMCID: PMC5317344
Cost and detection rate of glaucoma screening
with imaging devices in a primary care center
Alfonso Anton,1,2,3,4 Monica Fallon,3,5 Francesc Cots,2 María A Sebastian,6
Antonio Morilla-Grasa,4Sergi Mojal,3 and Xavier Castells2

RISK STRATIFICATION & Screening
Target screening for high-risk cases (family history, age, ethnicity, gender)
https://doi.org/10.1016/j.ajo.2017.05.017
(2016) https://doi.org/10.1109/TMI.2016.2608782
We introduce a novel Bayesian nonparametric model that uses
the concept of disease trajectories for disease subtype
identification.. We investigate several models with our
algorithm, and show that one with age, pack years (a measure of
cigarette exposure), and smoking status as predictors gives the
best compromise between estimated predictive performance
and model complexity.
The proposed risk score incorporates both the patients’ non-stationary temporal
physiological information and their individual baseline co-variates in order to accurately
describe the patients’ physiological trajectories.
Aaron Zalewski ; William Long ; Alistair E. W. Johnson ; Roger G. Mark ; Li-wei H. Lehman
Date of Conference: 16-19 Feb. 2017, https://doi.org/10.1109/BHI.2017.7897302

RISK factors
For example for Glaucoma
“Overview of ethnicity and race” by M. Roy Wilson (United States)
at Risk Profiling symposium at World Glaucoma Congress 2017, Helsinki, Finland
http://dx.doi.org/10.1001/jamaophthalmol.2015.1478
http://dx.doi.org/10.1126/science.aam7935

“Doctor AI” Systems | Introduction

AI Doctor
Longitudinal analysis → try to diagnose pathologies as early as possible.
Incorporate disease progression measurements and treatment interventions for
optimal personalized treatment.
Feature engineering remains a major bottleneck when creating predictive systems from electronic
medical records. At present, an important missing element is detecting predictive regular clinical motifs
from irregular episodic records. We present Deepr (short for Deep record), a new end-to-end deep
learning system that learns to extract features from medical records and predicts future risk
automatically. Deepr transforms a record into a sequence of discrete elements separated by coded
time gaps and hospital transfers. On top of the sequence is a convolutional neural net that detects and
combines predictive local clinical motifs to stratify the risk. Deepr permits transparent inspection and
visualization of its inner working. We validate Deepr on hospital data to predict unplanned readmission
after discharge. Deepr achieves superior accuracy compared to traditional techniques, detects
meaningful clinical motifs, and uncovers the underlying structure of the disease and intervention
space.

Condition dynamics Long short-term memory (LSTM)
C memory of LSTM
x diagnoses (features vector)
p procedures, medications
f illness "forgetting" (curing or toxicity)
m planned/unplanned admission flag
h weighed "illness pooling"
i input gate (new information updated to memory)
o output gate (disease state)

Condition dynamics always missing data in clinical time series
TREATING MISSING DATA Various options
1. Zero-Imputation Set to zero when missing data
2. FORWARD-FILLING use previous values
3. MISSINGNESS Treat the missing value as a signal, as lack of a value
measured e.g. in an ICU can carry information itself (Lipton et al.
2016)
4. BAYESIAN STATE-SPACE MODELING to fill the missing data (Luttinen et
al. 2016, BayesPy package)
5. GENERATIVE MODELING Train the deep network to generate missing
samples (Im et al. 2016, RNN GAN; see also github:
sequence_gan)

Condition dynamics -based Individualized treatment
●
Schmidt-Erfurth and Waldstein (2016): There is a critical unmet medical need to identify, characterize, and
validate biomarkers that could provide solid guidance for an efficient individualized treatment with regards to
optimal functional outcome and disease management. Such biomarkers would enable the treating physician to
tailor personalized treatment to each patient's individual disease and need, in order to provide adequate disease
control, minimize recurrence and neurosensory damage, and limit the number of invasive and costly
interventions.
Relationship between initial visual acuity, visual acuity
change and final visual acuity during therapy of
neovascular age-related macular degeneration (i.e.,the
ceiling effect). The interpolation curves illustrate final
visual acuity levels dependent on baseline visual acuity
in the controlled trials CATT and IVAN as well inthe
real-world UK neovascular AMD database study
Role of subretinal fluid as a treatment-modifying imaging
biomarker. In patients with subretinal fluid at baseline (blue
graphs), antiangiogenic therapy leads to identical visual
acuity outcomes, regardless of treatment regimen (monthly
versus every 12 weeks dosing). In contrast, patients without
subretinal fluid at baseline (red graphs) demonstrate
unfavourable outcomes if treatment was not administered
on a monthly basis.
Pigment-epithelial detachment as risk factor for vision loss
during individualized dosing. In the VIEW studies, patients
received continuous anti-VEGF therapy during the first 48
weeks. At 52 weeks, a discontinuous, “as-needed” dosing
regimen was introduced. Only in a precisely defined patient
population, i.e. eyes with pigment-epithelial detachments
developing secondary intraretinal cystoid fluid (IRC, red graph),
the reactive dosing regimen led to pronounced vision loss.
Future therapeutic approaches will likely focus on early and/or disease modifying interventions aiming to protect
the functional and structural integrity of the morphologic complex that is primarily affected in AMD, i.e. the
choriocapillary e RPE e photoreceptor unit.
Multimodal innovative imaging technologies, such as PS-OCT, OCT angiography, and adaptive optics allow access
to yet unidentified biomarkers representing the origin of neovascular AMD as well as functionally relevant
therapeutic aims. Improved big-data applicability and reproducibility aided by computerized OCT analysis will likely
allow personalized antiangiogenic therapy with minimal interventions, while providing maximum disease
control,using advanced imaging software and hardware. It is the responsibility of the scientific and clinical community
to follow the open path of advanced imaging in a collaborative and interdisciplinary approach together with
ophthalmologists, biologists, physicists, and computer scientists in an efficient interdisciplinary approach.

Condition dynamics risk factors for glaucoma progression
https://doi.org/10.1016/j.ajo.2017.06.003
To determine the intraocular and systemic risk factor differences
between a cohort of rapid glaucoma disease progressors and non-
rapid disease progressors.
Conclusion: Cardiovascular disease is an important risk factor for
rapid glaucoma disease progression irrespective of IOP control.

Condition dynamics Disease progression #1
Clin Ophthalmol. 2017; 11: 1015–1020. May 23.
doi: 10.2147/OPTH.S116265 PMCID: PMC5449101
Automated retinal imaging and trend
analysis – a tool for health monitoring
Karin Roesch, Tristan Swedish, and Ramesh Raskar
MIT Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
The future of health diagnostics. Current diagnostics are based on a
“snapshot” in time and limited data points. In the future, large datasets
acquired over time through constant monitoring will be analyzed to
establish baselines and trends, enabling preventative interventions.
Knowing when a feature occurred is key. For example, the MA
population is dynamic and changes occur in a matter of months. For
diabetic retinopathy (DR), it has been established that microaneurysms
(MAs) are the earliest lesions visible.6 Additionally, MA turnover rates
are indicative of early-stage DR as well as the likelihood of DR
progression to macular edema.
Po-Hsiang Chiu, George Hripcsak
Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA
https://doi.org/10.1016/j.jbi.2017.04.009
Learning statistical models of phenotypes using noisy
labeled training data
Vibhu Agarwal Tanya Podchiyska Juan M Banda Veena Goel Tiffany I LeungEvan P Minty Timothy E Sweeney Elsie Gyang Nigam H Shah
J Am Med Inform Assoc (2016) 23 (6): 1166-1173.
DOI: https://doi.org/10.1093/jamia/ocw028

Condition dynamics Disease progression #2
Hrvoje Bogunović; Alessio Montuoro; Magdalena Baratsits;
Maria G. Karantonis; Sebastian M. Waldstein; Ferdinand Schlanitz;
Ursula Schmidt-Erfurth
Investigative Ophthalmology & Visual Science June 2017,
Vol.58, BIO141-BIO150. DOI: 10.1167/iovs.17-21789
Observations at baseline and the first follow-up are used for predicting
drusen regression in the future, for example, the following 1-year period.
Examples of drusen thickness maps and the drusen regression prediction within 1-year
period. Last column shows true positives (green), false positives (orange), and false negatives
(blue). Each row represents one example eye.
http://dx.doi.org/10.1001/jamaophthalmol.2016.5111
http://dx.doi.org/10.1002/sim.7300
Application of our approach using linear mixed models to Alzheimer’s Disease Neuroimaging Initiative data with
bootstrapped 95% CI including boxplots of neocortical Aβ burden (standard uptake value ratio (SUVR)) for each
diagnosis group, separately for amyloid–β positive and negative individuals. It takes 24.47 years to progress from an
SUVR of 0.79 to 1.01. This is equivalent to a rate of 0.009 increase in SUVR per year. Similarly, it takes 10.76 years
to progress from an SUVR of 0.73 to 0.79. See the text for further details. HC, healthy control; MCI, mild cognitively
impaired; AD, Alzheimer’s disease

Condition dynamics Natural Language processing (NLP)
http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf
http://www.bioscience.ai/schedule

Text analysis for clinical notes #1
http://dx.doi.org/10.3233/978-1-61499-753-5-201
Medical Text Classification using Convolutional Neural
Networks
Mark Hughes, Irene Li, Spyros Kotoulas, Toyotaro Suzumura (Submitted on
22 Apr 2017). https://arxiv.org/abs/1704.06841
We present an approach to automatically classify clinical text at a sentence level. We are
using deep convolutional neural networks to represent complex features. We train the
network on a dataset providing a broad categorization of health information. Through a
detailed evaluation, we demonstrate that our method outperforms several approaches
widely used in natural language processing tasks by about 15%.

Text analysis for clinical notes #2
13 April 2017. https://doi.org/10.1109/BHI.2017.7897302
https://doi.org/10.1016/j.jbi.2017.07.006
We proposed the first models based on recurrent neural networks (more
specifically Long Short-Term Memory - LSTM) for classifying relations from
clinical notes.
We also evaluated the impact of word embedding on the performance of
LSTM models and showed that medical domain word embedding help
improve the relation classification. These results support the use of LSTM
models for classifying relations between medical concepts, as they show
comparable performance to previously published systems while requiring
no manual feature engineering.
In this work, we explore the use of Hierarchical Dirichlet Processes (HDP)
as a Bayesian nonparametric framework to infer patients' states of health
by combining multiple sources of data. In particular, we employ HDP to
combine clinical time series and text from the nursing progress notes in a
probabilistic topic modeling framework for patient risk stratification
iDoctor: Personalized and professionalized medical
recommendations based on hybrid matrix factorization
Future Generation Computer Systems
Volume 66, January 2017, Pages 30-35
https://doi.org/10.1016/j.future.2015.12.001

Personalized Medicine | Introduction

Precision / personalized medicine #1
re-work.co/blog
http://dx.doi.org/10.1101/070490
“For the first time, we demonstrate that DLNN trained on a large pharmacogenomic data set can effectively
predict the therapeutic response of specific drugs in specific cancer types, from a large panel of both drugs and
cancer cell lines. These findings serve as a proof of concept for the application of DLNN to predict therapeutic
responsiveness, a milestone in precision medicine.”
http://dx.doi.org/10.1056/NEJMp1500523
http://dx.doi.org/10.3389%2Ffpsyt.2016.00034
http://dx.doi.org/10.1016/j.media.2016.06.024

We introduce an IoT driven architecture and discuss how non-
invasive, affordable, unobtrusive sensing using mobile phones,
wearables and nearables is making physiological and pathological
data collection from human body possible in thus far unimaginable
ways. We also introduce breakthrough technologies in form
of exosomes and 3D organ printing that has the potential to disrupt
the future healthcare landscape.
http://dx.doi.org/10.1007/978-3-319-42141-4_9
https://doi.org/10.1109/TMM.2016.2614225
To facilitate the intensive computation required for interactive analytics, we design an efficient
sparse principal component analysis (SPCA) solver based on a variance reduced stochastic
gradient technique. The benefits of our method are demonstrated by analyzing two different
EHR patient cohorts, a public and a private dataset containing EHRs of 101 767 and 223 076
patients, respectively. Our evaluations show that PHENOTREE can detect clinically meaningful
hierarchical phenotypes.
http://dx.doi.org/10.3390/ijms17091555

Multimorbidity space and dynamic disease progression.
http://dx.doi.org/10.1038/nrg.2016.87
The co-occurrence of diseases can inform the underlying network biology of shared and multifunctional genes and pathways. In addition,
comorbidities help to elucidate the effects of external exposures, such as diet, lifestyle and patient care. With worldwide health transaction data
now often being collected electronically, disease co-occurrences are starting to be quantitatively characterized.
Linking network dynamics to the real-life, non-ideal patient in whom diseases co-occur and interact provides a valuable basis for generating
hypotheses on molecular disease mechanisms, and provides knowledge that can facilitate drug repurposing and the development of targeted
therapeutic strategies.

Glaucoma decision support tools
Old-school methodsformultimodal and structural features
Development of machine learning models
for diagnosis of glaucoma
Seong Jae Kim, Kyong Jin Cho, Sejong Oh
Published: May 23, 2017.
https://doi.org/10.1371/journal.pone.0177726
We used 100 cases of data as a test dataset and 399 cases
of data as a training and validation dataset. To develop the
glaucoma prediction model, we considered four machine
learning algorithms: C5.0, random forest (RF), support vector
machine (SVM), and k-nearest neighbor (KNN).
Color-fundus and red-free fundus photography (A),
peripapillary RNFL thickness measured by SD-OCT (B),
and automated 30–2 visual field test (C). The presence of
a tigroid fundus and peripapillary atrophy was observed, and
there was a decrease in the RNFL thickness on the
peripapillary RNFL thickness scan. In the visual field test,
the abnormalities were judged to be of no clinical
significance.
Computers in Biology and Medicine
Volume 8, Issue 1, January 1978, Pages 25-40
Glaucoma consultation by computer
Sholom Weiss, Casimir A. Kulikowski, Aran Safir
https://doi.org/10.1016/0010-4825(78)90011-2
Automated detection of glaucoma using structural and
non structural features
SpringerPlus December 2016, 5:1519
Anum A. Salam, Tehmina Khalil, M. Usman Akram, Amina Jameel, Imran Basit
First Online: 09 September 2016
https://doi.org/10.1186/s40064-016-3175-4

Tensor Networks Inspiration from quantum networks #1
Supervised Learning with Quantum-
Inspired Tensor Networks
E. Miles Stoudenmire, David J. Schwab last revised 18 May 2017
Deep Learning and Quantum Entanglement:
Fundamental Connections with Implications to
Network Design
Yoav Levine, David Yakira, Nadav Cohen, Amnon Shashua last revised 10 Apr 2017
Neural networks for computing best
rank-one approximations of tensors
and its applications
Maolin Che, Andrzej Cichocki, Yimin Wei. 22 May 2017
https://doi.org/10.1016/j.neucom.2017.04.058
This paper presents the neural dynamical network
to compute a best rank-one approximation of a
real-valued tensor. We implement the neural
network model by the ordinary differential
equations (ODE), which is a class of continuous-
time recurrent neural network. Finally, we
generalize the proposed neural networks to the
computation of the restricted singular values and
the associated restricted singular vectors of real-
valued tensors. We illustrate and validate
theoretical results via numerical simulations.
Keywords: Neural network, Ordinary differential equations, Lyapunov function, Lyapunov stability theory, Rank-one tensor, Best rank-one
approximation, Z-eigenpair, Symmetric-definite tensor pair, H-eigenpair, The local maximal generalized eigenpair, The local minimal
generalized eigenpair, Generalized tensor eigenpair, Local optimal rank-one approximation, Restricted singular value, Restricted singular
vector
We theoretically analyze convolutional arithmetic circuit (ConvACs), and empirically validate
our findings on more common ConvNets which involve ReLU activations and max pooling.
Beyond the results described above, the description of a deep convolutional network in well-
defined graph-theoretic tools and the formal connection to quantum entanglement, are two
interdisciplinary bridges that are brought forth by this work.
Neural-network representation of the
many-body ground states.
convolutional neural networks, can constitute the
basis of more advanced NQS and therefore have
the potential for increasing their expressive
power.

Tensor Networks Inspiration from quantum networks #2
Low-Rank Tensor Networks for Dimensionality
Reduction and Large-Scale Optimization
Problems: Perspectives and Challenges PART 1
A. Cichocki, N. Lee, I.V. Oseledets, A.-H. Phan, Q. Zhao, D. Mandic last revised 19 Jul
2017 (this version, v2)
Tensor Networks for Dimensionality Reduction and
Large-scale Optimization: Part 2 Applications and
Future Perspectives
A. Cichocki, N. Lee, I.V. Oseledets, A.-H. Phan, Q. Zhao, D. Mandic Foundations and
Trends® in Machine Learning (2017): Vol. 9: No. 6, pp 431-673.
http://dx.doi.org/10.1561/2200000067
“Tensor decompositions and tensor network algorithms require sophisticated software libraries, which are being rapidly
developed. The TT Toolbox, developed by Oseledets and coworkers, (http://github.com/oseledets/TT-Toolbox) for
MATLAB and (http://github.com/oseledets/ttpy) for PYTHON is currently the most complete software for the TT
(MPS/MPO) and QTT networks. The TT toolbox supports advanced applications, which rely on solving sets of linear
equations (including the AMEn algorithm), symmetric eigenvalue decomposition (EVD), and inverse/psudoinverse of
huge matrices.”
Keywords: Tensor networks, Function-related tensors, CP decomposition, Tucker models, tensor train (TT) decompositions, matrix product states (MPS), matrix product operators (MPO), basic tensor operations, multiway component analysis, multilinear blind source
separation, tensor completion, linear/multilinear dimensionality reduction, large-scale optimization problems, symmetric eigenvalue decomposition (EVD), PCA/SVD, huge systems of linear equations, pseudo-inverse of very large matrices, Lasso and Canonical
Correlation Analysis (CCA)

Tensor Networks in Healthcare
SCH: INT: Collaborative Research: High-throughput Phenotyping on Electronic Health
Records using Multi-Tensor Factorization
Jimeng Sun, Bradley Malin, Joshua Denny, Joydeep Ghosh, Abel Kho
Funding Source: NSF Smart Connect Health Integrated Grant: Award Number 1418511
http://www.sunlab.org/research/phenotyping/
Techniques
Task 1: Phenotyping Generation: How to turn EHR data into meaningful clinical concepts (Phenotypes)?
Task 2: Phenotyping Refinement: How to incoporate feedback to ensure the generated phenotypes clinically meaningful?
Task 3: Phenotyping Adaptation: How to port phenotypes from one institution to another?
Applications
App 1: Cohort Construction: Validate the generated phenotypes recover some existing phenotypes (from PheKB)
App 2: GWAS: Develop genomic-wide association studies using the generated phenotypes (as target or control variables)
App 3:Predictive modeling: Use generated phenotypes as features to faciliate predictive modeling https://arxiv.org/abs/1704.03141

Tensor Networks in Industry
Animashree Anandkumar
Associate Professor (with tenure)
University of California Irvine
I am a faculty at CS department within ICS at University of California Irvine since December 2016. Before that I was
a faculty at EECS department at UCIrvine since August 2010. I am a member of the center for pervasive
communications and computing (CPCC).
I am currently a principal scientist at Amazon Web Services (AWS) and on leave from UCI.
My research focus is in the high-dimensional learning of probabilistic graphical models and latent variable models.
Broadly I am interested in machine learning, high-dimensional statistics, tensor methods, statistical physics,
information theory and signal processing.
https://youtu.be/gEFaLKzrKYc?t=6m52s
https://youtu.be/KmvZu9qJNzg?t=7m15s
https://youtu.be/B4YvhcGaafw?t=5m40s
https://www.oreilly.com/ideas/lets-build-open-source-tensor-
libraries-for-data-science

“Model Refinement” Techniques

UNCERTAINTY ANALYSIS
’Layperson’background
development at internet giants like Google and Facebook.
https://www.wired.com/2016/12/uber-buys-mysterious-startup-make-ai-company/

UNCERTAINTY ANALYSIS
In practicefor retinalimaging
https://doi.org/10.1101/084210
Here we propose to estimate the uncertainty of DNNs in medical
diagnosis based on a recent theoretical insight on the link between
dropout networks and approximate Bayesian inference. Using the example
of detecting diabetic retinopathy (DR) from fundus photographs, we
show that uncertainty informed decision referral improves diagnostic
performance. Experiments across different networks, tasks and datasets
showed robust generalization.
Depending on network capacity and task/dataset difficulty, we surpass
85% sensitivity and 80% specificity as recommended by the NHS when
referring 0%-20% of the most uncertain decisions for further inspection.
We analyse causes of uncertainty by relating intuitions from 2D
visualizations to the high-dimensional image space, showing that it is in
particular the difficult decisions that the networks consider uncertain.
bioRxiv preprint first posted online
Oct. 28, 2016

Visualizing disease Clinicians want answers
Mitigating the resistance from clinical community, put effort in explaining the diagnosis
Roth et al. (2015)
Ribeiro et al. (2016)Baskaran et al. (2012)
ClinicalHeuristic Glaucomadecision tree
"Cliniciansneedthe data-drivenmodelpredictionstoalignwiththeirdomainknowledge"
Dr. Jenna Wiens @ NIPS 2016, “NIPS 2016 Workshop on Machine Learning for Health”
http://www.nipsml4hc.ws/jenna-wiens
Essentially the causal decision tree becomes now “hard-
to-interpret” deep learning model. How to communicate
this paradigm shift to clinicians?

Visualization state-of-the-art techniques in General
DOI: 10.1111/cgf.13210
An example of modeling
with visual analytics.
BaobabView
[Van den Elzen and van Wijk (2011)]
uses a
tree-like interactive view to
support a manually
controlled decision tree
construction process
An example of model selection. Squares [Ren et al. (2017)] uses small multiples
composed of grids of different colors and visual textures to display the
distribution of probabilities in classification
© VADER Lab at ASU 2017.
All rights for the techniques and
images belong to their respective
owners.

Visualization high-dimensional visualization #1
Shusen Liu ; Dan Maljovec ; Bei Wang ; Peer-Timo Bremer ; Valerio Pascucci
(2016) https://doi.org/10.1109/TVCG.2016.2640960
Dominik Sacha ; Leishi Zhang ; Michael Sedlmair ; John A. Lee ; Jaakko Peltonen ;
Daniel Weiskopf ; Stephen C. North ; Daniel A. Keim
(2016) https://doi.org/10.1109/TVCG.2016.2598495

Visualization high-dimensional visualization #2
http://dx.doi.org/10.1111/cgf.13237
Dimensionality reduction provides a scalable alternative to create visualizations
(projections) that enable insight into the structure of such datasets. However, applying
dimensionality reduction independently for each dataset in a sequence may introduce
unnecessary variability in the resulting sequence of projections, which makes tracking
the evolution of the data significantly more challenging. We show that this issue
affects t-SNE, a widely used dimensionality reduction technique. In this context, we
propose dynamic t-SNE, an adaptation of t-SNE that introduces a controllable trade-
off between temporal coherence and projection reliability. Our evaluation in two
time-dependent datasets shows that dynamic t-SNE eliminates unnecessary temporal
variability and encourages smooth changes between projections.
https://doi.org/10.2312/eurovisshort.20161164

Visualization ”unboxing” ConvNet black box #1
https://arxiv.org/abs/1311.2901; Cited by 2,133 articles
https://doi.org/10.1109/TVCG.2016.2598838
To enable a more intuitive exploration process, we are open-sourcing the Embedding Projector, a
web application for interactive visualization and analysis of high-dimensional data recently
shown as an A.I. Experiment, as part of TensorFlow. We are also releasing a standalone version
at projector.tensorflow.org, where users can visualize their high-dimensional data without the
need to install and run TensorFlow.

Visualization ”unboxing” ConvNet black box #2
HILDA’17, Chicago, IL, USA
http://dx.doi.org/10.1145/3077257.3077260
“ACTIVIS has been deployed on Facebook’s machine learning platform. We present case studies with
Facebook researchers and engineers, and usage scenarios of how ACTIVIS may work with different models.”
Minsuk Kahng is with Georgia Tech; Pierre Andrews is with Facebook; Aditya Kalro is with Facebook; Duen Horng (Polo) Chau.
DARVIZ: deep abstract representation, visualization,
and verification of deep learning models
ICSE-NIER '17 Proceedings of the 39th International Conference on Software Engineering:
New Ideas and Emerging Results Track. https://doi.org/10.1109/ICSE-NIER.2017.13
ShapeShop: Towards Understanding Deep Learning
Representations via Interactive Experimentation
CHI EA '17 Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors
in Computing Systems https://doi.org/10.1145/3027063.3053103

Visualization ”unboxing” recurrent/Sequence black box #1
Uninterpretable examples. Left: Illustration of an arbitrary set of parameters for
an LSTM trained on the MIT-BIH dataset. Numbers indicate different connections
for the input weight vector (rectangle) and the hidden layer weight matrix (square).
Right: The memory values c for arbitrary units in the LSTM trained on the MIT-
BIH data
LSTM hidden unit outputs compared to wavelet coefficients. The top of each column is the
original sample that was correctly classified using the respective LSTM model. The following
two pairs of rows are the cherry-picked pairs of wavelet coefficients and hidden unit outputs
that are roughly similar. The type of wavelet coefficient and the specific hidden unit are
indicated above each plot. The Daubechies wavelet coefficients are 108 time steps long
(instead of 216) because it makes use of the discrete wavelet transform. The wavelet
coefficients were computed using the PyWavelets package in Python
The sample saliencies for the ECG data using different techniques depicted in each column.
The occlusion width is the number of time steps that are occluded per instance. All the
samples shown have a length of 216 time steps (x-axis) and were correctly classified by the
model. The importance of each input step is shown on a scale of 0 to 1, with 1 being the
most important. The type of ECG signal is indicated on the left with LBBB – left bundle
branch block beat, RBBB – right bundle branch block beat, Paced – paced beat, and V-fib –
ventricular fibrillation.
Class mode
visualizations. The
optimized class
modes for the ECG
data (left) and the
MNIST data (right).
Here the input is
optimized with
respect to each class
in order to find the
most likely input for
each class. The class
for each plot is
indicated on the left
of the image. This
technique did not
yield interpretable
results.

Visualization Medical deep learning models #1
Overall illustration of MDNet. We use a bladder image with its diagnostic report as an example. The
image model generates an image feature to pass to LSTM in the form of a task tuple and a Conv
feature embedding (for the attention model) computed by the AAS module (defined in the method).
LSTM executes prediction tasks according to the specified image feature type
The illustration of class-specific
attention. From top to bottom,
test images, pathologist
annotations, and class attention
maps. Like the pathologist
annotations, the attention maps
are most activated in urothelial
regions, largely ignoring stromal
or background regions. Best
viewed in color.
http://dx.doi.org/10.1016/j.oret.2016.12.009
An occlusion test (Zeiler and Fergus, 2016) was performed to identify the areas
contributing most to the neural network's assigning the category of AMD. A blank 20
× 20-pixel box was systematically moved across every possible position in the image
and the probabilities were recorded. The highest drop in the probability represents
the region of interest that contributed the highest importance to the deep learning
algorithm.

Visualization Medical deep learning models #2
Inspired by Zhou et al. (2016), we present in
this section the idea of generating the
Regression Activation Maps (RAM) of an
input image to localize the discriminative region
towards the regression outcomes. It is known
that the convolutional units of each layers of
CNN act as visual concept detectors to identify
low-level concepts like textures or materials, to
high-level concepts like objects or scenes.
Deeper into the network, the units become
increasingly discriminative. However, the fully-
connected layers will make it difficult to
identify the importance of different units for
identifying the output labels (regression values,
in our networks). Instead, using global
averaging pooling (GAP) and the linear
output unit, we can directly visualize the region
of interest (ROI) that are most discriminative
for a given regression value. As we use
regression for the purpose of classification,
each single RAM obtained for each single
image explicitly depict the ROI on different
clinical level.
In this work, we provided a deep learning model that
includes regression activation maps layer (RAM). The
RAM layer can provide the robust interpretability of the
proposed detection model by monitoring the
pathogenesis so that the proposed model can be taken
as an assistant for clinicians

Interpretability to EHR Mining and decision making #1
https://youtu.be/co3lTOSgFlA
The source code of RETAIN is publicly available at
https://github.com/mp2893/retain Model Interpretation for Heart Failure Prediction We demonstrate the interpretability of RETAIN by
studying its behavior in the HF prediction task. We choose a HF patient from the test set and calculate the contribution of
the variables (medical codes in this case) for making the binary prediction. Figure 3a is the visualization of the
contributions of the variables in each visit. The patient suffered from skin problems, skin disorder (SD), benign neoplasm
(BN), excision of skin lesion (ESL), for some time before showing symptoms of HF, cardiac dysrhythmia (CD), heart valve
disease (HVD) and coronary atherosclerosis (CA), then being diagnosed with HF at the end. We can see that skin-related
codes from the earlier visits made little contribution to HF prediction as expected. RETAIN properly puts much attention
to the HF-related codes that occurred in recent visits.

Interpretability to EHR Mining and decision making #1
GRAM: Graph-based Attention Model for Healthcare
Representation Learning
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, Jimeng Sun‘
last revised 1 Apr 2017 (this version, v3)
“Deep learning methods exhibit promising performance for predictive modeling in healthcare, but
two important challenges remain: - Data insufficiency: Often in healthcare predictive modeling,
the sample size is insufficient for deep learning methods to achieve satisfactory results.
-Interpretation: The representations learned by deep learning methods should align with medical
knowledge. To address these challenges, we propose a GRaph-based Attention Model, GRAM that
supplements electronic health records (EHR) with hierarchical information inherent to medical
ontologies.”
https://jkulas12.github.io/GRAM_Visualization/ :

Dataset Size How much samples?
The more the better, but there are obvious problems with obtaining huge medical datasets
(A) The number of misclassified images on each body part class and (B) of total
misclassified ones on whole body in increasing number of training data sets.
Classification accuracy results according to increasing size of training data sets
There is rule-of-thumb (#1)stating that one should have
10x the number of samples as parameters in the
network (for more formal approach, see VC dimension),
and for example the ResNet (He et al. 2015) in the
ILSVRC2015 challenge had around 1.7M parameters,
thus requiring 17M images with this rule-of-thumb.
https://www.researchgate.net/post/What_is_the_minimum_sample_size_required_
to_train_a_Deep_Learning_model-CNN

Dataset Size How much samples?
More is better always if you train with higher capacity models
Since 2012, there have been significant advances in
representation capabilities of the models and
computational capabilities of GPUs. But the size of the
biggest dataset has surprisingly remained constant. What
will happen if we increase the dataset size by 10× or
100×?
Our experiments yield some surprising (and some
expected) findings:
Better Representation Learning Helps! Our first observation is that large-scale
data helps in representation learning as evidenced by improvement in performance
on each and every vision task we study. This suggests that collection of a larger-
scale dataset to study pretraining may greatly benefit the field. Our findings also
suggest a bright future for unsupervised or self-supervised [10, 42]
representation learning approaches. It seems the scale of data can overpower
noise in the label space.
Performance increases linearly with orders of magnitude of training data!
Perhaps the most surprising element of our finding is the relationship between
performance on vision tasks and the amount of training data (log-scale) used for
representation learning. We find that this relationship is still linear! Even with
300M training images, we do not observe any plateauing effect for the tasks
studied.
Capacity is Crucial: We also observe that to fully exploit 300M images, one needs
higher capacity models. For example, in case of ResNet-50 the gain on COCO
object detection is much smaller (1.87%) compared to (3%) when using ResNet-
152.
Training with Long-tail: Our data has quite a long tail and yet the representation
learning seems to work. This long-tail does not seem to adversely affect the
stochastic training of ConvNets (training still converges).
New state of the art results: Finally, our paper presents new state-of-the-art
results on several benchmarks using the models learned from JFT-300M. For
example, a single model (without any bells and whistles) can now achieve 37.4 AP
as compared to 34.3 AP on the COCO detection benchmark.

Dataset Size data augmentation #1
s
Images from:
ftp://ftp.dca.fee.unicamp.br/pub/docs/vonzuben/ia353_1s15/topico10_IA353_1s2015.pdf |
Wu et al. (2015)
Synthetically increase the number of training sample by distorting them in way expected from the dataset (random
xy-shifts, left-right flips, add gaussian noise, blur, etc.) → This have shown to reduce overfitting.
As noted in the previous slides on image quality, it is useful to train the model with various image quality levels
Köhler et al. (2013)
The most successful convolutional architectures are developed starting from ImageNet, a large
scale collection of images of object categories downloaded from the Web. This kind of images is
very different from the situated and embodied visual experience of robots deployed in
unconstrained settings. To reduce the gap between these two visual experiences, this paper
proposes a simple yet effective data augmentation layer that zooms on the object of interest
and simulates the object detection outcome of a robot vision system. The layer, that can be used
with any convolutional deep architecture, brings to an increase in object recognition performance
of up to 7%, in experiments performed over three different benchmark databases.

Dataset Size data augmentation #2
Apply domain-specific perturbations
Dataset Augmentation in Feature Space
Terrance DeVries, Graham W. Taylor(Submitted on 17 Feb 2017)
Dreaming More Data: Class-dependent Distributions over
Diffeomorphisms for Learned Data Augmentation
Søren Hauberg, Oren Freifeld, Anders Boesen Lindbo Larsen, John Fisher, Lars
Hansen ; Proceedings of the 19th International Conference on Artificial Intelligence and Statistics,
PMLR 51:342-350, 2016.
http://proceedings.mlr.press/v51/hauberg16.html
Our approach is, however, not limited to MNIST:
●
Image alignment and registration is a routine task in many medical imaging tasks,
such as the analysis of MRI.
●
We make similar observations for time-series data such as acoustic signals. Here
dynamic time warping (DTW) is often used as preprocessing to remove
differences in the temporal speed of individual signals.
●
Mesh alignment is also standard pre-processing step in the analysis of three-
dimensional meshes. As deep models are beginning to appear for three-
dimensional data it would be interesting to combine them with learned
augmentation schemes.
https://doi.org/10.1016/j.neucom.2016.12.025
In this paper, we propose five data augmentation methods dedicated to face images,
including landmark perturbation and four synthesis methods (hairstyles, glasses, poses,
illuminations). The proposed methods effectively enlarge the training dataset, which
alleviates the impacts of misalignment, pose variance, illumination changes and partial
occlusions, as well as the overfitting during training

Dataset Size Generative synthetic data
Augmentation through generative adversarial models (GAN)
the CVPR 2017 awards are out! The two winners are
Densely Connected Convolutional Networks by Facebook and
Improving the Realism of Synthetic Images
https://machinelearning.apple.com/2017/07/07/GAN.html
https://github.com/wayaai/SimGAN
https://github.com/val-iisc/deligan
TextureGAN: Controlling Deep Image Synthesis with
Texture Patches
Wenqi Xian, Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
(Submitted on 9 Jun 2017)

Dataset Size semi-supervised training #1
Jointly use labeled and unlabeled data
Our empirical results show that using the tangents of the data manifold (as estimated by the
generator of the GAN) to inject invariances in the classifier improves the performance on semi-
supevised learning tasks.
N. Siddharth, Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Frank Wood,
Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr
Here we are interested in learning disentangled representations that encode
distinct aspects of the data into separate variables. We propose to learn such
representations using model architectures that generalize from standard
Variational autoencoders (VAEs) employing a general graphical model structure
in the encoder and decoder. This allows us to train partially-specified models
that make relatively strong assumptions about a subset of interpretable
variables and rely on the flexibility of neural networks to learn representations
for the remaining variables. We further define a general objective for semi-
supervised learning in this model class, which can be approximated using an
importance sampling procedure.

Dataset Size semi-supervised training #2
In this work, we present a semi-supervised learning framework that
uses generated data to boost task performance. Under this
framework, we characterize the properties of various generators
and theoretically prove that a complementary (i.e. bad) generator
improves generalization. Empirically our proposed method improves
the performance of image classification on several benchmark
datasets.
Our proposed method, adversarial dropout, can be viewed from the
dropout and from the adversarial training perspectives. Our
proposed adversarial dropout can be interpreted as dropout masks
whose direction is counter-optimized, adversarially, to the model’s
label assignment. However, it should be noted that adversarial
dropout and traditional adversarial training with additive
perturbation are different because adversarial dropout induces the
sparse structure of neural network while the other do not make
changes on the neural network directly.

Dataset Size Active learning and “smart” labeling #1
When labelingis very time-consuming, activelearning can help us in choosing which unlabeled samples to label
Active Learning and Proofreading
for Delineation of Curvilinear
Structures
Mosinska, Agata Justyna; Tarnawski, Jakub; Fua, Pascal
Presented at: MICCAI, Quebec City, Canada, September 10-14, 2017
https://infoscience.epfl.ch/record/229472

Dataset Size Transfer learning
Leveraging features learned from bigger non-medical datasets
Our approach fine-tunes a pre-trained convolutional neural network (CNN),
GoogLeNet. The fine-tuned CNN could effectively identify pathologies in
comparison to classical learning. Our algorithm aims to demonstrate that
models trained on non-medical images can be fine-tuned for classifying OCT
images with limited training data.
Biomedical Optics Express Vol. 8, Issue 2, pp. 579-592 (2017)
https://doi.org/10.1364/BOE.8.000579
International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis
International Workshop on Deep Learning in Medical Image Analysis
LABELS 2016, DLMIA 2016: Deep Learning and Data Labeling for Medical Applications pp 188-196
Understanding the Mechanisms of Deep
Transfer Learning for Medical Images
https://doi.org/10.1007/978-3-319-46976-8_20
Hariharan Ravishankar, Prasad Sudhakar, Rahul Venkataramani, Sheshadri Thiruvenkadam, Pavan Annangi, Narayanan
Babu, Vivek Vaidya
Deep Learning and Convolutional Neural Networks for Medical Image Computing Pp 181-193
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)
On the Necessity of Fine-Tuned Convolutional Neural Networks for
Medical Imaging
https://doi.org/10.1007/978-3-319-42999-1_11
Nima Tajbakhsh, Jae Y. Shin, Suryakanth R. Gurudu, R. Todd Hurst, Christopher B. Kendall,
Michael B. Gotway, Jianming Liang
In this paper, we studied the necessity of fine-tuning and the effective level
of knowledge transfer to 4 medical imaging applications. Our experiments
demonstrated medical imaging applications were conducive to transfer
learning and that fine-tuned CNNs were necessary to achieve high
performance particularly with limited training datasets. We also showed that
the desired level of fine-tuning differed from one application to another.
While deeper levels of fine-tuning were suitable for polyp and PE detection,
intermediate fine-tuning worked the best for interface segmentation and
colonoscopy frame classification. Our findings led us to conclude that layer-
wise fine-tuning is a practical way to reach the best performance based on
the amount of available data.

Dataset Quality Beyond
A giant with feet of clay: on the validity of the
data that feed machine learning in medicine
Federico Cabitza, Davide Ciucci, Raffaele Rasoini last revised 26 Jun 2017
We point out how uncertainty is so ingrained in medicine that it
biases also the representation of clinical phenomena, that is the
very input of ML models, thus undermining the clinical
significance of their output. Recognizing this can motivate both
medical doctors, in taking more responsibility in the
development and use of these decision aids, and the
researchers, in pursuing different ways to assess the value of
these systems. In so doing, both designers and users could take
this intrinsic characteristic of medicine more seriously and
consider alternative approaches that do not "sweep uncertainty
under the rug" within an objectivist fiction, which everyone can
come up by believing as true.
5 Garbage in, Gospel out
The question of the quality of medical record and of the data
extracted from there is still understudied [
Cabitza and Batini, 2016; Stetson et al. 2012], let alone in
regard to machine learning projects [Feldman et al. 2017]. The
assumption that medical data could support secondary uses
has been challenged since almost 25 years ago, and also
strongly so, e.g., by Reiser 1991, who described several cases
of erroneous, missing and ambiguous data, and by
Burnum (1989), who provocatively wrote that “all medical
record information should be regarded as suspect; much of it is
fiction” (p. 484)”
JAMA. Published online July 20, 2017. doi: 10.1001/jama.2017.7797
https://doi.org/10.1177/0272989X12465490
Conclusions: Our exploratory analysis method reveals
unexpected effects. It indicates that, despite the original
study detecting no significant average effect, computer-
aided detection (CAD) helped the less discriminating
readers but hindered the more discriminating readers.
Such differential effects, although subtle, may be clinically
significant and important for improving both computer
algorithms and protocols for their use. They should be
assessed when evaluating CAD and similar warning
systems.

Deep Learning for Retinal Image Analysis

Deep Learning for Retinal Image Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Learning for Retinal Image Analysis

Similar to Deep Learning for Retinal Image Analysis (20)

More from PetteriTeikariPhD

More from PetteriTeikariPhD (20)

Recently uploaded

Recently uploaded (20)

Deep Learning for Retinal Image Analysis