Outlining the common challenges encountered when structuring clinical and research datasets for deep learning training.
Typically the datasets are so unstructured that they are impossible to analyze by any deep learning practitioners. And the cleaning and data wrangling ends up taking most of the time which could have been planned properly even before the clinical data acquisition.
One could argue that especially for medical data, the annotated data is the new gold, and not just the Big Data scattered all over the place. This is practice translates to efforts to design as intelligent as possible data labelling pipelines for efficient use of expert clinician annotation work.
Alternative download link:
https://www.dropbox.com/s/bbgc21yc86h0t14/Efficient_Ocular_Data_Labelling.pdf?dl=0
Gen AI in Business - Global Trends Report 2024.pdf
Efficient Data Labelling for Ocular Imaging
1. Efficient Data Labelling
for Ocular Imaging
Preprocessing, preparing,
wrangling, you name it.
Training is the easy part,
in the end
Petteri Teikari, PhD
Singapore Eye Research Institute (SERI)
Visual Neurosciences group
http://petteri-teikari.com/
Version “Sun 19 August 2018“
2. Motivation
The bigger problem, however, is that the cultureof
science in academia right nowputs way toomuch
emphasis on fashiness oversustainability and
admittedly non-sexy tasks like properly versioning and
packagingscientifc software, documentinganalyses,
and producing well-characterized datasets.
neuromantik8086 @ https://news.ycombinator.com/item?id=17744150
3. Motivation
Artifcialintelligenceinretina
Ursula Schmidt-Erfurth, Amir Sadeghipour, BiancaS. Gerendas,SebastianM.Waldstein, HrvojeBogunović
ChristianDopplerLaboratoryforOphthalmicImageAnalysis(OPTIMA),Vienna,Austria.
ProgressinRetinalandEyeResearch(1August2018)
https://doi.org/10.1016/j.preteyeres.2018.07.004
Researchdataisrelevantinsightwhichshouldsystematicallybesharedwith
theentireacademiccommunityinastructuredwaypertainingparticularlyto
publiclyfundedresearchtoincreasetheavailableknowledge(Hahnel,2015).
Most ofthe ophthalmic deep learningpapers talkabout
the outcomes ofalready curateddatasets with little
explanation ofhow much efort does it require for
institutions andresearchers to get there.
4. Modeltraining onlysmallpartof thepipeline
NiftyNet: adeep-learningplatformformedical
imaging https://doi.org/10.1016/j.cmpb.2018.01.025
Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in
the middle. The required surrounding infrastructure is vast and complex. Google at NIPS (2016):
“Hidden TechnicalDebtin MachineLearningSystems”
Big Data workfow for
biomedical image processing.
Only classifcation step will be
designed with Hadoop/Spark
framework.
Tchagna Kouanou A, Tchiotsop D, Kengne
R et al. (2018)
https://doi.org/10.1016/j.imu.2018.05.001
8. Impact onclinical(research)workfow
Many ways to structure
the data, and thebest
option depends on your
institution / team.
There aresome
practicalissues that are
more or less
generalizable
Doyou want“write”on
image the measurements,
instead of writing to
metadata?
Annotate the image quality
of theOCT
9. Annotation inpractice
Very time-consuming the annotation work, and expensive if it requires
expert clinician time, thus make the annotation software as
intelligent aspossibletouse.
There aretonsof closed/open-sourcetools butthey mightnot
workfor yourapplicationof-the-shelf.
https://en.wikipedia.org/wiki/List_of_manual_image_annotation_tools
As you can see from the following slides, the annotation pipelines
are itself a focus of active research, and could ofer better “return of
investment”foryour clinicalworkfowsthanmodelrefnement perse.
10. Annotation inpractice#2
Deploy the annotation as web-based front-
end allowing license-free annotation from
everywhere (smartphone, tablets, laptops and
desktops).
You can then run the back-end on local
computeratyourlab(ordeploy thecloud)
13. ScientifcDataFiles HDF typically as the baseline solution
Scientifc datacuration and processingwith ApacheTika
Chris Mattmann ChiefArchitect, Instrumentand Science DataSystems, NASA JPL;
Adjunct Associate Professor, USC; Director, ApacheSoftwareFoundation
+ UnifyingBiologicalImageFormatswithHDF
Optimization of PACS Data Persistency Using IndexedHierarchicalData
Ourapproachmakesuse of the HDF5 hierarchicaldatastorage standard for
scientifc data and overcomeslimitationsofhierarchical databasesemployinginverted
indexingforsecondarykey management andfor efcient and fexible accessto data
throughsecondarykeys. Thisapproachwasimplemented and testedusing real-world
data against a traditional solution employinga relational database, in variousstore,
search, and retrieval experimentsperformed repeatedly with diferentsizesof
DICOM datasets.
UnifyingBiologicalImageFormatswithHDF5
CommunACM2009Oct1;52(10):42–47.doi: 10.1145/1562764.1562781
PMCID: PMC3016045NIHMSID: NIHMS248614 PMID: 21218176
hdf-forum-OptimisingHDF5datastructure
”http://cassandra.apache.org/ FrommyexperienceHDF5 isalmost asfast asdirect disk
read,andeven *faster*whenusingfastcompression ...”
AskHackerNews:WhatDBtouseforhugetimeseries? by BWStearns on Sept25, 2014
NotadatabasebutHDF5(http://www.hdfgroup.org)isusedfor storingallsortsofscientifcdata,
hasbeenaroundfor awhileandisverystable.PyTablesisbuiltontopofitandtherearelotsof
other languagesthatcanhaveexistinglibrariestoread/writeHDF5(matlab,python,c,c++,R,
java,...)
14. DataFiles intoDatabases for somenumbercrunching
8NoSQLDatabases
Compared
By JanuVerma -June17,2015
https://hub.packtpub.com/8-nosql-databases-compared/
I am a bioinformatician working on data science applications to
genomic data. Recently we were discussing the possibility of changing
our data storage from HDF5 fles to some NoSQL system. HDF5
fles are great for the storage and retrieval purposes. But now with huge
data coming in we need to scale up, and also the hierarchical schema
ofHDF5 flesisnot verywell suitedfor all sortsofdata we are using.
MongoDB If you need to associate a more complex structure, such as
a document to a key, then MongoDBisa goodoption.
Redis If you need more structures like lists, sets, ordered sets and
hashes, then Redis is the best bet. It’s very fast and provides useful
data-structures.
Cassandra Each key has values as columns and columns are
grouped together into sets called column families. Can handle large
amounts of data across many servers (clusters), is fault-tolerant and
robust.
Apache Cassandra™ is a leading NoSQL database platform for modern
applications. https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks
Cassandra has become a proven choice for both technical and business
stakeholders. When compared to other database platforms such as HBase,
MongoDB, Redis, MySQL and many others, the linearly scalable database
ApacheCassandra™delivershigherperformanceunderheavyworkloads.
Netfix decided to run a test designed to validate their tooling and
automation scalability as well as the performance characteristics of
Cassandra. For a more thorough write up of the Netfix testing process
including confguration settings and commentary, visit their tech blog post
titled BenchmarkingCassandraScalabilityonAWS – Over a million writes
persecond.
15. Working withpatientdata under strict rulesandregulations
DeepMind https://deepmind.com/applied/deepmind-health/data-security/
Enabling rigorous audit of how data is used. All data use is logged, and can be reviewed by our partners,
regulators, andour IndependentReviewers. We’re also working on an unprecedented, even stronger set
of audit tools, calledVerifableDataAudit. The ledger and the entries within it will share some of the
properties of blockchain, which is the idea behind Bitcoin and other projects. This will give our partner
hospitals an additional real-time and fully proven mechanism to check how we’re processing data,
showingwhenapieceofdatahasbeenusedandforwhatpurpose.
AnalysingthesuitabilityofstoringMedicalImagesin
NoSQLDatabases
RevinaRebecca,I.ElizabethShanthi
InternationalJournalof Scientifc&Engineering Research,Volume7,Issue8,August-2016
Index Terms—MongoDB,Cassandra,Chunked Storage,Cloud Computing, Medical
images,NoSQLdatabases.
The IT industry is going through a paradigm shift, where the
entire scenario of storing and retrieval of information is moving
towards NoSQL Databases. The medical industry is no
exception, where the health care records mainly the medical
images need a better data model for storage and retrieval. The
need of the hour is to fnd a better suiting NoSQL Database.
This paper aims in studying the diferent NoSQL databases in
thelightofmedicalimages.
The result indicates that the time complexity of Cassandra is
less when compared with MongoDB for smaller fles. But as
the fle size increases time complexity of MongoDB remains
constant comparatively, so for larger fle MongoDB seems
to be better candidate. In Cassandra the time increases
proportionally with size of the fle. We propose both MongoDB
andCassandramaybesuitabletostoreLargeMedicalimages.
But MongoDB will be a better candidate
(SainiandKohli,2018;Teng etal.2018;Yangetal.2018)
17. Challengeboth inresearchandinclinical practice
Deloitte:Interoperabilityhindered
bylackofincentivesinprivate
sector
ByLeontinaPostelnicu. August 10, 2018
https://www.healthcareitnews.com/news/deloitte-inter
operability-hindered-lack-incentives-private-sector
Major advancements in technology have led to a
signifcant increase in the number of innovative
healthcare solutions that generate, collect, and analyze
data, but anew report fromtheDeloitte Center for
Health Solutions warns that the lack of interoperability
continuesto be astumbling block in the developmentof
adigitallyenabledecosystem.
But experts noted that interoperability remains
“arguably the biggest challenge for medtech”
due to the security challenges usually associated with
information sharing and a “lack of incentives” to drive
interoperabilityefortsintheprivatesector.
To realize the full value and potential of the Internet of
Medical Things (IoMT), Deloitte also warned
that the industry will need to develop new funding,
business and operating models, build public trust to
ensure that the growing amount of data generated is
“protected and responsibly used,” and tackle increasing
cyberthreats:
Whatisthe Internetof Medical Things?
From pregnancy testing kits to surgical instruments, artifcial joints and MRI scanners, the
medical technology (medtech) industry designs and manufactures a wide range of products.
Technologyisallowing these devicestogenerate,collect,analyse andtransmitdata,creating the
InternetofMedicalThings(IoMT)–aconnectedinfrastructureofhealthsystemsandservices.
18. HL7FHIR
MakingHealthcareDataWorkBetter with
MachineLearning
Friday, March 2, 2018 Posted byPatrik Sundberg, SoftwareEngineer and Eyal Oren,
Product Manager, GoogleBrainTeam
The FastHealthcareInteroperabilityResources (FHIR) standard
addresses most of these challenges: it has a solid yet extensible
data-model, is built on established Web standards, and is rapidly
becoming the de-factostandard for both individual records and
bulk-dataaccess.Butto enable large-scalemachinelearning,we
needed a few additions: implementations in various
programming languages, an efcient way to serialize large
amounts of data to disk, and a representation that allows
analysesoflargedatasets.
Today, we are happy to opensource a protocolbufer
implementation of the FHIR standard, which addresses these
issues. The current version supports Java, and support for C++,
Go, and Python will follow soon. Support for profles will follow
shortlyaswell,plustoolstohelpconvertlegacydataintoFHIR.
https://www.siliconrepublic.com/enterprise
/google-deepmind-nhs-report
http://healthstandards.com/blog/2013/03/26/hl7-fhir/
https://doi.org/10.1007/s10278-018-0090-y withApachestack
Scalableandaccuratedeeplearningwithelectronichealthrecords
byAlvinRajkomar and EyalOren (2018) Citedby22
GoogleInc,MountainView,CA
19. OtherAPIs Indian Roadmap
Our proposed model calls for a federatedarchitecture that acknowledges current and
future health information fow; for example, between providers and patients, wearables and
EHRs,consumersandpharmacies,physiciansandlaboratories, orinstitutionsandpayers.
An API–enabled federated health data architecture would function on
blockchainprinciples as an “open, distributed ledger that can record transactions
betweentwo partiesefcientlyandin averifableandpermanentway”.
Consider a personalhealth record(PHR) that could query all nodes in the network to
receive periodic updates—from wearables, diary entries, pharmacists, doctors, hospitals,
diagnostic labs, imaging facilities, and payers. It is possible to map out various
permissible pathways through which the data can travel automatically while there may
beothersthrough which itcannotpasswithoutthepatient’sconsent.
An authorized physician—even a virtual “teledoc”—would be able to call for her
patient’s entire record, either through pre-authorization, real-time authentication, or waivers
incaseof emergencies
Third-party applications that are built of the patient’s PHR, for example, alerting the
patient to vaccine requirement before travel, or triggering reminders based on her
medicationlist,wouldneedthepatient’spermissionto accessdatafromherPHR.
21. “NewData” pervasivedataacquisition
AReviewof PrivacyandConsentManagement
inHealthcare:AFocusonEmergingData
Sources
Muhammad Rizwan Asghar et al.
1Nov 2017https://arxiv.org/abs/1711.00546
The emergence of New Data Sources
(NDS) in healthcare is revolutionising
traditional electronic health records in termsof
data availability, storage, and access.
Increasingly, clinicians are using NDS
to build a virtual holistic image of a
patient's health condition. This research
is focused on a review and analysis of the
current legislation and privacy rules available
for healthcareprofessionals.
NDS in this project refers to and includes
patient-generated health data, consumer
device data, wearable health and ftness
data,anddatafromsocialmedia.
AWeb2.0Model forPatient-CenteredHealth InformaticsApplications
MWeitzel, ASmith,SdeDeugd,RYates -Computer,2010 Citedby33
“Weitzel et al. analysed the feasibility of using REST, OAuth, and OpenSocial to integrate social data
into the EHR in the form of a gadget. The author considers the possibility of integrating a team-based
environment as a social network defned using OpenSocial. A defned protocol using REST could use the
gadget/dashboardtoevaluatethepatientcondition andnexttreatment.”
Integratingpatientconsentine-healthaccesscontrol
KWuyts,RScandariato,GVerhenneman -2011 Citedby10
“Wuyts et al. proposed a system that uses XACML (eXtensible Access Control Markup Language), which is
based on XML, to defne access control. The XACML authorisation service is then integrated into a Cross
Enterprise Document Sharing (XDS) reference architecture. XACML enables fnegrained access control
using policiesthatareadefnedset of rulesrestrictingaccess”
Flexible and dynamicconsent-capturing
MRAsghar, GRussello -OpenProblemsinNetworkSecurity,2012 Citedby6
“Asghar and Russello [24] reported a consent policy where a patient is able to communicate their
permission to grant access to their EHR. They propose a consent evaluator that evaluates consent
policies and returns the consent response to the requester. They defne two types of policies: open and
complex. Open policies are more general and divided into two types, blacklist and whitelist. Complex policies
specifyconditionalexpressionsthatmust besatisfed.Inthissystem, boththedatasubjectan
22. HL7FHIR Open,notalwaysopen
While the underlying FHIR (aka, fast healthcare interoperability
resource) deployed by DeepMind for Streams uses an open
API, the contract between the company and the Royal Free
Trust funnels connections via DeepMind’s own servers, and
prohibitsconnectionstootherFHIRservers.
Though they point to DeepMind’s “stated commitment to
interoperability of systems,” and “their adoption of the FHIR
open API” as positive indications, writing: “This means that
there is potential for many other SMEs to become involved,
creating a diverse and innovative marketplace which works to
thebeneftofconsumers,innovationandtheeconomy.”
DeepMind has suggested it wants to build
healthcareAIsthatarecapableofchargingbyresults. But
Streams does not involve any AI. The service is also being
providedtoNHSTrustsfor free, at least for the frst fve years —
raising the question of how exactly the Google-owned
companyintendstorecoupitsinvestment.
https://techcrunch.com/2018/06/15/uk-report-warns-deepmind-health-c
ould-gain-excessive-monopoly-power/
https://doi.org/10.1007/s12553-017-0179-1 | Citedby30
https://doi.org/10.1007/s12553-018-0228-4
GoogleDeepMindandhealthcareinan ageofalgorithms
23. Whataboutblockchaininhealthcarethen?
Geospatialblockchain:promises,challenges,
andscenariosinhealthandhealthcare
Maged N. Kamel Boulosetal.
International Journal of Health Geographics2018 17:25
https://doi.org/10.1186/s12942-018-0144-x
We expect blockchain
technologiestoget
increasinglypowerful
and robust, asthey
becomecoupled
with artifcial
intelligence(AI) in
variousreal-world
healthcare solutions
involving AI-mediated
data exchange on
blockchains.
Threechallengesrequirecareful
considerationandinnovativesolutions(both
technicalandregulatory)toaddressthem.
1) Interoperability,tohaveblockchains
fromdiferentprovidersandservices
seamlesslytalk toeachother as
appropriate[51].
2) Blockchainsecurity [52].Afterall,the
wholerationaleofusing ablockchainis
toletpeoplewhodidnotpreviously
knowortrustoneanother sharedataina
secure,tamperproofway.Butthe
securityofeventhebest-conceived
blockchaincanfailinsomescenarios
(e.g.,theso-called‘51%attacks’) [52, 53],
callingfor adequatepre-emptive
mechanismstobeputinplaceinorder to
mitigateorpreventblockchainsecurity
breaches.
3) Blockchain’spromiseof
transparency withtheEuropean
Union’snowmuchstricterprivacyrules
under GDPR(GeneralDataProtection
Regulation)thatrequirepersonaldatato
bedeletableondemand[54].
BlockchaininOphthalmology
https://www.youtube.com/watch?v=kb97eJfPjrQ&t=67s
ByRobertChang,MD
ASSISTANTPROFESSOROFOPHTHALMOLOGYATTHE
STANFORDUNIVERSITYMEDICALCENTER.
PresentedinAPTOS2018conferenceat
Academia,SingHealth,SingaporeJuly7-82018
http://2018.asiateleophth.org/
26. Matlab→ Pythondefactolanguage
The pre-NiftyNet implementation used
TensorFlow directly for deep learning and used
custom MATLAB code and third-party MATLAB
libraries for converting data from medical image
formats, pre-/post-processing and evaluating the
inferred segmentations. In addition to Python code
implementing the novel aspects of the work (e.g. a
new memory-efcient dropout implementation and a
new network architecture), additional infrastructure
was developed to load data, separate the data for
cross-validation, sample training and validation data,
resample images for data augmentation, organise
model snapshots, log intermediate losses on training
and validation sets, coordinate each experiment, and
compute inferred segmentations on the test set. The
pre-NiftyNet implementation was not conducive
to distributing the code or the trained network,
and lacked visualizations for monitoring
segmentation performanceduring training.
In contrast, the NiftyNet implementation was entirely
Python-based and required implementations of custom
network, data augmentation and loss functions specifc to
the new architecture, including four conceptual blocks to
improve code readability. The network was trained using
images in their original NIfTI medical image format
and the resulting trained model was publicly deployed in
the NiftyNet model zoo. Furthermore, now that the
DenseVNet architecture is incorporated into NiftyNet, the
network and its conceptual blocks can be used in new
segmentation problems with no code development using
thecommand line interface.
https://doi.org/10.1016/j.cmpb.2018.01.025
https://github.com/NifTK/NiftyNet
28. Fromsomethingsimplelike Flaskfront-end
Label-Images-Flask
AminimalFlask webapplicationfor labellingimagesintoaCSV
datasetPython 3https://github.com/jrios6/Label-Images-Flask
APIforTensorfowmodelin Flask
Seethe blogpost coveringthemain steps.Tobehostedon Heroku
(fles Procfle, requirements.txt and runtime.txt).
https://github.com/guillaumegenthial/api_ner
Deploy yourFirstDeepLearningNeural
NetworkModelusingFlask,Keras,
TensorFlowin Python
Ashok Tankala,http://tanka.la
https://medium.com/coinmonks/deploy-your-frst-deep-learning-n
eural-network-model-using-fask-keras-tensorfow-in-python-f4bb
7309fc49
CreatingREST APIforTensorFlowmodels
VitalyBezgachev
https://becominghuman.ai/creating-restful-api-to-tensorfow-mod
els-c5c57b692c10
Machine Learning model deployment with TensorFlow Serving. The main
advantage of that approach, in my opinion, is a performance (thanks to
gRPC and Protobufs) and direct use of classes generated from Protobufs
insteadofmanualcreation of JSONobjects.
https://www.makeartwithpytho
n.com/blog/poor-mans-deep-le
arning-camera/
We’llwriteawebserverin Pythonto sendimages
fromaRaspberryPi (dumbversionofAmazon's
DeepLens,asmartwebcam)toanothercomputer
forinference,orimagedetection.
29. Prettyvisualizationwith D3.js
Predictingballsandstrikesusing
TensorFlow.js
By Nick Kreeger (@nkreeger)
Inthispostwe’ll beusing TensorFlow.js,
D3.js,andthepowerof thewebtovisualizethe
processoftraining amodeltopredictballs(blue
areas)andstrikes(orangeareas)frombaseball
data.
Healthfgures: anopensourceJavaScriptlibraryfor healthdata
visualizationAndresLedesma,MohammedAl-Musawi,andHannuNieminen
BMCMedInformDecisMak. 2016;16: 38. https://dx.doi.org/10.1186%2Fs12911-016-0275-6
Web-based solutions for
data visualization provide
fexibility, as they can be
accessed by any web browser,
either from mobile devices or
personal computers. The hGraph
uses a web approach via
HyperText Markup Language
(HTML) and Scalable Vector
Graphics (SVG). The
programming language of the
library is JavaScript and is built
using the Data-DrivenDocuments
library. Data-Driven
Documents (D3.js) library
provides free access to the
Document Object Model (DOM),
which is the substrate that
enables the programmer to
interface with the graphical
representations in a web browser
|[Bostocket al.2011; Cited by 1987]
.
30. WhataboutdeployinginR? RShiny
Developmentofaninteractiveopensource
softwareapplication(RadaR)forinfection
management/antimicrobialstewardship
Christian F. Luz et al.
20June2018http://dx.doi.org/10.1101/347534
We describe a software application (RadaR - Rapid
analysis of diagnostic and antimicrobial patterns in R
Shiny) for infection management allowing user-friendly,
intuitive and interactive analysis of large datasets
without prior in-depth statistical or software
knowledge.
Antimicrobial stewardship (AMS) teams are
multidisciplinary and act beyond the borders of single
specialties. They are usually understafed, with limited IT
support. They therefore need user-friendly and time-
saving IT resources, without the need for profound
technicalexpertise oncethesystemissetup.
The entire source code of RadaR is freely accessible on
Github (https://github.com/ceefuz/radar). The
development will continue with more suggestions
and improvements coming from its users and the R
community. Further comments and suggestions for
RadaR can be submitted at
https://github.com/ceefuz/radar/ issues.
“Quickwaytodeployifmost of
yourcode iswritteninR”
31. More“professional”andadvancedfront-endwith .jsframeworks?
ReactJS vs Angular5 vs Vue.js—What to
choose in 2018?
https://medium.com/@TechMagic/reactjs-vs-angular5-vs-vue-js-what-to-
choose-in-2018-b91e028fa91d
Unlessyouarethinkingofmoreserious(spinof,startup)deployment,thesemightbeanoverkill
Total.js+ServerRPlatform
= ArtifcialIntelligence in practice – part I
ClientSide (https://angular.io/, https://facebook.github.io/react/)
-for analystsand
peopletomonitoringtransactionsandalerts,analysis.Itwill
bebasedonReact.jsor Angular 2topresentresultsofdata
fromAIevaluations
36. Self-supervised labelling generatingthelabelsautomatically
Multi-taskself-supervisedvisual
learning
Carl DoerschandAndrew Zisserman (2017)
https://arxiv.org/abs/1708.07860
We investigate methods for combining
multiple self-supervised tasks--i.e.,
supervised tasks where data can be
collected without manual labeling--in
order to trainasinglevisualrepresentation.
First, we provide an apples-to-apples
comparison of four diferentself-supervised
tasks using the very deep ResNet-101
architecture. We then combine tasks to
jointlytrainanetwork.
We also explore lasso regularization to
encourage the network to factorize the
information in its representation, and
methods for "harmonizing" network
inputs in order to learn a more unifed
representation.
Too many self-supervised tasks have been proposed in
recent years for us to evaluate every possible
combination. Hence, we chose representative self-
supervised tasks to reimplement and investigate in
combination. We aimed for tasks that were conceptually
simple, yet also as diverse as possible. Intuitively, a
diverse set of tasks should lead to a diverse set of
features, which will therefore be more likely to span the
space of features needed for general semantic image
understanding.
Relative Position [Doersch et al. 2015]: This task
begins by sampling two patches at random from a single
image and feeding them both to the network without
context. The network’s goal is to predict where one patch
wasrelative tothe other in the original image.
Colorization [Zhanget al.2016]: Given a grayscale
image (the L channel of the Lab color space), the
network must predict the color at every pixel (specifcally,
the ab componentsof Lab)
Exemplar [Dosovitskiyet al.2014]: The original
implementation of this task created pseudo-classes,
where each class was generated by takingapatch from a
single image and augmenting it via translation, rotation,
scaling, and color shifts. The network was trained to
discriminate betweenpseudo-classes.
Motion Segmentation [Pathaket al. 2016]: Given
a single frame of video, this task asks the network to
classify which pixels will move in subsequent frames. The
“ground truth” mask of moving pixels is extracted using
standard densetrackingalgorithms.
https://project.inria.fr/paiss/fles/2018/07/zisse
rman-self-supervised.pdf
37. Weakly-supervised labelling ”lazy annotation”
Exploringthe LimitsofWeaklySupervisedPretraining
DhruvMahajan,RossGirshick,VigneshRamanathan,KaimingHe,Manohar Paluri,YixuanLi,AshwinBharambe,Laurensvander Maaten|Facebook
Submittedon2May2018
https://arxiv.org/abs/1805.00932
State-of-the-artvisualperception
modelsfor awiderangeoftasksrely
onsupervisedpretraining.
ImageNetclassifcationisthede
facto pretrainingtask for these
models.Yet,ImageNetisnownearly
tenyearsoldandisbymodern
standards"small".Evenso,
relativelylittleisknownaboutthe
behaviorofpretrainingwithdatasets
thataremultipleordersofmagnitude
larger.Thereasonsareobvious:
suchdatasetsaredifcultto
collectandannotate.Inthis
paper,wepresentauniquestudyof
transfer learningwithlarge
convolutionalnetworkstrainedto
predicthashtagsonbillionsof
socialmediaimages.Our
experimentsdemonstratethat
trainingforlarge-scalehashtag
predictionleadstoexcellentresults.
With billions of images, is transfer learning
model-capacity bound? indicate that with large-
scale Instagram hashtag training, transfer-learning
performance appears bottlenecked by model
capacity
Our study is part of a larger body of work on training convolutional networks on large,
weakly supervised image datasets. Sun et al. [2017, Google] train convolutional
networks on the JFT-300M dataset of 300 million weakly supervised images. Our
Instagram datasets are an order of magnitude larger than JFT-300M, and collecting them
required muchlessmanual annotation work.
Our resultssuggeststhat,
1) Whilst increasing the size of the pretraining dataset may be worthwhile, it may be at
least as important to select a label space for the source task to match that of
the target task.
2) In line with prior work Sunet al. [2017, Google]; Joulinet al. 2015 we observe that
current network architectures are underftting when trained on billions of images.
Capacity may be increased, for instance, by increasing the number of layers and the
number of flters per layer of existing architectures or by mixtures-of-experts [
Grosset al. 2017]
3) Our results also underline the importance of increasing the visual variety that
we consider in our benchmark tasks. They show that the diferences in the quality of
visual features become much more pronounced if these features are evaluated on
taskswithalarger visual variety
In closing, we refect on the remarkable fact that training for hashtag
prediction, without the need for additional manual annotation or
data cleaning, works at all. We believe our study illustrates the
potential of natural or “wild” data compared to the traditional approach of
manuallydesigningandannotatingdatasets.
38. Weakly-supervised labelling alsoforoutlier detection
Discoveryof rarephenotypesincellularimagesusingweaklysuperviseddeep learning
DhruvMahajan,RossGirshick,VigneshRamanathan,KaimingHe,Manohar Paluri,YixuanLi,AshwinBharambe,Laurensvander Maaten|Facebook
Submittedon2May2018
http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w1/Sailem_Discovery_of_Rare_ICCV_2017_paper.pdf
High-throughput
microscopygeneratesa
massiveamountofimages
thatenablestheidentifcation
ofbiologicalphenotypes
resultingfromthousandsof
diferentgeneticor
pharmacological
perturbations
thevariabilityincellular
responsesoftenresultsin
weakphenotypesthat
onlymanifestina
subpopulationofcells.To
overcometheburdenof
providing object-level
annotationsweproposea
deeplearningapproachthat
candetectthepresenceor
absenceofrarecellular
phenotypesfromweak
annotations.
Workfow for weakly supervised rare phenotype detection. The WSCNN network is trained on image-level classes that indicate the absence of a rare or
abnormal phenotype (negative class) or the presence of such phenotype (positive class) in the image. In the shown example the abnormal phenotype is
multinucleate cells. Our WSCNN isable to detect and localize the multinucleatecell even though it issurrounded byuninucleate cells.
Segmented saliency maps for the multinucleate cells. (a) The saliency maps (shown in magenta) confrm that WSCNN trained using image-
level annotations can detect and localize the diference between the positive class and negative class, which is the presence of a few multinucleate cells in the
positiveclass. (b)An image that belongstothe negative classbased on theexperimental metadataiscorrectlypredicted tohave amultinucleate cell.
39. Semi-supervisedlabellingthemostsuitableformedicalapplications
Have bigdatasetofunlabelled (orweakly-supervised) images thatare “easy”to
come by,but then the labellingbyclinicianis expensive
LABELEDIMAGES
Diabeticretinopathy,AMD and glaucoma with
all the diferent stagesaswell
Takesalotof annotation
time
495k imagesin SERI study
128k imagesin Google study
90k imagesin Kaggle Dataset
UNLABELEDIMAGES
Pool bunchof unlabeledfundusimagesfrom
your collaboratorsto the same dataset
+
40. Semi-supervisedlabellingplentyofdiferentapproachesoutthere
Anoverviewofproxy-label approaches
forsemi-supervised learning
SebastianRuder
http://ruder.io/semi-supervised/
PartsofthispostarebasedonmyACL2018paper
StrongBaselinesfor NeuralSemi-supervised
under DomainShift with BarbaraPlank.
While learning completely without
labelled data is unrealistic at
this point, semi-supervised
learning enables us to augment our
small labelled datasets with large
amounts of available unlabelled
data. Most of the discussed
methods are promising in that they
treat the model as a black box
and can thus be used with any
existing supervised learning
model.
RealisticEvaluationofDeep Semi-SupervisedLearning(SSL)
Algorithms
AvitalOliver,AugustusOdena,Colin Rafel,EkinD. Cubuk,IanJ.Goodfellow
https://arxiv.org/abs/1804.09170 lastrevised23May 2018
https://github.com/brain-research/realistic-ssl-evaluation
Ourexperimentsprovide strong evidencethatstandardevaluationpractice
for SSL isunrealistic.Whatchangesto evaluationshouldbemadeto better
refectreal-worldapplications?
Our discoveries also hint towards settings where SSL is most likely
the right choice forpractitioners:
•
When there are no high-quality labeled datasets from similar domains to
use forfne-tuning.
•
When the labeled data is collected by sampling i.i.d. from the pool of the
unlabeleddata,ratherthancoming froma(slightly)diferentdistribution.
•
When the labeled dataset is large enough to accurately estimate
validation accuracy, which is necessary when doing model selection and
tuning hyperparameters.
41. Semi-supervisedtraining examples with Mean Teacher and Virtual Adversarial Training
Meanteachers arebetterrolemodels:Weight-
averagedconsistencytargetsimprove semi-
superviseddeep learningresults
AnttiTarvainenandHarriValpola.
TheCuriousAICompanyandAaltoUniversity.
https://arxiv.org/abs/1703.01780 lastrevised16Apr2018
https://github.com/CuriousAI/mean-teacher
VirtualAdversarial Training: ARegularizationMethod
forSupervised and Semi-Supervised Learning
Takeru Miyato,Shin-ichi Maeda, Masanori Koyama, ShinIshii
https://arxiv.org/abs/1704.03976 revised27 Jun2018
https://github.com/takerum/vat_tf
A sketch of a binary classifcation task with two labeled examples (large blue dots) and one
unlabeled example, demonstrating how the choice of the unlabeled target (black circle) afects the
ftted function (gray curve). (a) A model with no regularization is free to ft any function that predicts the
labeled training examples well. … ). (e) An ensemble of models gives an even better expected
target.BothTemporal Ensembling and theMeanTeachermethod usethisapproach.
Demonstration of how our VAT works on semi-supervised learning. We generated 8
labeled data points (y = 1 and y = 0 are green and purple, respectively), and 1,000 unlabeled
data points in 2-D space. The panels in the frst row (I) show the prediction on the unlabeled
input points at diferent stages of the algorithm. The panels in the second row (II) are heat maps
of the regularization term on the input points. The values of LDS on blue-colored points are
relatively high in comparison to the gray-colored points. Note that, at the onset of training, all the
data points have similar infuence on the classifer. … After 10 updates, the model boundary was
still appearing over the inputs. As the training progressed, VAT pushed the boundary away from
the labeled input datapoints.
42. Semi-supervisedtraining recentexamples
ManifoldMixup:EncouragingMeaningfulOn-
ManifoldInterpolationasaRegularizer
VikasVerma,Alex Lamb,ChristopherBeckham, Aaron Courville,IoannisMitliagkas,
YoshuaBengio
https://arxiv.org/abs/1806.05236 lastrevised9Jul2018
https://github.com/vikasverma1077/manifold_mixup
The goal of our proposed algorithm, Manifold Mixup, is to increase the generality
and efectiveness of data augmentation by using a deep network’s learned
representations as a way of generating novel data points for training. For example, in a
dataset of images of dogs and cats, one could consider a non-linear interpolation
between a pair of dogs to produce novel examples of a dog that possesses some
factors of each of the pair in a novel combination (one such factor combination would
be to combine the head from one dog with the body of another dog, see Figure 4).
Experimentally, we observe that Manifold Mixup can act as an efective
regularizer. In particular, we show performance gains when training with limited
labeleddataandinsemi-supervisedlearning.
Semi-SupervisedLearningviaCompactLatent
Space Clustering
KonstantinosKamnitsas,DanielC.Castro,Loic LeFolgoc,IanWalker,Ryutaro
Tanno,DanielRueckert,Ben Glocker,Antonio Criminisi, AdityaNori
https://arxiv.org/abs/1806.02679 lastrevised29Jul 2018
Our work builds on the cluster assumption, whereby samples forming a
structure are likely of the same class (Chapelleetal.,2006), by enforcing a
furtherconstraint: Allsamplesofaclassshouldbelongtothesamestructure.
Our approach combines the benefts of graph-based regularization with
efcient, inductive inference, does not require modifcations to a
network architecture, and can thus be easily applied to existing
networkstoenablean efectiveuseofunlabeleddata
43. Semi-supervisedlabellingthemostsuitableformedicalapplications
DataDistillation: TowardsOmni-Supervised
Learning
IlijaRadosavovic,Piotr Dollár,RossGirshick,GeorgiaGkioxari,KaimingHe
https://arxiv.org/abs/1712.04440(12Dec2017)
ModelDistillation [Hintonetal.2015]
vs.DataDistillation.Indata
distillation,ensembled
predictionsfromasingle
modelappliedtomultiple
transformationsofan
unlabeledimageareused
asautomaticallyannotated
datafortrainingastudent
model.
ARobustDeep AttentionNetworktoNoisyLabelsin
Semi-supervisedBiomedicalSegmentation
ShaoboMin,XuejinChen
https://arxiv.org/abs/1807.11719(Submittedon31July2018)
Learning-based methods sufer from limited clean annotations, especially for
biomedical segmentation. For example, the noisy labels make model confused
and the limited labels lead to an inadequate training, which are usually concomitant. In
this paper, we propose a deep attention networks (DAN) that is more robust to noisy
labels by eliminating the bad gradients caused by noisy labels, using attention
modules. The intuition is that a discussion of two students may fnd out mistakes
taught by teacher. And we further analyse the infection processing of noisy labels
and design three attention modules, according to diferent disturbance of noisy
labels in diferent layers. Furthermore, a hierarchical distillation is developed to provide
morereliablepseudolabelsfromunlabelddata.
44. ActiveLearning whattolabelfrst,andusecliniciantimesmartly
Active deep learningreducesannotation
burdeninautomaticcellsegmentation
AritraChowdhury,SujoyBiswas,SimoneBianco
http://doi.org/10.1101/211060(November1,2017)
Analysis of high-content high-throughput microscopy cannot be
left to manual investigation and needs to resort to the use of efcient
computing algorithms for cellular detection, segmentation, and tracking.
Annotation is required for building high quality algorithms. Medical
professionals and researchers spend a lot of efort and time in
annotating cells. This task has proved to be very repetitive and time
consuming. The expert's time is valuable and should be used
efectively.
We approach the segmentation task using a classifcation
framework. Each pixel in the image is classifed based on whether the
patch around it resides on the interior, boundary or exterior of the cell.
Uncertainty sampling, a popular active learning framework is used in
conjunction withCNN tosegment thecellsin theimage.
AFT*:IntegratingActive LearningandTransfer Learningto
ReduceAnnotationEforts
Zongwei Zhou, Jae Y. Shin, Suryakanth R. Gurudu, Michael B. Gotway, JianmingLiang
https://arxiv.org/abs/1802.00912 (revised7Feb2018)
To dramatically reduce annotation cost, this paper presents a novel method to
naturally integrate active learning and transfer learning (fne-tuning) into a single
framework, called AFT*, which starts directly with a pre-trained CNN to seek "worthy"
samples for annotation and gradually enhance the (fne-tuned) CNN via continuous fne-
tuning.
Five diferent Pulmonary Embolism (PEs) in the standard 3-channel representation, as well
as in the 2-channel representation [Tajbakhshetal.2015], which was adopted in this work
because it achieves greater classifcation accuracy and accelerates CNN training
convergence.Thefgureisusedwithpermission.
45. In practice Structure andrecord allhumaninteractions
So thatpeopledonothaveto dothesamemanualwork adinfnitumwhich isthedreamof manyengineerswho hatemanualrepetitvework :)
all ofcourse
dependsonyour
application and how
muchefort isneeded
for eachaction
recording.
The more intelligent you
makethe acquisition,
the more thankful are
futuregenerations:P
Vascularpixel-by-pixel labeling doestemporalsequence
matterinlabeling? Youwouldprobablythinkthat not?
First
Stroke
Second
Stroke
Fourth
Stroke
Third
Stroke
You couldanywaytraina salient map
(“virtualeyetracking”)fromthis?
Insomeapplicationsmightbeuseful,
andsomenot?Butifthisiseasyto
gather(which itis),why thennotalso
savetheorderofbrush strokes
46. Hardto anticipatealltheuse cases ofyour acquireddata
Whatifthetemporalsequenceis useful forvascularsurgeons?Ornot?
For example, you could
alsorecord gaze (and get
the real sequence of
attention) whenexperts
grade diabetic
retinopathyimagesfor
diagnosis-relevantsalient
regions.
DeepScope:NonintrusiveWholeSlide
SaliencyAnnotationandPredictionfrom
PathologistsattheMicroscope
https://doi.org/10.1007/978-3-319-67834-4_4
https://www.biorxiv.org/content/early/2017/01/22
/097246
https://bitbucket.org/aschaumberg/deepscope
Manual labeling is prohibitive, requiring
pathologists with decades of training and
outstanding clinical service responsibilities. We
present the frst end-to-end framework to
overcome this problem, gathering annotations in
a nonintrusive manner during a pathologist's
routine clinical work: (i) microscope-specifc 3D-
printed commodity camera mounts are used to
video record the glass-slide-based clinical
diagnosis process; (ii) after routine scanning of
the whole slide, the video frames are registered
to the digital slide; (iii) motion and
observation time are estimated to
generate a spatial and temporal saliency
mapofthewholeslide.
TowardsIntelligentSurgicalMicroscope:
Micro-surgeons'GazeandInstrument
Tracking
https://doi.org/10.1145/3030024.3038269
The frst step towards an intelligent surgical microscope is to
design an activity-aware microscope. In this paper, we present
a novel system that we have built to record both eyes and
instruments movements of surgeons while
operating with a surgical microscope. We present a
case study in micro-neurosurgery to show how the system
monitorsthe surgeon'sactivities.
Tobii ProGlasses2API
The Tobii Pro Glasses 2 API consists of an
HTTP REST to control our wearable eye
tracker inlivesetups.
47. Modelstoopowerful for theavailableannotateddata
RevisitingUnreasonableEfectivenessofDatainDeep Learning
SunC, ShrivastavaA,Singh Setal.2017GoogleResearch,CarnegieMellonUniversity
http://arxiv.org/abs/1707.02968
WhileGPU computation
powerandmodelsizes
havecontinuedtoincrease
overthelast fveyears,size
of thelargest training
dataset has
surprisinglyremained
constant. Whyisthat?
Whatwouldhave
happenedifwehaveused
ourresourcestoincrease
datasetsizeaswell?This
paperprovidesasneak-
peekintowhat couldbe
if thedataset sizesare
increaseddramatically.
“Our sincerehope is
that this inspires vision
community to not
undervaluethe
data and develop
collectiveeforts in
building larger
datasets”
Underftting for
weakly-supervised
massivedatasets
48. Waytoolittlemedical dataavailable
Hopes for theMedical ImageNet tomovethe medical imageanalysis forward
MedicalImageNet-RadiologyInformatics
langlotzlab.stanford.edu/projects/medical-image-net/ MedicalImageNet.Apetabyte-scale,cloud-based,multi-institutional,searchable,openrepositoryof
diagnosticimaging studiesfor developingintelligentimage
CurtLanglotz,Stanford– StanfordMedicineBigData|Precision
Health2017https://youtu.be/guf8V33pOWQ?t=7m55s
Towardsambientintelligencein AI-assistedhealthcare
spaces-DrFei-FeiLi, Stanford University
https://youtu.be/5RTkhfVIW40?t=36m10s
TheAlanTuringInstitute
PublishedonApr25,2018
Motivationaloverviewforwhythemedicalimage
analysisneedavolumetricequivalent of
popularImageNet database usedin
benchmarkingdeeplearningarchitectures,and
asabasisfortransferlearningwhennotenough
dataisavailablefortrainingthedeeplearning
fromscratch.
https://www.slideshare.net/PetteriTeikariPhD/m
edical-imagenet
50. “Semi-”self-supervised learning with MRI scans
Self-Supervised LearningforSpinalMRIs
AmirJamaludin,TimorKadir,andAndrewZisserman
https://arxiv.org/abs/1708.00367(Submittedon1Aug2017)
A signifcant proportion of patients scanned in a clinical
setting have follow-up scans. We show in this work
that such longitudinal scans alone can be used as a
form of 'free' self-supervision for training a
deep network. We demonstrate this self-supervised
learning for the case of T2-weighted sagittal lumbar
Magnetic Resonance Images (MRIs). A Siamese
convolutional neural network (CNN) is trained using
two losses: (i) a contrastive loss on whether the
scan is of the same person (i.e. longitudinal) or not,
together with (ii) a classifcation loss on predicting
thelevelof vertebralbodies.
We show that the performance of the pre-trained CNN
on the supervised classifcation task is (i) superior to
that of a network trained from scratch; and (ii) requires
far fewer annotated training samples to reach an
equivalent performance to that of the network trained
fromscratch.
51. “Semi-”self-supervised learning with MRI scans
Self-Supervised ConvolutionalFeatureTrainingfor Medical Volume Scans
Max Blendowski,HannesNickisch, andMattiasP.Heinrich
https://pdfs.semanticscholar.org/4d27/b932a122dc3f1df47b379b75cb19610f5de5.pdf
1stConferenceonMedicalImagingwithDeepLearning (MIDL2018),
However, more than in other areas, annotated data is
scarce in medical imaging despite the fact that the
number of acquired images grows rapidly. We propose to
take advantage of these unlabeled images. Using readily
available spatial cues in medical volume scans without
expert input, we train local image descriptors in an
unsupervised manner and show that they can easily
be employed in subsequent tasks with very few labeled
training images. Our experiments demonstrate that
predicting simple positional relationships
between pairs of non-overlapping subvolumes in
medical images constitutes a sufcient auxiliary
pretraining task to provide expressive feature
descriptorswithintheirreceptivefeld.
53. “ActiveHardware” asnon-standardtermforsmartdataacquisition
Most papers coming out on image restoration, image classifcation and
semantic segmentation are done using alreadyacquired datasets.
Whereas we could do multiframe capture for denoising (image averaging),
super-resolution and improved highdynamicrange (HDR).
One could simultaneously deblur (deconvolve) the image with adaptive
opticscorrection.
This would give us self-supervised targets for image restoration deep
learning.
54. Learntheimaging pipelineimperfections
ModelingCameraEfectstoImproveDeep
VisionforRealandSyntheticData
AlexandraCarlson, Katherine A. Skinner, Ram Vasudevan, Matthew
Johnson-Roberson
Revised04June2018https://arxiv.org/abs/1803.07721
“Several studies have investigated manipulating images
in a visually-realistic manner to improve neural network
robustness. Eiteletal.2015 propose a physically-based
data augmentation scheme specifcally for depth
images. They model realistic depth (Kinect) sensor
noise as missing data patterns (i.e., occlusions and
missing sensor data). In this study we focus on the
more high-dimensional problem of modeling realistic
sensor noise in RGB images. Wuetal.2015 focus on
optimizing data partitioning, communication, and
hardware design for a novel image classifcation
pipeline. As part of this pipeline, they use data
augmentation, including RGB color shift, vignetting,
pincushion and barrel distortions, cropping, and
rotating.Alhaijaetal.2017demonstratethataugmenting
real data with rendered cars improves results for object
detection with Faster R-CNN. Although not the focus of
their work, their results show that augmented images
that are post-processed with hand-tuned chromatic
aberration, color curve shifts, and motion blur efects
yieldasignifcantperformanceboost.”
ASoftwarePlatformforManipulatingthe
CameraImagingPipeline
Karaimer H.C., BrownM.S. (2016)
https://doi.org/10.1007/978-3-319-46448-0_26
https://karaimer.github.io/camera-pipeline/
55. Computationalimaging anddeeplearning
End-to-endoptimizationofopticsandimage
processingforachromaticextendeddepthof
feldandsuper-resolutionimaging
Vincent Sitzmann, Steven Diamond, Yifan Peng, XiongDun, Stephen
Boyd, WolfgangHeidrich, FelixHeide, Gordon Wetzstein
ACMTransactionsonGraphics(TOG)
Volume37Issue4,August2018ArticleNo.114
https://doi.org/10.1145/3197517.3201333
https://www.youtube.com/watch?v=iJdsxXOfqvw
We build a fully-diferentiable simulation model that maps
the true source image to the reconstructed one. The
model includes difractive light propagation, depth and
wavelength-dependent efects, noise and nonlinearities, and
the image post-processing. We jointly optimize the optical
parameters and the image processing algorithm parameters
so as to minimize the deviation between the true and
reconstructedimage,overalargesetofimages.
In future work, we would like to explore more
sophisticated diferentiable reconstruction methods,
such as convolutional neural networks. Advanced
computational camera designs, for example tailored to
higher-level vision tasks, likely require deep algorithmic
frameworks. We would also like to explore otherwise
inaccessible parts of the camera design spectrum, for
example by minimizing the device form factor or
overcoming fundamental limits of conventional
cameras.
One of the applications of the proposed end-to-end computational
camera design paradigm is achromatic extended depth of
feld. When capturing an image with a regular singlet lens (top left),
out-of-focus regions are blurry and chromatic aberrations further
degrade the image quality. With our framework, we optimize the
profle of a refractive optical element that achieves both
depth and chromatic invariance. This element is fabricated
using diamond turning (right) or using photolithography. After
processing an image recorded with this optical element using a
simple Wiener deconvolution, we obtain an all-in-focus image with
little chromatic aberrations (top center). Point spread functions for
both the regular lens and the optimized optical element are shown in
the bottom. In this paper, we explore several applications that
demonstrate the efcacy of our novel approach to domain-specifc
computational cameradesign.
Evaluation of achromatic extended depth
of feld imaging in simulation. We compare
the performance of a Fresnel lens optimized for
one of the target wavelengths (left), a cubic phase
plate combined with the phase of a focusing lens
(second column), a multi-focal lens optimized for
all fve target depths (third column), a difractive
achromat optimized for all three target
wavelengths at 1 m (fourth column), a hybrid
difractive-refractive element (ffth column), optics
optimized end-to-end with Wiener deconvolution
with a height map parameterization (sixth column),
and optics optimized end-to-end with
Wiener deconvolution with a Zernike basis
representation (right).
56. Multiframe-based imageenhancement
EnablingfastandhighqualityLEDphotoacoustic
imaging:arecurrentneuralnetworksbased
approachEmran Mohammad AbuAnas, HaichongK. Zhang, Jin Kang,
and Emad Boctor
BiomedicalOpticsExpressVol.9,Issue8,pp.3852-3866
(2018)https://doi.org/10.1364/BOE.9.003852
Light emitting diode (LED) is a potential alternative of
laser as a light source; it has advantages of inexpensive,
portable andsafe lightsource.However,thekeydrawback of
LED light source is its limited output power, even series
of LEDs can generate energy in range of μJ. As a result, the
received photoacoustic (PA) signal of an LED-based system
signifcantly sufers from low signal-to-noise-ratio (SNR). To
improve the SNR, the current technology is based on
acquiring multiple frames of PA signals, and
subsequently performs an averaging over them to minimize
thenoise.
Thepublishedimageenhancementtechniquesarebased
onstackeddenoisingauto-encoder,denselyconnected
convolutionalnet or includingperceptuallosstoenhance
thespatialstructureofimages.Inadditiontocomputervision,
neuralnetworksbasedtechniqueshavebeenreportedin
variousPAapplications,e.g.,imagereconstructioninPA
tomography andeliminationof refectionartifactsinPA
images.
Qualitative comparison of our method with the simple averaging and CNN-only techniques for a wire phantom example for
three diferent values of the averaging frame numbers, where the imaging plane consists of a line object. Though the network
hasbeen trained usingpoint spread function, we can observe itsrobustnesson line target function.
A comparison of our method with the
averaging andCNN-onlytechniquesfor an in
vivo example. The in vivo data consists of
proper digital arteries of three fngers of a
volunteer. We can notice improvements in
our results compared to those of other two
methods in recovering the blood vessels
(markedbyarrows).
57. “DigitalAdaptiveOptics” forscatteringmediumwithCNNs#1
Imagingthroughglassdifusersusing
denselyconnectedconvolutional
networks
ShuaiLi, Mo Deng,JustinLee,AyanSinha,
GeorgeBarbastathis
https://arxiv.org/abs/1711.06810
(Submittedon18Nov 2017)
Here, we propose for the frst time, to our
knowledge, a convolutional neural network
architecture called "IDifNet" for the problem
of imaging through difuse media and
demonstrate that IDifNet has superior
generalization capability through extensive
testswith well-calibrateddifusers.
Our results show that the convolutional
architecture is robust to the choice of prior,
as demonstrated by the use of multiple
training and testing object databases, and
capable of achieving higher space-
bandwidth product reconstructions than
previouslyreported.
58. “DigitalAdaptiveOptics” forscatteringmediumwithCNNs#2
Deep specklecorrelation: a deep learningapproach
towardsscalableimagingthroughscatteringmedia
YunzheLi,YujiaXue, Lei Tian
https://arxiv.org/abs/1806.04139
(Submittedon11Jun2018)
Tremendous progress has been made by exploiting the deterministic input-
output relation for a static medium. However, this approach is highly
susceptible to speckle decorrelations - small perturbations to the
scattering medium lead to model errors and severe degradation of the imaging
performance.
In addition, this is complicated by the large number of phase-sensitive
measurementsrequiredfor characterizing the input-output`transmissionmatrix'.
Our goal here is to develop a new framework that is highly scalable to both
mediumperturbationsandmeasurementrequirement.
We then show that the CNN is able to generalize over a completely
diferent set of scattering media from the same class, demonstrating its
superior adaptability to medium perturbations. In our proof of concept
experiment, we frst train our CNN using speckle patterns captured on
difusers having the same macroscopic parameter (e.g. grits); the
trained CNN is then able to make high-quality reconstruction from
speckle patterns that were captured from an entirely diferent set of
difusersof the same grits. Ourwork pavesthe way to a highly scalable
deep learning approach for imaging through scattering media.
59. “DigitalAdaptiveOptics” forscatteringmediumwithCNNs#3
ObjectClassifcationthroughScatteringMedia with
Deep LearningonTime ResolvedMeasurement
GuySatat,MatthewTancik,OtkristGupta,Barmak Heshmat,and
Ramesh Raskar
https://arxiv.org/abs/1806.04139
https://www.youtube.com/watch?v=GZyN3fWQVu0
(Submittedon11Jun2018)
The CNN is trained with a large synthetic dataset
generated with a Monte Carlo (MC) model that contains
randomrealizations ofmajorcalibrationparameters.
The method is evaluated with a time-resolved camera
[Single Photon Avalanche Photodiode, SPAD,
PhotonForcePF32) with 32 × 32 pixels, and a time resolution
of56ps]
Multiple experimental results are provided including pose
estimation of a mannequin hidden behind a paper sheet with 23
correct classifcations out of 30 tests in three poses (76.6%
accuracy on real-world measurements). This approach paves the
way towards real-time practical non line of sight (NLOS) imaging
applications
60. Hyperspectralcrosstalkreduction forclassifcation
Hyperspectraldemosaickingandcrosstalk
correctionusingdeeplearning
K. Dijkstra, J. van de Loosdrecht, L. R. B. Schomaker, M. A. Wiering
MachineVisionandApplicationspp1–21(2018)
https://doi.org/10.1007/s00138-018-0965-4
An interesting class of cameras for UAVs are singlecamera-
one-shot. A standard RGB camera with a Bayer flter [12] is
an example of this type of system. Recently these types of
imaging systems have been further extended to 3×3, 4×4
and 5×5 mosaics [13] in both visible and near-infrared
spectralranges
1. How much does hyperspectral demosaicking
beneftfromspectralandspatialcorrelations?
2. What are good practices for designing
hyperspectraldemosaickingneuralnetworks?
3.Howwellcanhyperspectraldemosaickingsub-
tasksbeintegratedforend-to-endtraining?
Not active per se in this case as they optimize color flters, but we could do this in real-time with tunable
flters (AOTF) or with a monochromator. For example with “pathology priors”, optimized flters for
glaucoma,AMD andDRscreening
62. MotionCorrectionin MagneticResonanceImaging (MRI)
Automaticdetectionof motionartifactsonMRI
usingDeep CNN
IreneFantini; LeticiaRittner; ClarissaYasuda; RobertoLotufo
2018InternationalWorkshop onPatternRecognitioninNeuroimaging(PRNI)
https://doi.org/10.1109/PRNI.2018.8423948
The motion detection method. Four patches are extracted from 13
slices of each MRI acquisition’s view. The ANN gives the motion artifact
probability combining the results of the four CNN models from each
patchandthepatchcoordinates.
MoCoNet:MotionCorrectionin3D MPRAGE imagesusinga
ConvolutionalNeuralNetworkapproach
KamleshPawar,Zhaolin Chen,N.JonShah,GaryF.Egan
(Submittedon29Jul2018)
https://arxiv.org/abs/1807.10831
Convolutionmotion
model:showingthat
themotioncorrupted
imagecanbe
modelledasalinear
combinationof
convolutionof
transformedfully
sampledimages(d,g)
andconvolution
kernels(e,h)
63. MotionCorrectionin MagneticResonanceImaging (MRI)
Automatedreference-free detectionof motion
artifactsinmagneticresonanceimages
ThomasKüstner, AnnikaLiebgott, LukasMauch,PetrosMartirosian,Fabian
Bamberg,Konstantin Nikolaou,Bin Yang,FritzSchick,SergiosGatidis
MagneticResonanceMaterialsinPhysics,BiologyandMedicine
April2018,Volume31,Issue2,pp243–256
https://doi.org/10.1007/s10334-017-0650-z
Method for motion artifact reductionusing aconvolutional neural
network fordynamic contrast enhanced MRIof the liver
Daiki Tamada, Marie-LuiseKromrey, Hiroshi Onishi,UtarohMotosugi
(Submittedon18Jul2018)
https://arxiv.org/abs/1807.06956
Probability mapfor artifact
occurrenceon a per-patch basis
overlaid on MR imagesof an exemplary
volunteerin thehead (upperpart)and
theabdomen(lower part)in diferent
slicelocationsusing apatchsizeof 40
× 40 × 1 voxels.Respectivetop rows:
imagesacquired withoutmotion(lying
still /breath-hold). Respectivebottom
rows: imagesacquiredwith motion
(head motionin left-rightdirection or
breathing).Thelack of motionis
correctlydetected in theback of the
head (pivotpointof tilting)around the
spineevenin themotion-corrupted
images(green/whitearrows).False-
positivemotion detectionoccurred in
anatomicareaswithsmalllinear
structures, e.g., theposteriorneck
muscles(red/whitearrow)
Possible applications of the proposed approach are to provide direct feedback
during an MRI examination. This enables increased patient comfort and timely
optimization by possible repetition of a measurement with parameter adjustments
in both cases where prospective/retrospective correction techniques are available
or not. Furthermore, in retrospective studies this allows an automated identifcation
ofmotioncorrupted images.
Examples of artifact reduction with MARC in a patient from validation dataset. The motion
artifactsin theimages(upperrow)werereduced(lowerrow)byusingtheMARC.
64. MotionCorrectionin ComputedTomography(CT)
MotionEstimationinCoronaryCTAngiographyImages
usingConvolutionalNeuralNetworks
Tanja Elss,HannesNickisch,TobiasWissel,Rolf Bippus,MichaelMorlock,MichaelGrass
(modifed: 09Jun 2018)MIDL2018ConferenceSubmission
https://openreview.net/forum?id=HkBtaBjoz
Please Don'tMove—EvaluatingMotionArtifactFrom
PeripheralQuantitativeComputed TomographyScans
UsingTexturalFeatures
Timo Rantalainen,PaolaChivers,Belinda R.Beck,SamRobertson,NicolasH.Hart,Sophia Nimphius,BenjaminK.
Weeks,Fleur McIntyre,BethHands,ArisSiafarikas
JournalofClinicalDensitometry Volume21, Issue2,April–June2018,Pages260-268
https://doi.org/10.1016/j.jocd.2017.07.002
Visualization of rotation invariant local binary patterns used to capture textural information from a
tibial shaft slice with a clear visible motion artifact. Left: the original image before any processing;
right:LBPriuof the image. LBPriu, usingrotation invariant local binarypattern.
An additional limitation was the use of only 1 human-classifer for the
groundtruth,although thiswasconsidered sufcienttoexplore whether textural
analysiscouldprovideafeasibleclassifcationapproachfor motionartifact
The predicted motion vectors of nine exemplary cross-sectional patches are visualized as red
lines. The correspondingground truthmotion vectors(dashed line) are highlighted in green.
65. MotionCorrectionin UltrasoundImaging
3D FreehandUltrasoundWithoutExternalTracking
UsingDeep Learning
RaphaelPrevost,MehrdadSalehi,SimonJagoda,NavneetKumar,JulianSprung,
AlexanderLadikos,RobertBauer,OliverZettinig,Wolfgang Wein
MedicalImageAnalysis2018
https://doi.org/10.1016/j.media.2018.06.003
A deep learning network (instead of speckledecorrelation algorithm)toestimate
themotion oftheprobebetween twosuccessiveframes,withandwithout inertial
measurementunit(IMU)groundtruth
Stryker NAV3TM
Camera
HIGH-QUALITY
GroundTruth
XsensMti-3-8A7G6
IMU
“LOW-QUALITY”
GroundTruth
“Collect high quality ground truth
along with cheaper sensor and
‘self-supervise’ the correction
algorithm with both of them.
Then you can develop your
hardware with lower cost and
algorithmically improve the
quality”
66. MotionCorrectionin invitro Microscopy
Deep learning-baseddetectionofmotionartifactsin
probe-based confocal laserendomicroscopy images
MarcAubreville,MaikeStoeve,NicolaiOetter,MiguelGoncalves,ChristianKnipfer,Helmut
Neumann,ChristopherBohr,FlorianStelzle,AndreasMaier
InternationalJournalofComputerAssistedRadiologyandSurgery(2018)
https://doi.org/10.1007/s11548-018-1836-1
Each of the images was manually assessed for motion artifacts by two experts with
background in biomedical engineering, while the second expert was able to see the annotations of
the frst expert (non-blinded). The annotation results have been validated by two medical experts
with profound experience in CLE diagnosis. All annotations (bounding boxes of artifacts)
were stored in a relational database and used for both training and evaluation. The annotation found
motion artifactstobepresent in atotal of 1749images
... we could perform a factor analysis to investigate which hand-crafted
features load onto which data-driven features, and even devise new network
technologies that fuse traditional hand-crafted features with deep
learningtechniques [Maieret al.2017].
67. MotionCorrectionfor intravitalmicroscopy#1
Automatedcorrectionof fastmotionartifactsfor
two-photonimagingofawake animals
DavidS.Greenberg, Jason N.D.Kerr
Journal of Neuroscience Methods15January2009
https://doi.org/10.1016/j.jneumeth.2008.08.020
Two-PhotonImagingwithinthe Murine Thorax
without Respiratory andCardiacMotionArtifact
RobertG.PressonJr, IrinaPetrache,etal.
TheAmerican Journal ofPathology(July 2011)
https://doi.org/10.1016/j.ajpath.2011.03.048
(d)Corrected
imageswith
fuorescence
valuesassigned
to thepositions
estimated bythe
alignment
algorithm
Gated imaging eliminates motion for maximum clarity in three-dimensional TPM reconstructions. A:
Comparison between ungated (left) and gated (right) image acquisition of an identical feld of view
showing FITC-labeled (green) alveolar microvasculature in the intact rat. Reconstructions in the x-z
orientation correspond to the indicated slice regions (B–E) from the 3 dimensional images. Nuclei are
stained with intravenous Hoechst(blue). Scale bars 25 m.
69. MotionCorrectionin Ophthalmology
Prevalencesof segmentationerrorsandmotionartifacts
inOCT-angiographydiferamongretinaldiseases
J.L.Lauermann,A.K.Woetzel,M.Treder,M.Alnawaiseh,C.R.
Clemens, N.Eter, FlorianAlten
Graefe'sArchiveforClinicalandExperimentalOphthalmology(07July2018)
https://doi.org/10.1007/s00417-018-4053-2
Spectral domain OCT-A device (Optovue Angiovue) showing diferent degrees of motion artifacts.
a Motion artifact score (MAS) b, c Manifestation of diferent motion artifacts, including strong
quilting, partly with incipient expression of black lining (white asterisk), stretching (white arrows),
and displacements(whitecircles) in diferent partsof theimage
In the future, deep learning software applications might not only be able to distinguish between diferent retinal diseases but also to detect specifc artifacts and to
warn the user in case of insufcient image quality. Today, multimodal imaging leads to an overwhelmingly large amount of image information that has to be reviewed
by ophthalmologists in daily clinical routine. Thus, software assistance in image grading appears mandatory to manage the growing amount of image data and
to avoid useless image data of insufcient quality. In the future, segmentation will move forward through redefnitions of segmentation boundaries and refnements of
algorithm strategies in pathologic maculae [deSisternesetal.2017]. In conclusion, OCT-A image quality including motion artifacts and segmentation errors
mustbeassessedpriortoadetailedqualitativeorquantitativeanalysistowarrantmeaningfulandreliableresults.
71. Thesamething mightbeexpressedusingdiferent words
Medical Concept NormalizationforOnlineUser-
GeneratedTexts. KathyLee; Sadid A Hasan ; Oladimeji Farri; Alok
Choudhary; AnkitAgrawal
https://doi.org/10.1109/ICHI.2017.59
SSEL-ADE:Asemi-supervised ensemblelearningframeworkforextracting
adversedrugeventsfrom socialmedia JingLiu, SongzhengZhao, GangWang
Artifcial Intelligence in Medicine
Volume 84, January 2018, Pages34-49 https://doi.org/10.1109/ICHI.2017.59
Semi-supervised learning can address this issue by leveraging abundant
unlabeled data in social media together with labeled data, to build better
classiferswithouthumanintervention (ZhouandLi,2010).
To verify the efectiveness of SSEL-ADE for Adverse Drug Events (ADE) relation
extraction from social media, we built a dataset that was sourced from a well-
known online health community, i.e., MedHelp which empowers more than 12
millionpatients eachmonthtoseekmedicalanswers[Yangetal.2014].
73. InteractiveECGpatient-specifclabelling
AGlobal and Updatable ECGBeat
ClassifcationSystemBasedonRecurrent
NeuralNetworksandActive Learning
Guijin Wang, ChenshuangZhang, YongpanLiu, HuazhongYang, DapengFu,
HaiqingWang, PingZhang
Information Sciences(Availableonline2July2018)
https://doi.org/10.1016/j.ins.2018.06.062
Someofthesemodels
requireinteractive
labeling foreachpatient[
Kiranyazetal.2016;Zhangetal.2017]
.In
Kiranyazetal.2016
researchersselectcommon
training datarandomly froma
largedatapool. Thefrst
severalminutesofbeatsfrom
eachrecordareselectedand
usedasthepatient-specifc
training.An automatic
labelingprocessof
patient-specifctraining
samplesisdevelopedin
Yeetal.2015.Patient-specifc
samplesareaddedintothe
training setwiththelabels
generatedbyan existing
modal.
Removing diferent kinds of noises.
(a)Baselinewander.(b)Motion artifact.(c)
Muscle electricity. The red lines represent
signals corrupted by noises, and the blue
linesshowsignalsafter noiseelimination.
Uncertainty inNoise-DrivenSteady-State Neuromorphic
NetworkforECGDataClassifcation
Amir Zjajo; Johan Mes;Eralp Kolagasioglu; Sumeet Kumar ;Rene vanLeuken
2018 IEEE31stInternationalSymposiumon Computer-Based MedicalSystems(CBMS)
https://doi.org/10.1109/CBMS.2018.00082
The pathophysiological processes underlying the ECG tracing demonstrate
signifcant heart rate and the morphological pattern variations, for diferent or in the
same patient at diverse physical/temporal conditions. Within this framework, spiking
neural networks (SNN) may be a compelling approach to ECG pattern
classifcation based on the individual characteristics of each patient. In this
paper, we study electrophysiological dynamics in the self-organizing map (SOM)
SNN Mesetal.2017
when the coefcients of the neuronal connectivity matrix are random
variables. We examine synchronicity and noise-induced information
processing, infuence of the uncertainty on the system signal-to-noise ratio, and
impactontheclusteringaccuracyofcardiacarrhythmia.
http://doi.org/10.1145/2750858.2807526
76. 3DBoundingBoxAnnotation the simplest annotation method
LeveragingPre-Trained3D ObjectDetectionModels For Fast GroundTruthGeneration
Jungwook Lee,SeanWalsh,AliHarakeh,StevenL.Waslander
https://arxiv.org/abs/1807.06072 (Submittedon16Jul2018)
Reducing both task complexity and the amount of task switching
done by annotators is key to reducing the efort and time required to
generate 3D bounding box annotations. This paper introduces a
novel ground truth generation method that combines human
supervision with pretrained neural networks to generate per-
instance 3D point cloud segmentation, 3D bounding boxes, and
class annotations. The annotators provide object anchor
clicks which behave as a seed to generate instance
segmentationresultsin3D.
Through the use of a center-regression T-Net, the centroid of
each object is estimated and fnally, a bounding box is ft to the object
in a third stage. Since the only interaction required by annotators is to
provide the initial object instance clicks, the time taken to generate a
bounding box for each object can be reduced to the annotation time,
which is up to 30x faster than existing known methods of
producinggroundtruthfor3Dobjectdetection.
79. Refning BoxLabels with Graph Cut
PseudoMaskAugmentedObjectDetectionn
Zhaoetal.
15Mar 2018 https://arxiv.org/abs/1803.05858
An overview of our pseudo-mask augmented object detection, consisting of the network architecture and graph cut based
pseudo mask refnement. For each image, the detection sub-network and instance-level object segmentation sub-network share
convolutionallayers(i.e.conv1-conv5for VGG,conv1-conv4 for ResNet).For segmentationsub-network,position-sensitivescoremaps
aregeneratedby1×1convolutionallayer,anditisthenpassedthroughposition-sensitivepoolingtoobtainobjectmasks.
Starting fromthe joint
objectdetection and
instance segmentation
network, the proposed
PAD recursively
estimatesthe pseudo
ground-truthobject
masksfromthe
instance-level object
segmentationnetwork
training, and then
enhance the detection
network with atop-
downsegmentation
feedback.
80. Refning Masks with BilateralFiltering
Synthetic Depth-of-Field with a Single-CameraMobile Phone
Neal Wadhwa, Rahul Garg, David E. Jacobs, Bryan E. Feldman, NoriKanazawa, Robert Carroll, Yair Movshovitz-Attias, Jonathan T.
Barron, Yael Pritch, MarcLevoy
11June2018 https://arxiv.org/abs/1806.04171
The inputs to and steps of our disparity algorithm. Our input data is a color image
(a) and two single-channel Dual-pixel (DP) views that sum to the green channel of
the input image. For the purposes of visualization, we normalize the DP data by
making the two views have the same local mean and standard deviation. We show
pixel intensity vs. vertical position for the two views at two locations marked by the
green and red lines in the crops (b). We compute noisy matches and a heuristic
confdence (c). Errors due to the lens aberration (highlighted with the green arrows
in (c)) are corrected with calibration (d). The segmentation mask is used to
assign the disparity of the subject’s eyes and mouth to the textureless regions on
the subject’s shirt (e). We use bilateral space techniques to convert noisy
disparitiesand confdences toan edge-aware dense disparitymap (f ).
81. Refning 3DMasks with Supervoxelgraphicalmodels
Supervoxel based method for multi-atlas segmentationof brainMRimages
JieHuo, Jonathan Wu, JiuwenCao, GuanghuiWang
NeuroImage Volume175,15July2018,Pages201-214
https://doi.org/10.1016/j.neuroimage.2018.04.001
Three consecutive slices are shown for the supervoxel graph (a), where the blue edges E1
indicate the pairwise potential in the coronal plane while the orange edges E2 are the
pairwise potential of two adjacent slices. The dense graph (b) takes one slice as an example,
where the bottom layer andtop layer illustrate the gridgraph andsupervoxellayer,respectively.
The blue edges indicate the pairwise potential in the grid graph while the orange edges
showthehighorder potential.Thenodesareindicatedwithreddotsinbothgraphs.
83. InteractiveSegmentation ExtremeClicking
ExtremeclickingforefcientobjectannotationDim P.
Papadopoulos, Jasper R. R. Uijlings, FrankKeller, Vittorio Ferrari
https://arxiv.org/abs/1708.02750(Submittedon9Aug2017)
Visualizationof inputcuesand
outputof GrabCut. Firstrow
showsinputwith annotator’s
extremeclicks. Secondrow
showsoutputof edgedetector [
DollarandZitnick2013]
. Thirdrow shows
ourinputsforGrabCut: thepixels
used to createbackground
appearancemodel(red), thepixels
used to createtheobject
appearancemodel(brightgreen),
theinitial boundary estimate
(magenta),and theskeletonpixels
which weclamp to havetheobject
label (dark green). Fourth row
showstheoutputofGrabCutwhen
using our newinputs,whilethelast
rowshowstheoutputwhenusing
only abounding box.
Qualifcation test.
(Left)Qualifcation test
examplesof the dogand cat
class. (Middle)The fgure-
ground segmentation masks
weuse toevaluate annotators’
extreme clicksduringthe
trainingstage. The pixelsof the
four extreme areasof the mask
aremarked with colors.
(Right)The accepted areas
for each extreme clickand the
clickpositionsaswedisplay
them tothe annotatorsas
feedback.
85. InteractiveSegmentation Clicksampling
IterativelyTrainedInteractiveSegmentation
Sabarinath Mahadevan et al.
11May2018https://arxiv.org/abs/1805.04398
Forthetaskofobject
segmentation,manuallylabeling
dataisvery expensive,and
henceinteractivemethods are
needed.Followingrecent
approaches,wedevelopan
interactiveobjectsegmentation
systemwhichuses userinput in
theformof clicksastheinputto
aconvolutionalnetwork.While
previousmethodsuse heuristic
clicksamplingstrategiesto
emulateuserclicksduringtraining,
wepropose anewiterativetraining
strategy.Duringtraining,we
iteratively addclicksbasedonthe
errorsof the currentlypredicted
segmentation.
Overview of our method. The
input to our network consists
of an RGB image
concatenated with two
click channels
representing negative and
positive clicks, and also an
optional mask channel
encoded as distance
transform.
87. InteractiveSegmentation Verbalinteraction
GuideMe:InteractingwithDeepNetworks
Christian Rupprecht et al.
30Mar2018https://arxiv.org/abs/1803.11544
In thispaper, we explore
methodsto fexibly guide a
trainedconvolutional neural
network through userinput
to improve itsperformance
during inference. We do so by
insertinga layer that actsasa
spatio-semanticguide
into the network.Thisguide is
trainedto modify the
network'sactivations, either
directlyvia an energy
minimization scheme or
indirectlythrougha recurrent
model that translates
humanlanguage queries
to interaction weights.
Learning the verbal
interaction isfullyautomatic
anddoesnot require manual
text annotations.
90. Refning MedicalBoxLabels with Graph Search
BoxNet: Deep Learning Based Biomedical Image Segmentation Using Boxes Only Annotation Lin Yang et al. (2 June 2018)
https://arxiv.org/abs/1806.00593
”In this paper, we presented a new weakly supervised DL approach for biomedical image segmentation using boxes only annotation that
can achieve nearly the same performance compared to fully supervised DL methods. Our new method provides a more efcient way to
annotatetrainingdata forbiomedicalimage segmentationapplications,andcanpotentiallysaveconsiderablemanual eforts.”
C
V
S
E
E
N
I
92. InteractiveMedicalSegmentation BIFSeg
InteractiveMedicalImageSegmentationUsingDeep
LearningWithImage-SpecifcFineTuning
GuotaiWanget al. (July 2018) http://doi.org/10.1109/TMI.2018.2791721
The proposed Bounding box and Image-specifc Fine-tuning-based
Segmentation (BIFSeg). 2D images are shown as examples. During training,
each instanceiscroppedwithitsbounding box,andthe CNNistrainedfor binary
segmentation. In the testing stage, image-specifc fne-tuning with optional
scribbles and a weighted loss function is used. Note that the object class (e.g.
maternalkidneys)fortestingmayhavenotbeenpresentinthetrainingset.
93. InteractiveMedicalSegmentation
LearningtoSegmentMedicalImageswith
Scribble-SupervisionAlone Yigit B.Can etal.
12July2018https://arxiv.org/abs/1807.04668
In order to prevent segmentation errors from early
recursions from propagating we investigate the following
strategy to reset labels predicted with insufcient
certainty after each E step. We add dropout with
probability 0.5 to the 5 innermost blocks of our U-Net
architecture during training. In order to estimate the new
optimal labeling z we perform∗ 50 forward passes
withdropoutsimilarto(Kendalletal.2015).
Rather than a single output this yields a distribution of logits
and softmax outputs for each pixel and label. We then
compare the logits distributions of the label with the
highest and second highest softmax mean for each pixel
usingaWelch’st-test.
Generation of Seed Areas by Region Growing For
this step we use the random walk-based segmentation
method proposed by Grady(2006), which (similar to neural
networks) produces a pixel-wise probability map for
each label. We assign each pixel its predicted value only if
the probability exceeds a threshold τ . Otherwise the pixel-
labelistreatedasunknown.
We observe that a) the recursive training regime led to
substantial improvements over non-recursive training, b)
thedropout baseduncertainty wasresponsibleforthe
largestimprovements,
94. InteractiveMedicalSegmentation
Auser-guidedtoolforsemi-automated
cerebralmicrobleeddetectionandvolume
segmentation:Evaluatingvascularinjuryand
datalabellingformachinelearning
Melanie A. Morrison, SedyedmehdiPayabvash, YichengChen, SivakamiAvadiappan, Mihir
Shah, Xiaowei Zou, ChristopherP. Hess, JanineM.Lupo
NeuroImage: Clinical(4August2018)
https://doi.org/10.1016/j.nicl.2018.08.002
Screencaptureoftheuser interface. Exampleofthereportautomaticallygeneratedinthefnalstepofthe
algorithm.
“Future studies will incorporate a serial tracking feature into our
algorithm, that will frst align the datasets and then perform
automated CMB detection on the most recent serial scan where we
expect to see the highest CMB burden, as well as utilize the labelled
FP mimics in a CNN to eliminate the need for user-guided false
positive removal and create a fully automated tool. “
95. InteractiveMedicalSegmentation Weakly-supervised3DMasks
AccurateWeaklySupervisedDeepLesionSegmentationonCTScans:Self-Paced 3DMaskGenerationfromRECIST
JinzhengCai, YoubaoTang, LeLu, Adam P. Harrison, Ke Yan, JingXiao, Lin Yang, Ronald M. Summerset al.
25Jan2018 2July2018→ https://arxiv.org/abs/1801.08614 |https://arxiv.org/abs/1806.09507 |https://arxiv.org/abs/1807.01172
Because manual 3D segmentation is prohibitively
time consuming and requires radiological experience,
current practices rely on an imprecise surrogate called
response evaluation criteria in solid tumors (RECIST).
Despite their coarseness, RECIST marks are
commonly found in current hospital picture and
archiving systems(PACS),meaning theycanprovidea
potentially powerful, yet extraordinarily challenging,
source of weak supervision for full 3D segmentation.
Toward this end, we introduce a convolutional neural
network based weakly supervised self-paced
segmentation(WSSS)methodto
1) generate the initial lesion segmentation on
theaxialRECIST-slice;
2)learnthedatadistribution onRECIST-slices;
3) adapt to segment the whole volume slice by
slicetofnallyobtainavolumetricsegmentation.
In addition, we explore how super-resolution
images (2~5 times beyond the physical CT imaging),
generated from a proposed stacked generative
adversarialnetwork,canaidtheWSSSperformance.
he DeepLesion dataset [Yanetal.2017] consists of 8
categories of lesions. From left to right and top to bottom,
the lesion categories lung, mediastinum, liver, soft-tissue,
abdomen, kidney, pelvis and bone, respectively. For all
images, the RECIST-slices are shown with manually
delineated boundary in red and bookmarked RECIST
diametersinwhite.
96. Imagerestoration with segmentationandautomaticlabelling?
CTImageEnhancementUsingStackedGenerativeAdversarialNetworksand TransferLearningforLesion
SegmentationImprovement
YoubaoTang, JinzhengCai, LeLu, Adam P. Harrison, Ke Yan, JingXiao, Lin Yang, Ronald M. Summers
(Submittedon18Jul2018) https://arxiv.org/abs/1807.07144
Automated lesion segmentation from
computed tomography (CT) is an important
and challenging task in medical image analysis.
While many advancements have been made,
there is room for continued
improvements.
One hurdle is that CT images can exhibit high
noise and low contrast, particularly in lower
dosages. To address this, we focus on a
preprocessing method for CT images that uses
stacked generative adversarial networks
(SGAN) approach. The frst GAN reduces the
noise in the CT image and the second GAN
generates a higher resolution image with
enhanced boundariesand high contrast.
To make up for the absence of high quality CT
images, we detail how to synthesize a large
number of low- and high-quality natural
images and use transfer learning with
progressively larger amountsofCT images.
INPUT BM3D DnCNN Single GAN
Our
denoising
GAN
Our
SGAN
Three examples of CT image enhancement results using
diferentmethodsonoriginalimages
97. InteractiveMedicalSegmentation InterCNN
IterativeInteractionTrainingforSegmentation
EditingNetworks Gustav Bredell et al.
23July2018https://arxiv.org/abs/1807.08555
“Often users want to edit the
segmentation to their own needs
and will need diferent tools for this.
There has been methods
developed to edit segmentations of
automatic methods based on the
user input, primarily for binary
segmentations. Here however, we
present an unique training strategy
for convolutional neural networks
(CNNs) trained on top of an
automatic method to enable
interactive segmentation
editing that is not limited to binary
segmentation. By utilizing a robot-
user during training, we closely
mimic realistic use cases to achieve
optimal editing performance. In
addition, we show that an increase
of the iterative interactions during
the training process up to ten
improves the segmentation editing
performancesubstantially.”
Illustration of interactive segmentation editing networks. (a)
generation of initial prediction with base segmentation and frst
user input, (b) interactive improvement loop with proposed
interCNN. Here, we use a CNN for the base segmentation
algorithm for demonstration but other methods can be used.
interCNN can be applied iteratively until the segmentation is
satisfactory. During training, to make it feasible, the user is replaced
by a robot user that places scribbles based on the discrepancy
between ground truth and predicted segmentations for the
trainingimages.
Weused a U-Net architecture for both the autoCNN and
interCNN. Ithasbeenshown that thisarchitecture produces
automaticsegmentation resultson medical imagesthat is
comparable tomore complexarchitectures.
Interactive segmentation editing networks, which we refer to as
interCNN, are trained on top of a base segmentation algorithm,
specifcally to interpret user inputs and make appropriate
adjustments to the predictions of the base algorithm. During test
time, an interCNN sees the image, initial predictions of the base
algorithm and user edits in the form of scribbles, and combines
alltocreateanewsegmentation
Ideally, human users should provide the scribbles during
the training, however, this is clearly infeasible and a robot
user is often utilized to provide the scribbles and has been
showntoperformwell.
98. Beyondone’sowndomain pooldatatogether
CombiningHeterogeneouslyLabeledDatasetsFor
TrainingSegmentationNetworks
JanaKemnitz, Christian F. Baumgartner, WolfgangWirth, Felix Eckstein, SebastianK. Eder,Ender Konukoglu
https://arxiv.org/abs/1807.08935
The performance of CNNs strongly
dependson thesizeofthetrainingdata
and combining data from diferent
sources is an efective strategy for
obtaining larger training datasets.
However, this is often challenged by
heterogeneous labeling of the
datasets. For instance, one of the
dataset may be missing labels or a
number of labels may have been
combined into a super label. In this
work we propose a cost function
which allows integration of
multiple datasets with
heterogeneous label subsets into a
joint training. We evaluated the
performance of this strategy on thigh
MR and a cardiac MR datasets in
which we artifcially merged labels for
halfofthedata.
100. FundusImage Labeling for pathologies
Labelingfundusimagesforclassifcation models
https://www.slideshare.net/PetteriTeikariPhD/labeling-fundus-images-for-classifcation-models
Technical implementation quite
straight-forward
(e.g.with FastAnnotationSingleObject you
couldlabelroughly 750imagesinanhour)
Downside is that you need clinician or a
professional grader to interpret the images for
you (their time isexpensive).
And you would preferably have a consensus of
experts labeling even increasing the cost of
labelling
101. ROIDefnition Subjective,depending onyourwholepipe
You might want to label exudates, lesions, hemorrhages, microaneurysms for clinical
interest, but you might just to annotate optic disc, macula, pupil or similar ROI for
“automatic cropping” with detection network such as R-CNN for making the subsequent
deep learning easier with only the relevant image data as the input to the network
http://arxiv.org/abs/1706.09634
First column: Input image fed to the Fully Convolutional Network (FCN). Second column: Output of the
FCN (p in the text). This can be considered a saliency map of object locations. Third column: Result of
thresholding the output of the FCN. This is a binary image. Forth column: The estimated object locations
are marked with ared dot. https://arxiv.org/abs/1806.07564
103. Fundus/OCT Image ImageQualityAnnotation
ZeissStratusOCT ImageQualityAssessment
H. Ishikawa;G. Wollstein;M. Aoyama; D. Stein; S. Beaton; J.G.
Fujimoto; J.S. Schuman (Submitted on24 Jul 2004) Cited by 3
https://iovs.arvojournals.org/article.aspx?articleid=2408859
MethodsforQuantifcationofImageQualityin
OCTAngiographyImages
Carmen J. Yoo, Michael Chen, MaryK. Durbin, ZhongdiChu, Ruikang
K. Wang, Chieh-LiChen, Jesse J. Jung, ScottLee
ARVO2016 Annual MeetingAbstracts
Cirrus AngioPlex OCTA (Carl Zeiss Meditec, Dublin, CA) 3X3 images
were acquired from 20 eyes of 20 subjects. Four image quality
metrics were calculated: connectivity of angiogram, angiogram
contrast, angiogram signal to noise ration (aSNR) and the number of
connected components (NCC). These were correlated to mean
subjective gradeImage quality can be evaluated by trained graders
with good repeatability. . Overall, there is a poor correlation
between qualitative and quantitative assessment, but an
objective parameter that correlates well to subjective assessment is
the NCC. This may be due to the emphasis of the qualitative criterion
on pathologyassessment and diagnosis.
Impactofeye-trackingtechnologyonOCT-
angiographyimagingqualityinage-related
maculardegeneration
J. L. Lauermann, M. Treder, P. Heiduschka, C. R. Clemens, N. Eter, F.
Alten(2017)https://doi.org/10.1007/s00417-017-3684-z
In patients with AMD, active eye tracking technology ofers an
improved image quality in OCT-A imaging regarding presence of
motion artifactsat the expense ofhigher acquisition time.
DeepLearningforImageQualityAssessmentofFundusImagesinRetinopathyofPrematurity
AaronS. Coyner et al. (ARVO2018)
https://www.aaroncoyner.io/conferences/arvo2018.pdf
DeepLearningforAutomatedQualityAssessmentof ColorFundusImagesinDiabetic
RetinopathyScreening
Sajib Kumar Saha, BasuraFernando, Jorge Cuadros, DiXiao, Yogesan Kanagasingam (Submitted on 7 Mar 2017)
https://arxiv.org/abs/1703.02511
EyeQual:Accurate,Explainable,RetinalImageQualityAssessment
PedroCosta; AurelioCampilho; Bryan Hooi;Asim Smailagic ;KrisKitani ;ShenghuaLiu ;ChristosFaloutsos;Adrian Galdran
201716th IEEE International Conference on MachineLearningand Applications(ICMLA)
https://doi.org/10.1109/ICMLA.2017.0-140
In the future we want to evaluate our method on a larger dataset before deploying it in real screenings. As EyeQual is
efectively instantaneous running on a stock GPU, it could be implemented directly on a fundus camera in order to
provide immediate feedback to the technician on whether she should take another picture. Also, the heatmaps
produced byEyeQual should be visuallyvalidated by ophthalmologists.
104. FundusImage ”Personalized”ImageQualityAnnotation
UserLoss--AForced-Choice-Inspired
ApproachtoTrainNeuralNetworksdirectlyby
UserInteractionShahab Zarei, Bernhard Stimpel,
Christopher Syben, AndreasMaier
https://arxiv.org/abs/1807.09303 (Submitted on 24Jul 2018)
In this paper, we investigate whether is it possible
to train a neural network directly from user inputs.
We consider this approach to be highly relevant
for applications in which the point of optimality
is not well-defned and user-dependent. Our
application is medical image denoising which is
essential in fuoroscopy imaging. In this feld every
user, i.e. physician, has a diferent favor and
image quality needs to be tailored towards
eachindividual.
In the experimental results, we demonstrate that
two image experts who prefer diferent flter
characteristics between sharpness and de-
noising can be created using our approach. Also
models trained for a specifc user perform best on
this users test data. This approach opens the way
towards implementation of direct user feedback in
deep learning and is applicable for a wide range of
application.
Some users are distracted by noise and prefer strong de-
noising while others prefer crisp and sharp images. Another
requirement for our user loss is that we want to spend only
few clicks for training. As such we have to deal with the
problem of having only few training samples, as we cannot
ask your users to click more than 50 to 100 times. In order to
still work in the regime of deep learning, we employ a
framework coined precision learning that is able to map
known operators and algorithms onto deep learning
architectures [Maieretal. 2017]. In literature this approach is
known to be able to reduce maximal error bounds of the
learning problem and to reduce the number of
requiredtraining samples [Sybenetal. 2018].
Syben etal. 2018b demonstrated the same framework in the
contextof Hybrid MRI/X-ray imaging wheretransformation of
the parallel-beam MRI projections to fan-beam X-ray
projectionsisrequired
105. OCT ROIDetection
Optovue Oct P2Avanti3 Dscan
https://www.optovue.com/oct
Onlypartofyouracquired OCT cube
might consistofthe actual retina
http://doi.org/10.1364/BOE.3.001182
2DSlice-by-slice
Detection
e.g. with Faster R-CNNby
Ren etal.(2015)
Citedby4040 articles
3DVolume Detection
e.g.with FasterR-CNN by
FasterRenetal.(2015)