Human in the loop learning workflows leveraging deep learning to group and cluster data. Also, techniques for accounting for machine learning failures.
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Self driving computers active learning workflows with human interpretable vector spaces (2)
1.
2. Founded 2014
Funding 6.5 Million USD
Investors Y Combinator, Tencent
Customers Over 25 large enterprises and governments, DL4J gets over 160,000 donloads a month
Employees Around 40 (mostly engineers, includes Ph. Ds)
About Skymind
3. Production is part of your Training Set
• Edge cases exist in your data
• Imbalanced classes are a problem
• Data/Trends can change over time
• Expanded scope of problem due to
unforeseen difficulties or new business
problem
7. Human in The Loop
• Allow humans to have input
• Use Deep Learning to create friendly
vector spaces to inspect
• Use probabilities from models and
decision boundaries to control behavior
• More thorough data analysis to
understand outliers
• Human helps update models
8. Friendly Vector Spaces
• Word Embeddings
• Transfer Learning Feature Extractors
• Autoencoder Bottlenecks as an
embedding space
10. Word Embeddings: A 2 minute primer
• Do an SGD variant on co located pairs
of words minimizing a distance function
between the 2 words
• Run sparse SGD updates on various
rows (each word is a row)
• Various ways of computing accuracy
12. Transfer Learning: A 2 minute primer
• Download a pre-trained neural net
architecture (usually cnn)
• Tune final Layer if doing classification
• Otherwise just use feature extractor as
a compression algorithm for high
dimensional images
• Intuition is similar to layerwise
pretraining of old
13. Join raw data Transform
Feed groups into autoencoder
and save reconstruction error
of center
Input Data Reconstruction
Autoencoders
13
14. Learns to cover more of vector space over time as
reconstruction error goes down
Auto-Encoder learning process
14
15. Auto-Encoders: A 2 minute primer
• Minimize KL Divergence (see previous
slide) between reconstruction and input
• Learn a bottleneck low dimensional
vector for use in other algos or
visualizations
16. Different kinds of auto-encoders
Variational
Autoencoders
GANs (there are
1000s I am not
covering them all
here)
https://github.com/kozi
str/Awesome-GANs
17. Commonalities
• Latent vector spaces automatically
learned via SGD
• Low dimension vectors meant to be
consumed externally
21. Using Kmeans
• Tune with target number of classes
• Use as a way of seeing how the neural
net groups your data in to classes
• Pseudo labeling mechanism
• Key: Run on latent vector space
23. Various kinds of KNN
• RPTrees (neighbor of my neighbors is
also likely my neighbor)
• VPTrees (Segment the space in to
quadrants, repeated updates using
trees to index vector space
• KDTrees
28. Visualization
• UMap
• Barnes Hut Tsne
• LargeVis
• All dimensionality reduction algorithms
focused on building a coordinate space
via similarities in the vector space