Self driving computers active learning workflows with human interpretable vector spaces (2)

•Download as ODP, PDF•

1 like•588 views

Human in the loop learning workflows leveraging deep learning to group and cluster data. Also, techniques for accounting for machine learning failures.

Data & Analytics

Founded 　 2014
Funding 　 6.5 Million USD 　　
Investors 　 Y Combinator, Tencent 　
Customers 　 Over 25 large enterprises and governments, DL4J gets over 160,000 donloads a month
Employees Around 40 (mostly engineers, includes Ph. Ds)
About Skymind

Production is part of your Training Set
• Edge cases exist in your data
• Imbalanced classes are a problem
• Data/Trends can change over time
• Expanded scope of problem due to
unforeseen difficulties or new business
problem

Human in The Loop
• Allow humans to have input
• Use Deep Learning to create friendly
vector spaces to inspect
• Use probabilities from models and
decision boundaries to control behavior
• More thorough data analysis to
understand outliers
• Human helps update models

Friendly Vector Spaces
• Word Embeddings
• Transfer Learning Feature Extractors
• Autoencoder Bottlenecks as an
embedding space

Word Embeddings: A 2 minute primer
• Do an SGD variant on co located pairs
of words minimizing a distance function
between the 2 words
• Run sparse SGD updates on various
rows (each word is a row)
• Various ways of computing accuracy

Transfer Learning: A 2 minute primer
• Download a pre-trained neural net
architecture (usually cnn)
• Tune final Layer if doing classification
• Otherwise just use feature extractor as
a compression algorithm for high
dimensional images
• Intuition is similar to layerwise
pretraining of old

Join raw data Transform
Feed groups into autoencoder
and save reconstruction error
of center
Input Data Reconstruction
Autoencoders
13

Learns to cover more of vector space over time as
reconstruction error goes down
Auto-Encoder learning process
14

Auto-Encoders: A 2 minute primer
• Minimize KL Divergence (see previous
slide) between reconstruction and input
• Learn a bottleneck low dimensional
vector for use in other algos or
visualizations

Different kinds of auto-encoders
Variational
Autoencoders
GANs (there are
1000s I am not
covering them all
here)
https://github.com/kozi
str/Awesome-GANs

Commonalities
• Latent vector spaces automatically
learned via SGD
• Low dimension vectors meant to be
consumed externally

Consuming
• Kmeans
• KNN Algos
• Visualization (UMap,Tsne, LargeVis)

Using Kmeans
• Tune with target number of classes
• Use as a way of seeing how the neural
net groups your data in to classes
• Pseudo labeling mechanism
• Key: Run on latent vector space

Various kinds of KNN
• RPTrees (neighbor of my neighbors is
also likely my neighbor)
• VPTrees (Segment the space in to
quadrants, repeated updates using
trees to index vector space
• KDTrees

Visualization
• UMap
• Barnes Hut Tsne
• LargeVis
• All dimensionality reduction algorithms
focused on building a coordinate space
via similarities in the vector space

Dataset troubleshooting
• Examine class imbalance
• Weighted loss functions
• Resampling
• Decision thresholds (mainly for binary
classification)

Self driving computers active learning workflows with human interpretable vector spaces (2)

What's hot

Deep Learning on Apache SparkDash Desai

First steps with Keras 2: A tutorial with ExamplesFelipe

World Artificial Intelligence Conference Shanghai 2018Adam Gibson

Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf

Anomaly detection in deep learning (Updated) EnglishAdam Gibson

Kaz Sato, Evangelist, Google at MLconf ATL 2016MLconf

Keras: Deep Learning Library for PythonRafi Khan

Productionizing dl from the ground upAdam Gibson

Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman

Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf

Deploy Deep Learning Models with TensorFlow + LambdaGreg Werner

A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks

Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16Romeo Kienzler

Hadoop summit 2016Adam Gibson

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf

Deep learning with TensorFlowNdjido Ardo BAR

Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Databricks

Machine Learning and HadoopJosh Patterson

Operation Point Cluster - Blue Raster Esri Developer Summit 2013 PresentationBlue Raster

DeepLearning4J and Spark: Successes and Challenges - François GarillotSteve Moore

What's hot (20)

Deep Learning on Apache Spark

First steps with Keras 2: A tutorial with Examples

World Artificial Intelligence Conference Shanghai 2018

Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016

Anomaly detection in deep learning (Updated) English

Kaz Sato, Evangelist, Google at MLconf ATL 2016

Keras: Deep Learning Library for Python

Productionizing dl from the ground up

Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark

Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016

Deploy Deep Learning Models with TensorFlow + Lambda

A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...

Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16

Hadoop summit 2016

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016

Deep learning with TensorFlow

Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...

Machine Learning and Hadoop

Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation

DeepLearning4J and Spark: Successes and Challenges - François Garillot

Similar to Self driving computers active learning workflows with human interpretable vector spaces (2)

Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH

No BS Guide to Deep Learning in the EnterpriseJesus Rodriguez

NLP and Deep Learning for non_expertsSanghamitra Deb

Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Amazon Web Services

Software Architecture and Architectors: useless VS valuableComsysto Reply GmbH

Constrained Optimization with Genetic Algorithms and Project BonsaiIvo Andreev

Building Big Data Streaming ArchitecturesDavid Martínez Rego

Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin

The Diabolical Developers Guide to Performance TuningjClarity

Eric Proegler Oredev Performance Testing in New ContextsEric Proegler

Data Structure and Algorithmsiqbalphy1

10 Things I Wish I Dad Known Before Scaling Deep Learning SolutionsJesus Rodriguez

Deep learning for NLPShishir Choudhary

Domain Driven Design Big Picture Strategic PatternsMark Windholtz

The Challenges of Bringing Machine Learning to the MassesAlice Zheng

Vba Class Level 3Ben Miu CIM® FCSI A+

Webcast: DevOps in AWS is different! How can containers help? Applatix

Training Neural NetworksDatabricks

OOSE UNIT-1.pdfKarumuriJayasri

Similar to Self driving computers active learning workflows with human interpretable vector spaces (2) (20)

Architectural Decisions: Smoothly and Consistently

No BS Guide to Deep Learning in the Enterprise

NLP and Deep Learning for non_experts

Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...

Software Architecture and Architectors: useless VS valuable

Constrained Optimization with Genetic Algorithms and Project Bonsai

Building Big Data Streaming Architectures

Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution

The Diabolical Developers Guide to Performance Tuning

Eric Proegler Oredev Performance Testing in New Contexts

Data Structure and Algorithms

10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions

Deep learning for NLP

Domain Driven Design Big Picture Strategic Patterns

The Challenges of Bringing Machine Learning to the Masses

Vba Class Level 3

Webcast: DevOps in AWS is different! How can containers help?

Training Neural Networks

OOSE UNIT-1.pdf

Recently uploaded

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

Call Girls in Saket 99530🔝 56974 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda

How we prevented account sharing with MFAAndrei Kaleshka

B2 Creative Industry Response Evaluation.docxStephen266013

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

Recently uploaded (20)

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

9654467111 Call Girls In Munirka Hotel And Home Service

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

Call Girls in Saket 99530🔝 56974 Escort Service

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Customer Service Analytics - Make Sense of All Your Data.pptx

How we prevented account sharing with MFA

B2 Creative Industry Response Evaluation.docx

20240419 - Measurecamp Amsterdam - SAM.pdf

E-Commerce Order PredictionShraddha Kamble.pptx

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

Self driving computers active learning workflows with human interpretable vector spaces (2)

2. Founded 　 2014 Funding 　 6.5 Million USD 　　 Investors 　 Y Combinator, Tencent 　 Customers 　 Over 25 large enterprises and governments, DL4J gets over 160,000 donloads a month Employees Around 40 (mostly engineers, includes Ph. Ds) About Skymind

3. Production is part of your Training Set • Edge cases exist in your data • Imbalanced classes are a problem • Data/Trends can change over time • Expanded scope of problem due to unforeseen difficulties or new business problem

4. Why is this Important?

5. Crashes Happen

6. What can we do?

7. Human in The Loop • Allow humans to have input • Use Deep Learning to create friendly vector spaces to inspect • Use probabilities from models and decision boundaries to control behavior • More thorough data analysis to understand outliers • Human helps update models

8. Friendly Vector Spaces • Word Embeddings • Transfer Learning Feature Extractors • Autoencoder Bottlenecks as an embedding space

9. Word Embeddings Fast Text Word2vec

10. Word Embeddings: A 2 minute primer • Do an SGD variant on co located pairs of words minimizing a distance function between the 2 words • Run sparse SGD updates on various rows (each word is a row) • Various ways of computing accuracy

11. Transfer Learning

12. Transfer Learning: A 2 minute primer • Download a pre-trained neural net architecture (usually cnn) • Tune final Layer if doing classification • Otherwise just use feature extractor as a compression algorithm for high dimensional images • Intuition is similar to layerwise pretraining of old

13. Join raw data Transform Feed groups into autoencoder and save reconstruction error of center Input Data Reconstruction Autoencoders 13

14. Learns to cover more of vector space over time as reconstruction error goes down Auto-Encoder learning process 14

15. Auto-Encoders: A 2 minute primer • Minimize KL Divergence (see previous slide) between reconstruction and input • Learn a bottleneck low dimensional vector for use in other algos or visualizations

16. Different kinds of auto-encoders Variational Autoencoders GANs (there are 1000s I am not covering them all here) https://github.com/kozi str/Awesome-GANs

17. Commonalities • Latent vector spaces automatically learned via SGD • Low dimension vectors meant to be consumed externally

18. Various ways of consuming

19. Consuming • Kmeans • KNN Algos • Visualization (UMap,Tsne, LargeVis)

20. KMeans

21. Using Kmeans • Tune with target number of classes • Use as a way of seeing how the neural net groups your data in to classes • Pseudo labeling mechanism • Key: Run on latent vector space

22. KNN

23. Various kinds of KNN • RPTrees (neighbor of my neighbors is also likely my neighbor) • VPTrees (Segment the space in to quadrants, repeated updates using trees to index vector space • KDTrees

24. RPTrees

25. VPTrees

26. KDTrees

27. Visualization

28. Visualization • UMap • Barnes Hut Tsne • LargeVis • All dimensionality reduction algorithms focused on building a coordinate space via similarities in the vector space

29. UMap Just use this ->

30. Dataset troubleshooting • Examine class imbalance • Weighted loss functions • Resampling • Decision thresholds (mainly for binary classification)

31. Integrations/Workflow

32. We achieve this workflow with Discover

Editor's Notes

スカイマインド株式会社カントリーマネージャー　堀と申します。親会社のスカイマインドインクは2014年に設立、ディープラーニングに特化したソフトウェア会社で、サンフランシスコが本社です。日本を中心としたアジア・アメリカ・欧州のファーストクラスエンジニアを多数抱えています。 &lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
「ファーストＡｐｐ」は、ディープラーニングデータを可視化して「診断する」ツールです。これには2つの特徴があります。 ①まず、ＵＩがシンプルでわかりやすいデータサイエンティストでなくても、データをインポートするだけで、データの品質を瞬時に読み取れます。 ②次に、安価に拡張できる 24時間365日サポート付きの統合プラットフォーム上のアップですから、安心してスケールアップできます。 &lt;number&gt;

Self driving computers active learning workflows with human interpretable vector spaces (2)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Self driving computers active learning workflows with human interpretable vector spaces (2)

Similar to Self driving computers active learning workflows with human interpretable vector spaces (2) (20)

More from Adam Gibson

More from Adam Gibson (18)

Recently uploaded

Recently uploaded (20)

Self driving computers active learning workflows with human interpretable vector spaces (2)

Editor's Notes