For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/an-introduction-to-data-augmentation-techniques-in-ml-frameworks-a-presentation-from-amd/
Rajy Rawther, PMTS Software Architect at AMD, presents the “Introduction to Data Augmentation Techniques in ML Frameworks” tutorial at the May 2021 Embedded Vision Summit.
Data augmentation is a set of techniques that expand the diversity of data available for training machine learning models by generating new data from existing data. This talk introduces different types of data augmentation techniques as well as their uses in various training scenarios.
Rawther explores some built-in augmentation methods in popular ML frameworks like PyTorch and TensorFlow. She also discusses some tips and tricks that are commonly used to randomly select parameters to avoid having model overfit to a particular dataset.
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Introduction to Bayesian classifier. It describes the basic algorithm and applications of Bayesian classification. Explained with the help of numerical problems.
Selection of the optimal parameters for machine learning tasks is challenging. Some results may be bad not because the data is noisy or the used learning algorithm is weak, but due to the bad selection of the parameters values. This presentation gives a brief introduction about evolutionary algorithms (EAs) and describes genetic algorithm (GA) which is one of the simplest random-based EAs. A step-by-step example is given in addition to its implementation in Python 3.5.
---------------------------------
Read more about GA:
Yu, Xinjie, and Mitsuo Gen. Introduction to evolutionary algorithms. Springer Science & Business Media, 2010.
https://www.kdnuggets.com/2018/03/introduction-optimization-with-genetic-algorithm.html
https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/an-introduction-to-data-augmentation-techniques-in-ml-frameworks-a-presentation-from-amd/
Rajy Rawther, PMTS Software Architect at AMD, presents the “Introduction to Data Augmentation Techniques in ML Frameworks” tutorial at the May 2021 Embedded Vision Summit.
Data augmentation is a set of techniques that expand the diversity of data available for training machine learning models by generating new data from existing data. This talk introduces different types of data augmentation techniques as well as their uses in various training scenarios.
Rawther explores some built-in augmentation methods in popular ML frameworks like PyTorch and TensorFlow. She also discusses some tips and tricks that are commonly used to randomly select parameters to avoid having model overfit to a particular dataset.
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Introduction to Bayesian classifier. It describes the basic algorithm and applications of Bayesian classification. Explained with the help of numerical problems.
Selection of the optimal parameters for machine learning tasks is challenging. Some results may be bad not because the data is noisy or the used learning algorithm is weak, but due to the bad selection of the parameters values. This presentation gives a brief introduction about evolutionary algorithms (EAs) and describes genetic algorithm (GA) which is one of the simplest random-based EAs. A step-by-step example is given in addition to its implementation in Python 3.5.
---------------------------------
Read more about GA:
Yu, Xinjie, and Mitsuo Gen. Introduction to evolutionary algorithms. Springer Science & Business Media, 2010.
https://www.kdnuggets.com/2018/03/introduction-optimization-with-genetic-algorithm.html
https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad
디지털 카메라와 모바일 카메라가 널리 보급되면서 우리는 언제 어디서나 디지털 이미지를 촬영할 수 있게 되었고, 촬영한 디지털 이미지를 소셜 네트워크 서비스, 모바일 메신저 등 다양한 인터넷 서비스들을 이용하여 전송하거나 공유하며 살고 있다. 우리는 디지털 이미지를 볼 때 우리는 그것이 사실이라고 믿는 경향이 있지만 많은 디지털 이미지는 가짜이며 실제 일어나지 않은 장면을 담고 있다. 이러한 가짜 이미지는 가짜 뉴스, 보고서 조작 등 다양한 곳에서 악용되어 다양한 사회 문제를 야기시킨다. 가짜 이미지로 발생하는 여러 사회 문제를 예방하거나 해결하기 위해 많은 연구자들은 이미지의 조작을 탐지하는 기술을 제안하였지만 기존의 이미지 조작 탐지 기술은 제한적인 특정 이미지 환경에서만 작동하기 때문에 실제 이미지 유통 환경에서 사용하기가 거의 불가능한 실정이다. 본 발표에서는 약 2년동안 이미지 조작 탐지 서비스에서 의뢰된 다양한 실제 이미지들에 대해 소개한 뒤, 의뢰된 JPEG 이미지로부터 추출한 1120개의 quantization table을 이용해 생성한 데이터셋에 대해서 설명한다. 또한 single JPEG과 double JPEG을 구분할 수 있는 네트워크 구조를 제안한 뒤, 이를 이용하여 JPEG 이미지에서 발생하는 다양한 조작을 탐지하는 방법에 대해 소개한다.
Paraphrase detection is an academically challenging NLP problem of detecting whether multiple phrases have the same meaning. In this talk, we’ll go through the existing traditional and deep learning approaches for this task, and see how they apply in practice as a silver-winning solution to the popular Kaggle Quora Question Pairs competition.
生成式對抗網路 (Generative Adversarial Network, GAN) 顯然是深度學習領域的下一個熱點,Yann LeCun 說這是機器學習領域這十年來最有趣的想法 (the most interesting idea in the last 10 years in ML),又說這是有史以來最酷的東西 (the coolest thing since sliced bread)。生成式對抗網路解決了什麼樣的問題呢?在機器學習領域,回歸 (regression) 和分類 (classification) 這兩項任務的解法人們已經不再陌生,但是如何讓機器更進一步創造出有結構的複雜物件 (例如:圖片、文句) 仍是一大挑戰。用生成式對抗網路,機器已經可以畫出以假亂真的人臉,也可以根據一段敘述文字,自己畫出對應的圖案,甚至還可以畫出二次元人物頭像 (左邊的動畫人物頭像就是機器自己生成的)。本課程希望能帶大家認識生成式對抗網路這個深度學習最前沿的技術。
How to optimize with the help of the Particle Swarm Optimization Technique and xlOptimizer ? This brief tutorial will enable you to solve any optimization problem with the application of Particle Swarm Optimization Method. After a brief introduction about the method the tutorial will show you the steps that you will need to follow for application of PSO in optimization even if you do not know any programming.(Some basic knowledge of MS Excel 2010 and later is required).
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
디지털 카메라와 모바일 카메라가 널리 보급되면서 우리는 언제 어디서나 디지털 이미지를 촬영할 수 있게 되었고, 촬영한 디지털 이미지를 소셜 네트워크 서비스, 모바일 메신저 등 다양한 인터넷 서비스들을 이용하여 전송하거나 공유하며 살고 있다. 우리는 디지털 이미지를 볼 때 우리는 그것이 사실이라고 믿는 경향이 있지만 많은 디지털 이미지는 가짜이며 실제 일어나지 않은 장면을 담고 있다. 이러한 가짜 이미지는 가짜 뉴스, 보고서 조작 등 다양한 곳에서 악용되어 다양한 사회 문제를 야기시킨다. 가짜 이미지로 발생하는 여러 사회 문제를 예방하거나 해결하기 위해 많은 연구자들은 이미지의 조작을 탐지하는 기술을 제안하였지만 기존의 이미지 조작 탐지 기술은 제한적인 특정 이미지 환경에서만 작동하기 때문에 실제 이미지 유통 환경에서 사용하기가 거의 불가능한 실정이다. 본 발표에서는 약 2년동안 이미지 조작 탐지 서비스에서 의뢰된 다양한 실제 이미지들에 대해 소개한 뒤, 의뢰된 JPEG 이미지로부터 추출한 1120개의 quantization table을 이용해 생성한 데이터셋에 대해서 설명한다. 또한 single JPEG과 double JPEG을 구분할 수 있는 네트워크 구조를 제안한 뒤, 이를 이용하여 JPEG 이미지에서 발생하는 다양한 조작을 탐지하는 방법에 대해 소개한다.
Paraphrase detection is an academically challenging NLP problem of detecting whether multiple phrases have the same meaning. In this talk, we’ll go through the existing traditional and deep learning approaches for this task, and see how they apply in practice as a silver-winning solution to the popular Kaggle Quora Question Pairs competition.
生成式對抗網路 (Generative Adversarial Network, GAN) 顯然是深度學習領域的下一個熱點,Yann LeCun 說這是機器學習領域這十年來最有趣的想法 (the most interesting idea in the last 10 years in ML),又說這是有史以來最酷的東西 (the coolest thing since sliced bread)。生成式對抗網路解決了什麼樣的問題呢?在機器學習領域,回歸 (regression) 和分類 (classification) 這兩項任務的解法人們已經不再陌生,但是如何讓機器更進一步創造出有結構的複雜物件 (例如:圖片、文句) 仍是一大挑戰。用生成式對抗網路,機器已經可以畫出以假亂真的人臉,也可以根據一段敘述文字,自己畫出對應的圖案,甚至還可以畫出二次元人物頭像 (左邊的動畫人物頭像就是機器自己生成的)。本課程希望能帶大家認識生成式對抗網路這個深度學習最前沿的技術。
How to optimize with the help of the Particle Swarm Optimization Technique and xlOptimizer ? This brief tutorial will enable you to solve any optimization problem with the application of Particle Swarm Optimization Method. After a brief introduction about the method the tutorial will show you the steps that you will need to follow for application of PSO in optimization even if you do not know any programming.(Some basic knowledge of MS Excel 2010 and later is required).
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
3. Supervised Learning
Supervised Learning: etiketli verilerden
öğrenmedir. Yapay öğrenmede yoğun olarak
kullanılır.
Ör: Spam mailleri önemli mesajlardan ayıran
bir email sınıflayıcısını ele alalım:
5. Supervised Learning
M örnek kümesi Destek Vektör Makineleri, Karar
Ağaçları gibi yöntemlerle eğitilir.
Elde edilen eğitim kümesi yeni emaillerin
ayıklanmasında kullanılır.
7. UnSupervised Learning
Etiketli veri elde etmek pahalıdır ve her zaman
verileri etiketlemek mümkün olamayabilir.
Bunun yanında,
Etiketsiz veriler çok daha ucuzdur.
8. UnSupervised Learning
Görev: Konuşma analizi
Telefon Görüşme Kayıtları
1 saatlik veriyi tanımlayabilmek için 400 saatlik
fonetik seviyede veri setine ihtiyaç duyulur
film f ih_n uh_gl_n m
be all bcl b iy iy_tr ao_tr ao l_dl
9. Etiketli küçük veri setleri ve etiketsiz büyük veri setleri
ile öğrenmedir.
Semi-Supervised Learning Algoritmaları:
Self Training
Generative Models
S3VMs (Transductive SVM)
Graph-Based Algorithms
Multiview Algorithms
Semi-Supervised Learning
10. Self-Training Algorithm
Algoritma
»»𝑋 𝑢 : Etiketsiz veri »»(𝑋1, 𝑌1): etiketli veri »» 𝑓: öğrenici
1. 𝑋1, 𝑌1 veri setinden 𝑓 ‘yi eğit
2. 𝑥 ∈ 𝑋 𝑢 tahmin et
3. 𝑥, 𝑓 𝑥 etiketli verilere ekle
4. Tekrar et
11. Self-Training Algorithm
Çeşitli uygulama şekilleri:
Yüksek güven düzeyine sahip olanları 𝑥, 𝑓 𝑥 ekle
Etiketlenen tüm verileri 𝑥, 𝑓 𝑥 ekle
Belli ağırlık ölçütüne göre ekle
19. Self-Training Avantaj-Dezavantaj
Avantajları:
• En basit semi-supervised learning yöntemidir
• Var olan sınıflandırıcılara uygulanabilir
• Doğal dil işleme gibi alanlarda etkin olarak
kullanılabilir
Dezavantajları:
• Güçlü bir eğitim kümesi oluşana kadar hata
yapılabilir
26. Transductive Support Vector Machines
Semi-Supervised SVMs = Transductive SVMs
(TSVMs)
Etiketli ve etiketsiz verilerin maksimum sınırı aranır
27. Transductive Support Vector Machines
TSVM adımları:
• 𝑋 𝑢 ‘daki tüm mümkün etiketlenebilecek verileri al
• Her birine standart SVM uygula
• En geniş çerçeveli SVM ‘i seç
33. Transductive Support Vector Machines
Avantajları:
• SVM uygulanan her durumda uygulanabilir
• Matematiksel sistemi kolay anlaşılabilirdir
Dezavantajları:
• Optimizasyonu zordur
• Yanlış çözümde hapsolabilir
34. Graph-Based Methods
Çok sayıda etiketli veri varsa En Yakın Komşuluk Algoritması
kullanılabilir
Çok sayıda etiketsiz veri varsa
bunlar çözüm için bir araç
olarak kullanılabilir
41. Co-Training
Her bir örnek ya da örneği açıklayan özellik iki alt kümeye
bölünebilir.
Bunların her biri hedef fonksiyonu öğrenmek için yeterlidir.
İki sınıflandırıcı aynı verileri kullanarak öğrenebilir
Ör: web sayfası sınıflandırması için link ve sayfa içeriği
Multiview Algorithms
42. Co-Training Algoritması
Giriş: İşaretli veri seti L
İşaretsiz veri seti U
Döngü:
L yi kullanarak h1 i eğit (ör: link sınıflandırıcı)
L yi kullanrak h2 yi eğit (ör: sayfa sınıflandırıcı)
h1 ile U da p tane pozitif, n tane negatif veri etiketle
h2 ile U da p tane pozitif, n tane negatif veri etiketle
Etiketlenen en güvenli verileri L ye ekle
43. Co-Training Deneysel Sonuçlar
12 etiketli web sayfası (L)
1000 etiketsiz web sayfası (U)
Ortalama hata: etiketli veriler ile öğrenmede %11.1
Ortalama hata: Co-training ile öğrenmede %5.0
Sayfa bazlı
sınıflandırma
Link bazlı
sınıflandırma
Birleşik
sınıflandırma
Supervised
Learning
12.9 12.4 11.1
Co-training 6.2 11.6 5.0
44. Kaynaklar
Olivier Chapelle, Alexander Zien, Bernhard Sch¨olkopf (Eds.). (2006) Semi-
supervised learning. MIT Press.
Xiaojin Zhu (2005). Semi-supervised learning literature survey. TR-1530. University
of Wisconsin-Madison Department of Computer Science.
Matthias Seeger (2001). Learning with labeled and unlabeled data.Technical
Report. University of Edinburgh.
Editor's Notes
Maliyeti yüksek Bulunması zor Özel cihazlar gerektirebilir
Etiketleme işlemini yapmak için birini tutmak gerekebilir, etiketlerin sağlamlığı açısından özel testler gerekebilir.
Yararlı olduğu pek çok alan vardır Elde etmesi ucuzdur Önemli bilgiler kaybolabilir
En yakın görsel kelimenin indexine göre bir parça tanımlanır.
Generative(üretici) yaklaşımlar istatistiksel öğrenme ile 𝑃(𝑥|𝑦) olasılığını tahmin etmeye çalışarak verilerin hangi sınıfa ait olduğunu bulmaya çalışır
Etiketli veriler üzerinde uygun model ve sınırlar bu şekilde oluşur.
Etiketsiz veriler teta paremetresine göre etiketli verilerle aynı dağılıma sahipse çözümün doğruluğu yükselir. Aksi azaltır.
SVM: danışmanlı öğrenmede etiketli veriler üzerinde maksimum kümeyi arar.
TSVM: etiketli ve etiketsiz veriler üzerinde maksimum kümeyi arar.
Sonuç olarak küçük miktardaki etiketli veriler ile büyük miktardaki etiketsiz veriler kullanılarak öğrenme artırılabilir.