Deep semi-supervised learning (DSSL) uses a small amount of labeled data along with a large amount of unlabeled data. It works by using the labeled data to initially train a model, which is then used to label the unlabeled data, effectively increasing the training data. This document categorizes and describes various DSSL methods including generative models like GANs and VAEs, consistency regularization, graph-based methods using graphs and GNNs, pseudo-labeling using self-training or disagreement, and hybrid methods. Challenges include not fully understanding how DSSL works and potential performance issues with imbalanced or unrealistic data distributions.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Stochastic gradient descent and its tuningArsalan Qadri
This paper talks about optimization algorithms used for big data applications. We start with explaining the gradient descent algorithms and its limitations. Later we delve into the stochastic gradient descent algorithms and explore methods to improve it it by adjusting learning rates.
This is a presentation about Gradient Boosted Trees which starts from the basics of Data Mining, building up towards Ensemble Methods like Bagging,Boosting etc. and then building towards Gradient Boosted Trees.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Stochastic gradient descent and its tuningArsalan Qadri
This paper talks about optimization algorithms used for big data applications. We start with explaining the gradient descent algorithms and its limitations. Later we delve into the stochastic gradient descent algorithms and explore methods to improve it it by adjusting learning rates.
This is a presentation about Gradient Boosted Trees which starts from the basics of Data Mining, building up towards Ensemble Methods like Bagging,Boosting etc. and then building towards Gradient Boosted Trees.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
Starting from the very basic of what a GAN is, passing trough Tensorflow implementation, using the most cutting-edge APIs available in the framework, and finally, production-ready serving at scale using Google Cloud ML Engine.
Slides for the talk: https://www.pycon.it/conference/talks/deep-diving-into-gans-form-theory-to-production
Github repo: https://github.com/zurutech/gans-from-theory-to-production
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.
An Autoencoder is a type of Artificial Neural Network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise.”
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
basics of GAN neural network
GAN is a advanced tech in area of neural networks which will help to generate new data . This new data will be developed based over the past experiences and raw data.
DBScan stands for Density-Based Spatial Clustering of Applications with Noise.
DBScan Concepts
DBScan Parameters
DBScan Connectivity and Reachability
DBScan Algorithm , Flowchart and Example
Advantages and Disadvantages of DBScan
DBScan Complexity
Outliers related question and its solution.
Semi supervised learning machine learning made simpleDevansh16
Video: https://youtu.be/65RV3O4UR3w
Semi-Supervised Learning is a technique that combines the benefits of supervised learning (performance, intuitiveness) with the ability to use cheap unlabeled data (unsupervised learning). With all the cheap data available, Semi Supervised Learning will get bigger in the coming months. This episode of Machine Learning Made Simple will go into SSL, how it works, transduction vs induction, the assumptions SSL algorithms make, and how SSL compares to human learning.
About Machine Learning Made Simple:
Machine Learning Made Simple is a playlist that aims to break down complex Machine Learning and AI topics into digestible videos. With this playlist, you can dive head first into the world of ML implementation and/or research. Feel free to drop any feedback you might have down below.
It is a data mining technique used to place the data elements into their related groups. Clustering is the process of partitioning the data (or objects) into the same class, The data in one class is more similar to each other than to those in other cluster.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
Starting from the very basic of what a GAN is, passing trough Tensorflow implementation, using the most cutting-edge APIs available in the framework, and finally, production-ready serving at scale using Google Cloud ML Engine.
Slides for the talk: https://www.pycon.it/conference/talks/deep-diving-into-gans-form-theory-to-production
Github repo: https://github.com/zurutech/gans-from-theory-to-production
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.
An Autoencoder is a type of Artificial Neural Network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise.”
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
basics of GAN neural network
GAN is a advanced tech in area of neural networks which will help to generate new data . This new data will be developed based over the past experiences and raw data.
DBScan stands for Density-Based Spatial Clustering of Applications with Noise.
DBScan Concepts
DBScan Parameters
DBScan Connectivity and Reachability
DBScan Algorithm , Flowchart and Example
Advantages and Disadvantages of DBScan
DBScan Complexity
Outliers related question and its solution.
Semi supervised learning machine learning made simpleDevansh16
Video: https://youtu.be/65RV3O4UR3w
Semi-Supervised Learning is a technique that combines the benefits of supervised learning (performance, intuitiveness) with the ability to use cheap unlabeled data (unsupervised learning). With all the cheap data available, Semi Supervised Learning will get bigger in the coming months. This episode of Machine Learning Made Simple will go into SSL, how it works, transduction vs induction, the assumptions SSL algorithms make, and how SSL compares to human learning.
About Machine Learning Made Simple:
Machine Learning Made Simple is a playlist that aims to break down complex Machine Learning and AI topics into digestible videos. With this playlist, you can dive head first into the world of ML implementation and/or research. Feel free to drop any feedback you might have down below.
It is a data mining technique used to place the data elements into their related groups. Clustering is the process of partitioning the data (or objects) into the same class, The data in one class is more similar to each other than to those in other cluster.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Node classification with graph neural network based centrality measures and f...IJECEIAES
Graph neural networks (GNNs) are a new topic of research in data science where data structure graphs are used as important components for developing and training neural networks. GNN always learns the weight importance of the neighbor for perform message aggregation in which the feature vectors of all neighbors are aggregated without considering whether the features are useful or not. Using such more informative features positively affect the performance of the GNN model. So, in this paper i) after selecting a subset of features to define important node features, we present new graph features’ explanation methods based on graph centrality measures to capture rich information and determine the most important node in a network. Through our experiments, we find that selecting certain subsets of these features and adding other features based on centrality measure can lead to better performance across a variety of datasets and ii) we introduce a major design strategy for graph neural networks. Specifically, we suggest using batch renormalization as normalization over GNN layers. Combining these techniques, representing features based on centrality measures that passed to multilayer perceptron (MLP) layer which is then passed to adjusted GNN layer, the proposed model achieves greater accuracy than modern GNN models.
Seeing what a gan cannot generate: paper reviewQuantUniversity
Seeing what a GAN cannot Generate Paper review: Bau, David et al. “Seeing What a GAN Cannot Generate.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 4501-4510.
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
in this presentation you can learn how to decompose an image into layout and find the predictions. In this presentation , I mention all the data in very convenient way , I hope you can take it easy.
Thank you.
When deep learners change their mind learning dynamics for active learningDevansh16
Abstract:
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
Data Mining Module 3 Business Analtics..pdfJayanti Pande
Business Analytics Paper 2
| Data Mining | RTMNU Nagpur University MBA | Module 3
| Decision Trees and Decision Rules | By Jayanti Pande | ProNotesJRP | JRP Notes
Image classification as a process of assigning all pixels in the image to particular classes or themes based on spectral information represented by the digital numbers (DNs). The classified image comprises a mosaic of pixels, each of which belong to a particular theme and is a thematic map of the original image.
Approaches to Classification There are two general approaches to image classification:
Supervised Classification: It is the process of identification of classes within a remote sensing data with inputs from and as directed by the user in the form of training data, and
Unsupervised Classification: It is the process of automatic identification of natural groups or structures within a remote sensing data.
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
IMAGE CLASSIFICATION USING KNN, RANDOM FOREST AND SVM ALGORITHM ON GLAUCOMA DATASETS AND EXPLAIN THE ACCURACY, SENSITIVITY, AND SPECIFICITY OF EACH AND EVERY ALGORITHMS
An Extensive Review on Generative Adversarial Networks GAN’sijtsrd
This paper is to provide a high level understanding of Generative Adversarial Networks. This paper will be covering the working of GAN’s by explaining the background idea of the framework, types of GAN’s in the industry, it’s advantages and disadvantages, history of how GAN’s are developed and enhanced along the timeline and some applications where GAN’s outperforms themselves. Atharva Chitnavis | Yogeshchandra Puranik "An Extensive Review on Generative Adversarial Networks (GAN’s)" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42357.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42357/an-extensive-review-on-generative-adversarial-networks-gan’s/atharva-chitnavis
Similar to Deep Semi-supervised Learning methods (20)
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
1. A Survey on Deep Semi-supervised Learning
Based on: https://arxiv.org/pdf/2103.00550.pdf
Original Authors: Xiangli Yang, Zixing Song, Irwin King, Fellow, IEEE, Zenglin Xu, Senior Member, IEEE
Date Published: 23 August 2021
2. Introduction
What is deep semi-supervised learning (DSSL)?
● Combo of supervised and unsupervised learning which uses a small portion of labeled
examples and a large number of unlabeled data
● Model must learn and make predictions on new examples.
● The basic procedure involves using the existing labeled data to label the rest of the unlabelled
data, thus effectively helping to increase the training data.
Why DSSL is needed?
● Most of the real-world use cases do not have extensive labeled data and labeling data is
challenging and requires a considerable amount of resources, time, and effort.
3. General idea of SSL
Fig. 1. General idea of Semi supervised learning
(https://www.enjoyalgorithms.com/)
4. Learning paradigms in SSL
Two dominant learning paradigms in SSL
● Transductive
● Transductive methods do not construct a classifier for the entire input space. So the
predictions of such systems are restricted to the objects on which they have been
trained and therefore, transductive methods have no separate train and test phases.
● Inductive
● A classifier that relies on inductive methods can predict any object in input space. This
classifier may be trained using unlabeled data, but its predictions for previously unseen
objects are independent of each other once training is complete.
The majority of graph-based methods are transductive,
whereas most other types of SSL methods are inductive.
5. SSL Assumptions
SSL may not be improved unless certain assumptions are made about the distribution of data.
● Self-training assumption: Predictions with high confidence are considered to be accurate.
● Co-training assumption: Instance x has two conditionally independent views and each view is sufficient for
a classification task.
● Generative model assumption: When the number of mixed components, a prior p(y), and a conditional
distribution p(x|y) are correct, data can be assumed to come from the mixed model.
● Cluster assumption: Two points x1 and x2 in the same cluster should be categorized as one.
● Low-density separation: A low-density region should be used as the boundary, not an area with high
density.
● Manifold assumption: If two points x1 and x2 are located in a local neighborhood in the low-dimensional
manifold, they have similar class labels.
6. Taxonomy of DSSL
Fig. 2. The taxonomy of major deep semi-supervised learning
methods based on loss function and model design.
7. Important DSSL
methods
● Generative models
○ Semi-supervised GANs
○ Semi-supervised VAE
● Consistency regularization
methods
● Graph based methods
○ Auto-encoder based methods
○ GNN based methods
● Pseudo labeling methods
○ Disagreement based models
○ Self training models
● Hybrid methods
8. Generative models - Semi-supervised GANs
● A GAN is an unsupervised model. It consists of a generative model that is trained on
unlabeled data, as well as a discriminative classifier that determines the quality of the
generator.
● GANs are able to learn the distribution of real data from unlabeled samples, which
facilitates SSL. There are four main themes in how to use GANs for SSL.
○ re-using the features from the discriminator
○ using GAN-generated samples to regularize a classifier
○ learning an inference model
○ using samples produced by a GAN as additional training data.
9. Semi-supervised GANs - Examples
● CatGAN: Categorical Generative Adversarial Network (CatGAN) modifies the GAN's objective function to incorporate
mutual information between observed samples and their predicted categorical distributions. The goal is to train a
discriminator which distinguishes the samples into K categories by labeling y to each x, instead of learning a binary
discriminator value function.
● CCGAN: The use of Context-Conditional Generative Adversarial Networks (CCGAN) is proposed as a method of
harnessing unlabeled image data using an adversarial loss for cases like image in-painting. Contextual information
provided by the surrounding parts of the image. The generator is trained to generate pixels within a missing piece of
image.
● Improved GAN: There are several methods to adapt GANs into a semi-supervised classification scenario. Cat-GAN
forces the discriminator to maximize the mutual information between examples and their predicted class
distributions instead of training the discriminator to learn a binary classification.
10. Generative models - Semi-supervised VAE
Variational AutoEncoders (VAEs) combine deep autoencoders and generative latent-variable models.
VAE is trained with two objectives - reconstruction objective between inputs and reconstructed versions, and
variational objective learning of a latent space that follows a Gaussian distribution.
SSL can benefit from VAEs for three reasons:
● Incorporating unlabeled data is a natural process
● Using latent space, it is possible to disentangle representations easily.
● It also allows us to use variational neural methods.
VAEs can be used as semi-supervised learning models in two steps.
● First, a VAE is trained using both unlabeled and labeled data in order to extract latent representations.
● The second step entails a VAE in which the latent representation is supplemented by the label vector. The
label vector contains the true labels for labeled data points and is used to construct additional latent
variables for unlabelled data.
11. Semi-supervised VAE - Examples
● Semi-supervised Sequential Variational Autoencoder (SSVAE): Consists of a Seq2Seq
structure and a sequential classifier. In the Seq2Seq structure, the input sequence is firstly
encoded by a recurrent neural network and then decoded by another recurrent neural network
conditioned on both latent variable and categorical label.
● Infinite VAE: Mixture of an infinite number of autoencoders capable of scaling with data
complexity to better capture its intrinsic structure. The unsupervised generative model is
trained using unlabeled data, then this model can be combined with the available labeled
data to train a discriminative model, which is also a mixture of experts.
12. Consistency Regularization
● Consistency regularization is based on the idea that predictions should be less affected by
extra perturbation imposed on the input samples.
● Consistency regularization SSL methods typically use the Teacher-Student structure.
● The objective is to train the model to predict consistently on an unlabeled example and its
perturbed version.
● The model learns as a student, and as a teacher, it creates targets simultaneously. Since
models themselves generate targets, they may be incorrect and are then used as students for
learning.
13. Consistency Regularization - Examples
● Ladder networks: This network consists of two encoders, a corrupted and clean one, and a decoder.
The corrupted one has Gaussian noise injected at each layer after batch normalization. The input x
on passing through clean encoder will produce output y and on passing through corrupted one
produce ỹ. This ỹ will be fed into the denoising decoder to reconstruct the clean output y. The
training loss is computed as the MSE between clean activation and reconstructed activation.
● Π-Model: This is a simplified ladder network, where the corrupted encoder is removed, and the
same network is used to get the prediction for both corrupted and uncorrupted input. In this model,
two random augmentations of a sample for both labeled and unlabeled data are forward
propagated, and the objective is to produce consistent predictions on the variants, ie reduce the
distance between two predictions.
14. Graph-Based Methods
● The main idea of graph-based semi-supervised learning (GSSL) is to extract a
graph out of the raw data where each node represents a training sample and the
edge represents the similarity measurement of samples.
● There are labeled and unlabeled samples, the objective is to propagate the labels
from the labeled nodes to the unlabeled ones.
● The GSSL methods are broadly classified into two:
○ AutoEncoder-based
○ GNN-based methods.
15. Graph-Based Methods - Examples
● Structural deep network embedding (SDNE) is an AutoEncoder based method. This
framework consists of unsupervised and supervised parts. The first is an autoencoder
designed to produce an embedding result for each node to rebuild the neighborhood. The
second part utilizes Laplacian Eigenmaps, which penalize the model when related vertices
are far apart.
● Basic GNN: Graph Neural Networks (GNNs) is a classifier is first trained to predict class
labels for the labeled nodes. Then it can be applied to the unlabeled nodes based on the final,
hidden state of the GNN-based model. It takes advantage of neural message passing in
which messages are exchanged and updated between each pair of nodes by using neural
networks.
16. Pseudo-Labeling Methods
The pseudo labeling method works in two steps.
● In the first step, a model is trained on a limited set of labeled data.
● The second step leverages the same model to create pseudo labels on unlabeled data and add the high
confidence pseudo labels as targets to the existing labeled dataset creating additional training data.
There are two main patterns: one is to improve the performance of the whole framework based on the
disagreement of views or multiple networks, and the other is self-training.
● Disagreement-based methods train multiple learners and focus on exploiting the disagreement during the
training.
● The self-training algorithm leverages the model’s own confident predictions to produce the pseudo labels
for unlabeled data.
17. Pseudo-Labeling Methods - Examples
● Pseudo-label: This is a simple and efficient SSL method that allows the network to be trained simultaneously with labeled and
unlabeled data. Initially, the model is trained with labeled data using a cross-entropy loss. The same model is used to predict an
entire batch of unlabeled samples. The maximum confidence prediction is called a pseudo-label.
● Noisy Student: This is a semi supervised method that works on knowledge distillation with equal-or-larger student models. The
teacher model is first trained on labeled images to generate pseudo labels for unlabeled examples. Following this, a larger
student model is trained on the combination of labeled and pseudo-labeled samples. These combined instances are
augmented using data augmentation techniques and model noise. With several iterations of this algorithm, the student model
becomes the new teacher and relabels the unlabeled data, and the cycle is repeated.
● SimCLRv2: This is the SSL version of SimCLR (A Simple Framework for Contrastive Learning of Visual Representations).
SimCLRv2 can be summarized in three steps: task agnostic unsupervised pre-training, supervised fine-tuning on labeled
samples, and self-training or distillation with task-specific unlabeled examples. SimCLRv2 learns representations by
maximizing the contrastive learning loss function.
18. Hybrid Methods
Hybrid methods combine ideas from the above-mentioned methods such as :
● Pseudo-label
● Consistency regularization
● Entropy minimization for performance improvement.
19. Hybrid Methods - Examples
● MixMatch: This method combines consistency regularization and entropy minimization in a unified
loss function. It first introduces data augmentation in both labeled and unlabeled data. Each
unlabeled sample is augmented K number of times and predictions of different augmentations are
then averaged. To reduce entropy, guessed labels are sharpened before a final label is provided.
After this, Mixup regularization is applied on both labeled and unlabeled data.
● FixMatch: This method combines consistency regularization and pseudo-labeling in a simplified
way. Here, for each unlabeled image, weak augmentation and strong augmentations are applied to
get two images. Both augmentations are passed through the model to get predictions. Then it uses
consistency regularization as cross-entropy between one-hot pseudo-labels of weakly augmented
images and prediction of strongly augmented images.
20. Challenges
The SSL method, like any other machine learning method, has its own set of challenges.
One of the major challenges is that it is unknown how SSL works internally and what role various techniques, such
as data augmentation, training methods, and loss functions, play exactly.
The SSL approaches above generally operate at their best only in ideal environments when the training dataset
meets the design assumptions, however, in reality, the distribution of the dataset is unknown and does not
necessarily meet these ideal conditions and might produce unexpected results.
If training data is highly imbalanced, the models tend to favor the majority class, and in some cases ignore the
minority class entirely.
The use of unlabeled data may result in worse generalization performance than a model learned only with labeled
data.
21. Conclusion
Deep semi-supervised learning
has already demonstrated
remarkable results in various
tasks and has garnered a lot of
interest from the research
community due to its important
practical applications.