The document provides an overview of dimensionality reduction techniques, including PCA, SVD, and LDA. PCA uses linear projections to reduce dimensions while preserving variance in the data. It computes eigenvectors of the covariance matrix. SVD is similar to PCA but works directly with the data matrix rather than the covariance matrix. LDA aims to maximize class separability during dimensionality reduction for classification tasks. It computes within-class and between-class scatter matrices. While PCA maximizes variance, LDA maximizes class discrimination.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
Nonlinear methods are widely used to produce higher performance compared with linear methods; however, nonlinear methods are generally more expensive in model size, training time, and scoring phase. With proper feature engineering techniques like polynomial expansion, the linear methods can be as competitive as those nonlinear methods. In the process of mapping the data to higher dimensional space, the linear methods will be subject to overfitting and instability of coefficients which can be addressed by penalization methods including Lasso and Elastic-Net. Finally, we'll show how to train linear models with Elastic-Net regularization using MLlib.
Several learning algorithms such as kernel methods, decision tress, and random forests are nonlinear approaches which are widely used to have better performance compared with linear methods. However, with feature engineering techniques like polynomial expansion by mapping the data into a higher dimensional space, the performance of linear methods can be as competitive as those nonlinear methods. As a result, linear methods remain to be very useful given that the training time of linear methods is significantly faster than the nonlinear ones, and the model is just a simple small vector which makes the prediction step very efficient and easy. However, by mapping the data into higher dimensional space, those linear methods are subject to overfitting and instability of coefficients, and those issues can be successfully addressed by penalization methods including Lasso and Elastic-Net. Lasso method with L1 penalty tends to result in many coefficients shrunk exactly to zero and a few other coefficients with comparatively little shrinkage. L2 penalty trends to result in all small but non-zero coefficients. Combining L1 and L2 penalties are called Elastic-Net method which tends to give a result in between. In the first part of the talk, we'll give an overview of linear methods including commonly used formulations and optimization techniques such as L-BFGS and OWLQN. In the second part of talk, we will talk about how to train linear models with Elastic-Net using our recent contribution to Spark MLlib. We'll also talk about how linear models are practically applied with big dataset, and how polynomial expansion can be used to dramatically increase the performance.
DB Tsai is an Apache Spark committer and a Senior Research Engineer at Netflix. He is recently working with Apache Spark community to add several new algorithms including Linear Regression and Binary Logistic Regression with ElasticNet (L1/L2) regularization, Multinomial Logistic Regression, and LBFGS optimizer. Prior to joining Netflix, DB was a Lead Machine Learning Engineer at Alpine Data Labs, where he developed innovative large-scale distributed linear algorithms, and then contributed back to open source Apache Spark project.
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
Alexey Zinoviev presented this paper on the Jocker conference http://jokerconf.com/#zinoviev.
This paper covers next topics: Data Mining, Machine Learning, Mahout, Spark, MLlib, Python, Octave, R language
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
Nonlinear methods are widely used to produce higher performance compared with linear methods; however, nonlinear methods are generally more expensive in model size, training time, and scoring phase. With proper feature engineering techniques like polynomial expansion, the linear methods can be as competitive as those nonlinear methods. In the process of mapping the data to higher dimensional space, the linear methods will be subject to overfitting and instability of coefficients which can be addressed by penalization methods including Lasso and Elastic-Net. Finally, we'll show how to train linear models with Elastic-Net regularization using MLlib.
Several learning algorithms such as kernel methods, decision tress, and random forests are nonlinear approaches which are widely used to have better performance compared with linear methods. However, with feature engineering techniques like polynomial expansion by mapping the data into a higher dimensional space, the performance of linear methods can be as competitive as those nonlinear methods. As a result, linear methods remain to be very useful given that the training time of linear methods is significantly faster than the nonlinear ones, and the model is just a simple small vector which makes the prediction step very efficient and easy. However, by mapping the data into higher dimensional space, those linear methods are subject to overfitting and instability of coefficients, and those issues can be successfully addressed by penalization methods including Lasso and Elastic-Net. Lasso method with L1 penalty tends to result in many coefficients shrunk exactly to zero and a few other coefficients with comparatively little shrinkage. L2 penalty trends to result in all small but non-zero coefficients. Combining L1 and L2 penalties are called Elastic-Net method which tends to give a result in between. In the first part of the talk, we'll give an overview of linear methods including commonly used formulations and optimization techniques such as L-BFGS and OWLQN. In the second part of talk, we will talk about how to train linear models with Elastic-Net using our recent contribution to Spark MLlib. We'll also talk about how linear models are practically applied with big dataset, and how polynomial expansion can be used to dramatically increase the performance.
DB Tsai is an Apache Spark committer and a Senior Research Engineer at Netflix. He is recently working with Apache Spark community to add several new algorithms including Linear Regression and Binary Logistic Regression with ElasticNet (L1/L2) regularization, Multinomial Logistic Regression, and LBFGS optimizer. Prior to joining Netflix, DB was a Lead Machine Learning Engineer at Alpine Data Labs, where he developed innovative large-scale distributed linear algorithms, and then contributed back to open source Apache Spark project.
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
Alexey Zinoviev presented this paper on the Jocker conference http://jokerconf.com/#zinoviev.
This paper covers next topics: Data Mining, Machine Learning, Mahout, Spark, MLlib, Python, Octave, R language
Large Scale Machine Learning with Apache SparkCloudera, Inc.
Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Recent advances on low-rank and sparse decomposition for moving object detectionActiveEon
(RFIA 2016) Recent advances on low-rank and sparse decomposition for moving object detection: matrix and tensor-based approaches. RFIA 2016, workshop/atelier: Enjeux dans la détection d’objets mobiles par soustraction de fond.
Dimensionality Reduction and feature extraction.pptxSivam Chinna
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationDataStax Academy
In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
Containerization of your application is only the first step towards modernizing your application. Building cloud-native application requires other tools like Container orchestration platform, Service Mesh tool, Logging & Alert Monitoring tool and Visualization tools.
Real cloud-native platforms need to be equipped with the necessary tool-stack like Kubernetes, Istio, Prometheus, Grafana, and Kiali.
In this webinar, we will cover building a cloud-native platform from zero.
Take home from the webinar -
- What and Why of a cloud-native application
- Steps to build a cloud-native platform from scratch and its challenges
- A high-level overview of Istio, Prometheus, Grafana, and Kiali
- Integrating your cloud-native application with Istio, Prometheus, Grafana, and Kiali
- Live Demo - Deploy, Monitor, and control a full-fledged Microservice-based application.
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabszekeLabs Technologies
The combination of Docker and Kubernetes is quickly becoming the de-facto standard for building Microservices. Whether you are a developer or an architect you need to know how to bundle your application into Containers and Pods. Docker and Kubernetes give a lot of good features out of the box. To effectively leverage these features, you need to know - how to use them, what are some commonly used Pod design patterns and the best practices.
In this webinar, we will explore various such questions and their answers along with appropriate examples. Some of those questions would be-
1. When and how to build multi-container pods?
2. What are some of the well-adopted design patterns for pods?
3. What are some multi-pod design patterns?
4. How to use Lifecycle hooks, Init Containers and Health probes?
Github repo - https://github.com/ashishrpandey/pod-design-pattern-webinar
Large Scale Machine Learning with Apache SparkCloudera, Inc.
Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Recent advances on low-rank and sparse decomposition for moving object detectionActiveEon
(RFIA 2016) Recent advances on low-rank and sparse decomposition for moving object detection: matrix and tensor-based approaches. RFIA 2016, workshop/atelier: Enjeux dans la détection d’objets mobiles par soustraction de fond.
Dimensionality Reduction and feature extraction.pptxSivam Chinna
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationDataStax Academy
In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
Containerization of your application is only the first step towards modernizing your application. Building cloud-native application requires other tools like Container orchestration platform, Service Mesh tool, Logging & Alert Monitoring tool and Visualization tools.
Real cloud-native platforms need to be equipped with the necessary tool-stack like Kubernetes, Istio, Prometheus, Grafana, and Kiali.
In this webinar, we will cover building a cloud-native platform from zero.
Take home from the webinar -
- What and Why of a cloud-native application
- Steps to build a cloud-native platform from scratch and its challenges
- A high-level overview of Istio, Prometheus, Grafana, and Kiali
- Integrating your cloud-native application with Istio, Prometheus, Grafana, and Kiali
- Live Demo - Deploy, Monitor, and control a full-fledged Microservice-based application.
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabszekeLabs Technologies
The combination of Docker and Kubernetes is quickly becoming the de-facto standard for building Microservices. Whether you are a developer or an architect you need to know how to bundle your application into Containers and Pods. Docker and Kubernetes give a lot of good features out of the box. To effectively leverage these features, you need to know - how to use them, what are some commonly used Pod design patterns and the best practices.
In this webinar, we will explore various such questions and their answers along with appropriate examples. Some of those questions would be-
1. When and how to build multi-container pods?
2. What are some of the well-adopted design patterns for pods?
3. What are some multi-pod design patterns?
4. How to use Lifecycle hooks, Init Containers and Health probes?
Github repo - https://github.com/ashishrpandey/pod-design-pattern-webinar
Information Technology is nothing but a reflection of the needs of Business.
Before Industry 4.0, as IT professionals we were just 'coding' or 'decoding' the trend of Business. Any change in the Business scenario would shake the IT sector but the reverse was not true.
But now, after the Industry 4.0, due to High-Speed Internet boom, omniChannel presence of consumer needs, market consolidation, and above all - consumer psyche, the business service providers cannot wait for long to see their product in the market.
This is where there is a call for Process Change - from Waterfall to Agile.
WHAT THIS WEBINAR IS ALL ABOUT:
1. Discuss the macroscopic view of Business & Technology and how they beautifully merge together
2. How Agile is becoming more relevant to the current trend
3. What preparatory works are needed to get into an Agile perspective
4. The Agile StoryBoard - a walkthrough of concepts and terminologies
5. Do's and Don'ts of 'Team Agile'
6. Next Steps
Building machine learning muscle in your team & transitioning to make them do machine learning at scale. We also discuss about Spark & other relevant technologies.
Agenda
1. The changing landscape of IT Infrastructure
2. Containers - An introduction
3. Container management systems
4. Kubernetes
5. Containers and DevOps
6. Future of Infrastructure Mgmt
About the talk
In this talk, you will get a review of the components & the benefits of Container technologies - Docker & Kubernetes. The talk focuses on making the solution platform-independent. It gives an insight into Docker and Kubernetes for consistent and reliable Deployment. We talk about how the containers fit and improve your DevOps ecosystem and how to get started with containerization. Learn new deployment approach to effectively use your infrastructure resources to minimize the overall cost.
The slides talk about Docker and container terminologies but will also be able to see the big picture of where & how it fits into your current project/domain.
Topics that are covered:
1. What is Docker Technology?
2. Why Docker/Containers are important for your company?
3. What are its various features and use cases?
4. How to get started with Docker containers.
5. Case studies from various domains
What is Serverless?
How it evolved?
What are its features?
What are the tradeoffs?
Should I use serverless?
How is it different from the container as a service?
Our subject matter expert answered these in a technology conference hosted by one of our esteemed client that works in the domain of Marketing Data Analytics.
Terraform is an Infrastructure Automation tools. This can work equally good for on-premises, public cloud, private cloud, hybrid-cloud and multi-cloud infrastructure.
Visit us for more at www.zekeLabs.com
Terraform is an Infrastructure Automation tools. This can work equally good for on-premises, public cloud, private cloud, hybrid-cloud and multi-cloud infrastructure.
Visit us for more at www.zekeLabs.com
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
2. “Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”
3. ● Real Data
● PCA
● Eigenvectors
● Covariance Matrix
● Matrix Decomposition
● Covariance Matrix Decomposition
● SVD
● PCA vs SVD
● LDA
● PCA vs LDA
Overview of
Dimensionality
Reduction
Techniques
4. Real Data
● Real world data and information therein may be
○ Noisy
■ Some dimensions may not carry any useful information
■ Variation in that dimension is purely due to noise in the observations
○ Redundant
■ One variables may carry the same information as the other variable
■ Information covered by a set of variable may overlap
● How to reduce the dimensions?
5. PCA
● Dimensionality reduction technique
● Linear projection of Data into Orthogonal Basis System
● Minimum redundancy and preserves variance in the data
● Smallest reconstruction error
● Applications include Image-compression, Data Visualisation
6. Eigenvectors
● If A is a square matrix, vector v is an eigenvector of if there is a scalar 𝝀
such that
Av = 𝝀v
● Example:
● Simply, if we transform the eigenvector, it doesn’t change it’s direction
● Matrix multiplication of A with v is a transformation
7. Covariance Matrix
● Covariance Matrix of X
● Diagonal Terms: Variance
● Off-Diagonal Terms: covariance
● Covariance matrix is always symmetric
● n - no. of observation in X here
10. PCA
● Compute the covariance matrix decomposition
● The first principle component will always be the eigenvector with highest
eigenvalue
● Second will be chosen with the next highest eigenvalue
11. Linear Discriminant Analysis - LDA
● Dimensionality reduction technique in the pre-processing step for pattern-
classification and machine learning applications.
● Project data into lower dimension with good class separability to avoid
over-fitting & reduced computation
● In addition to finding component axis that maximizes the variance of our
data, we are additionally interested in the axes that maximize the
separation between multiple classes
● project a feature space onto a smaller subspace k (where k≤n−1) while
maintaining the class-discriminatory information.
12. PCA vs LDA
● It’s not true that LDA is superior
than PCA for classification
● PCA outperforms classification
for smaller dataset.
● PCA & SVD can be combined
14. Normality Assumptions of LDA
● It should be mentioned that LDA assumes normal distributed data,
features that are statistically independent, and identical covariance
matrices for every class.
● However, this only applies for LDA as classifier and LDA for dimensionality
reduction can also work reasonably well if those assumptions are violated.
20. PCA vs SVD
● The eigenvectors of C are the same as the right singular vectors of X
● The eigenvectors of covariance matrix are same as vector V
● Working directly with X will produce much more accurate results
● Working directly with X is faster
21. LDA
● LDA - Linear Discriminant Analysis
○ Compute the d-dimensional mean vectors for the different classes from the dataset
○ Compute the scatter matrices (in-between-class and within-class scatter matrix)
○ Compute the eigenvectors and corresponding eigenvalues for the scatter matrices
○ Sort the eigenvectors by decreasing eigenvalues and choose kk eigenvectors