(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
2018 icml part 2 segwangkim
1. ICML 2018 (July 11 – July 16)
PART 2
BEST PAPER (adversarial attack)
Graph Embedding / Manifold Alignment
2016-20873 Segwang Kim 1
2. ① Best Paper
Obfuscated Gradients Give a False Sense of Security:
Circumventing Defenses to Adversarial Examples
② Graph Embedding
③ Manifolds
Overview
2
3. 3
Obfuscated Gradients Give a False Sense of Security:
Circumventing Defenses to Adversarial Examples
Ref: Synthesizing Robust Adversarial Examples
4. Faulty defenses against adv attacks
Takeaways
4
ICLR 2018 adv. defense papers
Obfuscated Gradients
Shattered gradients
Backward Pass Differentiable Approximation
Stochastic gradients
Expectation Over Transformation
Vanishing/exploding gradients
reparameterization & optimize over good space
5. Induces nonexistent or incorrect gradients
Intentionally, non-differentiable operators are used
Unintentionally, numeric instability caused
Backward Pass Differentiable Approximation
Shattered Gradients
f (NN)
g (filter)
f (NN)
f ∘g
(faulty defensed NN)
Shattered
gradients
True
gradients
6. Time-randomness
Randomized filters or classifiers
Expectation over Transformation
Stochastic Gradients
f (NN)
t (filter)
f (NN)
f ∘ t
(faulty defensed NN)
Randonmized
gradients
7. Input transformation loop prior to being fed
Loop optimization vanishes or explodes gradients
Reparameterization
Vanishing/exploding gradients
f (NN)
t (filter)
f (NN)
f ∘ t
(faulty defensed NN)
Exploding / vanishing
gradients
9. Overview-Graph Embedding
9
• Graph Embedding in Euclidean Space
• Out-of-sample extension of graph adjacency spectral embedding
• Tree Edit Distance Learning via Adaptive Symbol Embeddings
• Spectrally approximating large graphs with smaller graphs
• Improved large-scale graph learning through ridge spectral sparsification
• Low-Rank Riemannian Optimization on Positive Semidefinite Stochastic
Matrices with Applications to Graph Clustering
• Representation Learning on Graphs with Jumping Knowledge Networks
• Graph Embedding in Hyperbolic space
• Representation Tradeoffs for Hyperbolic Embeddings
• Hyperbolic Entailment Cones for Learning Hierarchical Embeddings
• Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic
Geometry
10. Out-of-sample extension of graph.
Computing over for the added graph is prohibited.
Allow to access to
True connectivity between given vertice and new vertex
Similarities between learned vectors and new embedding vector
Out-of-sample extension of graph adjacency spectral embedding
10
(Fixed) Embedding
?
New node is added
1
2
11. Out-of-sample extension of graph adjacency spectral embedding
11
Two basic ways for OOS extension
Linear Least Square OOS Extension
Maximum Likelihood OOS Extension
Then, how close OOS extension to insampling embedding?
For random dot product graph, they are good!
𝑋1, … , 𝑋 𝑛 ∼ 𝐹 where 𝐹 is dist. over ℝ 𝑑
1 2
adjacency spectral embedding
For adjacency matrix A, select d
eigenvectors with the largest d
eigenvalues.
13. Hyperbolic Embedding
13
Recall
Hyperbolic space is GOOD for hierarchical embedding
From the geometric nature, hyperbolic model is suitable for tree-like
structure
• Parsimonious (needs relatively low dimension)
• Space becomes compressed as going far from the origin
Embedding
Tree distance=4
Embedded distance~ 4
14. Representation Tradeoffs for Hyperbolic Embeddings
14
Analysis 1
Therefore, it is powerful for wide tree
Analysis 2
But, not too deep tree.
15. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings
15
Cone can capture hierarchy
But, in n dimensions, no n+1 distinct cone regions can
simultaneously have unbounded disjoint sub-volumes
Entailment cones in Poincare ball can do
Axial symmetry
Rotation invariance
Continuity
Transitivity of nested angular cones
16. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings
16
Therefore, objective function is
Previous Now
17. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic
Geometry
17
Lorentz Model
Mathematically same, but fewer inverse operations for optimization
Optimize on Lorentz model, visualize on Poincare ball model
19. Overview-Manifolds
19
• GAN
• Geometry Score: A Method For Comparing GANs
• MAGAN: Aligning Biological Manifolds
• ETC
• Stochastic Wasserstein Barycenters
• On the Generalization of Equivariance and Convolution in Neural networks
to the Action of Compact Groups
20. Geometry Score: A Method For Comparing GANs
20
Robust comparing criteron between data and fake distribution
Not necessarily images
Not for training
Comparing topology of underlying manifolds of data and fake
distribution
Number of loops and higher dimensional holes
Not geometry, topology Easier to approximate
≅
Homeomorphic
21. Geometry Score: A Method For Comparing GANs
21
Using simplexes from randomly sampled vertices (landmarks)
with varying radius, approximate topology
0-dimensional hole
(Connected component)
1-dimensional hole
(loop)
22. Geometry Score: A Method For Comparing GANs
22
Observe birth and death of k-th dimensional holes as time
(radius) goes by
At relative time (radius) 𝛼, the k-th Betty number
Relative time being k-th Betty number persisted (=𝑖)
landmarks
23. Geometry Score: A Method For Comparing GANs
23
Since , we can consider it as distribution
24. MAGAN: Aligning Biological Manifolds
24
Same system, different measurements
Dual GAN needs supervised paired examples impractical
Unsupervised, unpaired data ?
Aligning rather than superimposing
Tissue
Single-cell RNA
sequencing
Mass cytometry
28. On the Generalization of Equivariance and Convolution in Neural
Networks to the Action of Compact Groups
28
Generalize (1) rotation equivariant networks, (2) spherical
CNNs, (3) Message passing NNs
A feed forward neural network 𝒩 is equivariant to the action of a compact group G
Each layer of 𝒩 implements a generalized form of convolution derived from
where 𝑓, 𝑔 ∶ 𝐺 → ℂ
Topological group with compact topology
e.g. SO(3), SL(3) … (NOT ℝ 𝑛
)
29. On the Generalization of Equivariance and Convolution in Neural
Networks to the Action of Compact Groups
29
…
𝜒0
𝑉0
𝜒1
𝑉0
𝜒 𝐿
𝑉𝐿
Editor's Notes
From now on, I will present part 2 of ICML 2018.
First, I will give a brief summary of one of the best papers: Obfuscated … . Basically, this paper can be broken into two parts which are going over existing defense against adversarial attack and suggesting guideline for future defense assessments.
Second and Third parts are my interests. Graph embedding and Machine learning with manifolds.
Before start, I want to drop a realistic, tangible adversarial 3d object from “synthe …”.
This 3d printed turtle is an adversarial example robust to any conceivable vision distortion like rotation, scaling, tilting.
Authors brought this object to poster session. I bet that once benign and adversarial turtles are shuffled, nobody can tell which one is which.
The reason I show this figure is that adversarial examples are not theoretical, pathological examples specifically designed for making abstract problems.
They exist.
I believe you are already familiar with what adversarial examples are. Adding imperceptible noises to benign images can deceive classifiers.
So, there have been numerous works related to adv attack such as ways of generating, characterizing and defending adv examples.
For instance, nine papers were popped up in ICLR 2018, concerning about defense against adv. Attacks.
However, this best paper pointed out that most of those defenses degrade gradients of adv examples.
Rather than using correct gradients, these papers “obfuscated” gradients in some senses.
The best paper divides obfuscating techniques into three categories: shattered gradients, stochastic gradients and vanishing or exploding gradients.
And each of these obfuscating techniques, there is a wayout, which means that we can neutralize given defense with small modification of original NN
The best paper successfully neutralize 6 defenses out of 9 ones, partially does one.
Let’s look at obfuscated gradients in more details.
Shattered Gradients means,
defense mechanism intentionally uses non-diff operators
or unintentionally cause numeric instability.
In either way, nonexistent or incorrect gradients are obtained so that suggested, defensed NN can not have correct gradients of adv example.
The defense mechanism typically achieves its goal by passing image to a filter. We believe this filter can reduce adversarial noises. Then, mapping filtered image to NN.
The problem of the defense is that, this filter usually non-differentiable. Therefore, through “Backward Pass Differentiable Approximation”, which is basically approximating gradients by difference quotient, nearly correct gradients can come into play.
There are popular optimization algorithms that have been used in NNs. It has been proven that with a suitable step size decay, exactly one over root t, the SGD converges. Since SGD has shown slow performance, the alternatives such as adagrad or adam RMSProp, adadeltas are proposed. Although third type optimization, ….
In particular, this paper is interested in EMA type optimization including Adam, RMSProp, Adadelta.
There are popular optimization algorithms that have been used in NNs. It has been proven that with a suitable step size decay, exactly one over root t, the SGD converges. Since SGD has shown slow performance, the alternatives such as adagrad or adam RMSProp, adadeltas are proposed. Although third type optimization, ….
In particular, this paper is interested in EMA type optimization including Adam, RMSProp, Adadelta.
With knowledge about measurement
If we only see noisy incomplete measurements of original images?
If you know the measurement process,
So you can try this on any well-known simulatiion
if you insert this simulated measurement in the middle of training.
Such as throwing out 90% of pixels.