In this paper, we solve a semi-supervised regression
problem. Due to the luck of knowledge about the
data structure and the presence of random noise, the considered data model is uncertain. We propose a method which combines graph Laplacian regularization and cluster ensemble methodologies. The co-association matrix of the ensemble is calculated on both labeled and unlabeled data; this matrix is used as a similarity matrix in the regularization framework to derive the predicted outputs. We use the low-rank decomposition of the co-association matrix to significantly speedup calculations and reduce memory. Two clustering problem examples are presented.
Full version is here https://arxiv.org/abs/1901.03919
Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian theory and methodology in machine learning. They have achieved remarkable success in computation, and enjoy strong theoretical support. Much of the existing literature has focused on the linear Gaussian case. The purpose of the current talk is to demonstrate that the horseshoe priors are useful more broadly, by reviewing both methodological and computational developments in complex models that are more relevant to machine learning applications. Specifically, we focus on methodological challenges in horseshoe regularization in nonlinear and non-Gaussian models; multivariate models; and deep neural networks. We also outline the recent computational developments in horseshoe shrinkage for complex models along with a list of available software implementations that allows one to venture out beyond the comfort zone of the canonical linear regression problems.
In this paper, we solve a semi-supervised regression
problem. Due to the luck of knowledge about the
data structure and the presence of random noise, the considered data model is uncertain. We propose a method which combines graph Laplacian regularization and cluster ensemble methodologies. The co-association matrix of the ensemble is calculated on both labeled and unlabeled data; this matrix is used as a similarity matrix in the regularization framework to derive the predicted outputs. We use the low-rank decomposition of the co-association matrix to significantly speedup calculations and reduce memory. Two clustering problem examples are presented.
Full version is here https://arxiv.org/abs/1901.03919
Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian theory and methodology in machine learning. They have achieved remarkable success in computation, and enjoy strong theoretical support. Much of the existing literature has focused on the linear Gaussian case. The purpose of the current talk is to demonstrate that the horseshoe priors are useful more broadly, by reviewing both methodological and computational developments in complex models that are more relevant to machine learning applications. Specifically, we focus on methodological challenges in horseshoe regularization in nonlinear and non-Gaussian models; multivariate models; and deep neural networks. We also outline the recent computational developments in horseshoe shrinkage for complex models along with a list of available software implementations that allows one to venture out beyond the comfort zone of the canonical linear regression problems.
Medical pathology images are visually evaluated by experts for disease diagnosis, but the connectionbetween image features and the state of the cells in an image is typically unknown. To understand thisrelationship, we describe a multimodal modeling and inference framework that estimates shared latentstructure of joint gene expression levels and medical image features. The method is built aroundprobabilistic canonical correlation analysis (PCCA), which is jointly fit to image embeddings that are learnedusing convolutional neural networks and linear embeddings of paired gene expression data. We finallydiscuss a set of theoretical and empirical challenges in domain adaptation settings arising from genomics data.(based on work in collab with Gregory Gundersen and Barbara E. Engelhardt)
A Hough Transform Based On a Map-Reduce AlgorithmIJERA Editor
This paper presents a method that proposes the composition of the Map-Reduce algorithm and the Hough
Transform method to research particular features of shape in the Big Data of images. We introduce the first
formal translation of the Hough Transform method into the Map-Reduce pattern. The Hough transform is
applied to one image or to several images in parallel. The context of the application of this method concerns Big
Data that requires Map-Reduce functions to improve the processing time and the need of object detection in
noisy pictures with the Hough Transform method.
The (fast) component-by-component construction of lattice point sets and polynomial lattice point sets is a powerful method to obtain quadrature rules for approximating integrals over the dimensional unit cube.
In this talk, we present modifications of the component-by-component algorithm and of the more recent successive coordinate search algorithm, which yield savings of the construction cost for lattice rules and polynomial lattice rules in weighted function spaces. The idea is to reduce the size of the search space for coordinates which are associated with small weights and are therefore of less importance to the overall error compared to coordinates associated with large
weights. We analyze tractability conditions of the resulting quasi-Monte Carlo rules, and show some numerical results.
We provide a review of the recent literature on statistical risk bounds for deep neural networks. We also discuss some theoretical results that compare the performance of deep ReLU networks to other methods such as wavelets and spline-type methods. The talk will moreover highlight some open problems and sketch possible new directions.
Medical pathology images are visually evaluated by experts for disease diagnosis, but the connectionbetween image features and the state of the cells in an image is typically unknown. To understand thisrelationship, we describe a multimodal modeling and inference framework that estimates shared latentstructure of joint gene expression levels and medical image features. The method is built aroundprobabilistic canonical correlation analysis (PCCA), which is jointly fit to image embeddings that are learnedusing convolutional neural networks and linear embeddings of paired gene expression data. We finallydiscuss a set of theoretical and empirical challenges in domain adaptation settings arising from genomics data.(based on work in collab with Gregory Gundersen and Barbara E. Engelhardt)
A Hough Transform Based On a Map-Reduce AlgorithmIJERA Editor
This paper presents a method that proposes the composition of the Map-Reduce algorithm and the Hough
Transform method to research particular features of shape in the Big Data of images. We introduce the first
formal translation of the Hough Transform method into the Map-Reduce pattern. The Hough transform is
applied to one image or to several images in parallel. The context of the application of this method concerns Big
Data that requires Map-Reduce functions to improve the processing time and the need of object detection in
noisy pictures with the Hough Transform method.
The (fast) component-by-component construction of lattice point sets and polynomial lattice point sets is a powerful method to obtain quadrature rules for approximating integrals over the dimensional unit cube.
In this talk, we present modifications of the component-by-component algorithm and of the more recent successive coordinate search algorithm, which yield savings of the construction cost for lattice rules and polynomial lattice rules in weighted function spaces. The idea is to reduce the size of the search space for coordinates which are associated with small weights and are therefore of less importance to the overall error compared to coordinates associated with large
weights. We analyze tractability conditions of the resulting quasi-Monte Carlo rules, and show some numerical results.
We provide a review of the recent literature on statistical risk bounds for deep neural networks. We also discuss some theoretical results that compare the performance of deep ReLU networks to other methods such as wavelets and spline-type methods. The talk will moreover highlight some open problems and sketch possible new directions.
Università di Verona. Dipartimento di Scienze Giuridiche
Seminario del 25 novembre 2015 su "Natura e/o naturalità del diritto. Riflessioni filosofico-giuridiche"
(1) Introduzione
(2) La figura ed il pensiero di Giovanni Ambrosetti.
(3) La critica al naturalismo giuridico (pars destruens).
(4) La proposta del diritto naturale. Breve esposizione critica (pars construens)
(5) Conclusioni
GRUPO 5
Motivación: Aplicaciones organizacionales
Sistemas d recompensas organizacionales
Recompensas extrínsecas e intrínsecas
Sistemas de recompensas en las organizaciones de alto desempeño
Criticas de Kohn a las empresas basadas en el desempeño
Georg Rehm. "Globale Standards im Web of Things". Bitkom Akademie Workshop “Die Dinge im Internet-der-Dinge kommen”, Cologne, Germany, December 2015. December 09, 2015.
Puerta de enlace predeterminada (gateway)Stefany Pérez
Una puerta de enlace predeterminada es el dispositivo o la computadora que sirve como enlace entre dos redes informáticas, es decir, es aquel dispositivo que conecta y dirige el tráfico de datos entre dos o más redes.
Presentation of my NSERC-USRA funded summer research project given at the Canadian Undergraduate Mathematics Conference (CUMC) 2014.
Please refer to the project site: http://jessebett.com/Radial-Basis-Function-USRA/
Robot의 Gait optimization, Gesture Recognition, Optimal Control, Hyper parameter optimization, 신약 신소재 개발을 위한 optimal data sampling strategy등과 같은 ML분야에서 약방의 감초 같은 존재인 GP이지만 이해가 쉽지 않은 GP의 기본적인 이론 및 matlab code 소개
It's the deck for one Hulu internal machine learning workshop, which introduces the background, theory and application of expectation propagation method.
Efficient end-to-end learning for quantizable representationsNAVER Engineering
발표자: 정연우(서울대 박사과정)
발표일: 2018.7.
유사한 이미지 검색을 위해 neural network를 이용해 이미지의 embedding을 학습시킨다. 기존 연구에서는 검색 속도 증가를 위해 binary code의 hamming distance를 활용하지만 여전히 전체 데이터 셋을 검색해야 하며 정확도가 떨어지는 다는 단점이 있다. 이 논문에서는 sparse한 binary code를 학습하여 검색 정확도가 떨어지지 않으면서 검색 속도도 향상시키는 해쉬 테이블을 생성한다. 또한 mini-batch 상에서 optimal한 sparse binary code를 minimum cost flow problem을 통해 찾을 수 있음을 보였다. 우리의 방법은 Cifar-100과 ImageNet에서 precision@k, NMI에서 최고의 검색 정확도를 보였으며 각각 98× 와 478×의 검색 속도 증가가 있었다.
Surrogate models emulate expensive computer simulations. The objective is to approximate a function, $f$, of $d$ variables to a given tolerance, $\varepsilon$, using as few function values as possible, preferably $O(d)$. We explain how tractability theory provides lower bounds on the number of function values required for any possible method. We also propose method for sampling $f$ and approximating $f$ that achieves this objective and the kind of underlying structure that $f$ must have for success.
Mathematics (from Greek μάθημα máthēma, “knowledge, study, learning”) is the study of topics such as quantity (numbers), structure, space, and change. There is a range of views among mathematicians and philosophers as to the exact scope and definition of mathematics
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
1. Iterated geometric harmonics for missing data recovery
Iterated geometric harmonics
for missing data recovery
Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang
jlindgre, epearse, zazhang, @calpoly.edu
California Polytechnic State University
Nov. 14, 2015
California Polytechnic State University
San Luis Obispo, CA
2. Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
The missing data problem
Missing data is often a problem. Data can be lost
while recording measurements,
during storage or transmission,
due to equipment failure,
...
3. Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
The missing data problem
Missing data is often a problem. Data can be lost
while recording measurements,
during storage or transmission,
due to equipment failure,
...
Existing techniques:
require some records (rows) to be complete, or
require some characteristics (columns) to be complete, or
are based on linear regression.
(But data often has highly nonlinear internal structure!)
4. Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
A dataset is a collection of vectors, stored as a matrix
The data is an n × p matrix. Each row is a vector of length p; one
row is a record and each column is a parameter or coordinate.
{[ ]n records
(p characteristics)
one record
5. Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
A dataset is a collection of vectors, stored as a matrix
The data is an n × p matrix. Each row is a vector of length p; one
row is a record and each column is a parameter or coordinate.
EXAMPLES
36 photos, each of size 112 pixels × 92 pixels.
{vk}36
k=1 ⊆ R10,304. (Each photo stored as a vector)
Results from a psychology experiment: a 50-question exam
given to 200 people.
{vk}200
k=1 ⊆ R50.
3000 student records (SAT, ACT, GPA, class rank, etc.)
{vk}3000
k=1 ⊆ R20.
6. Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
Special case of the missing data problem
Suppose all missing data are in one column
v1
v2 f2
v3
...
vn fn
Consider last column as a function f : {1, 2, . . . , n} → R.
7. Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
Out-of-sample extension of an empirical function
Idea: A function f is defined on a subset Γ of the dataset.
f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known.
Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ.
f
XΓ
8. Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
Out-of-sample extension of an empirical function
Idea: A function f is defined on a subset Γ of the dataset.
f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known.
Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ.
f
F
XΓ
9. Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
Out-of-sample extension of an empirical function
Idea: A function f is defined on a subset Γ of the dataset.
f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known.
Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ.
Application: The data is a sample {(x, f(x))}x∈Γ.
Example: X may be a collection of images or documents.
Y = R
Want to generalize to as-yet-unseen instances in X.
“function extension” ←→ “automated sorting”
=⇒ machine learning/manifold learning
10. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Similarities within data are modeled via nonlinearity
Introduce a nonlinear kernel function k to model the similarity
between two vectors.
k(v, u) =
≈ 0, v and u very different
≈ 1, v and u very similar
11. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Similarities within data are modeled via nonlinearity
Introduce a nonlinear kernel function k to model the similarity
between two vectors.
k(v, u) =
≈ 0, v and u very different
≈ 1, v and u very similar
Two possible choices of such a kernel function:
k(v, u) =
e− v−u 2
2/ε
| Corr(v, u)|m
12. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.
13. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.
Vector vi −→ vertex vi in the network.
v1
v2
v3
v4
k
−−−−−→
v1 • 4
2
• v2
3
wwwwwwwww
v3 •
1
• v4
14. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.
Vector vi −→ vertex vi in the network.
v1
v2
v3
v4
k
−−−−−→
v1 • 4
2
• v2
3
wwwwwwwww
v3 •
1
• v4
K =
v1 v2 v3 v4
v1
v2
v3
v4
0 4 2 0
4 0 3 0
2 3 0 1
0 0 1 0
15. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.
Efficiency gain: n × p data matrix → n × n adjacency matrix
v1
v2
v3
v4
k
−−−−−→ K =
0 4 2 0
4 0 3 0
2 3 0 1
0 0 1 0
Ki,j := k(vi, vi)
Advantageous for high-dimensional datasets: p >> n.
16. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics
Coifman and Lafon introduced the machine learning tool
“geometric harmonics” in 2005.
Idea: the eigenfunctions of a diffusion operator can be used to
perform global analysis of the dataset and of functions on a
dataset.
17. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: construction and definition
For matrix K with Ku,v = k(u, v), consider the integral operator
f → Kf by (Kf)(u) :=
v∈Γ
Ku,vf(v), u ∈ X.
“Restricted matrix multiplication”
18. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: construction and definition
For matrix K with Ku,v = k(u, v), consider the integral operator
f → Kf by (Kf)(u) :=
v∈Γ
Ku,vf(v), u ∈ X.
Diagonalize restricted matrix [K]u,v∈Γ via:
v∈Γ
Ku,vψj(v) = λjψj(u), u ∈ Γ.
NOTE:
k symmetric =⇒ K symmetric =⇒ {ψj} form ONB
19. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: construction and definition
For matrix K with Ku,v = k(u, v), consider the integral operator
f → Kf by (Kf)(u) :=
v∈Γ
Ku,vf(v), u ∈ X.
Diagonalize restricted matrix [K]u,v∈Γ via:
v∈Γ
Ku,vψj(v) = λjψj(u), u ∈ Γ.
[Nystr¨om] Reverse this equation to define values off Γ:
Ψj(u) :=
1
λj
v∈Γ
Ku,vψj(v), u ∈ X.
{Ψj}n
j=1 are the geometric harmonics, where n = |Γ|.
20. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: the extension algorithm
For f : Γ → Y and n = |Γ|, define
F(x) =
n
j=1
f, ψj ΓΨj(x), x ∈ X.
21. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: the extension algorithm
For f : Γ → Y and n = |Γ|, define
F(x) =
n
j=1
f, ψj ΓΨj(x), x ∈ X.
For x ∈ Γ, Ψj(x) = ψj(x), so
F(x) =
n
j=1
f, ψj ΓΨj(x) =
n
j=1
f, ψj Γψj(x) = f(x),
since this is just the decomposition of f in the ONB {ψj}n
j=1.
22. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: limitations
Geometric harmonics does not apply to missing data.
Consider f : Γ → R as extra column with holes:
v1
v2
v3 f
...
vn
Geometric harmonics requires first p columns to be complete.
23. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: basic idea
Underlying assumption of geometric harmonics:
Data are samples from a submanifold.
Restated as a continuity assumption:
If p − 1 entries of u and v are very close, then so is the pth.
24. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: basic idea
Underlying assumption of geometric harmonics:
Data are samples from a submanifold.
Restated as a continuity assumption:
If p − 1 entries of u and v are very close, then so is the pth.
Idea: Consider jth column to be a function of the others
v1
v2
...
vn
−→
a11
a21
...
an1
a12
a22
...
an2
. . .
. . .
. . .
a1j
a2j
...
anj
. . .
. . .
. . .
a1p
a2p
...
anp
Geometric harmonics can be applied to jth column.
25. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: the iteration scheme
1 Record locations of missing values in the dataset.
2 Stochastically impute missing values.
Drawn from N(µ, σ2
), computed columnwise.
26. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: the iteration scheme
1 Record locations of missing values in the dataset.
2 Stochastically impute missing values.
Drawn from N(µ, σ2
), computed columnwise.
3 Iteration through columns.
(a) Choose (at random) a column to update.
(b) “Unlock” entries of column to be imputed.
(c) Use geometric harmonics to update those entries.
Current column is treated as a function of the others.
New values are initially computed in terms of poor guesses.
Successive passes improve guesses.
(d) Continue until all columns are updated.
27. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: the iteration scheme
1 Record locations of missing values in the dataset.
2 Stochastically impute missing values.
Drawn from N(µ, σ2
), computed columnwise.
3 Iteration through columns.
(a) Choose (at random) a column to update.
(b) “Unlock” entries of column to be imputed.
(c) Use geometric harmonics to update those entries.
Current column is treated as a function of the others.
New values are initially computed in terms of poor guesses.
Successive passes improve guesses.
(d) Continue until all columns are updated.
4 Repeat iteration until updates cause negligible change.
Process typically stabilizes after about 4 cycles.
28. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
29. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
30. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
damaged restored original
(70% data loss)
31. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: applications
Iterated geometric harmonics requires continuity assumption
Probably not well-suited to social network analysis, etc.
32. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: applications
Iterated geometric harmonics requires continuity assumption
Probably not well-suited to social network analysis, etc.
Iterated geometric harmonics requires multiple similar
datapoints/records
Video footage is a natural application.
10–24 images per second, usually very similar.
Applications for security, military, law enforcement.
33. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: applications
Iterated geometric harmonics requires continuity assumption
Probably not well-suited to social network analysis, etc.
Iterated geometric harmonics requires multiple similar
datapoints/records
Video footage is a natural application.
10–24 images per second, usually very similar.
Applications for security, military, law enforcement.
Iterated geometric harmonics excels when p >> n
However, has demonstrated good performance on
low-dimensional time series.
Example: San Diego weather data (next slide)
34. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
San Diego Airport weather data
n = 2000, p = 25
0 1 2 3 4 5
0
500
1000
1500
2000
2500
GH Iterations
L−2Error
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 1 2 3 4 5 6
8
10
12
14
16
18
20
22
GH Iterations
StandardDeviation
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
35. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Summary
Iterated Geometric Harmonics (IGH):
Robust data reconstruction, even at high rates of data loss.
Well suited to high-dimensional problems p >> n.
Relies on continuity assumptions on underlying data.
Application to image reconstruction, video footage, etc.
Patent pending (U.S. Patent Application No.: 14/920,556)
36. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Summary
Iterated Geometric Harmonics (IGH):
Robust data reconstruction, even at high rates of data loss.
Well suited to high-dimensional problems p >> n.
Relies on continuity assumptions on underlying data.
Application to image reconstruction, video footage, etc.
Patent pending (U.S. Patent Application No.: 14/920,556)
Future work: noisy data.
37. Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics
for missing data recovery
Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang
jlindgre, epearse, zazhang, @calpoly.edu
California Polytechnic State University
Nov. 14, 2015
California Polytechnic State University
San Luis Obispo, CA
38. Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: noisy data
The problem of “noisy data” is more difficult:
Before improving the data, bad values need to be located.
39. Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: noisy data
The problem of “noisy data” is more difficult:
Before improving the data, bad values need to be located.
Current work: using Markov random fields to detect noise.
Markov random fields: another graph-based tool for data
analysis.
40. Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: Markov random fields
original
(noisy)
data
improved
data
41. Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: Markov random fields
original
(noisy)
data
improved
data
a1 a2 a3
a4
w13
u4
u1
u5
u2
u6
u3
w12
w45
w23
w56
w24 w35
a5 a6
b1 b2 b3
b4 b5 b6
Minimize the energy functional:
E = wij(ai − aj)2
+ ui(ai − bi)2
where {bi} are given,
wij are tuned by user (and usually all equal), and
ui are tuned by user (and usually all equal).
42. Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: Markov random fields
original
(noisy)
data
improved
data
a1 a2 a3
a4
w13
u4
u1
u5
u2
u6
u3
w12
w45
w23
w56
w24 w35
a5 a6
b1 b2 b3
b4 b5 b6
Minimize the energy functional:
E = (ai − aj)2
+ λ (ai − bi)2
where {bi} are given,
wij = ui = 1, and λ is tuned by user.
43. Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: Markov random fields
Markov random fields (MRF) use simulated annealing solve
minimize E given {bi}
Output: improved data {ai}.
Our approach:
1 Apply MRF to find improved data {ai}.
2 Compare {ai} to original data {bi}.
3 Label nodes with large values of |ai − bi| as missing data.
4 Apply IGH and obtain better improved data.
44. Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Iterated geometric harmonics
for missing data recovery
Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang
jlindgre, epearse, zazhang, @calpoly.edu
California Polytechnic State University
Nov. 14, 2015
California Polytechnic State University
San Luis Obispo, CA
45. Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
Suppose X ∈ Rn and k : X × X → R is
nonnegative: k(x, y) ≥ 0
symmetric: k(x, y) = k(y, x)
positive semidefinite: for any choice of {xi}m
i=1,
Ki,j = k(xi, xj) defines a positive semidefinite matrix.
46. Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
Suppose X ∈ Rn and k : X × X → R is
nonnegative: k(x, y) ≥ 0
symmetric: k(x, y) = k(y, x)
positive semidefinite: for any choice of {xi}m
i=1,
Ki,j = k(xi, xj) defines a positive semidefinite matrix.
[Aronszajn] There is a Hilbert space H of functions on X with
kx := k(x, ·) ∈ H, for x ∈ X
kx, f = f(x) (reproducing property)
47. Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
Suppose X ∈ Rn and k : X × X → R is
nonnegative: k(x, y) ≥ 0
symmetric: k(x, y) = k(y, x)
positive semidefinite: for any choice of {xi}m
i=1,
Ki,j = k(xi, xj) defines a positive semidefinite matrix.
[Aronszajn] There is a Hilbert space H of functions on X with
kx := k(x, ·) ∈ H, for x ∈ X
kx, f = f(x) (reproducing property)
In the discrete case, H is the closure of
f = x axkx, ax ∈ scalars.
48. Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
For Γ ⊆ X, the operator K : L2(Γ, µ) → H given by
(Kf)(x) =
Γ
k(x, y)f(y)dµ(y), x ∈ X,
turns out to have adjoint operator K : H → L2(Γ, µ) given by
domain restriction:
K g(y) = g(y), y ∈ Γ, g ∈ H.
49. Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
For Γ ⊆ X, the operator K : L2(Γ, µ) → H given by
(Kf)(x) =
Γ
k(x, y)f(y)dµ(y), x ∈ X,
turns out to have adjoint operator K : H → L2(Γ, µ) given by
domain restriction:
K g(y) = g(y), y ∈ Γ, g ∈ H.
K K is self-adjoint, positive, and compact.
Its eigenvalues are discrete and non-negative.
Since K is restriction, eigs can be found by diagonalizing k
on Γ.