The document discusses distances between data and similarity measures in data analysis. It introduces the concept of distance between data as a quantitative measure of how different two data points are, with smaller distances indicating greater similarity. Distances are useful for tasks like clustering data, detecting anomalies, data recognition, and measuring approximation errors. The most common distance measure, Euclidean distance, is explained for vectors of any dimension using the concept of norm from geometry. Caution is advised when calculating distances between data with differing scales.
The document discusses distances between data and similarity measures in data analysis. It introduces the concept of distance between data as a quantitative measure of how different two data points are, with smaller distances indicating greater similarity. Distances are useful for tasks like clustering data, detecting anomalies, data recognition, and measuring approximation errors. The most common distance measure, Euclidean distance, is explained for vectors of any dimension using the concept of norm from geometry. Caution is advised when calculating distances between data with differing scales.
Recursively Summarizing Books with Human Feedbackharmonylab
公開URL:https://arxiv.org/abs/2109.10862
出典:Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano : Recursively Summarizing Books with Human Feedback, arXiv:2109.10862 (2021).
概要:MLモデルの学習のために行動の良し悪しを表すtraining signalを人間がループの中で提供する必要があるタスクが多く存在する.人間による評価に時間や専門的な知識を要するタスクの学習のためには,効果的なtraining signalを生成するためのスケーラブルな手法が必要となる.本論文では書籍全体の要約タスク(abstractive)を対象として,再帰的なタスクの分解と人間のフィードバックからの学習を組み合わせたアプローチを紹介する.モデルによる要約の中には人間が書いた要約の品質に匹敵する要約もあるが,平均するとモデルの要約は人間の要約に著しく劣ることが示された.
Recursively Summarizing Books with Human Feedbackharmonylab
公開URL:https://arxiv.org/abs/2109.10862
出典:Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano : Recursively Summarizing Books with Human Feedback, arXiv:2109.10862 (2021).
概要:MLモデルの学習のために行動の良し悪しを表すtraining signalを人間がループの中で提供する必要があるタスクが多く存在する.人間による評価に時間や専門的な知識を要するタスクの学習のためには,効果的なtraining signalを生成するためのスケーラブルな手法が必要となる.本論文では書籍全体の要約タスク(abstractive)を対象として,再帰的なタスクの分解と人間のフィードバックからの学習を組み合わせたアプローチを紹介する.モデルによる要約の中には人間が書いた要約の品質に匹敵する要約もあるが,平均するとモデルの要約は人間の要約に著しく劣ることが示された.
Tensor representations in signal processing and machine learning (tutorial ta...Tatsuya Yokota
Tutorial talk in APSIPA-ASC 2020.
Title: Tensor representations in signal processing and machine learning.
Introduction to tensor decomposition (テンソル分解入門)
Basics of tensor decomposition (テンソル分解の基礎)
The document proposes a new fast algorithm for smooth non-negative matrix factorization (NMF) using function approximation. The algorithm uses function approximation to smooth the basis vectors, allowing for faster computation compared to existing methods. The method is extended to tensor decomposition models. Experimental results on image datasets show the proposed methods achieve better denoising and source separation performance compared to ordinary NMF and tensor decomposition methods, while being up to 300 times faster computationally. Future work includes extending the model to incorporate both common smoothness across factors and individual sparseness.
Linked CP Tensor Decomposition (presented by ICONIP2012)Tatsuya Yokota
This document proposes a new method called Linked Tensor Decomposition (LTD) to analyze common and individual factors from a group of tensor data. LTD combines the advantages of Individual Tensor Decomposition (ITD), which analyzes individual characteristics, and Simultaneous Tensor Decomposition (STD), which analyzes common factors in a group. LTD represents each tensor as the sum of a common factor and individual factors. An algorithm using Hierarchical Alternating Least Squares is developed to solve the LTD model. Experiments on toy problems and face reconstruction demonstrate LTD can extract both common and individual factors more effectively than ITD or STD alone. Future work will explore Tucker-based LTD and statistical independence in the LTD model
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...Tatsuya Yokota
This document introduces common spatial pattern (CSP) filters for EEG motor imagery classification. CSP filters aim to find spatial patterns in EEG data that maximize the difference between two classes. The document outlines several CSP algorithms including standard CSP, common spatially standardized CSP, and spatially constrained CSP. CSP filters extract discriminative features from EEG data that can improve classification accuracy for brain-computer interface applications involving motor imagery tasks.
This document provides an introduction to blind source separation and non-negative matrix factorization. It describes blind source separation as a method to estimate original signals from observed mixed signals. Non-negative matrix factorization is introduced as a constraint-based approach to solving blind source separation using non-negativity. The alternating least squares algorithm is described for solving the non-negative matrix factorization problem. Experiments applying these methods to artificial and real image data are presented and discussed.
This document discusses independent component analysis (ICA) for blind source separation. ICA is a method to estimate original signals from observed signals consisting of mixed original signals and noise. It introduces the ICA model and approach, including whitening, maximizing non-Gaussianity using kurtosis and negentropy, and fast ICA algorithms. The document provides examples applying ICA to separate images and discusses approaches to improve ICA, including using differential filtering. ICA is an important technique for blind source separation and independent component estimation from observed signals.