GAN-based statistical speech synthesis (in Japanese)Yuki Saito
Guest presentation at "Applied Gaussian Process and Machine Learning," Graduate School of Information Science and Technology, The University of Tokyo, Japan, 2021.
2017年春季研究発表会の発表資料です.
邦題: 形態素解析も辞書も言語モデルもいらないend-to-end音声認識
英題: End-to-end Japanese ASR without using morphological analyzer, pronunciation dictionary and language model
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
Guest presentation at "Applied Gaussian Process and Machine Learning," Graduate School of Information Science and Technology, The University of Tokyo, Japan, 2021.
2017年春季研究発表会の発表資料です.
邦題: 形態素解析も辞書も言語モデルもいらないend-to-end音声認識
英題: End-to-end Japanese ASR without using morphological analyzer, pronunciation dictionary and language model
ICASSP 2019音声&音響論文読み会(https://connpass.com/event/128527/)での発表資料です。
AASP (Audio and Acoustic Signal Processing) 分野の紹介と、ICASSP 2019での動向を紹介しています。#icassp2019jp
The document describes a real-time DNN voice conversion system with feedback to acquire character traits. It proposes a method to provide real-time feedback of the converted voice to the speaker to encourage speech modification (prosody and emphasis) towards the target speaker's character. Subjective evaluations from the first-person (user) perspective and third-person perspective found that the system improved the reproduction of the target speaker's character, especially for inexperienced users. Providing only pitch feedback was already quite effective.
ICASSP 2019音声&音響論文読み会(https://connpass.com/event/128527/)での発表資料です。
AASP (Audio and Acoustic Signal Processing) 分野の紹介と、ICASSP 2019での動向を紹介しています。#icassp2019jp
The document describes a real-time DNN voice conversion system with feedback to acquire character traits. It proposes a method to provide real-time feedback of the converted voice to the speaker to encourage speech modification (prosody and emphasis) towards the target speaker's character. Subjective evaluations from the first-person (user) perspective and third-person perspective found that the system improved the reproduction of the target speaker's character, especially for inexperienced users. Providing only pitch feedback was already quite effective.
DynamicFusion is a method for reconstructing and tracking non-rigid scenes in real-time by extending KinectFusion. It uses a volumetric truncated signed distance function (TSDF) to integrate depth maps from multiple viewpoints into a global reconstruction. Live depth frames are aligned to a dense surface prediction generated by raycasting the TSDF. This closes the loop between mapping and localization for tracking dynamic, non-rigid scenes.
This document summarizes the organization of UIST 2016, including the chairs and committees. It notes there were 632 attendees, with registration capped at 500. It lists the chairs for various committees like local arrangements, program, doctoral symposium, and student volunteers. The local arrangement chair was Masa Ogata. It discusses the roles and members of the local arrangement, student volunteer, and other committees in organizing the conference.
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-‐rigid Scenes in Real...Ken Sakurada
CVPR2015(Best Paper Award)の論文紹介
"DynamicFusion: Reconstruction and Tracking of Non-‐rigid Scenes in Real-‐Time"
Richard A. Newcombe, Dieter Fox, Steven M. Seitz
内容に関して何かお気づきになりましたら,スライドに記載されているメールアドレスにご連絡頂けると幸いです
CVPR2016 was held in Las Vegas from June 26-July 1. The author attended and reported on trends in papers presented. Deep learning and CNNs were widely used for tasks like object detection, segmentation, pose estimation and re-identification. Fast/Faster R-CNN models were common for object detection. CNNs combined with CRFs were frequent for segmentation. Papers on 3D object detection from RGB-D data and dense 3D correspondence between human bodies using CNNs were highlighted.
1. The document discusses various statistical and neural network-based models for representing words and modeling semantics, including LSI, PLSI, LDA, word2vec, and neural network language models.
2. These models represent words based on their distributional properties and contexts using techniques like matrix factorization, probabilistic modeling, and neural networks to learn vector representations.
3. Recent models like word2vec use neural networks to learn word embeddings that capture linguistic regularities and can be used for tasks like analogy-making and machine translation.
Utilization of social networking services toward accessible academic meetings
From the viewpoint of ``reasonable information accessibility,''
the popularization of real-time social networking services,
such as Twitter and Ustream,
can be regarded as the oppotunity to change academic meetings
into the ``universally designed'' events.
Accessibility should always be considered to some extent
without excessive cost.
Captioning service using automatic speech recognition is also investigated
as a part of the universally designed events using social media.