SlideShare a Scribd company logo
class SelfIntro自己紹介:
1
def __init__ (私):
私.名 = 'Renyuan Lyu, 呂 仁園'
私.職業 = 'University Professor, 大学の先生'
私.研究分野 = 'Speech Recognition, 音声認識'
私.職場 = 'Chang Gung Univ (CGU), 長庚大學'
私.国 = 'TAIWAN, 台灣’
私.誇り = '''
Pycon JP speaker (2015~2017, 2019 ),
カラオケさん'''
https://youtu.be/O1-9Yv9cB8Q
2
https://youtu.be/cUewj2kRrbk?t=2434
Lightning Talks at PyCon JP 2016, 2017
Real-time Pitch Detection
and Speech Recognition
in Python
via Pyaudio, Pygame & Vpython
Renyuan Lyu (呂仁園),
Chang Gung University (長庚大學),
TAIWAN (台灣)
@ Pycon JP 2019 3
The System
4
Multilingual
Lyric Transcription
(Speech Recognition)
Pitch
Detection
(Melody Recognition)
https://youtu.be/XF3oGwEsPac
The System
Singing
Voice
Multilingual
Lyric Transcription
(Speech Recognition)
Pitch
Detection
(Melody Recognition)
Lyrics (歌詞)
“Twinkle Twinkle Little Star”
“きらきらひかる”
“一閃一閃亮晶晶”
Pitch (musical notes, 音符)
“C C G G A A C –”
5
Data (Voice) acquisition
• Audio Signal Processing
• samplingRate= 16000 samples/sec,
• bitsPerSample= 16 bits/sample = 2 bytes/sample
• channelNumber= 3 (L, R, humming)
• Frame-wise short-time processing
Frame01
Frame02
6
Digital Signal Processing:
Spectrogram
• A spectrogram is
• a visual representation
• of the spectrum
• of frequencies
• of a signal
• as it varies with time.
• using Fast Fourier Transform
• FFT
7
https://youtu.be/bCRL5yw8fXA
A Real-time Spectrogram
http://friture.org/
8
https://youtu.be/1sbtXqZaGXE
• Friture is a program in PYTHON
designed to analyze audio input in
real-time.
• It displays audio data as a scope,
a spectrum analyzer, or with a
rolling 2D spectrogram.
• I found this program in 2012~2013
and was totally convinced that I
can transfer into the PYTHON
world to continue my career.
Using Audacity
to get audio signal
9
https://youtu.be/o9DF9SVdcVo
The first step to do audio signal processing
is to get some audio signal by yourself
and play with it.
WAVE PCM
soundfile format
(.wav)
• http://soundfile.sapp.org
/doc/WaveFormat/
10
• Compared with text data,
audio data is much bigger,
and it is usually stored in
binary form.
• Being familiar with the data
format is crucial to process it.
“See” the audio signal in the raw format
11
Extract audio header information
12
Visualize the audio signal in waveform
• As long as you can visualize the
audio signal, you can make sure
you read them in a correct way,
• and then you can do further
processing via advanced signal
processing algorithms
• like Pitch Detection and Speech
Recognition.
13
Human aided pitch tracking
by Humming
• Pitch Detection for real music
signal is not easy by itself.
• To simplify the task, I use
some TRICK….
• I hum the song and record it in
another channel, while listening
the music.
• I use this “clean” humming
voice to detect the pitch.
14
Multi-Threading Programming
15
def init(self):
self.錄音線= threading.Thread(target= self.錄音線程)
self.能量線= threading.Thread(target= self.f1_能量)
self.基頻線= threading.Thread(target= self.f4_基頻)
self.語音辨認線= threading.Thread(target= self.f6_語音辨認)
def start(self):
self.錄音線.start()
self.能量線.start()
self.基頻線.start()
self.語音辨認線.start()
• For a Realtime system,
the multi-threading
programming is crucial,
• At least, an independent
thread for data
acquisition is necessary.
audio recording “Thread”
16
A circular buffer
to store the real-time
audio signal
17
I set a buffer in RAM to store 16 sec of voice,
It is of size 16*16000*2*3= 1,536,000 bytes
Pitch Detection Algorithm
18
• Zoom a speech signal into scale of .01 sec, We
can visualize there are periodic patterns.
• the duration of a periodic pattern is called
the “pitch period”.
• For the A-440 note, the pitch period =
1/440 = .0023 sec
• A traditionally popular pitch detection
algorithm is based on auto-correlation
method.
Pitch Detection Thread
19
Pitch Sampling at slower intervals
20
Pitch Quantization
21
Speech Recognition
• http://shorturl.at/rxLM4
22
23
Speech Recognition
need Large-scale of Database
to train the system.
Nowadays, Deep-learning
algorithms play the major roles
and achieve the greatest
performance.
Speech Recognition in Python
24
https://pypi.org/project/SpeechRecognition/
Google has a great Speech Recognition API.
This API converts spoken text (microphone)
into written text (Python strings)
the ASR Thread
25
Get a segment (M frames) of speech  x
Transform x into an “AudioData” and then
send it to Google Speech Recognition engine
to get a recognition output “text”.
To get speech data from a circular buffer is
quite an issue for implementation. !!
26
def 語音辨認(私):
辨= sr.Recognizer()
while self.語音辨認中==True:
#
# Get x as "singingVoice" to be 音
#
音= sr.AudioData(x, 私.取樣率, 私.樣本寬)
#
# Do ASR to get recognition Result as 文
#
try:
if lang=='ja':
文= 辨.recognize_google(音, language='ja')
elif lang=='en':
文= 辨.recognize_google(音, language='en')
elif lang= 'zh-TW'
文= 辨.recognize_google(音, language='zh-TW')
else:
私.文= '{} ({})'.format(文, lang)
except:
私.文= 'exceptionOccurs!!'
pass
return
Lyric Transcription
• Melodic voice (singing) recognition
• Timed Text Generation
• Need do Speech recognition and
segmentation
• Currently, it was done by human,
not yet by machine.
27
Kara OK
• Pitch Tracking
• Timed Text Displaying
28
https://youtu.be/F1_Xz1c5AEE
Final
Demo
29
https://youtu.be/0cdo6ZnBZc8
ご清聴ありがとうございました。
Thank you for listening.
感謝聆聽。
@ PyCon Jp 2019
Renyuan Lyu
From TAIWAN
30

More Related Content

What's hot

Speech processing
Speech processingSpeech processing
Multimedia
MultimediaMultimedia
Multimedia
BUDNET
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization works
Muhammad Taqi
 
Speech Recognition No Code
Speech Recognition No CodeSpeech Recognition No Code
Speech Recognition No Code
Gerome Jan M. Llames
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
International Journal of Science and Research (IJSR)
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Edureka!
 
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
Danilo J. S. Bellini
 
MPEG 4
MPEG 4MPEG 4
MPEG 4
tvutech
 
Conditional generative model for audio
Conditional generative model for audioConditional generative model for audio
Conditional generative model for audio
Keunwoo Choi
 
Aichroth audio forensics and automation
Aichroth audio forensics and automationAichroth audio forensics and automation
Aichroth audio forensics and automation
FIAT/IFTA
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
KrishnaMildain
 
Multimedia
Multimedia Multimedia
Multimedia
philipsinter
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
Yuki Saito
 
Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016
Ye Sung (Rebecca) Kim
 
Basic audio programming
Basic audio programmingBasic audio programming
Basic audio programming
Iulian-Nicu Şerbănoiu
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
Lizy Abraham
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Keunwoo Choi
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
Dr. Uday Saikia
 

What's hot (18)

Speech processing
Speech processingSpeech processing
Speech processing
 
Multimedia
MultimediaMultimedia
Multimedia
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization works
 
Speech Recognition No Code
Speech Recognition No CodeSpeech Recognition No Code
Speech Recognition No Code
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
 
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
 
MPEG 4
MPEG 4MPEG 4
MPEG 4
 
Conditional generative model for audio
Conditional generative model for audioConditional generative model for audio
Conditional generative model for audio
 
Aichroth audio forensics and automation
Aichroth audio forensics and automationAichroth audio forensics and automation
Aichroth audio forensics and automation
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
 
Multimedia
Multimedia Multimedia
Multimedia
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
 
Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016
 
Basic audio programming
Basic audio programmingBasic audio programming
Basic audio programming
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
 

Similar to Py conjp2019 renyuanlyu_3

Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaoke
Renyuan Lyu
 
Pycon apac 2014
Pycon apac 2014Pycon apac 2014
Pycon apac 2014
Renyuan Lyu
 
Desktop assistant
Desktop assistant Desktop assistant
Desktop assistant
PRASUNCHAKRABORTY21
 
Automatic subtitle generation
Automatic subtitle generationAutomatic subtitle generation
Automatic subtitle generation
tanyasaxena1611
 
Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2
ThomasDowson123
 
Ig2 task 1 re edit version
Ig2 task 1 re edit versionIg2 task 1 re edit version
Ig2 task 1 re edit version
cameronbailey1996
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
Adambailey-eccles
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Maarten Balliauw
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
diegogee
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
Jakeyhyatt123
 
Pod Series Audio10
Pod Series Audio10Pod Series Audio10
Pod Series Audio10
Dan Cabrera
 
Django Python(2)
Django Python(2)Django Python(2)
Django Python(2)
tomcoh
 
Podcasting
PodcastingPodcasting
Podcasting
Craig Lawson
 
Pod Series Audio14
Pod Series Audio14Pod Series Audio14
Pod Series Audio14
Dan Cabrera
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
hajohnson90
 
Speech Dubbing Software
Speech Dubbing SoftwareSpeech Dubbing Software
Speech Dubbing Software
PushkarKumar8856
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
Edureka!
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
KyleFielding
 
Entering the Fourth Dimension of OCR with Tesseract
Entering the Fourth Dimension of OCR with TesseractEntering the Fourth Dimension of OCR with Tesseract
Entering the Fourth Dimension of OCR with Tesseract
🎤 Hanno Embregts 🎸
 
Input, Processing and Output
Input, Processing and OutputInput, Processing and Output
Input, Processing and Output
Munazza-Mah-Jabeen
 

Similar to Py conjp2019 renyuanlyu_3 (20)

Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaoke
 
Pycon apac 2014
Pycon apac 2014Pycon apac 2014
Pycon apac 2014
 
Desktop assistant
Desktop assistant Desktop assistant
Desktop assistant
 
Automatic subtitle generation
Automatic subtitle generationAutomatic subtitle generation
Automatic subtitle generation
 
Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2
 
Ig2 task 1 re edit version
Ig2 task 1 re edit versionIg2 task 1 re edit version
Ig2 task 1 re edit version
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Pod Series Audio10
Pod Series Audio10Pod Series Audio10
Pod Series Audio10
 
Django Python(2)
Django Python(2)Django Python(2)
Django Python(2)
 
Podcasting
PodcastingPodcasting
Podcasting
 
Pod Series Audio14
Pod Series Audio14Pod Series Audio14
Pod Series Audio14
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Speech Dubbing Software
Speech Dubbing SoftwareSpeech Dubbing Software
Speech Dubbing Software
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
 
Entering the Fourth Dimension of OCR with Tesseract
Entering the Fourth Dimension of OCR with TesseractEntering the Fourth Dimension of OCR with Tesseract
Entering the Fourth Dimension of OCR with Tesseract
 
Input, Processing and Output
Input, Processing and OutputInput, Processing and Output
Input, Processing and Output
 

More from Renyuan Lyu

Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
Renyuan Lyu
 
Lightning talk01 docx
Lightning talk01 docxLightning talk01 docx
Lightning talk01 docx
Renyuan Lyu
 
Lightning talk01
Lightning talk01Lightning talk01
Lightning talk01
Renyuan Lyu
 
Pycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch DetectionPycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch Detection
Renyuan Lyu
 
pycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslatepycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslate
Renyuan Lyu
 
pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__
Renyuan Lyu
 
Ry pyconjp2015 turtle
Ry pyconjp2015 turtleRy pyconjp2015 turtle
Ry pyconjp2015 turtle
Renyuan Lyu
 
教青少年寫程式
教青少年寫程式教青少年寫程式
教青少年寫程式
Renyuan Lyu
 

More from Renyuan Lyu (8)

Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
Lightning talk01 docx
Lightning talk01 docxLightning talk01 docx
Lightning talk01 docx
 
Lightning talk01
Lightning talk01Lightning talk01
Lightning talk01
 
Pycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch DetectionPycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch Detection
 
pycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslatepycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslate
 
pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__
 
Ry pyconjp2015 turtle
Ry pyconjp2015 turtleRy pyconjp2015 turtle
Ry pyconjp2015 turtle
 
教青少年寫程式
教青少年寫程式教青少年寫程式
教青少年寫程式
 

Recently uploaded

How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Diana Rendina
 

Recently uploaded (20)

How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
 

Py conjp2019 renyuanlyu_3

  • 1. class SelfIntro自己紹介: 1 def __init__ (私): 私.名 = 'Renyuan Lyu, 呂 仁園' 私.職業 = 'University Professor, 大学の先生' 私.研究分野 = 'Speech Recognition, 音声認識' 私.職場 = 'Chang Gung Univ (CGU), 長庚大學' 私.国 = 'TAIWAN, 台灣’ 私.誇り = ''' Pycon JP speaker (2015~2017, 2019 ), カラオケさん'''
  • 3. Real-time Pitch Detection and Speech Recognition in Python via Pyaudio, Pygame & Vpython Renyuan Lyu (呂仁園), Chang Gung University (長庚大學), TAIWAN (台灣) @ Pycon JP 2019 3
  • 4. The System 4 Multilingual Lyric Transcription (Speech Recognition) Pitch Detection (Melody Recognition) https://youtu.be/XF3oGwEsPac
  • 5. The System Singing Voice Multilingual Lyric Transcription (Speech Recognition) Pitch Detection (Melody Recognition) Lyrics (歌詞) “Twinkle Twinkle Little Star” “きらきらひかる” “一閃一閃亮晶晶” Pitch (musical notes, 音符) “C C G G A A C –” 5
  • 6. Data (Voice) acquisition • Audio Signal Processing • samplingRate= 16000 samples/sec, • bitsPerSample= 16 bits/sample = 2 bytes/sample • channelNumber= 3 (L, R, humming) • Frame-wise short-time processing Frame01 Frame02 6
  • 7. Digital Signal Processing: Spectrogram • A spectrogram is • a visual representation • of the spectrum • of frequencies • of a signal • as it varies with time. • using Fast Fourier Transform • FFT 7 https://youtu.be/bCRL5yw8fXA
  • 8. A Real-time Spectrogram http://friture.org/ 8 https://youtu.be/1sbtXqZaGXE • Friture is a program in PYTHON designed to analyze audio input in real-time. • It displays audio data as a scope, a spectrum analyzer, or with a rolling 2D spectrogram. • I found this program in 2012~2013 and was totally convinced that I can transfer into the PYTHON world to continue my career.
  • 9. Using Audacity to get audio signal 9 https://youtu.be/o9DF9SVdcVo The first step to do audio signal processing is to get some audio signal by yourself and play with it.
  • 10. WAVE PCM soundfile format (.wav) • http://soundfile.sapp.org /doc/WaveFormat/ 10 • Compared with text data, audio data is much bigger, and it is usually stored in binary form. • Being familiar with the data format is crucial to process it.
  • 11. “See” the audio signal in the raw format 11
  • 12. Extract audio header information 12
  • 13. Visualize the audio signal in waveform • As long as you can visualize the audio signal, you can make sure you read them in a correct way, • and then you can do further processing via advanced signal processing algorithms • like Pitch Detection and Speech Recognition. 13
  • 14. Human aided pitch tracking by Humming • Pitch Detection for real music signal is not easy by itself. • To simplify the task, I use some TRICK…. • I hum the song and record it in another channel, while listening the music. • I use this “clean” humming voice to detect the pitch. 14
  • 15. Multi-Threading Programming 15 def init(self): self.錄音線= threading.Thread(target= self.錄音線程) self.能量線= threading.Thread(target= self.f1_能量) self.基頻線= threading.Thread(target= self.f4_基頻) self.語音辨認線= threading.Thread(target= self.f6_語音辨認) def start(self): self.錄音線.start() self.能量線.start() self.基頻線.start() self.語音辨認線.start() • For a Realtime system, the multi-threading programming is crucial, • At least, an independent thread for data acquisition is necessary.
  • 17. A circular buffer to store the real-time audio signal 17 I set a buffer in RAM to store 16 sec of voice, It is of size 16*16000*2*3= 1,536,000 bytes
  • 18. Pitch Detection Algorithm 18 • Zoom a speech signal into scale of .01 sec, We can visualize there are periodic patterns. • the duration of a periodic pattern is called the “pitch period”. • For the A-440 note, the pitch period = 1/440 = .0023 sec • A traditionally popular pitch detection algorithm is based on auto-correlation method.
  • 20. Pitch Sampling at slower intervals 20
  • 23. 23 Speech Recognition need Large-scale of Database to train the system. Nowadays, Deep-learning algorithms play the major roles and achieve the greatest performance.
  • 24. Speech Recognition in Python 24 https://pypi.org/project/SpeechRecognition/ Google has a great Speech Recognition API. This API converts spoken text (microphone) into written text (Python strings)
  • 25. the ASR Thread 25 Get a segment (M frames) of speech  x Transform x into an “AudioData” and then send it to Google Speech Recognition engine to get a recognition output “text”. To get speech data from a circular buffer is quite an issue for implementation. !!
  • 26. 26 def 語音辨認(私): 辨= sr.Recognizer() while self.語音辨認中==True: # # Get x as "singingVoice" to be 音 # 音= sr.AudioData(x, 私.取樣率, 私.樣本寬) # # Do ASR to get recognition Result as 文 # try: if lang=='ja': 文= 辨.recognize_google(音, language='ja') elif lang=='en': 文= 辨.recognize_google(音, language='en') elif lang= 'zh-TW' 文= 辨.recognize_google(音, language='zh-TW') else: 私.文= '{} ({})'.format(文, lang) except: 私.文= 'exceptionOccurs!!' pass return
  • 27. Lyric Transcription • Melodic voice (singing) recognition • Timed Text Generation • Need do Speech recognition and segmentation • Currently, it was done by human, not yet by machine. 27
  • 28. Kara OK • Pitch Tracking • Timed Text Displaying 28 https://youtu.be/F1_Xz1c5AEE
  • 30. ご清聴ありがとうございました。 Thank you for listening. 感謝聆聽。 @ PyCon Jp 2019 Renyuan Lyu From TAIWAN 30