Py conjp2019 renyuanlyu_3

class SelfIntro自己紹介:
1
def __init__ (私):
私.名 = 'Renyuan Lyu, 呂仁園'
私.職業 = 'University Professor, 大学の先生'
私.研究分野 = 'Speech Recognition, 音声認識'
私.職場 = 'Chang Gung Univ (CGU), 長庚大學'
私.国 = 'TAIWAN, 台灣’
私.誇り = '''
Pycon JP speaker (2015~2017, 2019 ),
カラオケさん'''

https://youtu.be/O1-9Yv9cB8Q
2
https://youtu.be/cUewj2kRrbk?t=2434
Lightning Talks at PyCon JP 2016, 2017

Real-time Pitch Detection
and Speech Recognition
in Python
via Pyaudio, Pygame & Vpython
Renyuan Lyu (呂仁園),
Chang Gung University (長庚大學),
TAIWAN (台灣)
@ Pycon JP 2019 3

The System
4
Multilingual
Lyric Transcription
(Speech Recognition)
Pitch
Detection
(Melody Recognition)
https://youtu.be/XF3oGwEsPac

The System
Singing
Voice
Multilingual
Lyric Transcription
(Speech Recognition)
Pitch
Detection
(Melody Recognition)
Lyrics (歌詞)
“Twinkle Twinkle Little Star”
“きらきらひかる”
“一閃一閃亮晶晶”
Pitch (musical notes, 音符)
“C C G G A A C –”
5

Data (Voice) acquisition
• Audio Signal Processing
• samplingRate= 16000 samples/sec,
• bitsPerSample= 16 bits/sample = 2 bytes/sample
• channelNumber= 3 (L, R, humming)
• Frame-wise short-time processing
Frame01
Frame02
6

Digital Signal Processing:
Spectrogram
• A spectrogram is
• a visual representation
• of the spectrum
• of frequencies
• of a signal
• as it varies with time.
• using Fast Fourier Transform
• FFT
7
https://youtu.be/bCRL5yw8fXA

A Real-time Spectrogram
http://friture.org/
8
https://youtu.be/1sbtXqZaGXE
• Friture is a program in PYTHON
designed to analyze audio input in
real-time.
• It displays audio data as a scope,
a spectrum analyzer, or with a
rolling 2D spectrogram.
• I found this program in 2012~2013
and was totally convinced that I
can transfer into the PYTHON
world to continue my career.

Using Audacity
to get audio signal
9
https://youtu.be/o9DF9SVdcVo
The first step to do audio signal processing
is to get some audio signal by yourself
and play with it.

WAVE PCM
soundfile format
(.wav)
• http://soundfile.sapp.org
/doc/WaveFormat/
10
• Compared with text data,
audio data is much bigger,
and it is usually stored in
binary form.
• Being familiar with the data
format is crucial to process it.

“See” the audio signal in the raw format
11

Extract audio header information
12

Visualize the audio signal in waveform
• As long as you can visualize the
audio signal, you can make sure
you read them in a correct way,
• and then you can do further
processing via advanced signal
processing algorithms
• like Pitch Detection and Speech
Recognition.
13

Human aided pitch tracking
by Humming
• Pitch Detection for real music
signal is not easy by itself.
• To simplify the task, I use
some TRICK….
• I hum the song and record it in
another channel, while listening
the music.
• I use this “clean” humming
voice to detect the pitch.
14

Multi-Threading Programming
15
def init(self):
self.錄音線= threading.Thread(target= self.錄音線程)
self.能量線= threading.Thread(target= self.f1_能量)
self.基頻線= threading.Thread(target= self.f4_基頻)
self.語音辨認線= threading.Thread(target= self.f6_語音辨認)
def start(self):
self.錄音線.start()
self.能量線.start()
self.基頻線.start()
self.語音辨認線.start()
• For a Realtime system,
the multi-threading
programming is crucial,
• At least, an independent
thread for data
acquisition is necessary.

audio recording “Thread”
16

A circular buffer
to store the real-time
audio signal
17
I set a buffer in RAM to store 16 sec of voice,
It is of size 16*16000*2*3= 1,536,000 bytes

Pitch Detection Algorithm
18
• Zoom a speech signal into scale of .01 sec, We
can visualize there are periodic patterns.
• the duration of a periodic pattern is called
the “pitch period”.
• For the A-440 note, the pitch period =
1/440 = .0023 sec
• A traditionally popular pitch detection
algorithm is based on auto-correlation
method.

Pitch Sampling at slower intervals
20

Speech Recognition
• http://shorturl.at/rxLM4
22

23
Speech Recognition
need Large-scale of Database
to train the system.
Nowadays, Deep-learning
algorithms play the major roles
and achieve the greatest
performance.

Speech Recognition in Python
24
https://pypi.org/project/SpeechRecognition/
Google has a great Speech Recognition API.
This API converts spoken text (microphone)
into written text (Python strings)

the ASR Thread
25
Get a segment (M frames) of speech  x
Transform x into an “AudioData” and then
send it to Google Speech Recognition engine
to get a recognition output “text”.
To get speech data from a circular buffer is
quite an issue for implementation. !!

26
def 語音辨認(私):
辨= sr.Recognizer()
while self.語音辨認中==True:
#
# Get x as "singingVoice" to be 音
#
音= sr.AudioData(x, 私.取樣率, 私.樣本寬)
#
# Do ASR to get recognition Result as 文
#
try:
if lang=='ja':
文= 辨.recognize_google(音, language='ja')
elif lang=='en':
文= 辨.recognize_google(音, language='en')
elif lang= 'zh-TW'
文= 辨.recognize_google(音, language='zh-TW')
else:
私.文= '{} ({})'.format(文, lang)
except:
私.文= 'exceptionOccurs!!'
pass
return

Lyric Transcription
• Melodic voice (singing) recognition
• Timed Text Generation
• Need do Speech recognition and
segmentation
• Currently, it was done by human,
not yet by machine.
27

Kara OK
• Pitch Tracking
• Timed Text Displaying
28
https://youtu.be/F1_Xz1c5AEE

Final
Demo
29
https://youtu.be/0cdo6ZnBZc8

ご清聴ありがとうございました。
Thank you for listening.
感謝聆聽。
@ PyCon Jp 2019
Renyuan Lyu
From TAIWAN
30

Py conjp2019 renyuanlyu_3

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Py conjp2019 renyuanlyu_3

Similar to Py conjp2019 renyuanlyu_3 (20)

More from Renyuan Lyu

More from Renyuan Lyu (8)

Recently uploaded

Recently uploaded (20)

Py conjp2019 renyuanlyu_3