Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Py conjp2019 renyuanlyu_3

130 views

Published on

Invited Spee h at Py conjp2019 renyuanlyu

Published in: Engineering
  • Get HERE to Download This eBook === http://ebookdfsrewsa.justdied.com/ ebookdfsrewsa.justdied.com 3319272985-real-time-speech-and-music-classification-by-large-audio-feature-space-extraction-springer-theses.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Py conjp2019 renyuanlyu_3

  1. 1. Real-time Pitch Detection and Speech Recognition in Python via Pyaudio, Pygame & Vpython Renyuan Lyu (呂仁園), Chang Gung University (長庚大學), TAIWAN (台灣) @ Pycon JP 2019 1
  2. 2. class SelfIntroduction自己紹介: 2 def __init__ (私): 私.名 = 'Renyuan Lyu, 呂 仁園' 私.職業 = 'University Teacher, 大学の先生' 私.研究分野 = 'Speech Recognition, 音声認識' 私.職場 = 'Chang Gung Univ (CGU), 長庚大學' 私.国 = 'TAIWAN, 台灣’
  3. 3. 3 def introduce (私): 私.誇り = ‘’’ Becoming an associate prof 20+ years ago.’’’ 私.恥ずかしさ = ‘’’Still being the associate prof after those 20+ years ’’’ 私.挑戦= “Marathon runner/walker, 2019” 私.興味= '‘’ Being the Pycon JP speaker (2015~2017, 2019 ), カラオケさん’‘’
  4. 4. 田沢湖マラソン、2019/09/15 4535人が力走 4 https://www.sakigake.jp/news/article/20190915AK0028/
  5. 5. https://youtu.be/O1-9Yv9cB8Q 5 https://youtu.be/cUewj2kRrbk?t=2434 Lightning Talks at PyCon JP 2016, 2017
  6. 6. Real-time Pitch Detection and Speech Recognition in Python via Pyaudio, Pygame & Vpython 6
  7. 7. The System Overview 7 Multilingual Lyric Transcription (Speech Recognition) Pitch Detection (Melody Recognition) https://youtu.be/XF3oGwEsPac
  8. 8. The System Blockdiagram Singing Voice Multilingual Lyric Transcription (Speech Recognition) Pitch Detection (Melody Recognition) Lyrics (歌詞) “Twinkle Twinkle Little Star” “きらきらひかる” “一閃一閃亮晶晶” Pitch (musical notes, 音符) “C C G G A A C –” 8
  9. 9. Audio Data (Voice) Acquisition • Audio Signal Processing • samplingRate= 16000 samples/sec, • bitsPerSample= 16 bits/sample = 2 bytes/sample • channelNumber= 3 (L, R, humming) • Frame-wise short-time processing Frame01 Frame02 9
  10. 10. Digital Signal Processing: Spectrogram • A spectrogram is • a visual representation • of the spectrum • of frequencies • of a signal • as it varies with time. • using Fast Fourier Transform • FFT 10 https://youtu.be/bCRL5yw8fXA
  11. 11. A Real-time Spectrogram http://friture.org/ 11 https://youtu.be/1sbtXqZaGXE • Friture is a program in PYTHON designed to analyze audio input in real-time. • It displays audio data as a scope, a spectrum analyzer, or with a rolling 2D spectrogram. • I found this program in 2012~2013 and was totally convinced that I can transfer into the PYTHON world to continue my career.
  12. 12. Using Audacity to get audio signal 12 https://youtu.be/o9DF9SVdcVo The first step to do audio signal processing is to get some audio signal by yourself and play with it.
  13. 13. Sound file PCM format (.wav) • http://soundfile.sapp.org /doc/WaveFormat/ 13 • Compared with text data, audio data is much bigger, and it is usually stored in binary form. • Being familiar with the data format is crucial to process it.
  14. 14. “See” the audio signal in the raw format 14
  15. 15. Extract audio header information 15
  16. 16. Visualize the audio signal in waveform • As long as you can visualize the audio signal, you can make sure you read them in a correct way, • and then you can do further processing via advanced signal processing algorithms • like Pitch Detection and Speech Recognition. 16
  17. 17. Human aided pitch tracking by Humming • Pitch Detection for real music signal is not easy by itself. • To simplify the task, I use some TRICK…. • I hum the song and record it in another channel, while listening the music. • I use this “clean” humming voice to detect the pitch. 17
  18. 18. Multi-Threading Programming 18 def init(self): self.錄音線= threading.Thread(target= self.錄音線程) self.能量線= threading.Thread(target= self.f1_能量) self.基頻線= threading.Thread(target= self.f4_基頻) self.語音辨認線= threading.Thread(target= self.f6_語音辨認) def start(self): self.錄音線.start() self.能量線.start() self.基頻線.start() self.語音辨認線.start() • For a Realtime system, the multi-threading programming is crucial, • At least, an independent thread for data acquisition is necessary.
  19. 19. audio recording “Thread” 19
  20. 20. A circular buffer to store the real-time audio signal 20 I set a buffer in RAM to store 16 sec of voice, It is of size 16*16000*2*3= 1,536,000 bytes
  21. 21. Pitch Detection Algorithm 21 • Zoom a speech signal into scale of .01 sec, We can visualize there are periodic patterns. • the duration of a periodic pattern is called the “pitch period”. • For the A-440 note, the pitch period = 1/440 = .0023 sec • A traditionally popular pitch detection algorithm is based on auto-correlation method.
  22. 22. Pitch Detection Thread 22
  23. 23. Pitch Sampling at slower intervals 23
  24. 24. Pitch Quantization 24
  25. 25. Speech Recognition • http://shorturl.at/rxLM4 25
  26. 26. 26 Speech Recognition need Large-scale of Database to train the system. Nowadays, Deep-learning algorithms play the major roles and achieve the greatest performance.
  27. 27. Speech Recognition in Python 27 https://pypi.org/project/SpeechRecognition/ Google has a great Speech Recognition API. This API converts speech ( from microphone) into written text (as Python strings)
  28. 28. the ASR Thread 28 Get a segment (M frames) of speech ➔ x Transform x into an “AudioData” and then send it to Google Speech Recognition engine to get a recognition output “text”. To get speech data from a circular buffer is quite an issue for implementation. !!
  29. 29. 29 def 語音辨認(私): 辨= sr.Recognizer() while self.語音辨認中==True: # # Get x as "singingVoice" to be 音 # 音= sr.AudioData(x, 私.取樣率, 私.樣本寬) # # Do ASR to get recognition Result as 文 # try: if lang=='ja': 文= 辨.recognize_google(音, language='ja') elif lang=='en': 文= 辨.recognize_google(音, language='en') elif lang= 'zh-TW' 文= 辨.recognize_google(音, language='zh-TW') else: 私.文= '{} ({})'.format(文, lang) except: 私.文= 'exceptionOccurs!!' pass return
  30. 30. Lyric Transcription • Melodic voice (singing) recognition • Timed Text Generation • Need do Speech recognition and segmentation • Currently, it was done by human, not yet by machine. 30
  31. 31. Kara OK • Pitch Tracking • Timed Text Displaying 31 https://youtu.be/F1_Xz1c5AEE
  32. 32. Final Demo 32 https://youtu.be/0cdo6ZnBZc8
  33. 33. ご清聴ありがとうございました。 Thank you for listening. 感謝聆聽。 @ PyCon Jp 2019 Renyuan Lyu (呂仁園) from TAIWAN (台灣) 33

×