20200309 (FSRI) deep-family_v2-br31_rabbit

“Makers & ML”
DEEPFAMILY -TOY
2020.03.09
Jason, Min

V1 고정형 로봇
화자인식
이미지인식
외부네트워크
카메라
마이크
스피커
불빛
Nvidia Jetson Nano
WIFI
얼굴
아이스크림
인사
대화
날씨
이벤트발생시
가족에게 알림
(텔레그램)
날씨정보
미세먼지
인사 및 음악 플레이
②음성합성
①이미지인식
③음성인식
④화자인식

아두이노 1만원 이하
tf lite 가능?
라즈베리파이Zero
라즈베리파이3 3만원선
+ 카메라(1~3만원)
+ 스피커
+ 마이크
라즈베리파이4 4만원선
CorelDevBoard 150달러, 23만원
Jetson Nano (Nvidia) 12.8만원
- 128 cuda core
- yolo v3 FPS : 2
- yolo tiny v3 FPS : 15
khadas vim3 pro
Asus Tinker Board
LattePanda
Intel Movidius (NCS2) 10만원
-toolkit : OpenVINO
Coral USB Accelerator
$74.99로 한화로 85,533원
라즈베리파이 zero & Coral 데모 프로그램 : 15초
라즈베리파이 3 모델 B & Coral 데모 프로그램 : 5초
정지이미지 (MobileNet_SSD_V2)
동영상
http://www.devicemart.co.kr/goods/view?no=11869994

Coral USB Accelerator
$74.99로 한화로 85,533원

https://www.youtube.com/watch?v=TiOKvOrYNII
Raspberry Pi 3 vs Raspberry Pi 4 Performance with TensorFlow, TF Lite, & Coral USB Accelerator
라즈베리파이

[TRY 1] Jetson Nano
https://smartstore.naver.com/mhe/products/4256544002
https://opencourse.tistory.com/219?category=324499
https://www.amazon.com/Yahboom-Robotics-Autopilot-Tracking-
Recognition/dp/B07ZYJYGZ5/ref=sr_1_6?keywords=NVIDIA+Jetson+Nano+
Developer+Kit&qid=1577628163&s=electronics&sr=1-6

https://www.instructables.com/id/Jetson-
Nano-Quadruped-Robot-Object-
Detection-Tutor/
SpotMicroAI - SpotMini-Clone with S |
RobotShop Community

https://github.com/zxzhaixiang/darknet-nnpack
Darknet with NNPACK

yolov3 with tensorRT on NVIDIA Jetson Nano
https://github.com/min-sangshik/tensorrt_demos

yolo_mark.exe C:/yolo3/baskinrobbins_images C:/yolo3/YOLOv3-Training-baskinrobbins-Detector /fingerprint_test.txt C:/yolo3/YOLOv3-
Training-baskinrobbins-Detector/classes.names
[클래스 6개]
0 Dummy
1 Jason
2 ㅌㅌㅌ
3 ㅌㅌㅌ
4 Woo
5 Woong
[darknet.data]
classes = 6
train = C:/yolo3/YOLOv3-Training-baskinrobbins-Detector/br31_train.txt
valid = C:/yolo3/YOLOv3-Training-baskinrobbins-Detector/br31_test.txt
names = C:/yolo3/YOLOv3-Training-baskinrobbins-Detector/classes.names
backup = C:/yolo3/YOLOv3-Training-baskinrobbins-Detector/weights/
TRAIN
WIN10
1. 클래스 마킹 (649개 jpg 파일 내 다수의 Object)
#darknet detector train XXX.data XXX.cfg weight
C:darknet_win>darknet detector train C:/yolo3/YOLOv3-Training-baskinrobbins-
Detector/darknet.data C:/yolo3/YOLOv3-Training-baskinrobbins-Detector/darknet-yolov3.cfg
C:/darknet_win/weight/darknet53.conv.74
3. 649개 파일 training (2020.02.25 7:00 AM ~ )
2. 이미지 분리 : train/test Set
splitTrainAndTest.py
(2020.02.25 )

TEST
WIN10
#test 3000번 훈련(12H 소요)
C:darknet_win>darknet detector test C:/yolo3/YOLOv3-Training-baskinrobbins-Detector/darknet.data C:/yolo3/YOLOv3-Training-
baskinrobbins-Detector/darknet-yolov3.cfg C:/yolo3/YOLOv3-Training-baskinrobbins-Detector/weights/darknet-
yolov3_3000.weights c:/yolo3/baskinrobbins_images/20190211_175627.jpg

TEST
WIN10
http://aitimes.org/archives/1896

VISION
알고리즘 성능분석

model is trained with both PASCAL VOC 2007 and 2012 data. The mAP is
measured with the PASCAL VOC 2012 testing set. For SSD, the chart shows
results for 300 × 300 and 512 × 512 input images. For YOLO, it has results
for 288 × 288, 416 ×461 and 544 × 544 images. Higher resolution images
for the same model have better mAP but slower to process.
https://medium.com/@jonathan_hui/object-detection-speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359
Object detection: speed and accuracy comparison (Faster R-CNN, R-FCN, SSD, FPN, RetinaNet and YOLOv3)
Mar 28, 2018 기준 작성자료

https://arxiv.org/pdf/1611.10012.pdf

[Lessons learned]
Some key findings from the Google Research paper:
R-FCN and SSD models are faster on average but cannot beat the Faster R-CNN in accuracy if speed is not a concern.
Faster R-CNN requires at least 100 ms per image.
Use only low-resolution feature maps for detections hurts accuracy badly.
Input image resolution impacts accuracy significantly. Reduce image size by half in width and height lowers accuracy by 15.88% on average but also reduces inference time by
27.4% on average.
Choice of feature extractors impacts detection accuracy for Faster R-CNN and R-FCN but less reliant for SSD.
Post processing includes non-max suppression (which only run on CPU) takes up the bulk of the running time for the fastest models at about 40 ms which caps speed to 25
FPS.
If mAP is calculated with one single IoU only, use mAP@IoU=0.75.
With an Inception ResNet network as a feature extractor, the use of stride 8 instead of 16 improves the mAP by a factor of 5%, but increased running time by a factor of 63%.
Most accurate
The most accurate single model use Faster R-CNN using Inception ResNet with 300 proposals. It runs at 1 second per image.
The most accurate model is an ensemble model with multi-crop inference. It achieves state-of-the-art detection on 2016 COCO challenge in accuracy. It uses the vector of
average precision to select five most different models.
Fastest
SSD with MobileNet provides the best accuracy tradeoff within the fastest detectors.
SSD is fast but performs worse for small objects comparing with others.
For large objects, SSD can outperform Faster R-CNN and R-FCN in accuracy with lighter and faster extractors.
Good balance between accuracy and speed
Faster R-CNN can match the speed of R-FCN and SSD at 32mAP if we reduce the number of proposal to 50.

[교훈]
Google 연구 논문에서 얻은 몇 가지 주요 결과 :
R-FCN 및 SSD 모델은 평균적으로 더 빠르지만 속도가 중요하지 않은 경우 더 빠른 R-CNN을 정확하게 이길 수는 없습니다.
빠른 R-CNN은 이미지 당 최소 100ms가 필요합니다.
탐지에는 저해상도 기능 맵만 사용하면 정확도가 떨어집니다.
입력 이미지 해상도는 정확도에 큰 영향을 미칩니다. 이미지 크기를 너비와 높이의 절반으로 줄이면 정확도가 평균 15.88 % 낮아지고 추론 시간이 평균 27.4 % 줄어 듭니다.
특징 추출기의 선택은 빠른 R-CNN 및 R-FCN의 탐지 정확도에 영향을 주지만 SSD에는 덜 의존합니다.
사후 처리에는 비 최대 억제 (CPU에서만 실행)가 포함되어 약 40ms에서 가장 빠른 모델의 실행 시간을 25FPS까지 제한합니다.
하나의 IoU만으로 mAP를 계산하는 경우 mAP@IoU=0.75를 사용하십시오.
Inception ResNet 네트워크를 기능 추출기로 사용하면 16 대신 stride 8을 사용하면 mAP가 5 % 향상되지만 실행 시간은 63 % 향상됩니다.
가장 정확한
가장 정확한 단일 모델은 300 개의 제안서와 함께 Inception ResNet을 사용하는 Faster R-CNN을 사용합니다. 이미지 당 1 초로 실행됩니다.
가장 정확한 모델은 다중 자르기 추론이 있는 앙상블 모델입니다. 2016 COCO 챌린지의 정확성을 최첨단으로 탐지합니다. 평균 정밀도 벡터를 사용하여 가장 다른 5 가지 모델을 선
택합니다.
가장 빠른
MobileNet이 포함된 SSD는 가장 빠른 탐지기 내에서 최고의 정확도를 제공
SSD는 빠르지만 작은 물체는 다른 물체에 비해 성능이 떨어짐
대형 물체의 경우 SSD는 더 가볍고 빠른 추출기로 더 빠른 R-CNN 및 R-FCN보다 성능이 뛰어남
정확성과 속도의 균형
제안서 수가 50 개로 줄어든다면 더 빠른 R-CNN은 32mAP에서 R-FCN 및 SSD의 속도와 일치 할 수 있습니다.
YOLO : You Only Look Once
SSD : Single Shot Multibox Detector

VISION
2019
CenterNet으로 가야 함?
https://heartbeat.fritz.ai/a-2019-guide-to-object-detection-9509987954c3
정확/성능 균형점 : RetinaNet-101-800, CenterNet-DLA, RefineNet
최고정확도 : TridentNet
https://arxiv.org/pdf/1904.07850v2.pdf

정확/성능 균형점 : RetinaNet-101-800, CenterNet-DLA
최고정확도 : TridentNet

2019.04
Objects as Points
Over YOLO
Over RetinaNet
https://github.com/xingyizhou/CenterNet
ResNet : residual network
DLA Deep Layer Aggregation
[Training]
Resnet-101and DLA-34 train in 2.5 days on 8 TITAN-V GPUs,
while Hourglass-104 requires 5 days.

[Experiments]
Object detection
Intel Core i7-8086K CPU, Titan Xp GPU, Pytorch 0.4.1, CUDA 9.0, and CUDNN 7.1
Ap : average precision
all IOU thresholds (AP), AP at IOU thresholds 0.5(AP50) and 0.75 (AP75)
Hourglass-104 achieves the best accuracy at a relatively good speed
DLA-34 gives the best speed/accuracy trade-off. It runs at 52FPS with 37.4%AP
This is more than twice as fast as YOLOv3 [45] and 4.4%AP more accurate

CenterNet
2019.3월 기술
https://github.com/xingyizhou/CenterNet

Third-party resources
Keras Implementation: keras-centernet from see-- and keras-CenterNet from xuannianz.
CenterNet + DeepSORT tracking implementation: centerNet-deep-sort from kimyoon-young.
Blogs on training CenterNet on custom datasets (in Chinese): ships from Rhett Chen and faces from linbior.
https://github.com/xingyizhou/TensorRT-CenterNet

VISION
facebookresearch/Detectron2
https://github.com/facebookresearch/detectron2
What's New
It is powered by the PyTorch deep learning framework.
Includes more features such as panoptic segmentation, densepose, Cascade R-CNN, rotated bounding
boxes, etc.
Can be used as a library to support different projects on top of it. We'll open source more research
projects in this way.
It trains much faster.

Voice Synthesis

Wavenet (DeepMind 2016.9) https://deepmind.com/blog/article/wavenet-generative-model-raw-audio
Tacotron (Google 2017.3 End-To-End Speech Synthesis MOS 4.00)
DeepVoice2 (바이두, 2017.5)
VoiceLoop(Facebook AI Research 2017.7 MOS 3.69)
Parallel WaveNet (Google 2017.11 MOS 4.41)
Tacotron2(Google 2017.12, MOS 4.526)
Deep Voice 3 (BAIDU, 2018.02 Neural Voice Cloning with a Few Samples)
WaveGlow(NVIDIA 2018.11, MOS 3.96) https://nv-adlr.github.io/WaveGlow
Meltron(2019.10)
image description - https://arxiv.org/search/cs?searchtype=author&query=Catanzaro%2C+B
머니브레인 음성합성 기술 2019.5.3
셀바스AI 딥러닝 기반 음성합성 솔루션 – xVoice 2018.9
https://heartbeat.fritz.ai/a-2019-guide-to-speech-synthesis-with-deep-learning-630afcafb9dd

[Tacotron 구현 Code]
• keithito(2017년7월)
- 대표적인 Tacotron 구현
- https://github.com/keithito/tacotron
• carpedm20(2017년10월)
- keithito 코드를 기반으로 Tacotron 모델로 한국어 생성
- DeepVoice 2에서 제안한 Multi-Speaker 모델로 확장
- Tensorflow 1.3 è 최신 버전에 작동하지 않음.
- https://github.com/carpedm20/multi-speaker-tacotron-tensorflow
• Rayhane-mamah(2018년4월)
- keithito, r9y9 코드를 기반으로 구현된 대표적인 Tacotron 2 구현
- Wavenet 구현도 포함
- https://github.com/Rayhane-mamah/Tacotron-2
• hccho2(2018년12월) - https://aifrenz.github.io/present_file/Tacotron-AIFrenz-20190424.pdf
- 한국어 Tacotron + Wavenet, Tensorflow 최신 버전으로 실행
- 빠른(speed up) convergence
- https://github.com/hccho2/Tacotron-Wavenet-Vocoder
- Tacotron2(2017년12월)
- r9y9 코드(wavenet vocoder) 공개(2018년1월)
- Wavenet(2016년9월)
- ibab 코드 공개(2016년9월)
- Tacotron 논문 발표(2017년3월)
t r a i n s t e p : 1 0 6 0 0 0 ( G T X 1 0 8 0 t i - 1 8 h )
m o o n d a t a : 1 , 1 2 5 e x a m p l e s ( 0 . 8 9 h o u r s )
s o n d a t a : 2 0 , 1 0 5 e x a m p l e s ( 1 9 . 1 0 h o u r s )

[Tacotron 구현 Code] Tacotron2
Rayhane-mamah
r9y9
wavenet-vocoder(
local condition구현 포함)
* local condition : mel
spectrogram(을 넣어주는데, mel
spectrogram은 raw audio 길이보
다 짧아지기 때문에 upsampling
과정이 필요하다. upsampling은
conv2d_transpose를 이용)
t r a i n s t e p : 1 0 6 0 0 0 ( G T X 1 0 8 0 t i - 1 8 h )
m o o n d a t a : 1 , 1 2 5 e x a m p l e s ( 0 . 8 9 h o u r s )
s o n d a t a : 2 0 , 1 0 5 e x a m p l e s ( 1 9 . 1 0 h o u r s )
Tacotron
Keithito
Vocoder : Griffin Lim
Tacotron-한국어 +
deepvoice(multi-speaker)
Carpedm20
Vocoder : Griffin Lim
Tensorflow 1.3에서만 실행
Ibab
wavenet
Tacotron + Wavenet
Vocoder + Korean
TTS
Hccho2
Ttensorflow 1.8이상에서도 작동
1000K 이상 train해야 noise 없는 결과를 얻을 수 있음

3D 딥러닝 동향
https://www.youtube.com/watch?v=rqR-z2mNqmM

“만약 당신이 미래를 꿈꾸지 않거나 지금 기술개선을 위해 노력하지
않는다면 그건 곧 낙오되고 있는 것이나 마찬가지 입니다.”
그윈 쇼트웰(Gwynne Shtwell, SpaceX CEO, COO)

감사합니다
(facebook.com/sangshik, mikado22001@yahoo.co.kr)

20200309 (FSRI) deep-family_v2-br31_rabbit

Recommended

Recommended

More Related Content

Similar to 20200309 (FSRI) deep-family_v2-br31_rabbit

Similar to 20200309 (FSRI) deep-family_v2-br31_rabbit (20)

More from jason min

More from jason min (14)

20200309 (FSRI) deep-family_v2-br31_rabbit