Successfully reported this slideshow.
Your SlideShare is downloading. ×

SeRanet introduction

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 41 Ad

More Related Content

Slideshows for you (20)

Similar to SeRanet introduction (20)

Advertisement

Recently uploaded (20)

SeRanet introduction

  1. 1. SeRanet Super resolution software through Deep Learning https://github.com/corochann/SeRanet
  2. 2. Table of contents Introduction Machine learning Deep learning SRCNN Problem Introduction of previous works “Image Super-Resolution Using Deep Convolutional Networks” waifu2x SeRanet Sprice Fusion CNN model Result Performance Conclusion
  3. 3. Table of contents Introduction Machine learning Deep learning
  4. 4. What is machine learning There are 3 major category in machine learning ・Supervised learning Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.)) Voice recognition (input: human voice data, output: text which human speaks)) ・Unsupervised learning Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer Link users who have similar interests in the web application for recommendation ・Reinforcement learning The problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”. Goal: Find out an action which maximizes the reward agent can gain. Ex. Deepmind DQN, Alpha GO Robot self learning: how to control own parts
  5. 5. What is machine learning There are 3 major category in machine learning ・Supervised learning Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.)) Voice recognition (input: human voice data, output: text which human speaks)) ・Unsupervised learning Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer Link users who have similar interests in the web application for recommendation ・Reinforcement learning The problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”. Goal: Find out an action which maximizes the reward agent can gain. Ex. Deepmind DQN, Alpha GO Robot self learning: how to control own parts SeRanet uses this machine learning
  6. 6. Deep learning “Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high- level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations.” cite from Wikipedia “Deep Learning” Input Output
  7. 7. Table of contents SRCNN Problem Introduction of previous works “Image Super-Resolution Using Deep Convolutional Networks” waifu2x
  8. 8. Super resolution task by machine learning Problem definition ・You are given a compressed picture with half size. Recover original picture and output it. Training phase: Map The goal of this machine learning is to construct a map to convert compressed picture into original picture (as close as possible). Original pictureCompressed picture (half size)
  9. 9. Super resolution task by machine learning After training ・Input: arbitrary picture → Output: twice size picture with super resolution Twice size picture High quality Picture to be enlarged map obtained by machine learning
  10. 10. Representation of the “map” Deep Convolutional Neural Network (CNN) is used. - Current trend for image recognition task
  11. 11. Previous work ① “Image Super-Resolution Using Deep Convolutional Networks” Chao Dong, Chen Change Loy, Kaiming He and Xiaoou Tang https://arxiv.org/abs/1501.00092 ・The original paper which suggest “SRCNN”. It reports that superior result is obtained for super resolution using Convolutional Neural Network. In this slide, this work paper be denoted as “SRCNN paper” in the following http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html
  12. 12. Algorithm summary 1.Read picture/image file 2.Enlarge the picture twice size in advance as preprocessing (can be generalized to n-times size) 3.Convert RGB format into YCbCr format, and extract Y Channel 4.Normalization: Convert value range from 0-255 to 0-1 5.Input Y Channel data into CNN As output, we obtain Y channel data with normalized value 6.Revert value range to 0-255 7.CbCr Channel is enlarged by conventional method like Bicubic method etc. Compose obtained Y channel and CbCr Channel to get final result. ※ 3., 7. can be skipped when you construct CNN with input/output RGB Channel
  13. 13. Remark of algorithm ① ・Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN. Y channel Cr channel Cb channelYCbCr decomposition
  14. 14. Remark of algorithm ① ・Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN. Y channel Cr channel Cb channelYCbCr decomposition ・RGB Channel training is difficult? SRCNN paper surveys the case when CNN is trained for input/output RGB 3 Channel data. The result suggests that Y channel CNN has superior performance than RGB CNN when the CNN size is not so big.
  15. 15. Remark of algorithm ② ・Enlarge the picture/image data in advance before input to CNN SRCNN paper uses Bicubic method, waifu2x (explained later) uses Nearest neighbor method to enlarge picture before input the data to CNN [Reason] Input/output picture size is almost same when you implement convolutional neural network by machine learning library. (※Regolously, output picture size will be smaller by filter size - 1) input output CNN - Enlarge picture - Y channel extraction Enlarge picture, CNN is not used for Cr・Cb Channel compose
  16. 16. Previous work ① CNN model CNN Layer1 Layer2 Layer3 In channel 1 32 64 Out channel 32 64 1 Kernel size 9 5 5 # of parameter 2628 51264 1664 # of convolution 2592×4WH 51200×4WH 1600×4WH Relatively shallow CNN architecture with big kernel size SRCNN paper’s CNN model, one example (many other parameters are tested in the paper) ※# of parameter = In channel * Out channel * (kernel size)^2 + Out channel ※# of convolution per pixel = In channel * Out channel * (kernel size) ^2 + Out channel Soppse the picture size before enlarge as w × h pixel, then the picture size after enlarge will be 2w×2h. We need convolution for 4wh pixels in total. total # of parameter:55556 total # of convolution:55392×4WH
  17. 17. Previous work ② waifu2x waifu2x https://github.com/nagadomi/waifu2x The term “waifu” comes from Japanese pronunciation of “wife” (Japanese uses the term “wife” to their favorite female anime character) https://github.com/nagadomi/waifu2x Open source software, originally published to enlarge art-style image It also supports picture style now. You can test the application on server. http://waifu2x.udp.jp/
  18. 18. Previous work ② waifu2x waifu2x is open source software, which makes other software engineers to develop the related software. Many of the derivative software is published now. [Related links (in Japanese)] ・waifu2xとその派生ソフト一覧 http://kourindrug.sakura.ne.jp/waifu2x.html ・サルでも分かる waifu2x のアルゴリズム https://drive.google.com/file/d/0B22mWPiNr-6-RVVpaGhZa1hJTnM/view
  19. 19. 先行ソフト② waifu2x CNN model 畳み込みのKernel sizeを3と小さくとる分、深いニューラルネットを構成している。 CNN Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7 In channel 1 32 32 64 64 128 128 Out channel 32 32 64 64 128 128 1 Kernel size 3 3 3 3 3 3 3 # of parameter 320 9248 18496 36928 73856 147584 1153 # of convolution 288×4WH 9216×4WH 18432×4WH 36864×4WH 73728×4WH 147456×4WH 1152×4WH Deep CNN architecture with small kernel size total # of parameter :287585 total # of convolution :287136×4WH
  20. 20. What is special for SRCNN task The big difference point with image reconition task 1.Position sensitivity is required + Image recognition task: - Translation invariant property is welcomed and Max pooling or Stride technique is often utilized. + SRCNN task: - Translation variant property is necessary for super resolution to since it requires position-dependent output. 2.Feature map image size don’t reduce during the CNN image processing. →As the number of feature map increases, amount of calculation increases Speed/memory restriction is severe Required memory for CNN ≒ The volume of rectangular in CNN model figure For image recognition task, usually feature map image size will be smaller in the deeper layer of CNN. The number of feature map can be bigger if the image size is smaller.
  21. 21. Table of contents Explanation of SeRanet project starts from here. Introduction of the idea behind SeRanet. SeRanet Idea 1 Sprice Idea 2 Fusion SeRanet CNN model
  22. 22. SeRanet Idea 1 Splice Input with the pre-scaled size w × h picture image, and getting 2w × 2h size picture as output → Introduce “Split” and “Splice” concept input size: w × h Split Splice: 4 branches of neural network will be merged to obtain one 2w × 2h size image Split: 4 branches of neural network (NN) with size w × h will be created Splice output size: 2w × 2h LU RU LD RD
  23. 23. SeRanet Idea 1 Splice After split, 4 branches of neural network corresponds to Left-Up (LU), Right-Up (RU), Left-Down (LD), Right-Down(RD) pixel of enlarged picture. Split Splice Input image Output image (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 1)(1, 1) (1, 2) (1, 2)(1, 2) (2, 1) (2, 1) (2, 1) (2, 2) (2, 2) (2, 2) At splice phase, 4 branches of CNN will be combined/spliced to get twice size image LU RU LD RD
  24. 24. The effect of introducing Splice → Flexibility of neural network modelling Input w × h Split Splice Output 2w × 2h 3rd Phase2nd Phase1st Phase 1st Phase:Image size is wxh before enlarged, the amount of calculation is 1/4 compared to 3rd phase. →The larger size of feature map, Kernel size is accepted in this phase. 2nd Phase:4 branches CNN with image size wxh. Total calculation amount is same with 3rd Phase, but the parameter learned at each branch (LU, RU, LD, RD) can be different. →model representation potential will grow. Another advantage is memory consumption is smaller compared to 3rd phase due to the size of image. 3rd Phase:Image size 2wx2h. The last phase to get output. Memory consumption and the amount of calculation will be 4 times larger than 1st Phase.
  25. 25. 目次 SeRanet Idea 1 Sprice Idea 2 Fusion SeRanet CNN model
  26. 26. Fusion… The method has introduced in Colorization paper ・“Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification” Satoshi Iizuka, Edgar Simo-Serra and Hiroshi Ishikawa http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/ The research aims to convert monotone input image to colorful output image through CNN supervised learning. https://github.com/satoshiiizuka/siggraph2016_colorization Input Output
  27. 27. Neural network used in Colorization paper Upper CNN: Main CNN used for colorization Lower CNN: This CNN is trained for image classification So, different purpose CNN is utilized to help improve the performance of main CNN. Lower CNN is expected to be learning global feature, e.g. “The image is taken in outsize”. The research reports colorization accuracy has increased by “fusioning” the global feature into main CNN. Example of how global feature helps colorization (Read paper for detail) - It reduces to mistakenly use sky-color at the top of image when the picture is taken in inside. - It reduces to mistakenly use brown ground color when picture is taken on the sea. http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/
  28. 28. SeRanet Idea 2 Fusion SeRanet combines/fusions 2 types of CNN at 1st Phase. Purpose: Combining different type non-linear activation to get wide variety of model representation ※I want to use Convolutional Restricted Boltzmann Machine with Pre-training for lower CNN in the future development. Upper CNN uses Leaky ReLU for Activation Lower CNN uses Sigmoid for Activation
  29. 29. SeRanet CNN model CNN model of seranet_v1 Split Splice CNN Layer1 Layer2 Layer3 In channel 3 64 64 Out channel 64 64 128 Kernel size 5 5 5 # of parameter 4864 102464 204928 # of convolution 4800×WH 102400×WH 204800×WH Layer4 Layer5 Layer6 256 512 256 512 256 128 1 1 3 131584 131584 295040 131072×WH 131072×WH 294912×WH Layer7 Layer8 Layer9 Layer10 128 128 128 128 128 128 128 3 3 3 3 3 147584 147584 147584 3584 147456×4WH 147456×4WH 147456×4WH 3456×4WH Fusion total # of parameter :3303680 total # of convolution :1159150×4WH ×2 ×4 3rd Phase2nd Phase1st Phase
  30. 30. Comparison Parameter: 10 times more than waifu2x Convolution: 4 times The number of parameter increases more compared to the number of convolution(calculation) increase. This is because SeRanet have position-dependent paramter (LU, RU, LD, RD). → Question: The increase of parameter and calculation results in better performance??? Model SRCNN paper waifu2x SeRanet_v1 Total parameter 55556 287585 3303680 Total convolution 55392×4WH 287136×4WH 1159150×4WH
  31. 31. Table of contents Result Performance Comparison between various resize methods ・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet Conventional resize method Resize through CNN - Forward/backward slide for comparison - Slideshare may be difficult to find the difference, See the link for comparison https://github.com/corochann/SeRanet
  32. 32. Result Input picture
  33. 33. Result Bicubic (OpenCV resize method is used)
  34. 34. Result Lanczos (OpenCV resize method is used)
  35. 35. Result waifu2x (http://waifu2x.udp.jp/, Style:photo, Noise reduction: None, Upscaling: 2x)
  36. 36. Result SeRanet
  37. 37. Result Original data (ground truth data, for reference)
  38. 38. The difference can be found at detail point of 1st picture, thin stalk at 4th picture (high frequency channel) Result Original data (ground truth data, for reference)
  39. 39. Performance Comparison between various resize methods *Based on personal feeling ・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet ・ Original image (comparison by specific mesurement is not done yet) Almost same Almost same Different Different Conventional resize method Resize through CNN Result
  40. 40. Summary SeRanet ・ Big size CNN is used(Depth 9 layer, total parameter 3303680 ) ・RGB 3 Channel is used for input/output of CNN instead of only Y Channel ・Split, Splicing CNN Left-Up, Right-Up, Left-Down, Right-Down branch will use different paramter ・Fusion Different non-linearity is combined for flexibility of model representation ・Convolutional RBM Pretraining The performance still not matured yet, we may improve more to get the output more close to original image.
  41. 41. At last,,, + The project is open source project, on github https://github.com/corochann/SeRanet + Improvement idea, discussion welcome + My Blog: http://corochann.com/ * If there is in-appropriate citing, please let me know. * SeRanet is personal project, I may be misunderstanding. Please let me know if there’s wrong information.

×