Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

My works in gitub, etc.


Published on

Brief description of works uploaded in github, etc.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

My works in gitub, etc.

  1. 1. 20161216 Works in github, etc (except code for DRL of Montezuma‘s Revenge. See other slide for that) Takayoshi Iitsuka (Staff Service Engineering, Hitachi Ltd OB) 20161216
  2. 2. 20161216 Analysis of the intermediate layer of VAE (1)  Usually intermediate layer of VAE (Variational Auto Encorder) is visualized by 2D figure like following (MNIST example).  But, generally, the dimension of intermediate layer is much higher. High dimensional analysis of the structure of intermediate layer looks like important.
  3. 3. 20161216 Analysis of the intermediate layer of VAE (2)  By experiment, it turned out that only 11 dimensons are active in intermediate layer even though the dimension of intermediate layer is 30. (experiment code does not inlude terms for sparseness)
  4. 4. 20161216 Analysis of the intermediate layer of VAE (3)  Approximated the ditribution of each numeric character images by 30D sphere. => 10.7% error
  5. 5. 20161216 Analysis of the intermediate layer of VAE (4)  Approximated the ditribution by mulivariate normal distribution (spheroid) => 4.8% error (much better!)
  6. 6. 20161216 Analysis of the structure of MNIST data set  Assumtion: Simple stucture of the intermediate layer of VAE comes from simpleness of the structure of MNIST data set.  Result: CONFIRMED Assumption! 784 (=28 * 28) dimensional space of MNIST data set has rather simple structure that can be approximated by 10 spheriods. In this analysis, 50,000 images have been used. In previous analysis, 10,000 images have been used and another analysis by 50,000 images become 5.8% error. So, original structures in 784D space have almost same compactness as in 30D structure in intermediate layer of VAE. This result looks like natural because VAE is unsupervised learning and no additional information.
  7. 7. 20161216 Scripts to fully utilize GCP preemptible VM  Backgroud: GCP (Google Cloud Platform) preemptible VM is very cheap (costs 1/3), but it may stop any time. => Some control is mandatory.  Published scripts in github fully utilize GCP preemptible VM for people who try my A3C+OHL code.  Effect: The scripts enable the full use of IT resource. It can use 4VPU x 8 VM with free trial condition (2 month , $300). 4VPU 4VPU 4VPU 4VPU 4VPU 4VPU 4VPU 4VPU 1VPU GCP preemptible VMs (2 months, $300 free trial) AWS VM (1 year free trial) Periodically, watch VMs and re-start stopped VMs (once per 1 min.), create web page summarizing the status of training (once per 5 min.)
  8. 8. 20161216 K-means classification of MNIST dataset  In Do2dl research group, we read the book on AI and there was a explanation of k-means classificaition method. I said that it might be interesting to apply k-means method to MNIST dataset.  Because nobody other than me have a time, I wrote the code for that and uploaded it to github.  Actually, there was a chapter written on much sophisticated classification method EM-algorithm in following pages.  I compared both results. When starting with random images, k-means was better (50% correct) than EM-algorithm (less than 50% correct).  When starting with images created from center of 20 images of each number, EM-algorithm become better (71.5%) than k-means (60.5%)
  9. 9. 20161216 Tools for Renaming titles in BD-recorder  Backgroud: My BD-recorder has a web interface to rename titles in it. But it takes time to rename many titles.  Developed tools to rename titles in BD-recorder using renaming rules. Renaming rule replace strings/regexp in titles to another string. 1. Determine new titles by setting renaming rules (renaming rules are automatically saved and reused again) 2. Automatically rename titles in BD-recorder using web interface of BD-recorder
  10. 10. 20161216 Executable version of Python Tutorial  After I learned python using Python tutorial, I felt that it might be convinient if I could execute examples in tutorial directly.  So, I extracted the examples in tutorial as python scripts and published them on github 1 week after I started learning of python.  They can be edited and executed directly.  They output example code and thier execution result (including error)   After that, I learned juypyter notebook. So, I coverted entire tuturial to jupyter notebook and published it on github 1 week after. 
  11. 11. 20161216 Program template for scraping with NodeJS and Selenium  I developed a tool to download content of an internet school ( to enable offline study, and published it on github and informed on my twitter.  I got claim from So, I deleted it immediately.  But I think that the general program framework for scraping is usefull for many people and is not illegal. So, I published program template of scraping with NodeJS and Seleinum on github.
  12. 12. 20161216 Pico-os of MicroPython  I got ESP8266 (very cheap microprocessor with WIFI, i.e. less than $5), on which MicroPython was embedded (The presenter of introductory semintor of Python gave 2 ESP8266 to audience).  Unfortunately, the method the presenter wrote MicroPython to EEPROM of ESP 8266 was not complete (I clould not write any file to filesystem on EEPROM).  So, I build an environment to write to EEPROM by myself and re- wrote MicroPython to ESP8266.  I measured the performance of MicroPython on EEPROM too. (Roughly 1/1000 of Intel CPU. Memory size is also roughly 1/1000)  To make experiment on MicroPython easy, I wrote very very small interface library to use MicroPython on ESP8266. I named it “Pico-os“ and uploaded it to github.  I wrote the article in blog and twitter too.
  13. 13. 20161216 1000 times speedup of re-calculation in big EXCEL sheet  There was a very big EXCEL sheet containing over 120,000 lines.  Re-culculation of the sheet took several hours.  They re-calcalate the sheet 13 times every month and it needs human intervention in the middle of re-calculation.  I investgated the expressions in cells and found that repeated use of COUNTA and VLOOKUP with perfect match mode look like the cause. (Both founctions need O(n) time when they search n lines)  I reduced the use of COUNTA as onece and used VLOOKUP with approximate mode (The latter needs only O(log(n)) time. Actually more complex expression is needed)  By these alternation, the time of re-cauclation become few seconds. (More than 1,000 times speedup)
  14. 14. 20161216 Thank you for listening.