TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
My works in gitub, etc.
1. 20161216
Works in github, etc
(except code for DRL of Montezuma‘s Revenge. See other slide for that)
Takayoshi Iitsuka
(Staff Service Engineering, Hitachi Ltd OB)
20161216
2. 20161216
Analysis of the intermediate layer of VAE (1)
Usually intermediate layer of VAE (Variational Auto Encorder) is
visualized by 2D figure like following (MNIST example).
But, generally, the dimension of intermediate layer is much higher.
High dimensional analysis of the structure of intermediate layer
looks like important.
https://github.com/Itsukara/vae-hidden-layer
3. 20161216
Analysis of the intermediate layer of VAE (2)
By experiment, it turned out that only 11 dimensons are active in
intermediate layer even though the dimension of intermediate layer is 30.
(experiment code does not inlude terms for sparseness)
https://github.com/Itsukara/vae-hidden-layer
4. 20161216
Analysis of the intermediate layer of VAE (3)
Approximated the ditribution of each numeric
character images by 30D sphere. => 10.7% error
https://github.com/Itsukara/vae-hidden-layer
5. 20161216
Analysis of the intermediate layer of VAE (4)
Approximated the ditribution by mulivariate normal
distribution (spheroid) => 4.8% error (much better!)
https://github.com/Itsukara/vae-hidden-layer
6. 20161216
Analysis of the structure of MNIST data set
Assumtion: Simple stucture of the intermediate layer of VAE
comes from simpleness of the structure of MNIST data set.
Result: CONFIRMED Assumption!
784 (=28 * 28) dimensional space of MNIST data set has rather
simple structure that can be approximated by 10 spheriods.
https://github.com/Itsukara/vae-hidden-layer
In this analysis, 50,000 images
have been used.
In previous analysis, 10,000
images have been used and
another analysis by 50,000
images become 5.8% error.
So, original structures in 784D
space have almost same
compactness as in 30D
structure in intermediate layer
of VAE. This result looks like
natural because VAE is
unsupervised learning and no
additional information.
7. 20161216
Scripts to fully utilize GCP preemptible VM
Backgroud: GCP (Google Cloud Platform) preemptible VM is very
cheap (costs 1/3), but it may stop any time.
=> Some control is mandatory.
Published scripts in github fully utilize GCP preemptible VM
for people who try my A3C+OHL code.
Effect: The scripts enable the full use of IT resource.
It can use 4VPU x 8 VM with free trial condition (2 month , $300).
https://github.com/Itsukara/async_deep_reinforce/tree/master/gcp-preemptible-VM-instaces
4VPU
4VPU
4VPU
4VPU
4VPU
4VPU
4VPU
4VPU
1VPU
GCP preemptible VMs
(2 months, $300 free trial)
AWS VM
(1 year free trial)
Periodically, watch VMs and re-start
stopped VMs (once per 1 min.), create
web page summarizing the status of
training (once per 5 min.)
8. 20161216
K-means classification of MNIST dataset
In Do2dl research group, we read the book on AI and there was a
explanation of k-means classificaition method. I said that it might be
interesting to apply k-means method to MNIST dataset.
Because nobody other than me have a time, I wrote the code for that
and uploaded it to github.
Actually, there was a chapter written on much sophisticated
classification method EM-algorithm in following pages.
I compared both results. When starting with random images, k-means
was better (50% correct) than EM-algorithm (less than 50% correct).
When starting with images created from center of 20 images of each
number, EM-algorithm become better (71.5%) than k-means (60.5%)
https://github.com/Itsukara/ml4se
9. 20161216
Tools for Renaming titles in BD-recorder
Backgroud: My BD-recorder has a web interface to rename titles in it.
But it takes time to rename many titles.
Developed tools to rename titles in BD-recorder using renaming rules.
Renaming rule replace strings/regexp in titles to another string.
https://github.com/Itsukara/diga-rename
1. Determine new titles by setting
renaming rules
(renaming rules are automatically saved and
reused again)
2. Automatically rename titles in BD-recorder
using web interface of BD-recorder
10. 20161216
Executable version of Python Tutorial
After I learned python using Python tutorial, I felt that it might be
convinient if I could execute examples in tutorial directly.
So, I extracted the examples in tutorial as python scripts and
published them on github 1 week after I started learning of python.
They can be edited and executed directly.
They output example code and thier execution result (including error)
https://github.com/Itsukara/Python-Tutorial-Scripts
After that, I learned juypyter notebook. So, I coverted entire tuturial to
jupyter notebook and published it on github 1 week after.
https://github.com/Itsukara/Python-Tutorial-Ipython
https://github.com/Itsukara/Python-Tutorial-Scripts
https://github.com/Itsukara/Python-Tutorial-Ipython
11. 20161216
Program template for scraping with NodeJS and Selenium
I developed a tool to download content of an internet school
(dotinstall.com) to enable offline study, and published it on github and
informed on my twitter.
I got claim from dotinstall.com. So, I deleted it immediately.
But I think that the general program framework for scraping is usefull
for many people and is not illegal. So, I published program template of
scraping with NodeJS and Seleinum on github.
https://github.com/Itsukara/Selenium-Scraping-Template
12. 20161216
Pico-os of MicroPython
I got ESP8266 (very cheap microprocessor with WIFI, i.e. less than
$5), on which MicroPython was embedded (The presenter of
introductory semintor of Python gave 2 ESP8266 to audience).
Unfortunately, the method the presenter wrote MicroPython to
EEPROM of ESP 8266 was not complete (I clould not write any file to
filesystem on EEPROM).
So, I build an environment to write to EEPROM by myself and re-
wrote MicroPython to ESP8266.
I measured the performance of MicroPython on EEPROM too.
(Roughly 1/1000 of Intel CPU. Memory size is also roughly 1/1000)
To make experiment on MicroPython easy, I wrote very very small
interface library to use MicroPython on ESP8266. I named it “Pico-os“
and uploaded it to github.
I wrote the article in blog and twitter too.
https://github.com/Itsukara/MicroPython-pos
13. 20161216
1000 times speedup of re-calculation in big EXCEL sheet
There was a very big EXCEL sheet containing over 120,000 lines.
Re-culculation of the sheet took several hours.
They re-calcalate the sheet 13 times every month and it needs
human intervention in the middle of re-calculation.
I investgated the expressions in cells and found that repeated use of
COUNTA and VLOOKUP with perfect match mode look like the
cause. (Both founctions need O(n) time when they search n lines)
I reduced the use of COUNTA as onece and used VLOOKUP with
approximate mode (The latter needs only O(log(n)) time. Actually
more complex expression is needed)
By these alternation, the time of re-cauclation become few seconds.
(More than 1,000 times speedup)