Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IkaLog: Data Collector for Splatoon and Machine Learning

712 views

Published on

This is a translated version of IkaLog tech presentation, originally in Japanese language. The translation work is still in progress!

Published in: Technology
  • Be the first to comment

  • Be the first to like this

IkaLog: Data Collector for Splatoon and Machine Learning

  1. 1. This is a translated version of IkaLog tech presenta6on, originally in Japanese language. Transla6on work is s6ll in progress! 2
  2. 2. What is Splatoon? A unique third-person shooter –  Paint the battlefield with your ink, and claim turf from the enemy team –  Switch between an Inkling and a Squid for different approaches –  Several gamemodes in addition to turf war, like KOTH and payload Simple rules, but deep –  Various strategies and tactics –  Unique weapon classes Shooter, Charger, Roller, Splatling, and Slosher
  3. 3. Strategy is the key to winning STAGE RECON •  There’s more than one way around each map •  Know where the enemy is coming from and have a plan VARIETY OF WEAPONS •  Certain weapons are beCer suited to some maps and modes over others
  4. 4. You play, IkaLog collects the data HDMI Capture Device IkaLog
  5. 5. COLLECT PLAY IkaLog: Data Collector for Splatoon – Play Splatoon as you would normally – Your gameplay footage is analyzed – The data can then be sent to log files, and/or other tools, like stat.ink Game Console Log files IkaLog stat.ink, Speech applica6on, etc. Forward to desired tools PROCESS
  6. 6. Supported Integrations Video Streaming/Recording AmaRecTV Online Database CSV/JSON, Screenshots SNS and Chat IkaLog
  7. 7. stat.ink The online database provided by @fetus_hina –  Submit your battle results and statistics with IkaLog –  Review your past gameplay easily using the website
  8. 8. Review your gameplay on stat.ink SCOREBOARD Review the scoreboard later Filtering allows for careful analysis of your past gameplay TIMELINE The graph shows what happened in the game Displays kills, deaths, special weapons, ranked mode counts/distances and events GLOBAL STATS Sta6s6cs data from all stat.ink users is available See the trends among users
  9. 9. Timeline Visualization The period when you got splaCed and 6me spent inac6ve Alive/Dead status of all 8 players Team’s special weapons, Kills, Deaths Your Turf Score
  10. 10. Timeline for Ranked Battles Status of the Splat Zone Splat Zone count the other team earned Splat Zone count your team earned The game-changing moment
  11. 11. # of IkaLog + stat.ink users Source hCps://stat.ink/en6re/user Avg: 200+ users, 4500+ matches a day Peak: 370 users a day Processed 15,000 matches
  12. 12. Statistical data from all players Kill/Death Heat map KO raWo
  13. 13. https://stat.ink/ .96 Gal Deco users decreased aUer the moment of game update Long Blaster Custom is geWng more popular since Summer
  14. 14. Source footage Mask Image Added Image + = = Addi6on (OR) of source image and correct mask results in white image Addi6on (OR) of source image and wrong mask results in non-white image
  15. 15. 18
  16. 16. 20
  17. 17. •  There are two font types in the game. We should cover single number font face – The font is known, so we should be able to iden6fy them. •  Decided to have ML based image classifica6on, rather than use of exis6ng OCR library
  18. 18. •  •  •  •  TesseractOCR could not find the charachers
  19. 19. 1)Crop the number from the footage, apply some image filters 2)Generate vertical & horizontal histogram to guess each character’s position 3)Resize the characters to identical size, and make the image binary 4)Classify the image using KNN
  20. 20. •  Bascially same idea with recogni6on of numbers •  30+% of accuracy for single classifica6on. Earn accuracy by inves6ga6ng many frames –  IkaLog inves6gates approx. 10 frames per a sec –  This example shows IkaLog analyzed 49 frames, and found “96gal_deco” is most likely (18 frames, 36%) -> correct. votes={ 'supershot': 6, 'carbon_deco': 1, 'bucketslosher': 1, 'octoshooter_replica': 1, 'splashshield': 1, 'sshooter_collabo': 5, 'hotblaster': 2, 'pablo': 1, 'nzap89': 6, 'sharp_neo': 3, 'hotblaster_custom': 2, '96gal_deco': 18, '52gal': 1, 'hokusai': 1 }
  21. 21. 30
  22. 22. 59 Weapons (Splatoon has 90+ weapons today)
  23. 23. •  •  •  •  Overlaps of other equipment Similar BG & FG color - 1 Similar BG & FG color 2
  24. 24. Note: This screenshot was taken in quite early stage of IkaLog development.
  25. 25. Input Laplacian Filter Grayscale Classify using k- Nearest Neighbor Thanks @itooon sschooter_collabo Feature map
  26. 26. 35
  27. 27. Pros – Ignores Color – It uses “the shape” for matching Cons – Doesn’t work if input has different resolu6on – Cannot classify similar shapes
  28. 28. 39
  29. 29. Original class (longblaster) Variant version (longblaster_custom) One more variant! (Apr 2016) (longblaster_necro) Conjunc6on on features between the variants. Needed to update (or find) the feature extrac6on
  30. 30. longblaster_necro longblaster_custom longblaster IkaLog internal features values PloCed in 2D using PCA Conjunc6on on features between the variants. Needed to update (or find) the feature extrac6on
  31. 31. Another game result scenes with “Gear Ability icons” The approach results in low accuracy… ・Same icon, different size in single frame ・Users uses different capture resolu6on → More robust image classifier needed
  32. 32. •  Amount of the data is always jus6ce for machine learning •  Use the crowd to collect the data 50 games/day per person 5000 games can be possible by 100 people –  Field-generated data has the noises and outliers •  Model of HDMI capture units, HDMI re-transmiCers, and soUware configura6on •  It’s easier to collec6ng the outliers from field, then train, and test classifiers
  33. 33. Object storage on the cloud Work VM on IaaS 4TB+ Data (collect for 12months+) IkaLog users stat.ink
  34. 34. •  Accuracy of Machine learning/Deep Learning is affected by pre- processing •  Template matching-based algorithm automa6cally fixes broken input images Wrong aspect ra6o offseted reference Image to classify
  35. 35. Thanks @itoooon
  36. 36. Pros –  BeCer classifica6on accuracy Cons –  Too large data size(20MB -> 100MB~400MB) Hard to distribute the weights –  Breaks Windows version of IkaLog •  90%+ of users are running IkaLog on Windows Plazorm •  Most of deep learning framework breaks Py2EXE (Python script to Windows executable converter) –  More compu6ng power needed
  37. 37. –  Input: RGB or HSV color value (47*45*3=6,345 units) –  Output: possibility of each class(91 units, apply soUmax) –  Connec6on: always use fully connec6on –  Let computers the feature automa6cally •  In this use case, deep learning will find proper weights automa6cally •  It will ignore background colors automa6cally –  Target Performance •  Calc. 6me: less than 350 ms for each mul6-class classifica6on(91 classes) < 3 seconds per a frame •  99.99+% accuracy against stat.ink posted (Field) data
  38. 38. 0 1 2 3 .. .. n 0 1 2 3 … 89 90 Input Layer Output Layer Hidden Layer 52gal 52gal_deco 96gal 96gal_deco … sschooter_wasabi wakaba
  39. 39. 51
  40. 40. •  Tested on Azure ML, before implemen6ng the code –  Trained classifiers using small dataset –  Input : Pixel values(w/PCA) –  Hidden Layer: 1 layer、150 units –  Output: Probably 91 (Azure ML manages) •  Got good results and decided to move forward –  98+% accuracy (with small dataset) –  Enough robustness for any images (e.g. los-res)
  41. 41. 54
  42. 42. •  Used Chainer (Deep Learning Framework) to train the model –  Supervised training using stat.ink data (0.9M images) –  Can benefit GPU performance •  Trained models are converted to Float16 – 32bit float -> 16bit float – 50% smaller file size
  43. 43. CPU GPU # of jobs Throughput (images/s) GPU Time /epoch (s) RelaWve perf. (virtual) Core i7-4790S Typ. 3.2GHz NA 1 3,928 234.5s 1X Core i7-4790S Typ. 3.2GHz GeForce GTX760 1 10,371 39.2s 5.9X Core i7-4790S Typ. 3.2GHz GeForce GTX760 4 (Cl) 15,988 25.8s (103 / 4) 9.0X Core i7-4790S Typ. 3.2GHz GeForce GTX 1080 4 (Cl) 30,320 6.25s (25.0s / 4) 37.5X
  44. 44. 57
  45. 45. CPU GPU # of jobs Throughput (images/s) GPU Time /epoch (s) RelaWve perf. (virtual) Core i7-4790S Typ. 3.2GHz NA 1 3,928 234.5s 1X Core i7-4790S Typ. 3.2GHz GeForce GTX760 1 10,371 39.2s 5.9X Core i7-4790S Typ. 3.2GHz GeForce GTX760 4 (Cl) 15,988 25.8s (103 / 4) 9.0X Core i7-4790S Typ. 3.2GHz GeForce GTX 1080 4 (Cl) 30,320 6.25s (25.0s / 4) 37.5X 2X E5-2630L v3 Typ. 1.80GHz Tesla P100 4 (Cl) 65,811 2.5s (10.3/4) 93.8X Special Thanks to Typical Linux Box
  46. 46. Pre-training (Parameter tuning) –  Tested various layer configura6on(# of units), balanced model file size and its accuracy –  4 concurrent jobs –  Started 03:08am 〜 6:28pm (15h20m) Training –  550 epochs / 24hrs Includes disk IO, cross valida6on test 6me –  Adopted the 617th epoch for the first NN-based classifier. Took Approx. 48 GPU Hours using 21GB dataset Special Thanks to
  47. 47. KNN (Original) Complex NN The new NN Accuracy Low, on certain users 99.99+% 99.99+% Data size 20MB 400MB (AlexNet) 100MB (GoogleNet) 14MB (Float32) 7MB (Float16) Classifica6on 6me @ IvyBridge 2GHz (very fast) ~300ms ~20ms •  Improved accuracy of weapon classifica6on •  Faster than complex neural network models •  Smaller data size makes distribu6on easy Special Thanks to
  48. 48. •  IkaLog includes own neural network implementa6on –  Deep Learning Frameworks break Windows version –  Re-implemented propaga6on func6on •  To make the code simple, the model is kept simple •  Compa6ble with LinearFunc6on, and ReLU in Chainer hCps://github.com/hasegaw/IkaLog/commit/3238b67749334a3c4254aa6f25c005f83e210895 –  Single run takes 20ms @ IvyBridge 2.0GHz 200ms @ PYNQ-Z1 FPGA board (Cortex-A9 650MHz)
  49. 49. •  Apply netural network-based approach to other classifiers –  Gear ability image classifier –  Rainmaker touchdown detec6on –  Use RNN for “Tide of the baCle” evalua6on func6on •  Make the dataset smaller –  The dataset (21GB) is obviously redundant –  Use PCA-based anomaly detec6on to generate more smaller data?
  50. 50. •  Several marchine-learning approach applied in IkaLog –  K Nearest Neighbor for real-6me classifica6on –  Neural network for more accurate classifica6on •  Auto feature extrac6on, training from 0.9M samples •  IkaLog and Deep Learning –  Most of Deep Learning framework breaks Py2EXE –  Easy to re-implement of propaga6on if the network is simple –  Can u6lize exis6ng DNN frameworks and GPUs for training process
  51. 51. Lapis Lazuli(2000-2014)
  52. 52. © 07strikers

×