4. ! AWS Startup CTO Night with Amazon CTO
• We had Amazon CTO Werner Vogels
! TechCrunch Tokyo CTO Night powered by AWS
• Startups pitch contest for “JP CTO of the year”
! IVS CTO Night & Day powered by AWS
• 3 days Over 100 CTOs gathering
• w/ Infinity Ventures Summit
2014 #CTONight Series in Japan
5.
6.
7. 1. CTOʼ’s Pitch
2. Questions and Comments from Werner
3. Tech Discussion in a few minutes
4. Give an Award from Werver
8. ! AWS Startup CTO Night with Amazon CTO
Vuzz CTO Kiyota-‐‑‒san won the prize
9. ! TechCrunch Tokyo CTO Night powered by AWS
Contest for JP Startup CTO of the year!
10. ! TechCrunch Tokyo CTO Night powered by AWS
! Pitch Presenters
(Startup CTOs)
! Judges
(Popular Company CTOs)
GREE
Cookpad
BizReach
Hatena
CyberAgent
Amazon
11. ! TechCrunch Tokyo CTO Night powered by AWS
Popular News-‐‑‒Curation Service
12. ! IVS CTO Night & Day powered by AWS
3 days event. Over 100 Startup CTOs gathering!
18. Iʼ’d like to have “AWS Pop-‐‑‒up Loft” in Tokyo.
So Iʼ’m honored to be here J
We never disclose AWS customersʼ’ info without permission.
We got agreements for all use-‐‑‒cases in this slide.
22. Amazon CloudSearch
! CloudSearch in Amazon
amazon smile : Support Local Charities/10s of millions of products
https://smile.amazon.com/
23. Amazon CloudSearch
! CloudSearch in Amazon
goodreads : 30 million members/900 million books/34 million reviews
https://www.goodreads.com/
24. Amazon CloudSearch
! CloudSearch in Japan
schoo WEB-‐‑‒campus: Online Life-‐‑‒Long Learning Platform
https://schoo.jp/
25. Amazon CloudSearch
! CloudSearch in Japan
schoo WEB-‐‑‒campus: You can search “AWS” class
26. Search Engine
! Find documents with keyword from large amount of data
• Incrementally like “grep”? It will take too long..
• Need to build index in advance(Inverted Index)
• TF-‐‑‒IDF scoring
• Multiple Query Parser Support
28. and.. Japanese
(c)相⽥田みつを Mitsuo Aida http://www.mitsuo.co.jp/museum/info/message.html
If we take from one another, there will never be enough
If we share with each other, there will be more than enough
29. Japanese Text Processing
! 形態素解析(Morphological Analysis)
• 彼(名詞-‐‑‒代名詞)/は(助詞-‐‑‒係助詞)/エンジニア(名詞-‐‑‒⼀一般)/だ(助動詞)
• Japanese is NOT separated by white-‐‑‒space. Need to analyze..
• To fine-‐‑‒tune, dictionary maintenance is needed
– きゃりーぱみゅぱみゅ (Kyary Pamyu Pamyu)
important NOUN
should be 1 noun
30. Japanese Text Processing
! Stemming
• 飲んだ → 飲ん(動詞-‐‑‒⾃自⽴立立, baseForm:飲む)/だ(助動詞) → 飲む
• like..
• if the word ends in 'ed', remove the 'ed'
• if the word ends in 'ing', remove the 'ing'
• if the word ends in 'ly', remove the 'lyʼ’
• Yes. Japanese Stemming is enough complicated..
http://www.cjk.org/cjk/joa/joapaper.htm
31. Japanese Text Processing
! Synonym Addition
• e.g. Venice
• 「ベニス」
• 「ベネチア」
• 「ヴェネチア」
• 「ヴェネツィア」
• Alias
• search with pupil => student is hit
• search with student => pupil is NOT hit
• Group
• 1st, first, one => you can search with all keywords in the group
! Stop Word Removing
• これ(this), あれ(that), どれ(which), だれ(who), なに(what),,,
All these 4 words mean “Venice”
http://ja.wikipedia.org/wiki/ヴェネツィア
32. ! Amazon CloudSearch
• Full Managed Cloud-‐‑‒Based Search Service
• Pretty easy to introduce
• 34 languages(include Japanese) support
• Sophisticated Functions
• Highlight
• Suggest(AutoComplete)
• Geo Search
33. Amazon CloudSearch
! Suggestions
/suggest?q=ir&suggester=title_̲sug
"suggest": {"query": "iro", "found": 5,
"suggestions": [
{“suggestion”: “Iron Man”,…"id": "tt0371746"},
{"suggestion": "Iron Man 2”,…"id”:"tt1228705"},
...
• Reading Search
• Japanese language has
Kanji/Hiragana/Katakana ,,
47. CloudSearch use-‐‑‒case: ChatWork
! ChatWork: Business Communication Tool
• Over 40 thousand companies are using
• About 0.5 million users
! comment from ChatWork CTO Yamamoto-‐‑‒san
• “To handle about 5 hundred million documents, we
migrated to Amazon CloudSearch. Thanks to AWS and
A9 team, it took only a few month.”
48. CloudSearch use-‐‑‒case: ChatWork Tanaka-‐‑‒san slide
https://speakerdeck.com/tanakayuki/kai-‐‑‒fa-‐‑‒zhe-‐‑‒karamitacloudsearch
Almost maintenance free
Positive feedback from end-‐‑‒users about Low latency
51. CloudSearch use-‐‑‒case: nanapi
! nanapi is a Life Recipe portal
• About 20 million per user per Month
• Over 0.1 million recipes
! Getting popular these days
52. CloudSearch use-‐‑‒case: nanapi Kagaya-‐‑‒san slide
https://speakerdeck.com/violetyk/cloudsearch-‐‑‒nanapi-‐‑‒use-‐‑‒case
• Default setting works a lot
• Easy to have Japanese search function
• Fully managed by AWS is huge plus
54. CloudSearch use-‐‑‒case: schoo Ito-‐‑‒san slide
http://www.slideshare.net/hiromitsuito71/20141017-‐‑‒cloud-‐‑‒searchschoo
It took only 1 WEEK to introduce. Itʼ’s so easy and nice.
Of course you need to escape XSS stuff
55. ! Japanese Language is not so easy
• Yahoo! Japan Search Engineer Osuka-‐‑‒san slide
Hasegawa-‐‑‒san?
Tanigawa-‐‑‒san??
Need to analyze and only the
user can know the answer
56. Launched Features -‐‑‒ Requests from Japan
! Customizing Japanese Tokenization
• 形態素解析辞書のカスタマイズ
! Indexing Bigrams
• Bi-‐‑‒gramでのインデクシング
59. ! Customizing Japanese Tokenization
• add きゃりーぱみゅぱみゅ to the Tokenization Dictionary
https://www.youtube.com/watch?v=NLy4cvRx7Vc
60. ! Indexing Bigrams
• CJK -‐‑‒ Chinese/Japanese/Korean
• “Multiple Languages” in Analysis Scheme
• Eliminate the missing
• but may need to care the noise
– e.g. search “東京都”(Tokyo) with the word ”京都”(Kyoto)
Tokyo
Kyoto
61. ! Indexing Bigrams
• Define 2 fields. 1 for Tokenize, 1 for Bi-‐‑‒gram
• Utilizing “Source Field”
62. ! Indexing Bigrams
• Controlling Score with ^(caret)
Search with “京都”(Kyoto)
“京都府”(Kyoto) is Higher than “東京都”(Tokyo)
66. ! Indexing
EC2
P1
batch
§ Raw Batch data to S3. Meta data to DynamoDB
§ Document Service returns 200(OK) after putting the data
Document
Service
S3 DynamoDB
200
67. ! Indexing
EC2
P1
§ Indexed Binary data to S3 and update Meta data
§ Each node(partition) gets Indexed Binary data from S3
Document
Service
S3 DynamoDB
Updater
Process
Solr
68. Amazon S3
Region
over 3 different places
Bucket
Durability:
99.999999999%
Data Center
Just put files
Donʼ’t need to care about
infrastructure. Unlimited
Data Center
Data CenterFiles
⇒ Text, Image, Movie,,
Cheap and Pay as you go
1GB $0.03 per month
70. ! Handling Massive Query
u Replication by Auto Scaling
• scale EC2 capacity automatically according to server load
• e.g. CPU70% for 5 min. Then add 2 more EC2 instances
Auto Scaling Group
EC2 EC2
ELB
Auto Scaling
CloudWatch
Monitoring
71. ! Handling Massive Query
u Replication by Auto Scaling
• scale EC2 capacity automatically according to server load
• e.g. CPU70% for 5 min. Then add 2 more EC2 instances
Auto Scaling Group
EC2 EC2
ELB
Auto Scaling
CloudWatch
Monitoring
EC2 EC2
Create EC2 instances
add to ELB
72. ! Handling Massive Query
EC2
P1
Auto Scaling Group
EC2
P2
Auto Scaling Group
EC2
P3
Auto Scaling Group
73. ! Handling Massive Query
EC2
P1
Auto Scaling Group
EC2
P2
Auto Scaling Group
EC2
P3
Auto Scaling Group
EC2
P1
EC2
P2
EC2
P3
EC2
P1
EC2
P2
EC2
P3
74. ! Scaling
§ Depends on the Data Size
§ No downtime but eventually consistent
Small Medium Large
75. ! Scaling
§ After scaling up to largest, Scale out
§ EMR(AWS Hadoop service) jobs to split partition
§ No downtime but eventually consistent
Index
Index P1
Index P2Amazon
EMR
76. ! Modify configuration
§ Re-‐‑‒Indexing with EMR
§ No downtime but eventually consistent
Index A
Amazon
EMR
Index B
77. Inside Amazon CloudSearch
! Utilizing variety of AWS services
• Donʼ’t need to think of back-‐‑‒ground things but please
understand that Amazon CloudSearch is based on
eventual consistency model
• Metrics with CloudWatch (launched in Mar 2015)
• Successful Requests
• Searchable Documents
• IndexUtilization
• Partitions
78. Amazon CloudSearch TIPS
! Initial Massive Data Loading
• Lager Instance and enough Partitions for Multi-‐‑‒threading
http://www.slideshare.net/AmazonWebServices/sdd411-‐‑‒amazon-‐‑‒cloudsearch-‐‑‒deep-‐‑‒dive-‐‑‒and-‐‑‒best-‐‑‒practices-‐‑‒aws-‐‑‒reinvent-‐‑‒2014
83. CloudSearch is expanding in Japan
! Smart/InSight: Enterprise Search
http://smartinsight.jp/en/
CloudDays Tokyo 2014
84. CloudSearch is expanding in Japan
! Pasona Career: Japanese Big Recruiting
Company 『Connected』 – Career Search
https://pasona-‐‑‒connected.jp/
85. Want to Dive Deeper?
! CloudSearch, The Amazon Web Service on top of Solr
• At Lucene/Solr Revolution 2014
https://www.youtube.com/watch?v=RI1x0d-‐‑‒yO8A
86. Want to Dive Deeper?
! Amazon CloudSearch Deep Dive & Best Practices
• At AWS re:Invent 2014
• Pro Tips and Rule of thumb
• Slide: http://goo.gl/pklAzW
• Youtube: http://youtu.be/OeHaj1a66I4
88. AWS Black Belt Tech Webinar
! Every Wed 6PM – 7PM(JST) Online Seminar in Japanese
http://aws.typepad.com/sajp/
w/ Adobe Connect
89. AWS Black Belt Tech Webinar
! Deep dive product-‐‑‒cut seminar by Solution Architect
http://aws.typepad.com/sajp/
Amazon Simple Queue Service (SQS)
! AWS Elastic Beanstalk: Worker Tier
• SQS + Auto Scalingでスケーラブルなバッチ処理基盤
Sqsd
(deamon)
Elastic Beanstalk
Application
http://localhost:80/xxx
EC2 Instance
Auto Scaling group
CloudWatch
Auto Scaling
90. AWS Black Belt Tech Webinar
! Check #awsblackbelt hashtag on Twitter