Submit Search
Upload
ニコニコ動画を検索可能にしてみよう
•
32 likes
•
39,679 views
genta kaneyama
Follow
indexing 2.5billion with elasticsearch
Read less
Read more
Technology
News & Politics
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 52
Download now
Download to read offline
Recommended
Debugging and Testing ES Systems
Debugging and Testing ES Systems
Chris Birchall
Elastic search 클러스터관리
Elastic search 클러스터관리
HyeonSeok Choi
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
ElasticSearch
ElasticSearch
Luiz Rocha
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
NAVER D2
[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환
NAVER D2
Gazelle - Plack Handler for performance freaks #yokohamapm
Gazelle - Plack Handler for performance freaks #yokohamapm
Masahiro Nagano
Big Master Data PHP BLT #1
Big Master Data PHP BLT #1
Masahiro Nagano
Recommended
Debugging and Testing ES Systems
Debugging and Testing ES Systems
Chris Birchall
Elastic search 클러스터관리
Elastic search 클러스터관리
HyeonSeok Choi
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
ElasticSearch
ElasticSearch
Luiz Rocha
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
NAVER D2
[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환
NAVER D2
Gazelle - Plack Handler for performance freaks #yokohamapm
Gazelle - Plack Handler for performance freaks #yokohamapm
Masahiro Nagano
Big Master Data PHP BLT #1
Big Master Data PHP BLT #1
Masahiro Nagano
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
Henry Jeong
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
NAVER D2
power-assert, mechanism and philosophy
power-assert, mechanism and philosophy
Takuto Wada
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + Docker
Roald Umandal
Scala Frustrations
Scala Frustrations
takezoe
How to build a High Performance PSGI/Plack Server
How to build a High Performance PSGI/Plack Server
Masahiro Nagano
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
Sematext Group, Inc.
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
Maxim Shelest
What I learned from FluentConf and then some
What I learned from FluentConf and then some
Ohad Kravchick
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
async_io
Stop Worrying & Love the SQL - A Case Study
Stop Worrying & Love the SQL - A Case Study
All Things Open
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
Florian Hopf
CouchDB on Android
CouchDB on Android
Sven Haiges
What is nodejs
What is nodejs
JeongHun Byeon
What the web platform (and your app!) can learn from Node.js
What the web platform (and your app!) can learn from Node.js
wbinnssmith
Web前端性能优化 2014
Web前端性能优化 2014
Yubei Li
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps_Fest
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
Felix Geisendörfer
Solrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピング
genta kaneyama
Solr at cookpad
Solr at cookpad
genta kaneyama
More Related Content
What's hot
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
Henry Jeong
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
NAVER D2
power-assert, mechanism and philosophy
power-assert, mechanism and philosophy
Takuto Wada
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + Docker
Roald Umandal
Scala Frustrations
Scala Frustrations
takezoe
How to build a High Performance PSGI/Plack Server
How to build a High Performance PSGI/Plack Server
Masahiro Nagano
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
Sematext Group, Inc.
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
Maxim Shelest
What I learned from FluentConf and then some
What I learned from FluentConf and then some
Ohad Kravchick
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
async_io
Stop Worrying & Love the SQL - A Case Study
Stop Worrying & Love the SQL - A Case Study
All Things Open
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
Florian Hopf
CouchDB on Android
CouchDB on Android
Sven Haiges
What is nodejs
What is nodejs
JeongHun Byeon
What the web platform (and your app!) can learn from Node.js
What the web platform (and your app!) can learn from Node.js
wbinnssmith
Web前端性能优化 2014
Web前端性能优化 2014
Yubei Li
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps_Fest
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
Felix Geisendörfer
What's hot
(20)
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
power-assert, mechanism and philosophy
power-assert, mechanism and philosophy
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + Docker
Scala Frustrations
Scala Frustrations
How to build a High Performance PSGI/Plack Server
How to build a High Performance PSGI/Plack Server
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
What I learned from FluentConf and then some
What I learned from FluentConf and then some
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Stop Worrying & Love the SQL - A Case Study
Stop Worrying & Love the SQL - A Case Study
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
CouchDB on Android
CouchDB on Android
What is nodejs
What is nodejs
What the web platform (and your app!) can learn from Node.js
What the web platform (and your app!) can learn from Node.js
Web前端性能优化 2014
Web前端性能优化 2014
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
Viewers also liked
Solrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピング
genta kaneyama
Solr at cookpad
Solr at cookpad
genta kaneyama
Social media contract for parents and teens
Social media contract for parents and teens
Jill Celeste
Elasticsearch at CrowdWorks
Elasticsearch at CrowdWorks
佑介 九岡
Elasticsearchプラグインの作り方
Elasticsearchプラグインの作り方
Shinsuke Sugaya
Solr AutoComplete and Did You Mean?
Solr AutoComplete and Did You Mean?
Minoru Osuka
AeroGear & Java EE 7 で簡単プッシュ
AeroGear & Java EE 7 で簡単プッシュ
Norito Agetsuma
elasticsearchソースコードを読みはじめてみた
elasticsearchソースコードを読みはじめてみた
furandon_pig
JSR 352 “Batch Applications for the Java Platform”
JSR 352 “Batch Applications for the Java Platform”
Norito Agetsuma
ElasticSearch勉強会 第6回
ElasticSearch勉強会 第6回
Naoyuki Yamada
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
Amazon Web Services
Jbatch実践入門 #jdt2015
Jbatch実践入門 #jdt2015
Norito Agetsuma
AWS Black Belt Tech Webinar 2016 〜 Amazon CloudSearch & Amazon Elasticsearch ...
AWS Black Belt Tech Webinar 2016 〜 Amazon CloudSearch & Amazon Elasticsearch ...
Amazon Web Services Japan
Spring frameworkが大好きなおはなし
Spring frameworkが大好きなおはなし
Satoshi Kisanuki
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話
Kentaro Yoshida
Java Batch 仕様 (Public Review時点)
Java Batch 仕様 (Public Review時点)
Norito Agetsuma
JobStreamerではじめるJavaBatchのクラウド分散実行
JobStreamerではじめるJavaBatchのクラウド分散実行
Yoshitaka Kawashima
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
Amazon Web Services
たとえ日本人同士でも必要な異文化理解力
たとえ日本人同士でも必要な異文化理解力
Yoshitaka Kawashima
Viewers also liked
(20)
Solrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピング
Solr at cookpad
Solr at cookpad
Social media contract for parents and teens
Social media contract for parents and teens
Elasticsearch at CrowdWorks
Elasticsearch at CrowdWorks
Elasticsearchプラグインの作り方
Elasticsearchプラグインの作り方
Solr AutoComplete and Did You Mean?
Solr AutoComplete and Did You Mean?
AeroGear & Java EE 7 で簡単プッシュ
AeroGear & Java EE 7 で簡単プッシュ
elasticsearchソースコードを読みはじめてみた
elasticsearchソースコードを読みはじめてみた
JSR 352 “Batch Applications for the Java Platform”
JSR 352 “Batch Applications for the Java Platform”
ElasticSearch勉強会 第6回
ElasticSearch勉強会 第6回
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
Jbatch実践入門 #jdt2015
Jbatch実践入門 #jdt2015
AWS Black Belt Tech Webinar 2016 〜 Amazon CloudSearch & Amazon Elasticsearch ...
AWS Black Belt Tech Webinar 2016 〜 Amazon CloudSearch & Amazon Elasticsearch ...
Spring frameworkが大好きなおはなし
Spring frameworkが大好きなおはなし
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話
Java Batch 仕様 (Public Review時点)
Java Batch 仕様 (Public Review時点)
JobStreamerではじめるJavaBatchのクラウド分散実行
JobStreamerではじめるJavaBatchのクラウド分散実行
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
たとえ日本人同士でも必要な異文化理解力
たとえ日本人同士でも必要な異文化理解力
Similar to ニコニコ動画を検索可能にしてみよう
¡El mejor lenguaje para automatizar pruebas!
¡El mejor lenguaje para automatizar pruebas!
Antonio Robres Turon
System insight without Interference
System insight without Interference
Tony Tam
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Amazon Web Services Japan
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
Timothy Spann
Invoke-CradleCrafter: Moar PowerShell obFUsk8tion & Detection (@('Tech','niqu...
Invoke-CradleCrafter: Moar PowerShell obFUsk8tion & Detection (@('Tech','niqu...
Daniel Bohannon
Icinga Camp San Diego: Apify them all
Icinga Camp San Diego: Apify them all
Icinga
Icinga Camp San Diego 2016 - Apify them all
Icinga Camp San Diego 2016 - Apify them all
Icinga
K8s monitoring with elk
K8s monitoring with elk
윤종원 윤종원
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Wayne Chen
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
Amazee Labs
e10sとアプリ間通信
e10sとアプリ間通信
Makoto Kato
Building APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
Antonio Peric-Mazar
Parse, scale to millions
Parse, scale to millions
Florent Vilmart
スマートフォンサイトの作成術 - 大川洋一
スマートフォンサイトの作成術 - 大川洋一
okyawa
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
Puppet
Angular2 inter3
Angular2 inter3
Oswald Campesato
Anwendungsfälle für Elasticsearch JAX 2015
Anwendungsfälle für Elasticsearch JAX 2015
Florian Hopf
Make BDD great again
Make BDD great again
Yana Gusti
2つの同期 4つの状態 #roppongiswift
2つの同期 4つの状態 #roppongiswift
Kenji Tanaka
Nicole Neumann - Let’s Monitor All The Things
Nicole Neumann - Let’s Monitor All The Things
Nicole Neumann
Similar to ニコニコ動画を検索可能にしてみよう
(20)
¡El mejor lenguaje para automatizar pruebas!
¡El mejor lenguaje para automatizar pruebas!
System insight without Interference
System insight without Interference
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
Invoke-CradleCrafter: Moar PowerShell obFUsk8tion & Detection (@('Tech','niqu...
Invoke-CradleCrafter: Moar PowerShell obFUsk8tion & Detection (@('Tech','niqu...
Icinga Camp San Diego: Apify them all
Icinga Camp San Diego: Apify them all
Icinga Camp San Diego 2016 - Apify them all
Icinga Camp San Diego 2016 - Apify them all
K8s monitoring with elk
K8s monitoring with elk
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
e10sとアプリ間通信
e10sとアプリ間通信
Building APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
Parse, scale to millions
Parse, scale to millions
スマートフォンサイトの作成術 - 大川洋一
スマートフォンサイトの作成術 - 大川洋一
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
Angular2 inter3
Angular2 inter3
Anwendungsfälle für Elasticsearch JAX 2015
Anwendungsfälle für Elasticsearch JAX 2015
Make BDD great again
Make BDD great again
2つの同期 4つの状態 #roppongiswift
2つの同期 4つの状態 #roppongiswift
Nicole Neumann - Let’s Monitor All The Things
Nicole Neumann - Let’s Monitor All The Things
Recently uploaded
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
Hyundai Motor Group
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Deakin University
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
Neo4j
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
XfilesPro
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Hyundai Motor Group
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
LBM Solutions
Recently uploaded
(20)
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
ニコニコ動画を検索可能にしてみよう
1.
ニコニコ動画データ セットを検索可能に してみよう @PENGUINANA_
2.
whoami • @PENGUINANA_ /
兼山元太 • エンジニア at *.cookpad.com/* • 検索インフラとサービス開発
3.
身の回りのJSON • tweet • 140
character message
4.
身の回りのJSON • tweet • 140
character message • user_name • datetime • location • reply or not/contains link or not/ retweeted count/reply count ...
5.
身の回りのJSON • access log •
ip address • requested content • status code • response time • referrer
6.
身の回りのJSON • event log •
user_id • event name • params(hash) • datetime • user agent
7.
身の回りのJSON • dictionary edit
request • keyword • operation type • requester • status(applied or not)
8.
kibana • http://demo.kibana.org/ • http://www.elasticsearch.org/blog/kibana- whats-cooking/
9.
kibana@cookpad • log dashboard
for internal API • explore log • capacity planning • performance check • slowquery
10.
dashboard for each
application
11.
テーマ • データサイズに負けずにJSONデータを 柔軟に検索/分析できれば日常が楽にな る • どうやったらできる?難しい?
12.
やってみればよい • ニコニコ動画データセット • 検索/分析可能にする
13.
データセット • ニコニコ動画公式データセット • 800万動画のメタデータ •
25億コメント • JSON形式(圧縮:60G 非圧縮:300G) http://goo.gl/FYtO5T
14.
データセット • ニコニコ動画公式データセット • 800万動画のメタデータ •
25億コメント • JSON形式(圧縮:60G 非圧縮:300G) http://goo.gl/FYtO5T
15.
http://goo.gl/FYtO5T
16.
http://goo.gl/FYtO5T
17.
結果 • Elasticsearch on
AWSで4時間でできた • s3 -> unzip -> Elasticsearch(173k doc/s) • 550円
18.
デモ • 25億のコメントをdate facet
19.
20.
21.
22.
23.
24.
25.
install • wget https://download.elasticsearch.org/ elasticsearch/elasticsearch/ elasticsearch-0.90.3.noarch.rpm •
sudo rpm -i elasticsearch-0.90.3.noarch.rpm
26.
install plugins • sudo
bin/plugin • .. -install elasticsearch/elasticsearch-cloud-aws • .. -install mobz/elasticsearch-head • .. -install lukas-vlcek/bigdesk • .. -install elasticsearch/elasticsearch-analysis-kuromoji
27.
elasticsearch-cloud-aws • cluster node
discovery in AWS • add config to elasticsearch.yml cloud: aws: access_key:AKI........... secret_key: mR............. discovery: type: ec2 discovery.ec2.groups: es_test (security_group)
28.
elasticsearch-head
29.
bigdesk
30.
elasticsearch-analysis- kuromoji • japanese analyzer
31.
config • # Set
a custom allowed content length: • http.max_content_length: 1000m • # Heap Size (defaults to 256m min, 1g max) • ES_HEAP_SIZE=3g • # ElasticSearch data directory • DATA_DIR=/media/ephemeral1/es,/media/ephemeral2/ es,/media/ephemeral3/es
32.
make AMI • elasticsearch
machine image
33.
launch ES Instances •
c1.xlarge x 20 • CPU Xeon 8core(2,300MHz) • Memory 7G • Disk 420G x4 • $0.07/hour(spot instance)
34.
35.
• download from
s3 to nodes • use s3cmd(few minutes with GNU Parallel) • unzip(60GB->300GB) deploy data
36.
bulk import { "index"
: { "_id" : "sm14784868 1", "parent": "sm14784868" } } {"date":"2011-06-18T20:15:30+09:00","no":1,"vpos": 63,"comment":"1","command":"184"} ... { "index" : { "_id" : "sm14784868 2", "parent": "sm14784868" } } {"date":"2011-07-24T02:22:58+09:00","no":2,"vpos": 4651,"comment":"2 get","command":"184"}
37.
bulk import • ls
request_file* | parallel -j N curl -X POST -s -D - 'http:// localhost:9200/nico2/comment/_bulk' -o /dev/null --data- binary @{}
38.
39.
wc -l requests >
4.8billion
40.
import... import... import... • all
node can handle indexing request • curl bulk import in each node (x20) • I/O into 3 disks • takes 4hours
41.
efficiency
42.
efficiency "mappings": { "video": { "properties":
{ "video_id": { "type": "string", "index": "no" }, "title": { "type": "string", "index": "analyzed" }, "description": { "type": "string", "index": "analyzed" }, "thumbnail_url": { "type": "string", "index": "no", "store": "yes" }, "upload_time": { "type": "date", "format": "YYYY-MM-dd'T'HH:mm:ss'+09:00'" }, "movie_type": { "type": "string", "index": "not_analyzed" }, "last_res_body": { "type": "string", "index": "analyzed" }, "tags": { "properties": { "tag": { "type": "string", "index": "not_analyzed" } } } } }
43.
efficiency "mappings": { "comment": { "_parent":
{ "type": "video" }, "properties": { "date": { "type": "date", "format": "YYYY-MM-dd'T'HH:mm:ss'+09:00'" }, "no": { "type": "integer" }, "vpos": { "type": "integer" }, "comment": { "type": "string" }, "command": { "type": "string" }, "video_id": { "type": "string", "index": "not_analyzed" } } }
44.
efficiency • curl -X
POST 'http://localhost:9200/nico2' -d @mapping.json
45.
shrink curl -XPOST 'localhost:9200/_cluster/reroute'
-d '{ "commands" : [ { "move" : { "index" : "nico2", "shard" :33, "from_node" : "nodeA", "to_node" : "nodeB" } } ]} '
46.
shrink curl -XPUT localhost:9200/_cluster/settings
-d '{ "persistent": { "indices.recovery.concurrent_streams": 3 }}' curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent": { "indices.recovery.max_bytes_per_sec": "1000mb" }}'
47.
48.
49.
50.
Why Elasticsearch? • proven
scalable search engine • super flexible config with nice default conf • Great API • growing developer, user base
51.
not covered • mapping •
query DSL • search performance • cluster operation • healthcheck / cluster statistics • etc...
52.
questions?
Download now