Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
16 elasticsearch
SPEEDA
Elasticsearch
2016/6/27
• / @tau3000
• SPEEDA 

SPEEDA


•
• : Ruby, Java
:

: 2008 4 1
: 170 ( )
B2B
SPEEDA 550
300 / 100
M&A 10
B2C
NewsPicks
NewsPicks
2
300 / 550
M&A
2009 6
600 ( )
DEMO
SPEEDA Elasticsearch
1. Elasticsearch
2.
3. NewsPicks SPEEDA
4.
SPEEDA Elasticsearch
1. Elasticsearch
2.
3. NewsPicks SPEEDA
4.
SPEEDA
SPEEDA
•
•
•
•
SPEEDA
•
•
•
•
•
•
•
•
•
• 



1000 

1000 

motor 

in MySQL
• ID × ID × ID →
• 300 × 2 × 60 ( )
• 6
• 6 ( ) 

7 40
in MySQL
•
• LIKE
•
• 10 × 100 

MySQL+
• 10 

& 5
• ( CTO )
• KVS
• Elasticsearch
• Elasticsearch
•
•
Elasticsearch
in Elasticsearch
• 1 =1 (=300 )
• 1
•
•
• + 6 ( 40 )
• 1 40MB (JSON )
• 11 (= )
• × +
•
•
• 

• 10 24
• 1 CPU16 

128GB SSD RAID
• 30
• 2
DEMO
•
• 



1000 

1000 

"motor" 

SPEEDA Elasticsearch
1. Elasticsearch
2.
3. NewsPicks SPEEDA
4.

 1 

(precision) (recall)




(kuromoji_tokenizer )
ngram

(cjk_bigram filter )
mapping
• bigram,
• unigram
analyzer:
name_analyzer:
tokenizer: standard
filter:
- standard
- cjk_bigram # " " " " " " -> ...
• phrase_prefix 

• max_expansions 1024 

1
"petro china"
"query": {
"bool": {
"must": [ {
"multi_match": {
"query": "petro",
"type": "phrase_prefix",
"fields": [ "ja...
phrase_prefix
(AP)
(ES)
(ES)
phrase_prefix 

AND
petro china
petro
china
petro
china
NTT
China Petrotech Holdings
China Petr...
max_expansions
• term
"multi_match": {
"query": " ",
"type": "phrase_prefix",
"fields": [ "japanese_name", "english_name" ...
max_expansions
•
• max_expansions>=131
• bigram X 131
• e 1024
• e term eXXX 1024 

1
SPEEDA Elasticsearch
1. Elasticsearch
2.
3. NewsPicks SPEEDA
4.
NewsPicks
• SPEEDA β 2016/6/20 web
•
DEMO
1. ID ES RDB
2. ID ID 

ES
ID
1.
2.
3.
4.
5.
6.
Industry Index (ES)
Country Table (MySQL)
Account Title Table (MySQL)
1.
2.
3.
4.
5.
6.
ID
Industry Index (ES)
Country Table (MySQL)
Account Title Table (MySQL)
ID
1.
2.
3.
4.
5.
6.
Industry Index (ES)
Country Table (MySQL)
Account Title Table (MySQL)
ID
1.
2.
3.
4.
5.
6.
Industry Index (ES)
Country Table (MySQL)
Account Title Table (MySQL)
ID
1.
2.
3.
4.
5.
6.
Industry Index (ES)
Country Table (MySQL)
Account Title Table (MySQL)
ID
1.
2.
3.
4.
5.
6.
Industry Index (ES)
Country Table (MySQL)
Account Title Table (MySQL)
ID
JPN( ) AND IDST014( ) AND AT00102( )
Media Index (ES)
"query": {
"bool": {
"must": [
"term": { "countries": "JPN" },
"t...
1. RDB
2. ES
SPEEDA Elasticsearch
1. Elasticsearch
2.
3. NewsPicks SPEEDA
4.
AP Server
Node 01
Node 02
Node 20
Batch Server
SPEEDA
• AP
•
• 1
40MB
Elasticsearch
(ES 50 300 )
• Full GC
• Full GC
• Full GC
AP Server
Node 01
Node 02
Node 20
Batch Server
AP Server
Master Node 01
Master Node 02
Client Node 01
Client Node 02
Data Node 01
Data Node 02
Data Node 20
Batch Server
• Full GC
• Full GC
Full GC
•
•
• Full GC
•
0.5 7.2% 2.7%
1.0 2.6% 0.9%
1.5 1.0% 0.5%
2.0 0.7% 0.4%
• Elasticsearch
• ES RDB
•
•
• Elasticsearch (1.4→2.3)

inner hits
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
Upcoming SlideShare
Loading in …5
×

企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用

6,694 views

Published on

第16回elasticsearch勉強会の発表資料です
https://elasticsearch.doorkeeper.jp/events/46539

Published in: Software
  • Be the first to comment

企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用

  1. 1. 16 elasticsearch SPEEDA Elasticsearch 2016/6/27
  2. 2. • / @tau3000 • SPEEDA 
 SPEEDA 
 • • : Ruby, Java
  3. 3. :
 : 2008 4 1 : 170 ( )
  4. 4. B2B SPEEDA 550 300 / 100 M&A 10 B2C NewsPicks NewsPicks 2
  5. 5. 300 / 550 M&A 2009 6 600 ( )
  6. 6. DEMO
  7. 7. SPEEDA Elasticsearch 1. Elasticsearch 2. 3. NewsPicks SPEEDA 4.
  8. 8. SPEEDA Elasticsearch 1. Elasticsearch 2. 3. NewsPicks SPEEDA 4.
  9. 9. SPEEDA
  10. 10. SPEEDA • • • •
  11. 11. SPEEDA • • • • • • • •
  12. 12. • • 
 
 1000 
 1000 
 motor 

  13. 13. in MySQL • ID × ID × ID → • 300 × 2 × 60 ( ) • 6 • 6 ( ) 
 7 40
  14. 14. in MySQL • • LIKE •
  15. 15. • 10 × 100 
 MySQL+ • 10 
 & 5
  16. 16. • ( CTO ) • KVS • Elasticsearch • Elasticsearch • •
  17. 17. Elasticsearch
  18. 18. in Elasticsearch • 1 =1 (=300 ) • 1 • • • + 6 ( 40 )
  19. 19. • 1 40MB (JSON ) • 11 (= ) • × + • •
  20. 20. • 

  21. 21. • 10 24 • 1 CPU16 
 128GB SSD RAID • 30 • 2
  22. 22. DEMO
  23. 23. • • 
 
 1000 
 1000 
 "motor" 

  24. 24. SPEEDA Elasticsearch 1. Elasticsearch 2. 3. NewsPicks SPEEDA 4.
  25. 25. 
 1 
 (precision) (recall) 
 
 (kuromoji_tokenizer ) ngram
 (cjk_bigram filter )
  26. 26. mapping • bigram, • unigram analyzer: name_analyzer: tokenizer: standard filter: - standard - cjk_bigram # " " " " " " -> " " " "
  27. 27. • phrase_prefix 
 • max_expansions 1024 
 1
  28. 28. "petro china" "query": { "bool": { "must": [ { "multi_match": { "query": "petro", "type": "phrase_prefix", "fields": [ "japanese_name", "english_name" ], "max_expansions": 1024 } }, { "multi_match": { "query": "china", "type": "phrase_prefix", "fields": [ "japanese_name", "english_name" ], "max_expansions": 1024 } } ] } }
  29. 29. phrase_prefix (AP) (ES) (ES) phrase_prefix 
 AND petro china petro china petro china NTT China Petrotech Holdings China Petrochemical Development
  30. 30. max_expansions • term "multi_match": { "query": " ", "type": "phrase_prefix", "fields": [ "japanese_name", "english_name" ], "max_expansions": 1024 } 1. bigram 1024 
 … 2. bigram 
 …
  31. 31. max_expansions • • max_expansions>=131 • bigram X 131 • e 1024 • e term eXXX 1024 
 1
  32. 32. SPEEDA Elasticsearch 1. Elasticsearch 2. 3. NewsPicks SPEEDA 4.
  33. 33. NewsPicks • SPEEDA β 2016/6/20 web •
  34. 34. DEMO
  35. 35. 1. ID ES RDB 2. ID ID 
 ES
  36. 36. ID 1. 2. 3. 4. 5. 6. Industry Index (ES) Country Table (MySQL) Account Title Table (MySQL)
  37. 37. 1. 2. 3. 4. 5. 6. ID Industry Index (ES) Country Table (MySQL) Account Title Table (MySQL)
  38. 38. ID 1. 2. 3. 4. 5. 6. Industry Index (ES) Country Table (MySQL) Account Title Table (MySQL)
  39. 39. ID 1. 2. 3. 4. 5. 6. Industry Index (ES) Country Table (MySQL) Account Title Table (MySQL)
  40. 40. ID 1. 2. 3. 4. 5. 6. Industry Index (ES) Country Table (MySQL) Account Title Table (MySQL)
  41. 41. ID 1. 2. 3. 4. 5. 6. Industry Index (ES) Country Table (MySQL) Account Title Table (MySQL)
  42. 42. ID JPN( ) AND IDST014( ) AND AT00102( ) Media Index (ES) "query": { "bool": { "must": [ "term": { "countries": "JPN" }, "term": { "industries": "IDST014" }, "term": { "accountTitles": "AT00102" } ] } }
  43. 43. 1. RDB 2. ES
  44. 44. SPEEDA Elasticsearch 1. Elasticsearch 2. 3. NewsPicks SPEEDA 4.
  45. 45. AP Server Node 01 Node 02 Node 20 Batch Server
  46. 46. SPEEDA • AP • • 1 40MB
  47. 47. Elasticsearch (ES 50 300 ) • Full GC • Full GC • Full GC
  48. 48. AP Server Node 01 Node 02 Node 20 Batch Server
  49. 49. AP Server Master Node 01 Master Node 02 Client Node 01 Client Node 02 Data Node 01 Data Node 02 Data Node 20 Batch Server
  50. 50. • Full GC • Full GC Full GC •
  51. 51. • • Full GC • 0.5 7.2% 2.7% 1.0 2.6% 0.9% 1.5 1.0% 0.5% 2.0 0.7% 0.4%
  52. 52. • Elasticsearch • ES RDB • • • Elasticsearch (1.4→2.3)
 inner hits

×