2. • H A P P Y G R U M P Y C AT O F
E Y E E M
• S TA R T E D A S A N O P S I N A
S C I E N T I F I C D ATA C E N T E R
• N O W D E V
• D E V E L O P E R S H AT E M E
S O M E T I M E S
M E
A B O U T M E
3. EyeEm is the world’s premier community and
marketplace for the photographer inside all of us
4.
5.
6. A P I S TA C K
• PHP
• MySQL (~10k commands per second)
• Memcached (~50k commands per second)
• Redis (~3k commands per second)
• S3 (~1k commands per second, 40m photos stored)
• Elasticsearch (~250 commands per second - elasticsearch-php)
• All writes are async
• Metrics everywhere
7. C U R R E N T C L U S T E R S P E C S
• 3 x m3.xlarge (4 cores, 15GiB Mem, 2 x 40GB SSD)
• cloud-aws plugin to interconnect.
• OpenJDK 1.6
• 60% heap size (9 GiB)
• 4 Indexes, 5 Shards each. From 1GB to 15GB
8. C U R R E N T P R O D U C T I O N U S E - C A S E S
9. C U R R E N T P R O D U C T I O N U S E - C A S E S
A L B U M S E A R C H
10. C U R R E N T P R O D U C T I O N U S E - C A S E S
P E O P L E S E A R C H
11. C U R R E N T P R O D U C T I O N U S E - C A S E S
• C I T Y- S E A R C H
• L I V E N E A R B Y
D I S C O V E R
12. C U R R E N T P R O D U C T I O N U S E - C A S E S
L I V E N E A R B Y
15. L O N G S T O RY
• MyISAM full-text search
• Album Search on one ElasticSearch node
• People Search added
• Scale-Out to 3 instances for Photo Search (+ Live
Nearby)
16. E L A S T I C S E A R C H - I N T E R N A L S
• Index
• What your application sees.
• View for a logical namespace inside ElasticSearch.
• Consists of a fixed number of shards
• “To Index” means to “put” your data into
ElasticSearch to make it available for search and for
persistence.
17. E L A S T I C S E A R C H - I N T E R N A L S
• Inverted-Index/Mapping
• The Mapping tells Lucene how to create the
inverted-index in order to make data searchable.
• e.g. “EyeEm” as an nGram{2,3} gets “indexed” as
[“Ey”,”ye”,”eE”,”Em”,”Eye”,”yeE”,”eEm”],
“yeah” would be [“ye”,”ah”,”yea”, “eah”]
18. E L A S T I C S E A R C H - I N T E R N A L S
• Inverted Index/Mapping by example
Ey 1
ye 1,2
eE 1
Em 1
Eye 1
yeE 1
eEm 1
ah 2
yea 2
eah 2
19. S C H E M A - L E S S O R W H AT ?
• Yes and No.
20. S C H E M A - L E S S O R W H AT ?
• Yes - You can put anything that can be formatted as a
JSON in your index, and you get a readable
document.
21. S C H E M A - L E S S O R W H AT ?
• No - you have to think first, because changing your
Mapping is expensive, since you have to reindex.
22. E L A S T I C S E A R C H - I N T E R N A L S
• Shard
• Instance of Lucene
• Consists of multiple Lucene segments
• Manages segments (Merging, fsync, deletion etc.)
23. E L A S T I C S E A R C H - I N T E R N A L S
segments API
http://example.es:9200/yourindex/_segments
indices: { eyephoto6: { shards: { 0: [!
{!
routing: {!
state: "STARTED",!
primary: true,!
node: "PiVDZW-VRYmeaVOy7afoWQ"!
},!
num_committed_segments: 2,!
num_search_segments: 3,!
segments: {!
_l: {!
generation: 21,!
num_docs: 13,!
deleted_docs: 0,!
size_in_bytes: 30810,!
memory_in_bytes: 589,!
committed: true,!
search: true,!
version: "4.7",!
compound: true!
},!
!
!
!
!
!
_m: {!
generation: 22,!
num_docs: 371,!
deleted_docs: 16,!
size_in_bytes: 408548,!
memory_in_bytes: 7365,!
committed: false,!
search: true,!
version: "4.7",!
compound: false!
},!
_n: {!
generation: 23,!
num_docs: 16,!
deleted_docs: 0,!
size_in_bytes: 38514,!
memory_in_bytes: 615,!
committed: false,!
search: true,!
version: "4.7",!
compound: true!
}!
}!
}!
],!
1: [!
24. E L A S T I C S E A R C H - I N T E R N A L S
• Segments
• Managed by ElasticSearch
• Is the storage for the inverted index
25. E L A S T I C S E A R C H - I N T E R N A L S
• Basically ElasticSearch is a Lucene cluster manager
and API
26. L E S S O N S L E A R N E D - S H A R D S /
S E G M E N T S
• Deletion does only mark documents as deleted and
does not delete them immediately.
• Updating a document does only create a new one and
marks old one as deleted.
• The actual cleanup process happens in background
and can result in nice performance surprises.
27. L E S S O N S L E A R N E D - S H A R D S /
S E G M E N T S
• Nested documents live in the same Lucene Segment.
• Can bloat up memory usage a lot.
• They are treated as every other document.
• If you don’t necessarily always have to search in them,
go for parent-child.
28. L E S S O N S L E A R N E D - E L A S T I C S E A R C H
• Start with more than one instance - just too simple
• Major upgrades are a pain (0.90 -> 1.1)
• PHP Client Libraries mostly do not handle connection
pools properly, use elasticsearch-php
• ‘connectionPoolClass' => ‘Elasticsearch
ConnectionPoolStaticConnectionPool'
• let an intermediate webserver handle it
29. L E S S O N S L E A R N E D - E L A S T I C S E A R C H
• You will index more than one time. Promise.
Be prepared.
• Rebalancing is smooth, don’t worry.
• Have your metrics ready.
• “You can have a good time with ElasticSearch, if you
don't ignore the complexity and internals of this
distributed database.”
30. L E S S O N S L E A R N E D - E L A S T I C S E A R C H
31. L E S S O N S L E A R N E D - E L A S T I C S E A R C H
32. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• Different analysers should go into separate fields
• Score individually - iterative optimisations possible
• Keep a raw field
• Use dynamic_templates if you found the holy grail of
field analysis.
• Filter first! Querying and scoring is expensive.
33. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
34. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
GET /eyephoto/_mapping!
{!
"eyephoto6": {!
"mappings": {!
"photo": {!
"dynamic_templates": [!
{!
"string": {!
"mapping": {!
"type": "string",!
"index_analyzer": "photo_names",!
"search_analyzer": "photo_standard",!
"fields": {!
"raw": {!
"type": "string",!
"index": "not_analyzed"!
},!
"split": {!
"type": "string",!
"analyzer": "standard"!
}!
}!
},!
"match": "*",!
"match_mapping_type": "string"!
}!
}!
]
• Different analysers should go into separate fields
35. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
{!
"took": 18,!
"timed_out": false,!
"_shards": {!
##########!
},!
"hits": {!
"total": 125,!
"max_score": 6.44889,!
"hits": [!
{!
#####!
"_id": "167480",!
#####!
}!
}!
]!
},!
"facets": {!
"topic": {!
"_type": "terms",!
"missing": 0,!
"total": 138,!
"other": 57,!
"terms": [!
{!
"term": "Coffee",!
"count": 81!
}!
]!
}!
}!
}
• Different analysers should go
into separate fields
POST /eyephoto/photo/_search!
{!
"size": 1,!
"fields": [!
"id"!
],!
"query": {!
"multi_match": {!
"query": "coff",!
"fields": [!
"topics"!
]!
}!
},!
"facets": {!
"topic": {!
"terms": {!
"field": "topics.raw",!
"size": 1!
}!
}!
}!
}
36. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
POST /eyephoto/photo/_search!
{!
"query": {!
"bool": {!
"should": [!
{!
"multi_match": {!
"query": "lars",!
"operator": "and",!
"fields": [!
“name.raw^3",!
“name.split^2”,!
“name"!
]!
}!
},!
{!
"multi_match": {!
"query": "lars",!
"fields": [!
“name.raw^3”,!
“name.split^2”,!
“name”!
]!
}!
}!
]!
}!
• Different analysers should go
into separate fields
37. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• Read and write only to index aliases.
Index Name Index Aliases
eyephoto5 “eyephotoread”
eyephoto6 “eyephotowrite”
38. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• If you have a string or integer field, you can put an
array into it as well.
Ey 1
ye 1,2
eE 1
Em 1
Eye 1
yeE 1
eEm 1
ah 2
yea 2
eah 2
39. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• Use geohash wherever you query on lat/lng.
POST /eyephoto/photo/_search!
{!
"query": {!
"function_score": {!
"query": {!
"filtered": {!
"query": {!
"match_all": []!
},!
"filter": {!
"geohash_cell": {!
"location": {!
"lat": 52.5311,!
"lon": 13.404!
},!
"precision": 4,!
"neighbors": true!
} } } },!
"functions": [!
{!
"gauss": {!
"location": {!
"origin": "52.5311,13.404",!
"scale": "10km"!
}!
}!
},!
{!
"exp": {!
"uploaded": {!
"origin": "now",!
"scale": "2d"!
}!
}!
}!
40. L E S S O N S L E A R N E D - A G G R E G AT I O N S
• Aggregations give you recursive facets, handle with
care. "aggregations": {!
“user_fullname": {!
"filter": {!
"query": {!
"match": {!
"topics": {!
"query": "lars beer",!
"operator": "or"!
} } } },!
"aggs": {!
“user_fullname": {!
"terms": {!
"field": “user_fullname.raw”,!
"size": 3!
},!
"aggs": {!
“topics": {!
"filter": {!
"query": {!
"match": {!
“topics": {!
"query": "lars beer",!
"operator": "or"!
} } } },!
"aggs": {!
“topics": {!
"terms": {!
"field": “topics.raw”,!
"size": 3!
}!
}!
}!
},!
41. L E S S O N S L E A R N E D - A G G R E G AT I O N S
• Aggregations give you recursive facets, handle with
care. "user_fullname": {!
"doc_count": 678,!
"user_fullname": {!
"buckets": [!
{!
"key": "Lars 🍻 ",!
"doc_count": 678,!
"topics": {!
"doc_count": 5,!
"topics": {!
"buckets": [!
{!
"key": "Beer",!
"doc_count": 1!
},!
{!
"key": "BeerOps",!
"doc_count": 1!
},!
{!
"key": "Birthday beer in the snow",!
"doc_count": 1!
}!
]!
}!
}!
}!
]!
}!