SlideShare a Scribd company logo
1 of 45
Download to read offline
EyeEm

Lars Fronius
@LarsFronius
ElasticSearch in production at EyeEm
• H A P P Y G R U M P Y C AT O F
E Y E E M
• S TA R T E D A S A N O P S I N A
S C I E N T I F I C D ATA C E N T E R
• N O W D E V
• D E V E L O P E R S H AT E M E
S O M E T I M E S
M E
A B O U T M E
EyeEm is the world’s premier community and
marketplace for the photographer inside all of us
A P I S TA C K
• PHP
• MySQL (~10k commands per second)
• Memcached (~50k commands per second)
• Redis (~3k commands per second)
• S3 (~1k commands per second, 40m photos stored)
• Elasticsearch (~250 commands per second - elasticsearch-php)
• All writes are async
• Metrics everywhere
C U R R E N T C L U S T E R S P E C S
• 3 x m3.xlarge (4 cores, 15GiB Mem, 2 x 40GB SSD)
• cloud-aws plugin to interconnect.
• OpenJDK 1.6
• 60% heap size (9 GiB)
• 4 Indexes, 5 Shards each. From 1GB to 15GB
C U R R E N T P R O D U C T I O N U S E - C A S E S
C U R R E N T P R O D U C T I O N U S E - C A S E S
A L B U M S E A R C H
C U R R E N T P R O D U C T I O N U S E - C A S E S
P E O P L E S E A R C H
C U R R E N T P R O D U C T I O N U S E - C A S E S
• C I T Y- S E A R C H
• L I V E N E A R B Y
D I S C O V E R
C U R R E N T P R O D U C T I O N U S E - C A S E S
L I V E N E A R B Y
C U R R E N T B E TA U S E - C A S E S
C U R R E N T B E TA U S E - C A S E S
L O N G S T O RY
• MyISAM full-text search
• Album Search on one ElasticSearch node
• People Search added
• Scale-Out to 3 instances for Photo Search (+ Live
Nearby)
E L A S T I C S E A R C H - I N T E R N A L S
• Index
• What your application sees.
• View for a logical namespace inside ElasticSearch.
• Consists of a fixed number of shards
• “To Index” means to “put” your data into
ElasticSearch to make it available for search and for
persistence.
E L A S T I C S E A R C H - I N T E R N A L S
• Inverted-Index/Mapping
• The Mapping tells Lucene how to create the
inverted-index in order to make data searchable.
• e.g. “EyeEm” as an nGram{2,3} gets “indexed” as
[“Ey”,”ye”,”eE”,”Em”,”Eye”,”yeE”,”eEm”],

“yeah” would be [“ye”,”ah”,”yea”, “eah”]
E L A S T I C S E A R C H - I N T E R N A L S
• Inverted Index/Mapping by example
Ey 1
ye 1,2
eE 1
Em 1
Eye 1
yeE 1
eEm 1
ah 2
yea 2
eah 2
S C H E M A - L E S S O R W H AT ?
• Yes and No.
S C H E M A - L E S S O R W H AT ?
• Yes - You can put anything that can be formatted as a
JSON in your index, and you get a readable
document.
S C H E M A - L E S S O R W H AT ?
• No - you have to think first, because changing your
Mapping is expensive, since you have to reindex.
E L A S T I C S E A R C H - I N T E R N A L S
• Shard
• Instance of Lucene
• Consists of multiple Lucene segments
• Manages segments (Merging, fsync, deletion etc.)
E L A S T I C S E A R C H - I N T E R N A L S
segments API
http://example.es:9200/yourindex/_segments
indices: { eyephoto6: { shards: { 0: [!
{!
routing: {!
state: "STARTED",!
primary: true,!
node: "PiVDZW-VRYmeaVOy7afoWQ"!
},!
num_committed_segments: 2,!
num_search_segments: 3,!
segments: {!
_l: {!
generation: 21,!
num_docs: 13,!
deleted_docs: 0,!
size_in_bytes: 30810,!
memory_in_bytes: 589,!
committed: true,!
search: true,!
version: "4.7",!
compound: true!
},!
!
!
!
!
!
_m: {!
generation: 22,!
num_docs: 371,!
deleted_docs: 16,!
size_in_bytes: 408548,!
memory_in_bytes: 7365,!
committed: false,!
search: true,!
version: "4.7",!
compound: false!
},!
_n: {!
generation: 23,!
num_docs: 16,!
deleted_docs: 0,!
size_in_bytes: 38514,!
memory_in_bytes: 615,!
committed: false,!
search: true,!
version: "4.7",!
compound: true!
}!
}!
}!
],!
1: [!
E L A S T I C S E A R C H - I N T E R N A L S
• Segments
• Managed by ElasticSearch
• Is the storage for the inverted index
E L A S T I C S E A R C H - I N T E R N A L S
• Basically ElasticSearch is a Lucene cluster manager
and API
L E S S O N S L E A R N E D - S H A R D S /
S E G M E N T S
• Deletion does only mark documents as deleted and
does not delete them immediately.
• Updating a document does only create a new one and
marks old one as deleted.
• The actual cleanup process happens in background
and can result in nice performance surprises.
L E S S O N S L E A R N E D - S H A R D S /
S E G M E N T S
• Nested documents live in the same Lucene Segment.
• Can bloat up memory usage a lot.
• They are treated as every other document.
• If you don’t necessarily always have to search in them,
go for parent-child.
L E S S O N S L E A R N E D - E L A S T I C S E A R C H
• Start with more than one instance - just too simple
• Major upgrades are a pain (0.90 -> 1.1)
• PHP Client Libraries mostly do not handle connection
pools properly, use elasticsearch-php
• ‘connectionPoolClass' => ‘Elasticsearch
ConnectionPoolStaticConnectionPool'
• let an intermediate webserver handle it
L E S S O N S L E A R N E D - E L A S T I C S E A R C H
• You will index more than one time. Promise.

Be prepared.
• Rebalancing is smooth, don’t worry.
• Have your metrics ready.
• “You can have a good time with ElasticSearch, if you
don't ignore the complexity and internals of this
distributed database.”
L E S S O N S L E A R N E D - E L A S T I C S E A R C H
L E S S O N S L E A R N E D - E L A S T I C S E A R C H
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• Different analysers should go into separate fields
• Score individually - iterative optimisations possible
• Keep a raw field
• Use dynamic_templates if you found the holy grail of
field analysis.
• Filter first! Querying and scoring is expensive.
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
GET /eyephoto/_mapping!
{!
"eyephoto6": {!
"mappings": {!
"photo": {!
"dynamic_templates": [!
{!
"string": {!
"mapping": {!
"type": "string",!
"index_analyzer": "photo_names",!
"search_analyzer": "photo_standard",!
"fields": {!
"raw": {!
"type": "string",!
"index": "not_analyzed"!
},!
"split": {!
"type": "string",!
"analyzer": "standard"!
}!
}!
},!
"match": "*",!
"match_mapping_type": "string"!
}!
}!
]
• Different analysers should go into separate fields
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
{!
"took": 18,!
"timed_out": false,!
"_shards": {!
##########!
},!
"hits": {!
"total": 125,!
"max_score": 6.44889,!
"hits": [!
{!
#####!
"_id": "167480",!
#####!
}!
}!
]!
},!
"facets": {!
"topic": {!
"_type": "terms",!
"missing": 0,!
"total": 138,!
"other": 57,!
"terms": [!
{!
"term": "Coffee",!
"count": 81!
}!
]!
}!
}!
}
• Different analysers should go

into separate fields
POST /eyephoto/photo/_search!
{!
"size": 1,!
"fields": [!
"id"!
],!
"query": {!
"multi_match": {!
"query": "coff",!
"fields": [!
"topics"!
]!
}!
},!
"facets": {!
"topic": {!
"terms": {!
"field": "topics.raw",!
"size": 1!
}!
}!
}!
}
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
POST /eyephoto/photo/_search!
{!
"query": {!
"bool": {!
"should": [!
{!
"multi_match": {!
"query": "lars",!
"operator": "and",!
"fields": [!
“name.raw^3",!
“name.split^2”,!
“name"!
]!
}!
},!
{!
"multi_match": {!
"query": "lars",!
"fields": [!
“name.raw^3”,!
“name.split^2”,!
“name”!
]!
}!
}!
]!
}!
• Different analysers should go 

into separate fields
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• Read and write only to index aliases.
Index Name Index Aliases
eyephoto5 “eyephotoread”
eyephoto6 “eyephotowrite”
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• If you have a string or integer field, you can put an
array into it as well.
Ey 1
ye 1,2
eE 1
Em 1
Eye 1
yeE 1
eEm 1
ah 2
yea 2
eah 2
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• Use geohash wherever you query on lat/lng.
POST /eyephoto/photo/_search!
{!
"query": {!
"function_score": {!
"query": {!
"filtered": {!
"query": {!
"match_all": []!
},!
"filter": {!
"geohash_cell": {!
"location": {!
"lat": 52.5311,!
"lon": 13.404!
},!
"precision": 4,!
"neighbors": true!
} } } },!
"functions": [!
{!
"gauss": {!
"location": {!
"origin": "52.5311,13.404",!
"scale": "10km"!
}!
}!
},!
{!
"exp": {!
"uploaded": {!
"origin": "now",!
"scale": "2d"!
}!
}!
}!
L E S S O N S L E A R N E D - A G G R E G AT I O N S
• Aggregations give you recursive facets, handle with
care. "aggregations": {!
“user_fullname": {!
"filter": {!
"query": {!
"match": {!
"topics": {!
"query": "lars beer",!
"operator": "or"!
} } } },!
"aggs": {!
“user_fullname": {!
"terms": {!
"field": “user_fullname.raw”,!
"size": 3!
},!
"aggs": {!
“topics": {!
"filter": {!
"query": {!
"match": {!
“topics": {!
"query": "lars beer",!
"operator": "or"!
} } } },!
"aggs": {!
“topics": {!
"terms": {!
"field": “topics.raw”,!
"size": 3!
}!
}!
}!
},!
L E S S O N S L E A R N E D - A G G R E G AT I O N S
• Aggregations give you recursive facets, handle with
care. "user_fullname": {!
"doc_count": 678,!
"user_fullname": {!
"buckets": [!
{!
"key": "Lars 🍻 ",!
"doc_count": 678,!
"topics": {!
"doc_count": 5,!
"topics": {!
"buckets": [!
{!
"key": "Beer",!
"doc_count": 1!
},!
{!
"key": "BeerOps",!
"doc_count": 1!
},!
{!
"key": "Birthday beer in the snow",!
"doc_count": 1!
}!
]!
}!
}!
}!
]!
}!
O U T L O O K
O U T L O O K
• 1-liner search
• public release
• Localisation (snowball / stopwords)
• Keep indexed documents (e.g. albums) updated
N E X T I T E R AT I O N ( E VA L U AT I N G )
• Elasticsearch 1.1
• Oracle Java 1.8 (GC)
• more indexes and even more shards.
• restore API
46
WE
ARE
HIRING

More Related Content

Similar to Elasticsearch at EyeEm

Choosing the right database
Choosing the right databaseChoosing the right database
Choosing the right databaseDavid Simons
 
4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz KowalczewskiPROIDEA
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talkrtelmore
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at ScaleDavid Simons
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLDavid Simons
 
20121023 mongodb schema-design
20121023 mongodb schema-design20121023 mongodb schema-design
20121023 mongodb schema-designMongoDB
 
Be MEAN JSConf Uruguay - Suissa
Be MEAN JSConf Uruguay - SuissaBe MEAN JSConf Uruguay - Suissa
Be MEAN JSConf Uruguay - SuissaSuissa
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYSignis Vavere
 
Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Sawood Alam
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UNLucidworks
 
Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey businessRudy Stricklan
 
High quality Front-End
High quality Front-EndHigh quality Front-End
High quality Front-EndDavid Simons
 
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017Amazon Web Services
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明Satoshi Hara
 

Similar to Elasticsearch at EyeEm (20)

Choosing the right database
Choosing the right databaseChoosing the right database
Choosing the right database
 
4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski
 
Measure to fail
Measure to failMeasure to fail
Measure to fail
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
Witchcraft
WitchcraftWitchcraft
Witchcraft
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQL
 
20121023 mongodb schema-design
20121023 mongodb schema-design20121023 mongodb schema-design
20121023 mongodb schema-design
 
Be MEAN JSConf Uruguay - Suissa
Be MEAN JSConf Uruguay - SuissaBe MEAN JSConf Uruguay - Suissa
Be MEAN JSConf Uruguay - Suissa
 
Everybody Lies
Everybody LiesEverybody Lies
Everybody Lies
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
 
Reification
ReificationReification
Reification
 
Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey business
 
High quality Front-End
High quality Front-EndHigh quality Front-End
High quality Front-End
 
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明
 

Recently uploaded

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 

Recently uploaded (20)

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 

Elasticsearch at EyeEm

  • 2. • H A P P Y G R U M P Y C AT O F E Y E E M • S TA R T E D A S A N O P S I N A S C I E N T I F I C D ATA C E N T E R • N O W D E V • D E V E L O P E R S H AT E M E S O M E T I M E S M E A B O U T M E
  • 3. EyeEm is the world’s premier community and marketplace for the photographer inside all of us
  • 4.
  • 5.
  • 6. A P I S TA C K • PHP • MySQL (~10k commands per second) • Memcached (~50k commands per second) • Redis (~3k commands per second) • S3 (~1k commands per second, 40m photos stored) • Elasticsearch (~250 commands per second - elasticsearch-php) • All writes are async • Metrics everywhere
  • 7. C U R R E N T C L U S T E R S P E C S • 3 x m3.xlarge (4 cores, 15GiB Mem, 2 x 40GB SSD) • cloud-aws plugin to interconnect. • OpenJDK 1.6 • 60% heap size (9 GiB) • 4 Indexes, 5 Shards each. From 1GB to 15GB
  • 8. C U R R E N T P R O D U C T I O N U S E - C A S E S
  • 9. C U R R E N T P R O D U C T I O N U S E - C A S E S A L B U M S E A R C H
  • 10. C U R R E N T P R O D U C T I O N U S E - C A S E S P E O P L E S E A R C H
  • 11. C U R R E N T P R O D U C T I O N U S E - C A S E S • C I T Y- S E A R C H • L I V E N E A R B Y D I S C O V E R
  • 12. C U R R E N T P R O D U C T I O N U S E - C A S E S L I V E N E A R B Y
  • 13. C U R R E N T B E TA U S E - C A S E S
  • 14. C U R R E N T B E TA U S E - C A S E S
  • 15. L O N G S T O RY • MyISAM full-text search • Album Search on one ElasticSearch node • People Search added • Scale-Out to 3 instances for Photo Search (+ Live Nearby)
  • 16. E L A S T I C S E A R C H - I N T E R N A L S • Index • What your application sees. • View for a logical namespace inside ElasticSearch. • Consists of a fixed number of shards • “To Index” means to “put” your data into ElasticSearch to make it available for search and for persistence.
  • 17. E L A S T I C S E A R C H - I N T E R N A L S • Inverted-Index/Mapping • The Mapping tells Lucene how to create the inverted-index in order to make data searchable. • e.g. “EyeEm” as an nGram{2,3} gets “indexed” as [“Ey”,”ye”,”eE”,”Em”,”Eye”,”yeE”,”eEm”],
 “yeah” would be [“ye”,”ah”,”yea”, “eah”]
  • 18. E L A S T I C S E A R C H - I N T E R N A L S • Inverted Index/Mapping by example Ey 1 ye 1,2 eE 1 Em 1 Eye 1 yeE 1 eEm 1 ah 2 yea 2 eah 2
  • 19. S C H E M A - L E S S O R W H AT ? • Yes and No.
  • 20. S C H E M A - L E S S O R W H AT ? • Yes - You can put anything that can be formatted as a JSON in your index, and you get a readable document.
  • 21. S C H E M A - L E S S O R W H AT ? • No - you have to think first, because changing your Mapping is expensive, since you have to reindex.
  • 22. E L A S T I C S E A R C H - I N T E R N A L S • Shard • Instance of Lucene • Consists of multiple Lucene segments • Manages segments (Merging, fsync, deletion etc.)
  • 23. E L A S T I C S E A R C H - I N T E R N A L S segments API http://example.es:9200/yourindex/_segments indices: { eyephoto6: { shards: { 0: [! {! routing: {! state: "STARTED",! primary: true,! node: "PiVDZW-VRYmeaVOy7afoWQ"! },! num_committed_segments: 2,! num_search_segments: 3,! segments: {! _l: {! generation: 21,! num_docs: 13,! deleted_docs: 0,! size_in_bytes: 30810,! memory_in_bytes: 589,! committed: true,! search: true,! version: "4.7",! compound: true! },! ! ! ! ! ! _m: {! generation: 22,! num_docs: 371,! deleted_docs: 16,! size_in_bytes: 408548,! memory_in_bytes: 7365,! committed: false,! search: true,! version: "4.7",! compound: false! },! _n: {! generation: 23,! num_docs: 16,! deleted_docs: 0,! size_in_bytes: 38514,! memory_in_bytes: 615,! committed: false,! search: true,! version: "4.7",! compound: true! }! }! }! ],! 1: [!
  • 24. E L A S T I C S E A R C H - I N T E R N A L S • Segments • Managed by ElasticSearch • Is the storage for the inverted index
  • 25. E L A S T I C S E A R C H - I N T E R N A L S • Basically ElasticSearch is a Lucene cluster manager and API
  • 26. L E S S O N S L E A R N E D - S H A R D S / S E G M E N T S • Deletion does only mark documents as deleted and does not delete them immediately. • Updating a document does only create a new one and marks old one as deleted. • The actual cleanup process happens in background and can result in nice performance surprises.
  • 27. L E S S O N S L E A R N E D - S H A R D S / S E G M E N T S • Nested documents live in the same Lucene Segment. • Can bloat up memory usage a lot. • They are treated as every other document. • If you don’t necessarily always have to search in them, go for parent-child.
  • 28. L E S S O N S L E A R N E D - E L A S T I C S E A R C H • Start with more than one instance - just too simple • Major upgrades are a pain (0.90 -> 1.1) • PHP Client Libraries mostly do not handle connection pools properly, use elasticsearch-php • ‘connectionPoolClass' => ‘Elasticsearch ConnectionPoolStaticConnectionPool' • let an intermediate webserver handle it
  • 29. L E S S O N S L E A R N E D - E L A S T I C S E A R C H • You will index more than one time. Promise.
 Be prepared. • Rebalancing is smooth, don’t worry. • Have your metrics ready. • “You can have a good time with ElasticSearch, if you don't ignore the complexity and internals of this distributed database.”
  • 30. L E S S O N S L E A R N E D - E L A S T I C S E A R C H
  • 31. L E S S O N S L E A R N E D - E L A S T I C S E A R C H
  • 32. L E S S O N S L E A R N E D - I N D E X / M A P P I N G • Different analysers should go into separate fields • Score individually - iterative optimisations possible • Keep a raw field • Use dynamic_templates if you found the holy grail of field analysis. • Filter first! Querying and scoring is expensive.
  • 33. L E S S O N S L E A R N E D - I N D E X / M A P P I N G
  • 34. L E S S O N S L E A R N E D - I N D E X / M A P P I N G GET /eyephoto/_mapping! {! "eyephoto6": {! "mappings": {! "photo": {! "dynamic_templates": [! {! "string": {! "mapping": {! "type": "string",! "index_analyzer": "photo_names",! "search_analyzer": "photo_standard",! "fields": {! "raw": {! "type": "string",! "index": "not_analyzed"! },! "split": {! "type": "string",! "analyzer": "standard"! }! }! },! "match": "*",! "match_mapping_type": "string"! }! }! ] • Different analysers should go into separate fields
  • 35. L E S S O N S L E A R N E D - I N D E X / M A P P I N G {! "took": 18,! "timed_out": false,! "_shards": {! ##########! },! "hits": {! "total": 125,! "max_score": 6.44889,! "hits": [! {! #####! "_id": "167480",! #####! }! }! ]! },! "facets": {! "topic": {! "_type": "terms",! "missing": 0,! "total": 138,! "other": 57,! "terms": [! {! "term": "Coffee",! "count": 81! }! ]! }! }! } • Different analysers should go
 into separate fields POST /eyephoto/photo/_search! {! "size": 1,! "fields": [! "id"! ],! "query": {! "multi_match": {! "query": "coff",! "fields": [! "topics"! ]! }! },! "facets": {! "topic": {! "terms": {! "field": "topics.raw",! "size": 1! }! }! }! }
  • 36. L E S S O N S L E A R N E D - I N D E X / M A P P I N G POST /eyephoto/photo/_search! {! "query": {! "bool": {! "should": [! {! "multi_match": {! "query": "lars",! "operator": "and",! "fields": [! “name.raw^3",! “name.split^2”,! “name"! ]! }! },! {! "multi_match": {! "query": "lars",! "fields": [! “name.raw^3”,! “name.split^2”,! “name”! ]! }! }! ]! }! • Different analysers should go 
 into separate fields
  • 37. L E S S O N S L E A R N E D - I N D E X / M A P P I N G • Read and write only to index aliases. Index Name Index Aliases eyephoto5 “eyephotoread” eyephoto6 “eyephotowrite”
  • 38. L E S S O N S L E A R N E D - I N D E X / M A P P I N G • If you have a string or integer field, you can put an array into it as well. Ey 1 ye 1,2 eE 1 Em 1 Eye 1 yeE 1 eEm 1 ah 2 yea 2 eah 2
  • 39. L E S S O N S L E A R N E D - I N D E X / M A P P I N G • Use geohash wherever you query on lat/lng. POST /eyephoto/photo/_search! {! "query": {! "function_score": {! "query": {! "filtered": {! "query": {! "match_all": []! },! "filter": {! "geohash_cell": {! "location": {! "lat": 52.5311,! "lon": 13.404! },! "precision": 4,! "neighbors": true! } } } },! "functions": [! {! "gauss": {! "location": {! "origin": "52.5311,13.404",! "scale": "10km"! }! }! },! {! "exp": {! "uploaded": {! "origin": "now",! "scale": "2d"! }! }! }!
  • 40. L E S S O N S L E A R N E D - A G G R E G AT I O N S • Aggregations give you recursive facets, handle with care. "aggregations": {! “user_fullname": {! "filter": {! "query": {! "match": {! "topics": {! "query": "lars beer",! "operator": "or"! } } } },! "aggs": {! “user_fullname": {! "terms": {! "field": “user_fullname.raw”,! "size": 3! },! "aggs": {! “topics": {! "filter": {! "query": {! "match": {! “topics": {! "query": "lars beer",! "operator": "or"! } } } },! "aggs": {! “topics": {! "terms": {! "field": “topics.raw”,! "size": 3! }! }! }! },!
  • 41. L E S S O N S L E A R N E D - A G G R E G AT I O N S • Aggregations give you recursive facets, handle with care. "user_fullname": {! "doc_count": 678,! "user_fullname": {! "buckets": [! {! "key": "Lars 🍻 ",! "doc_count": 678,! "topics": {! "doc_count": 5,! "topics": {! "buckets": [! {! "key": "Beer",! "doc_count": 1! },! {! "key": "BeerOps",! "doc_count": 1! },! {! "key": "Birthday beer in the snow",! "doc_count": 1! }! ]! }! }! }! ]! }!
  • 42. O U T L O O K
  • 43. O U T L O O K • 1-liner search • public release • Localisation (snowball / stopwords) • Keep indexed documents (e.g. albums) updated
  • 44. N E X T I T E R AT I O N ( E VA L U AT I N G ) • Elasticsearch 1.1 • Oracle Java 1.8 (GC) • more indexes and even more shards. • restore API