SlideShare a Scribd company logo
1 of 52
Download to read offline
Présentation ElasticSearch
1
Indexation d’un annuaire de restaurant
● Titre
● Description
● Prix
● Adresse
● Type
2
Création d’un index sans mapping
PUT restaurant
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
3
Indexation sans mapping
PUT restaurant/restaurant/1
{
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
4
Risque de l’indexation sans mapping
PUT restaurant/restaurant/2
{
"title": "Pizza de l'ormeau",
"description": "Dans cette pizzeria on trouve
des pizzas très bonnes et très variés",
"price": 10,
"adresse": "1 place de l'ormeau, 31400
TOULOUSE",
"type": "italien"
}
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: "Pizza de
l'ormeau""
}
},
"status": 400
} 5
Mapping inféré
GET /restaurant/_mapping
{
"restaurant": {
"mappings": {
"restaurant": {
"properties": {
"adresse": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"prix": {
"type": "long"
},
"title": {
"type": "long"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
6
Création d’un mapping
PUT :url/restaurant
{
"settings": {
"index": {"number_of_shards": 3, "number_of_replicas": 2}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"description": {"type": "text"},
"price": {"type": "integer"},
"adresse": {"type": "text"},
"type": { "type": "keyword"}
}
}
}
}
7
Indexation de quelques restaurants
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse":
"10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13
route de labège, 31400 TOULOUSE", "type": "asiatique"}
8
Recherche basique
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatique"
}
}
}
{
"hits": {
"total": 1,
"max_score": 0.6395861,
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix
contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
9
Mise en défaut de notre mapping
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatiques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
10
Qu’est ce qu’un analyseur
● Transforme une chaîne de caractères en token
○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”]
● Les tokens permettent de construire un index inversé
11
Qu’est ce qu’un index inversé
12
Explication: analyseur par défaut
GET /_analyze
{
"analyzer": "standard",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [{
"token": "un",
"start_offset": 0, "end_offset": 2,
"type": "<ALPHANUM>", "position": 0
},{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiatique",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "très",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieux",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
}
13
Explication: analyseur “french”
GET /_analyze
{
"analyzer": "french",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [
{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiat",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "trè",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieu",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
} 14
Décomposition d’un analyseur
Elasticsearch décompose l’analyse en trois étapes:
● Filtrage des caractères (ex: suppression de balises html)
● Découpage en “token”
● Filtrage des tokens:
○ Suppression de token (mot vide de sens “un”, “le”, “la”)
○ Transformation (lemmatisation...)
○ Ajout de tokens (synonyme)
15
Décomposition de l’analyseur french
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "elision",
"articles_case": true,
"articles": [
"l", "m", "t", "qu", "n", "s", "j", "d", "c",
"jusqu", "quoiqu", "lorsqu", "puisqu"
]
}, {
"type": "stop", "stopwords": "_french_"
}, {
"type": "stemmer", "language": "french"
}
],
"text": "ce n'est qu'un restaurant asiatique très copieux"
}
“ce n’est qu’un restaurant asiatique très
copieux”
[“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“ce”, “est”, “un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“restaurant”, “asiatique”, “très”, “copieux”]
[“restaurant”, “asiat”, “trè”, “copieu”]
elision
standard tokenizer
stopwords
french stemming
16
Spécification de l’analyseur dans le mapping
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {fields: {"type": "text", "analyzer": "french"}},
"description": {"type": "text", "analyzer": "french"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"type": { "type": "keyword"}
}
}
}
}
17
Recherche résiliente aux erreurs de frappe
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
18
Une solution le ngram token filter
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "ngram",
"min_gram": 3,
"max_gram": 7
}
],
"text": "asiatuque"
}
[
"asi",
"asia",
"asiat",
"asiatu",
"asiatuq",
"sia",
"siat",
"siatu",
"siatuq",
"siatuqu",
"iat",
"iatu",
"iatuq",
"iatuqu",
"iatuque",
"atu",
"atuq",
"atuqu",
"atuque",
"tuq",
"tuqu",
"tuque",
"uqu",
"uque",
"que"
]
19
Création d’un analyseur custom pour utiliser le ngram filter
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "ngram_analyzer"},
"description": {"type": "text", "analyzer": "ngram_analyzer"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "ngram_analyzer"},
"type": {"type": "keyword"}
}
}
}
20
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.60128295,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}, {
"_score": 0.46237043,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où
tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
21
Bruit induit par le ngram
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "gastronomique"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.6277555,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat
coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},{
"_score": 0.56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un
prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
22
Spécifier plusieurs analyseurs pour un champs
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {
"type": "text", "analyzer": "french",
"fields": {
"ngram": { "type": "text", "analyzer": "ngram_analyzer"}
},
"price": {"type": "integer"},
23
Utilisation de plusieurs champs lors d’une recherche
GET /restaurant/restaurant/_search
{
"query": {
"multi_match": {
"query": "gastronomique",
"fields": [
"description^4",
"description.ngram"
]
}
}
}
{
"hits": {
"hits": [
{
"_score": 2.0649285,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},
{
"_score": 0 .56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
{
"_index": "restaurant",
24
Ignorer ou ne pas ignorer les stopwords tel est la question
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price":
42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"}
25
Les stopwords ne sont pas
forcément vide de sens
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": 42,
"description": "Un restaurant gastronomique donc
cher ou tout plat coûte cher (42 euros)",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
}
},{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
et pas cher",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
26
Modification de l’analyser french
pour garder les stopwords
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision",
"articles_case": true,
"articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"}
},
"analyzer": {
"custom_french": {
"tokenizer": "standard",
"filter": [
"french_elision",
"lowercase",
"french_stemmer"
]
}
27
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant
asiatique très copieux et pas cher",
"price": 14,
"adresse": "13 route de labège,
31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
28
Rechercher avec les stopwords sans diminuer les
performances
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": {
"query": "restaurant pas
cher",
"cutoff_frequency": 0.01
}
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": [
{"term": {"description": "restaurant"}},
{"term": {"description": "cher"}}]
}
},
"should": [
{"match": {
"description": "pas"
}}
]
}
29
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"functions": [{
"filter": { "terms": { "type": ["asiatique", "italien"]}},
"weight": 2
}]
}
}
}
30
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"script_score": {
"script": {
"lang": "painless",
"inline": "_score * ( 1 + 10/doc['prix'].value)"
}
}
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.53484553,
"_source": {
"title": "Pizza de l'ormeau",
"price": 10,
"adresse": "1 place de l'ormeau, 31400 TOULOUSE",
"type": "italien"
}
}, {
"_score": 0.26742277,
"_source": {
"title": 42,
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
}, {
"_score": 0.26742277,
"_source": {
"title": "Chez l'oncle chan",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
31
Comment indexer les documents multilingues
Trois cas:
● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"})
○ Ngram
○ Analysé plusieurs fois le même champs avec un analyseur par langage
● Un champ par langue:
○ Facile car on peut spécifier un analyseur différent par langue
○ Attention de ne pas se retrouver avec un index parsemé
● Une version du document par langue (à favoriser)
○ Un index par document
○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique)
32
Gestion des synonymes
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision", "articles_case": true,
"articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"},
"french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]}
},
"analyzer": {
"french_with_synonym": {
"tokenizer": "standard",
"filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"coord": {"type": "geo_point"},
33
Gestions des synonymes
GET /restaurant/restaurant/_search
{
"query": {
"match": {"description": "sous-marins"}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne,
31520 RAMONVILLE",
"type": "fastfood",
"coord": "43.5577519,1.4625753"
}
}
]
}
}
34
Données géolocalisées
PUT /restaurant
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {"type": "text", "analyzer": "french"
},
"price": {"type": "integer"},
"adresse": {"type": "text","analyzer": "french"},
"coord": {"type": "geo_point"},
"type": { "type": "keyword"}
}
}
}
}
35
Données géolocalisées
POST restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents",
"price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés",
"price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14,
"adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"}
{"index": {"_id": 4}}
{"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8,
"adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"}
{"index": {"_id": 5}}
{"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood",
"coord": "43.5577519,1.4625753"}
{"index": {"_id": 6}}
{"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
Filtrage et trie sur données
géolocalisées
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"filter": [
{"term": {"type":"français"}},
{"geo_distance": {
"distance": "1km",
"coord": {"lat": 43.5739329, "lon": 1.4893669}
}}
]
}
},
"sort": [{
"geo_distance": {
"coord": {"lat": 43.5739329, "lon": 1.4893669},
"unit": "km"
}
}]
{
"hits": {
"hits": [
{
"_source": {
"title": "bistronomique",
"description": "Un restaurant bon mais un petit peu cher, les desserts sont
"price": 17,
"adresse": "73 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.57417,1.4905748"
},
"sort": [0.10081529266640063]
},{
"_source": {
"title:": "L'évidence",
"description": "restaurant copieux et pas cher, cependant c'est pas bon",
"price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.5770109,1.4846573"
},
"sort": [0.510960087579506]
},{
"_source": {
"title:": "Chez Ingalls",
"description": "Contemporain et rustique, ce restaurant avec cheminée sert
savoyardes et des grillades",
37
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {"match": {"description": "sandwitch"}},
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}}
],
"must_not": [
{"match_phrase": {
"description": "pas bon"
}}
],
"filter": [
{"range": {"price": {
"lte": "20"
}}}
]
}
} 38
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}},
{"match": {"description": "service rapide"}}
],
"minimum_number_should_match": 2
}
}
}
39
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": "-"pas bon" +(pizzi~2 OR sandwitch)"
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must_not": {
"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"type": "phrase",
"query": "pas bon"
}
},
"should": [
{"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"fuziness": 2,
"max_expansions": 50,
"query": "pizzi"
}
},
{"multi_match": {
"fields": [ "description", , "title^2", "adresse",
"type"],
"query": "sandwitch"
} 40
Alias: comment se donner des marges de manoeuvre
PUT /restaurant_v1/
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"lat": {"type": "double"},
"lon": {"type": "double"}
}
}
}
}
POST /_aliases
{
"actions": [
{"add": {"index": "restaurant_v1", "alias": "restaurant_search"}},
{"add": {"index": "restaurant_v1", "alias": "restaurant_write"}}
]
}
41
Alias, Pipeline et reindexion
PUT /restaurant_v2
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"position": {"type": "geo_point"}
}
}
}
}
PUT /_ingest/pipeline/fixing_position
{
"description": "move lat lon into position parameter",
"processors": [
{"rename": {"field": "lat", "target_field": "position.lat"}},
{"rename": {"field": "lon", "target_field": "position.lon"}}
]
}
POST /_aliases
{
"actions": [
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_search"}},
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_write"}},
{"add": {"index": "restaurant_v2", "alias":
"restaurant_search"}},
{"add": {"index": "restaurant_v2", "alias": "restaurant_write"}}
]
}
POST /_reindex
{
"source": {"index": "restaurant_v1"},
"dest": {"index": "restaurant_v2", "pipeline": "fixing_position"}
}
42
Analyse des données des interventions des pompiers
de 2005 à 2014
PUT /pompier
{
"mappings": {
"intervention": {
"properties": {
"date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"},
"type_incident": { "type": "keyword" },
"description_groupe": { "type": "keyword" },
"caserne": { "type": "integer"},
"ville": { "type": "keyword"},
"arrondissement": { "type": "keyword"},
"division": {"type": "integer"},
"position": {"type": "geo_point"},
"nombre_unites": {"type": "integer"}
}
}
}
}
43
Voir les différents incidents
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"type_incident": {
"terms": {"field": "type_incident", "size": 100}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{"key": "Premier répondant", "doc_count": 437891},
{"key": "Appel de Cie de détection", "doc_count": 76157},
{"key": "Alarme privé ou locale", "doc_count": 60879},
{"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734},
{"key": "10-22 sans feu", "doc_count": 29283},
{"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663},
{"key": "Inondation", "doc_count": 26801},
{"key": "Problèmes électriques", "doc_count": 23495},
{"key": "Aliments surchauffés", "doc_count": 23428},
{"key": "Odeur suspecte - gaz", "doc_count": 21158},
{"key": "Déchets en feu", "doc_count": 18007},
{"key": "Ascenseur", "doc_count": 12703},
{"key": "Feu de champ *", "doc_count": 11518},
{"key": "Structure dangereuse", "doc_count": 9958},
{"key": "10-22 avec feu", "doc_count": 9876},
{"key": "Alarme vérification", "doc_count": 8328},
{"key": "Aide à un citoyen", "doc_count": 7722},
{"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351},
{"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232},
{"key": "Feu de véhicule extérieur", "doc_count": 5943},
{"key": "Fausse alerte 10-19", "doc_count": 4680},
{"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494},
{"key": "Assistance serv. muni.", "doc_count": 3431},
{"key": "Avertisseur de CO", "doc_count": 2542},
{"key": "Fuite gaz naturel 10-22", "doc_count": 1928},
{"key": "Matières dangereuses / 10-22", "doc_count": 1905},
{"key": "Feu de bâtiment", "doc_count": 1880},
{"key": "Senteur de feu à l'extérieur", "doc_count": 1566},
{"key": "Surchauffe - véhicule", "doc_count": 1499},
{"key": "Feu / Agravation possible", "doc_count": 1281},
{"key": "Fuite gaz naturel 10-09", "doc_count": 1257},
{"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015},
{"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971},
44
Agrégations imbriquées
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"ville": {
"terms": {"field": "ville"},
"aggs": {
"arrondissement": {
"terms": {"field": "arrondissement"}
}
}
}
}
}
{
"aggregations": {"ville": {"buckets": [
{
"key": "Montréal", "doc_count": 768955,
"arrondissement": {"buckets": [
{"key": "Ville-Marie", "doc_count": 83010},
{"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272},
{"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933},
{"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951},
{"key": "Rosemont / Petite-Patrie", "doc_count": 59213},
{"key": "Ahuntsic / Cartierville", "doc_count": 57721},
{"key": "Plateau Mont-Royal", "doc_count": 53344},
{"key": "Montréal-Nord", "doc_count": 40757},
{"key": "Sud-Ouest", "doc_count": 39936},
{"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139}
]}
}, {
"key": "Dollard-des-Ormeaux", "doc_count": 17961,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13452},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477},
{"key": "Pierrefonds / Senneville", "doc_count": 10},
{"key": "Dorval / Ile Dorval", "doc_count": 8},
{"key": "Pointe-Claire", "doc_count": 8},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6}
]}
}, {
"key": "Pointe-Claire", "doc_count": 17925,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13126},
{"key": "Pointe-Claire", "doc_count": 4766},
{"key": "Dorval / Ile Dorval", "doc_count": 12},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7},
{"key": "Kirkland", "doc_count": 7},
{"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1},
{"key": "St-Laurent", "doc_count": 1}
45
Calcul de moyenne et trie d'agrégation
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"avg_nombre_unites_general": {
"avg": {"field": "nombre_unites"}
},
"type_incident": {
"terms": {
"field": "type_incident",
"size": 5,
"order" : {"avg_nombre_unites": "desc"}
},
"aggs": {
"avg_nombre_unites": {
"avg": {"field": "nombre_unites"}
}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{
"key": "Feu / 5e Alerte", "doc_count": 162,
"avg_nombre_unites": {"value": 70.9074074074074}
}, {
"key": "Feu / 4e Alerte", "doc_count": 100,
"avg_nombre_unites": {"value": 49.36}
}, {
"key": "Troisième alerte/autre que BAT", "doc_count": 1,
"avg_nombre_unites": {"value": 43.0}
}, {
"key": "Feu / 3e Alerte", "doc_count": 173,
"avg_nombre_unites": {"value": 41.445086705202314}
}, {
"key": "Deuxième alerte/autre que BAT", "doc_count": 8,
"avg_nombre_unites": {"value": 37.5}
}
]
},
"avg_nombre_unites_general": {"value": 2.1374461758713728}
}
} 46
Percentile
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"unites_percentile": {
"percentiles": {
"field": "nombre_unites",
"percents": [25, 50, 75, 100]
}
}
}
}
{
"aggregations": {
"unites_percentile": {
"values": {
"25.0": 1.0,
"50.0": 1.0,
"75.0": 3.0,
"100.0": 275.0
}
}
}
}
47
Histogram
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"unites_histogram": {
"histogram": {
"field": "nombre_unites",
"order": {"_key": "asc"},
"interval": 1
},
"aggs": {
"ville": {
"terms": {"field": "ville", "size": 1}
}
}
}
}
}
{
"aggregations": {
"unites_histogram": {
"buckets": [
{
"key": 1.0, "doc_count": 23507,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]}
},{
"key": 2.0, "doc_count": 1550,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]}
},{
"key": 3.0, "doc_count": 563,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]}
},{
"key": 4.0, "doc_count": 449,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]}
},{
"key": 5.0, "doc_count": 310,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]}
},{
"key": 6.0, "doc_count": 215,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]}
},{
"key": 7.0, "doc_count": 136,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]}
},{
"key": 8.0, "doc_count": 35,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]}
},{
"key": 9.0, "doc_count": 10,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 10.0, "doc_count": 11,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 11.0, "doc_count": 2,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]}
48
“Significant term”
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"ville": {
"significant_terms": {"field": "ville", "size": 5, "percentage": {}}
}
}
}
{
"aggregations": {
"ville": {
"doc_count": 26801,
"buckets": [
{
"key": "Ile-Bizard",
"score": 0.10029498525073746,
"doc_count": 68, "bg_count": 678
},
{
"key": "Montréal-Nord",
"score": 0.0826544804291675,
"doc_count": 416, "bg_count": 5033
},
{
"key": "Roxboro",
"score": 0.08181818181818182,
"doc_count": 27, "bg_count": 330
},
{
"key": "Côte St-Luc",
"score": 0.07654825526563974,
"doc_count": 487, "bg_count": 6362
},
{
"key": "Saint-Laurent",
"score": 0.07317073170731707,
"doc_count": 465, "bg_count": 6355
49
Agrégation et données géolocalisées
GET :url/pompier/interventions/_search
{
"size": 0,
"query": {
"regexp": {"type_incident": "Feu.*"}
},
"aggs": {
"distance_from_here": {
"geo_distance": {
"field": "position",
"unit": "km",
"origin": {
"lat": 45.495902,
"lon": -73.554263
},
"ranges": [
{ "to": 2},
{"from":2, "to": 4},
{"from":4, "to": 6},
{"from": 6, "to": 8},
{"from": 8}]
}
}
}
{
"aggregations": {
"distance_from_here": {
"buckets": [
{
"key": "*-2.0",
"from": 0.0,
"to": 2.0,
"doc_count": 80
},
{
"key": "2.0-4.0",
"from": 2.0,
"to": 4.0,
"doc_count": 266
},
{
"key": "4.0-6.0",
"from": 4.0,
"to": 6.0,
"doc_count": 320
},
{
"key": "6.0-8.0",
"from": 6.0,
"to": 8.0,
"doc_count": 326
},
{
"key": "8.0-*",
"from": 8.0,
"doc_count": 1720
}
]
}
}
}
50
Il y a t-il des questions ?
? 51
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": ""service rapide"~2"
}
}
}
"hits": {
"hits": [
{
"_source": {
"title:": "Un fastfood très connu",
"description": "service très rapide,
rapport qualité/prix médiocre",
"price": 8,
"adresse": "210 route de narbonne, 31520
RAMONVILLE",
"type": "fastfood",
"coord": "43.5536343,1.476165"
}
},{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne, 31520
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": {
"slop": 2,
"query": "service rapide"
}
}
}
52

More Related Content

More from LINAGORA

Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !LINAGORA
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques LINAGORA
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS MeetupLINAGORA
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFILINAGORA
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)LINAGORA
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseLINAGORA
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalLINAGORA
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésLINAGORA
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »LINAGORA
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet LINAGORA
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLINAGORA
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wirelessLINAGORA
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du CloudLINAGORA
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPLINAGORA
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINIDLINAGORA
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...LINAGORA
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...LINAGORA
 
Open Source Software Assurance by Linagora
Open Source Software Assurance by LinagoraOpen Source Software Assurance by Linagora
Open Source Software Assurance by LinagoraLINAGORA
 

More from LINAGORA (20)

Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFI
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entreprise
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec Drupal
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivités
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipal
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wireless
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du Cloud
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAP
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINID
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
 
Open Source Software Assurance by Linagora
Open Source Software Assurance by LinagoraOpen Source Software Assurance by Linagora
Open Source Software Assurance by Linagora
 

Recently uploaded

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 

Recently uploaded (20)

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 

Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA

  • 2. Indexation d’un annuaire de restaurant ● Titre ● Description ● Prix ● Adresse ● Type 2
  • 3. Création d’un index sans mapping PUT restaurant { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } } 3
  • 4. Indexation sans mapping PUT restaurant/restaurant/1 { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } 4
  • 5. Risque de l’indexation sans mapping PUT restaurant/restaurant/2 { "title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } { "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [title]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [title]", "caused_by": { "type": "number_format_exception", "reason": "For input string: "Pizza de l'ormeau"" } }, "status": 400 } 5
  • 6. Mapping inféré GET /restaurant/_mapping { "restaurant": { "mappings": { "restaurant": { "properties": { "adresse": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "description": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "prix": { "type": "long" }, "title": { "type": "long" }, "type": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } } 6
  • 7. Création d’un mapping PUT :url/restaurant { "settings": { "index": {"number_of_shards": 3, "number_of_replicas": 2} }, "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "description": {"type": "text"}, "price": {"type": "integer"}, "adresse": {"type": "text"}, "type": { "type": "keyword"} } } } } 7
  • 8. Indexation de quelques restaurants POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 8
  • 9. Recherche basique GET :url/restaurant/_search { "query": { "match": { "description": "asiatique" } } } { "hits": { "total": 1, "max_score": 0.6395861, "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 9
  • 10. Mise en défaut de notre mapping GET :url/restaurant/_search { "query": { "match": { "description": "asiatiques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 10
  • 11. Qu’est ce qu’un analyseur ● Transforme une chaîne de caractères en token ○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”] ● Les tokens permettent de construire un index inversé 11
  • 12. Qu’est ce qu’un index inversé 12
  • 13. Explication: analyseur par défaut GET /_analyze { "analyzer": "standard", "text": "Un restaurant asiatique très copieux" } { "tokens": [{ "token": "un", "start_offset": 0, "end_offset": 2, "type": "<ALPHANUM>", "position": 0 },{ "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiatique", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "très", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieux", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 13
  • 14. Explication: analyseur “french” GET /_analyze { "analyzer": "french", "text": "Un restaurant asiatique très copieux" } { "tokens": [ { "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiat", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "trè", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieu", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 14
  • 15. Décomposition d’un analyseur Elasticsearch décompose l’analyse en trois étapes: ● Filtrage des caractères (ex: suppression de balises html) ● Découpage en “token” ● Filtrage des tokens: ○ Suppression de token (mot vide de sens “un”, “le”, “la”) ○ Transformation (lemmatisation...) ○ Ajout de tokens (synonyme) 15
  • 16. Décomposition de l’analyseur french GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "elision", "articles_case": true, "articles": [ "l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu" ] }, { "type": "stop", "stopwords": "_french_" }, { "type": "stemmer", "language": "french" } ], "text": "ce n'est qu'un restaurant asiatique très copieux" } “ce n’est qu’un restaurant asiatique très copieux” [“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”, “très”, “copieux”] [“ce”, “est”, “un”, “restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiat”, “trè”, “copieu”] elision standard tokenizer stopwords french stemming 16
  • 17. Spécification de l’analyseur dans le mapping { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } }, "mappings": { "restaurant": { "properties": { "title": {fields: {"type": "text", "analyzer": "french"}}, "description": {"type": "text", "analyzer": "french"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "type": { "type": "keyword"} } } } } 17
  • 18. Recherche résiliente aux erreurs de frappe GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 18
  • 19. Une solution le ngram token filter GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "ngram", "min_gram": 3, "max_gram": 7 } ], "text": "asiatuque" } [ "asi", "asia", "asiat", "asiatu", "asiatuq", "sia", "siat", "siatu", "siatuq", "siatuqu", "iat", "iatu", "iatuq", "iatuqu", "iatuque", "atu", "atuq", "atuqu", "atuque", "tuq", "tuqu", "tuque", "uqu", "uque", "que" ] 19
  • 20. Création d’un analyseur custom pour utiliser le ngram filter PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}} } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "ngram_analyzer"}, "description": {"type": "text", "analyzer": "ngram_analyzer"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "ngram_analyzer"}, "type": {"type": "keyword"} } } } 20
  • 21. GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "hits": [ { "_score": 0.60128295, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_score": 0.46237043, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" 21
  • 22. Bruit induit par le ngram GET /restaurant/restaurant/_search { "query": { "match": { "description": "gastronomique" } } } { "hits": { "hits": [ { "_score": 0.6277555, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_score": 0.56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, 22
  • 23. Spécifier plusieurs analyseurs pour un champs PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]} } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "fields": { "ngram": { "type": "text", "analyzer": "ngram_analyzer"} }, "price": {"type": "integer"}, 23
  • 24. Utilisation de plusieurs champs lors d’une recherche GET /restaurant/restaurant/_search { "query": { "multi_match": { "query": "gastronomique", "fields": [ "description^4", "description.ngram" ] } } } { "hits": { "hits": [ { "_score": 2.0649285, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0 .56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_index": "restaurant", 24
  • 25. Ignorer ou ne pas ignorer les stopwords tel est la question POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 25
  • 26. Les stopwords ne sont pas forcément vide de sens GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } 26
  • 27. Modification de l’analyser french pour garder les stopwords PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"} }, "analyzer": { "custom_french": { "tokenizer": "standard", "filter": [ "french_elision", "lowercase", "french_stemmer" ] } 27
  • 28. GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 28
  • 29. Rechercher avec les stopwords sans diminuer les performances GET /restaurant/restaurant/_search { "query": { "match": { "description": { "query": "restaurant pas cher", "cutoff_frequency": 0.01 } } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must": { "bool": { "should": [ {"term": {"description": "restaurant"}}, {"term": {"description": "cher"}}] } }, "should": [ {"match": { "description": "pas" }} ] } 29
  • 30. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "functions": [{ "filter": { "terms": { "type": ["asiatique", "italien"]}}, "weight": 2 }] } } } 30
  • 31. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "script_score": { "script": { "lang": "painless", "inline": "_score * ( 1 + 10/doc['prix'].value)" } } } } } { "hits": { "hits": [ { "_score": 0.53484553, "_source": { "title": "Pizza de l'ormeau", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } }, { "_score": 0.26742277, "_source": { "title": 42, "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0.26742277, "_source": { "title": "Chez l'oncle chan", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 31
  • 32. Comment indexer les documents multilingues Trois cas: ● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"}) ○ Ngram ○ Analysé plusieurs fois le même champs avec un analyseur par langage ● Un champ par langue: ○ Facile car on peut spécifier un analyseur différent par langue ○ Attention de ne pas se retrouver avec un index parsemé ● Une version du document par langue (à favoriser) ○ Un index par document ○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique) 32
  • 33. Gestion des synonymes PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"}, "french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]} }, "analyzer": { "french_with_synonym": { "tokenizer": "standard", "filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"] } } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "coord": {"type": "geo_point"}, 33
  • 34. Gestions des synonymes GET /restaurant/restaurant/_search { "query": { "match": {"description": "sous-marins"} } } { "hits": { "hits": [ { "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753" } } ] } } 34
  • 35. Données géolocalisées PUT /restaurant { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": {"type": "text", "analyzer": "french" }, "price": {"type": "integer"}, "adresse": {"type": "text","analyzer": "french"}, "coord": {"type": "geo_point"}, "type": { "type": "keyword"} } } } } 35
  • 36. Données géolocalisées POST restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents", "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"} {"index": {"_id": 4}} {"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"} {"index": {"_id": 5}} {"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753"} {"index": {"_id": 6}} {"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
  • 37. Filtrage et trie sur données géolocalisées GET /restaurant/restaurant/_search { "query": { "bool": { "filter": [ {"term": {"type":"français"}}, {"geo_distance": { "distance": "1km", "coord": {"lat": 43.5739329, "lon": 1.4893669} }} ] } }, "sort": [{ "geo_distance": { "coord": {"lat": 43.5739329, "lon": 1.4893669}, "unit": "km" } }] { "hits": { "hits": [ { "_source": { "title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748" }, "sort": [0.10081529266640063] },{ "_source": { "title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573" }, "sort": [0.510960087579506] },{ "_source": { "title:": "Chez Ingalls", "description": "Contemporain et rustique, ce restaurant avec cheminée sert savoyardes et des grillades", 37
  • 38. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "must": {"match": {"description": "sandwitch"}}, "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}} ], "must_not": [ {"match_phrase": { "description": "pas bon" }} ], "filter": [ {"range": {"price": { "lte": "20" }}} ] } } 38
  • 39. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}}, {"match": {"description": "service rapide"}} ], "minimum_number_should_match": 2 } } } 39
  • 40. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": "-"pas bon" +(pizzi~2 OR sandwitch)" } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must_not": { "multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "type": "phrase", "query": "pas bon" } }, "should": [ {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "fuziness": 2, "max_expansions": 50, "query": "pizzi" } }, {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "query": "sandwitch" } 40
  • 41. Alias: comment se donner des marges de manoeuvre PUT /restaurant_v1/ { "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "lat": {"type": "double"}, "lon": {"type": "double"} } } } } POST /_aliases { "actions": [ {"add": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v1", "alias": "restaurant_write"}} ] } 41
  • 42. Alias, Pipeline et reindexion PUT /restaurant_v2 { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "position": {"type": "geo_point"} } } } } PUT /_ingest/pipeline/fixing_position { "description": "move lat lon into position parameter", "processors": [ {"rename": {"field": "lat", "target_field": "position.lat"}}, {"rename": {"field": "lon", "target_field": "position.lon"}} ] } POST /_aliases { "actions": [ {"remove": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"remove": {"index": "restaurant_v1", "alias": "restaurant_write"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_write"}} ] } POST /_reindex { "source": {"index": "restaurant_v1"}, "dest": {"index": "restaurant_v2", "pipeline": "fixing_position"} } 42
  • 43. Analyse des données des interventions des pompiers de 2005 à 2014 PUT /pompier { "mappings": { "intervention": { "properties": { "date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"}, "type_incident": { "type": "keyword" }, "description_groupe": { "type": "keyword" }, "caserne": { "type": "integer"}, "ville": { "type": "keyword"}, "arrondissement": { "type": "keyword"}, "division": {"type": "integer"}, "position": {"type": "geo_point"}, "nombre_unites": {"type": "integer"} } } } } 43
  • 44. Voir les différents incidents GET /pompier/interventions/_search { "size": 0, "aggs": { "type_incident": { "terms": {"field": "type_incident", "size": 100} } } } { "aggregations": { "type_incident": { "buckets": [ {"key": "Premier répondant", "doc_count": 437891}, {"key": "Appel de Cie de détection", "doc_count": 76157}, {"key": "Alarme privé ou locale", "doc_count": 60879}, {"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734}, {"key": "10-22 sans feu", "doc_count": 29283}, {"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663}, {"key": "Inondation", "doc_count": 26801}, {"key": "Problèmes électriques", "doc_count": 23495}, {"key": "Aliments surchauffés", "doc_count": 23428}, {"key": "Odeur suspecte - gaz", "doc_count": 21158}, {"key": "Déchets en feu", "doc_count": 18007}, {"key": "Ascenseur", "doc_count": 12703}, {"key": "Feu de champ *", "doc_count": 11518}, {"key": "Structure dangereuse", "doc_count": 9958}, {"key": "10-22 avec feu", "doc_count": 9876}, {"key": "Alarme vérification", "doc_count": 8328}, {"key": "Aide à un citoyen", "doc_count": 7722}, {"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351}, {"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232}, {"key": "Feu de véhicule extérieur", "doc_count": 5943}, {"key": "Fausse alerte 10-19", "doc_count": 4680}, {"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494}, {"key": "Assistance serv. muni.", "doc_count": 3431}, {"key": "Avertisseur de CO", "doc_count": 2542}, {"key": "Fuite gaz naturel 10-22", "doc_count": 1928}, {"key": "Matières dangereuses / 10-22", "doc_count": 1905}, {"key": "Feu de bâtiment", "doc_count": 1880}, {"key": "Senteur de feu à l'extérieur", "doc_count": 1566}, {"key": "Surchauffe - véhicule", "doc_count": 1499}, {"key": "Feu / Agravation possible", "doc_count": 1281}, {"key": "Fuite gaz naturel 10-09", "doc_count": 1257}, {"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015}, {"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971}, 44
  • 45. Agrégations imbriquées GET /pompier/interventions/_search { "size": 0, "aggs": { "ville": { "terms": {"field": "ville"}, "aggs": { "arrondissement": { "terms": {"field": "arrondissement"} } } } } } { "aggregations": {"ville": {"buckets": [ { "key": "Montréal", "doc_count": 768955, "arrondissement": {"buckets": [ {"key": "Ville-Marie", "doc_count": 83010}, {"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272}, {"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933}, {"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951}, {"key": "Rosemont / Petite-Patrie", "doc_count": 59213}, {"key": "Ahuntsic / Cartierville", "doc_count": 57721}, {"key": "Plateau Mont-Royal", "doc_count": 53344}, {"key": "Montréal-Nord", "doc_count": 40757}, {"key": "Sud-Ouest", "doc_count": 39936}, {"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139} ]} }, { "key": "Dollard-des-Ormeaux", "doc_count": 17961, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13452}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477}, {"key": "Pierrefonds / Senneville", "doc_count": 10}, {"key": "Dorval / Ile Dorval", "doc_count": 8}, {"key": "Pointe-Claire", "doc_count": 8}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6} ]} }, { "key": "Pointe-Claire", "doc_count": 17925, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13126}, {"key": "Pointe-Claire", "doc_count": 4766}, {"key": "Dorval / Ile Dorval", "doc_count": 12}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7}, {"key": "Kirkland", "doc_count": 7}, {"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1}, {"key": "St-Laurent", "doc_count": 1} 45
  • 46. Calcul de moyenne et trie d'agrégation GET /pompier/interventions/_search { "size": 0, "aggs": { "avg_nombre_unites_general": { "avg": {"field": "nombre_unites"} }, "type_incident": { "terms": { "field": "type_incident", "size": 5, "order" : {"avg_nombre_unites": "desc"} }, "aggs": { "avg_nombre_unites": { "avg": {"field": "nombre_unites"} } } } } { "aggregations": { "type_incident": { "buckets": [ { "key": "Feu / 5e Alerte", "doc_count": 162, "avg_nombre_unites": {"value": 70.9074074074074} }, { "key": "Feu / 4e Alerte", "doc_count": 100, "avg_nombre_unites": {"value": 49.36} }, { "key": "Troisième alerte/autre que BAT", "doc_count": 1, "avg_nombre_unites": {"value": 43.0} }, { "key": "Feu / 3e Alerte", "doc_count": 173, "avg_nombre_unites": {"value": 41.445086705202314} }, { "key": "Deuxième alerte/autre que BAT", "doc_count": 8, "avg_nombre_unites": {"value": 37.5} } ] }, "avg_nombre_unites_general": {"value": 2.1374461758713728} } } 46
  • 47. Percentile GET /pompier/interventions/_search { "size": 0, "aggs": { "unites_percentile": { "percentiles": { "field": "nombre_unites", "percents": [25, 50, 75, 100] } } } } { "aggregations": { "unites_percentile": { "values": { "25.0": 1.0, "50.0": 1.0, "75.0": 3.0, "100.0": 275.0 } } } } 47
  • 48. Histogram GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "unites_histogram": { "histogram": { "field": "nombre_unites", "order": {"_key": "asc"}, "interval": 1 }, "aggs": { "ville": { "terms": {"field": "ville", "size": 1} } } } } } { "aggregations": { "unites_histogram": { "buckets": [ { "key": 1.0, "doc_count": 23507, "ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]} },{ "key": 2.0, "doc_count": 1550, "ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]} },{ "key": 3.0, "doc_count": 563, "ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]} },{ "key": 4.0, "doc_count": 449, "ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]} },{ "key": 5.0, "doc_count": 310, "ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]} },{ "key": 6.0, "doc_count": 215, "ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]} },{ "key": 7.0, "doc_count": 136, "ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]} },{ "key": 8.0, "doc_count": 35, "ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]} },{ "key": 9.0, "doc_count": 10, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 10.0, "doc_count": 11, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 11.0, "doc_count": 2, "ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]} 48
  • 49. “Significant term” GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "ville": { "significant_terms": {"field": "ville", "size": 5, "percentage": {}} } } } { "aggregations": { "ville": { "doc_count": 26801, "buckets": [ { "key": "Ile-Bizard", "score": 0.10029498525073746, "doc_count": 68, "bg_count": 678 }, { "key": "Montréal-Nord", "score": 0.0826544804291675, "doc_count": 416, "bg_count": 5033 }, { "key": "Roxboro", "score": 0.08181818181818182, "doc_count": 27, "bg_count": 330 }, { "key": "Côte St-Luc", "score": 0.07654825526563974, "doc_count": 487, "bg_count": 6362 }, { "key": "Saint-Laurent", "score": 0.07317073170731707, "doc_count": 465, "bg_count": 6355 49
  • 50. Agrégation et données géolocalisées GET :url/pompier/interventions/_search { "size": 0, "query": { "regexp": {"type_incident": "Feu.*"} }, "aggs": { "distance_from_here": { "geo_distance": { "field": "position", "unit": "km", "origin": { "lat": 45.495902, "lon": -73.554263 }, "ranges": [ { "to": 2}, {"from":2, "to": 4}, {"from":4, "to": 6}, {"from": 6, "to": 8}, {"from": 8}] } } } { "aggregations": { "distance_from_here": { "buckets": [ { "key": "*-2.0", "from": 0.0, "to": 2.0, "doc_count": 80 }, { "key": "2.0-4.0", "from": 2.0, "to": 4.0, "doc_count": 266 }, { "key": "4.0-6.0", "from": 4.0, "to": 6.0, "doc_count": 320 }, { "key": "6.0-8.0", "from": 6.0, "to": 8.0, "doc_count": 326 }, { "key": "8.0-*", "from": 8.0, "doc_count": 1720 } ] } } } 50
  • 51. Il y a t-il des questions ? ? 51
  • 52. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": ""service rapide"~2" } } } "hits": { "hits": [ { "_source": { "title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165" } },{ "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": { "slop": 2, "query": "service rapide" } } } 52