Este es el cuarto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. Este seminario se ve en la compatibilidad con índices de texto libre y geoespaciales.
3. Conceptos Básicos 2016
Indexación Avanzada:
Índices de texto y Geoespaciales
Rubén Terceño
Senior Solutions Architect, EMEA
ruben@mongodb.com
@rubenTerceno
4. Agenda del Curso
Date Time Webinar
25-Mayo-2016 16:00 CEST Introducción a NoSQL
7-Junio-2016 16:00 CEST Su primera aplicación MongoDB
21-Junio-2016 16:00 CEST Diseño de esquema orientado a documentos
07-Julio-2016 16:00 CEST Indexación avanzada, índices de texto y geoespaciales
19-Julio-2016 16:00 CEST Introducción al Aggregation Framework
28-Julio-2016 16:00 CEST Despliegue en producción
5. Resumen de lo visto hasta ahora
• ¿Porqué existe NoSQL?
• Tipos de bases de datos NoSQL
• Características clave de MongoDB
• Instalación y creación de bases de datos y colecciones
• Operaciones CRUD
• Índices y explain()
• Diseño de esquema dinámico
• Jerarquía y documentos embebidos
• Polimorfismo
9. Creating a Simple Index
db.coll.createIndex( { fieldName : <Direction> } )
Database Name
Collection Name
Command
Field Name to
be indexed
Ascending : 1
Descending : -1
10. Two Other Kinds of Indexes
• Full Text Index
• Allows searching inside the text of a field or several fields, ordering the
results by relevance.
• Geospatial Index
• Allows geospatial queries
• People around me.
• Countries I’m traversing during my trip.
• Restaurants in a given neighborhood.
• These indexes do not use B-trees
11. Full Text Indexes
• An “inverted index” on all the words inside text fields (only one text index per collection)
{ “comment” : “I think your blog post is very interesting
and informative. I hope you will post more
info like this in the future” }
>> db.posts.createIndex( { “comments” : “text” } )
MongoDB Enterprise > db.posts.find( { $text: { $search : "info" }} )
{ "_id" : ObjectId(“…"), "comment" : "I think your blog post is very
interesting and informative. I hope you will post more info like this in
the future" }
MongoDB Enterprise >
12. On The Server
2016-07-07T09:48:48.605+0200 I INDEX [conn4] build index on:
indexes.products properties: { v: 1,
key: { _fts: "text", _ftsx: 1 },
name: "longDescription_text_shortDescription_text_name_text”,
ns: "indexes.products",
weights: { longDescription: 1,
name: 10,
shortDescription: 3 },
default_language: "english”,
language_override: "language”,
textIndexVersion: 3 }
14. Using Weights
• We can assign different weights to different fields in the text index
• E.g. I want to favour name over shortDescription in searching
• So I increase the weight for the the name field
>> db.blog.createIndex( { shortDescription: "text",
longDescription: "text”,
name: "text” },
{ weights: { shortDescription: 3,
longDescription: 1,
name: 10 }} )
• Now searches will favour name over shortDesciption over longDescription
15. $textscore
• We may want to favor results with higher weights, thus:
>> db.products.find({$text : {$search: "humongous"}}, {score:
{$meta : "textScore"}, name: 1, longDescription: 1,
shortDescription: 1}).sort( { score: { $meta: "textScore" } } )
16. Other Parameters
• Language : Pick the language you want to search in e.g.
• $language : Spanish
• Support case sensitive searching
• $caseSensitive : True (default false)
• Support accented characters (diacritic sensitive search e.g. café
is distinguished from cafe )
• $diacriticSensitive : True (default false)
17. Geospatial Indexes
• 2d
• Represents a flat surface. A good fit if:
• You have legacy coordinate pairs (MongoDB 2.2 or earlier).
• You do not plan to use geoJSON objects.
• You don’t worry about the Earth's curvature. (Yup, earth is not flat)
• 2dsphere
• Represents a flat surface on top of an spheroid.
• It should be the default choice for geoData
• Coordinates are (usually) stored in GeoJSON format
• The index is based on a QuadTree representation
• The index is based on WGS 84 standard
18. Coordinates
• Coordinates are represented as longitude, latitude
• Longitude
• Measured from Greenwich meridian (0 degrees)
• For locations east up to +180 degrees
• For locations west we specify as negative up to -180
• Latitude
• Measured from equator north and south (0 to 90 north, 0 to -90 south)
• Coordinates in MongoDB are stored on Longitude/Latitude order
• Coordinates in Google Maps are stored in Latitude/Longitude order
19. 2dSphere Versions
• Two versions of 2dSphere index in MongoDB
• Version 1 : Up to MongoDB 2.4
• Version 2 : From MongoDB 2.6 onwards
• Version 3 : From MongoDB 3.2 onwards
• We will only be talking about Version 3 in this webinar
20. Creating a 2dSphere Index
db.collection.createIndex
( { <location field> : "2dsphere" } )
• Location field must be coordinate or GeoJSON data
22. Testing Geo Queries
• Lets search for wine regions in the world
• Using two collections from my gitHub repo
• https://github.com/terce13/geoData
• Import them into MongoDB
• mongoimport -c wines -d geo wine_regions.json
• mongoimport -c countries -d geo countries.json
26. $geoIntersects to find our country
• Assume we are at lat: 43.47, lon: -3.81
• What country are we in? Use $geoIntersects
db.countries.findOne({ geometry:
{ $geoIntersects:
{ $geometry:
{ type: "Point",
coordinates:
[ -3.81, 43.47 ]}}}},
{"properties.name": 1})
33. Let’s do crazy things
var wines = db.wines.find()
while (wines.hasNext()){
var wine = wines.next();
var country = db.countries.findOne({geometry :
{$geoIntersects : {$geometry : wine.geometry}}});
if (country!=null){
db.wines.update({"_id" : wine._id},
{$set : {"properties.country" :
country.properties.name}});
}
}
34. Summary of Operators
• $geoIntersect: Find areas or points that overlap or are
adjacent
• Points or polygons, doesn’t matter.
• $geoWithin: Find areas on points that lie within a specific area
• Use screen limits smartly
• $near: Returns locations in order from nearest to furthest away
• Find closest objects.
35. Summary
• Los índices de texto permiten hacer búsquedas tipo Google, SOLR, ElasticSearch
• Pueden tenere en cuenta los pesos de diferentes campos
• Pueden combinarse con otras búsquedas
• Pueden devolver los resultado ordenados por relevancia
• Pueden ser multilenguaje y case/accent insensitive
• Los índices geoespaciales permiten manejar objetos GeoJSON
• Permiten hacer búsquedas por proximidad, inclusión e intersección
• Utilizan el sistema de referencia más habitual, WGS84
• Ojo!!! Latitud y longitud son al revés que Google Maps.
• Pueden combinarse con otras búsquedas
• Existe un índice especial (2d) para superficies planas (un campo de fútbol, un mundo
virtual, etc.)
36. Próximo Webinar
Introducción a Aggregation Framework
• 19 de Julio 2016 – 16:00 CEST, 11:00 ART, 9:00
• ¡Regístrese si aún no lo ha hecho!
• MongoDB Aggregation Framework concede al desarrollador la capacidad de
desplegar un procesamiento de análisis avanzado dentro de la base de
datos..
• Este procesa los datos en una pipeline tipo Unix y permite a los
desarrolladores:
• Remodelar, transformar y extraer datos.
• Aplicar funciones analíticas estándares que van desde las sumas y las medias hasta la
desviación estándar.
• Regístrese en : https://www.mongodb.com/webinars
• Denos su opinión, por favor: back-to-basics@mongodb.com
Each item in a Btree node points to a sub-tree containing elements below its key value. Insertions require a read before a write. Writes that split nodes are expensive.