conceptos básicos. seminario web 4: indexación avanzada, índices de texto y geoespaciales

Post on 10-Jan-2017

993 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MongoDB Europe 2016Old Billingsgate, London

15th November

Use my code rubenterceno20 for 20% off ticketsmongodb.com/europe

Conceptos Básicos 2016Indexación Avanzada:Índices de texto y Geoespaciales

Rubén TerceñoSenior Solutions Architect, EMEAruben@mongodb.com@rubenTerceno

Agenda del CursoDate Time Webinar25-Mayo-2016 16:00 CEST Introducción a NoSQL 7-Junio-2016 16:00 CEST Su primera aplicación MongoDB 21-Junio-2016 16:00 CEST Diseño de esquema orientado a documentos 07-Julio-2016 16:00 CEST Indexación avanzada, índices de texto y geoespaciales 19-Julio-2016 16:00 CEST Introducción al Aggregation Framework 28-Julio-2016 16:00 CEST Despliegue en producción

Resumen de lo visto hasta ahora• ¿Porqué existe NoSQL?• Tipos de bases de datos NoSQL• Características clave de MongoDB

• Instalación y creación de bases de datos y colecciones• Operaciones CRUD• Índices y explain()

• Diseño de esquema dinámico• Jerarquía y documentos embebidos• Polimorfismo

Indexing• An efficient way to look up data by its value• Avoids table scans

Traditional Databases Use B-trees• … and so does MongoDB

O(Log(n) Time

Creating a Simple Indexdb.coll.createIndex( { fieldName : <Direction> } )

Database Name

Collection Name

Command

Field Name to be indexed

Ascending : 1 Descending : -1

Two Other Kinds of Indexes• Full Text Index

• Allows searching inside the text of a field or several fields, ordering the results by relevance.

• Geospatial Index• Allows geospatial queries

• People around me.• Countries I’m traversing during my trip.• Restaurants in a given neighborhood.

• These indexes do not use B-trees

Full Text Indexes• An “inverted index” on all the words inside text fields (only one text index per collection)

{ “comment” : “I think your blog post is very interesting and informative. I hope you will post more info like this in the future” }

>> db.posts.createIndex( { “comments” : “text” } )

MongoDB Enterprise > db.posts.find( { $text: { $search : "info" }} ){ "_id" : ObjectId(“…"), "comment" : "I think your blog post is very interesting and informative. I hope you will post more info like this in the future" }MongoDB Enterprise >

On The Server2016-07-07T09:48:48.605+0200 I INDEX [conn4] build index on: indexes.products properties: { v: 1,

key: { _fts: "text", _ftsx: 1 }, name: "longDescription_text_shortDescription_text_name_text”,ns: "indexes.products", weights: { longDescription: 1, name: 10, shortDescription: 3 },default_language: "english”,language_override: "language”,textIndexVersion: 3 }

More Detailed Example>> db.posts.insert( { "comment" : "Red yellow orange green" } )>> db.posts.insert( { "comment" : "Pink purple blue" } )>> db.posts.insert( { "comment" : "Red Pink" } )

>> db.posts.find( { "$text" : { "$search" : "Red" }} ){ "_id" : ObjectId("…"), "comment" : "Red yellow orange green" }

{ "_id" : ObjectId("…"), "comment" : "Red Pink" }

>> db.posts.find( { "$text" : { "$search" :  "Pink Green" }} ){ "_id" : ObjectId("…"), "comment" : "Red Pink" }

{ "_id" : ObjectId("…"), "comment" : "Red yellow orange green" }

>> db.posts.find( { "$text" : { "$search" : "red" }} ) # <- Case Insensitve{ "_id" : ObjectId("…"), "comment" : "Red yellow orange green" }

{ "_id" : ObjectId("…"), "comment" : "Red Pink" }

>>

Using Weights• We can assign different weights to different fields in the text index• E.g. I want to favour name over shortDescription in searching• So I increase the weight for the the name field

>> db.blog.createIndex( { shortDescription: "text", longDescription: "text”,

name: "text” }, { weights: { shortDescription: 3,

longDescription: 1, name: 10 }} )• Now searches will favour name over shortDesciption over longDescription

$textscore• We may want to favor results with higher weights, thus:

>> db.products.find({$text : {$search: "humongous"}}, {score: {$meta : "textScore"}, name: 1, longDescription: 1, shortDescription: 1}).sort( { score: { $meta: "textScore" } } )

Other Parameters• Language : Pick the language you want to search in e.g.

• $language : Spanish• Support case sensitive searching

• $caseSensitive : True (default false)• Support accented characters (diacritic sensitive search e.g. café

is distinguished from cafe )• $diacriticSensitive : True (default false)

Geospatial Indexes• 2d

• Represents a flat surface. A good fit if:• You have legacy coordinate pairs (MongoDB 2.2 or earlier).• You do not plan to use geoJSON objects.• You don’t worry about the Earth's curvature. (Yup, earth is not flat)

• 2dsphere• Represents a flat surface on top of an spheroid.• It should be the default choice for geoData• Coordinates are (usually) stored in GeoJSON format• The index is based on a QuadTree representation• The index is based on WGS 84 standard

Coordinates• Coordinates are represented as longitude, latitude• Longitude

• Measured from Greenwich meridian (0 degrees) • For locations east up to +180 degrees• For locations west we specify as negative up to -180

• Latitude• Measured from equator north and south (0 to 90 north, 0 to -90 south)

• Coordinates in MongoDB are stored on Longitude/Latitude order• Coordinates in Google Maps are stored in Latitude/Longitude order

2dSphere Versions• Two versions of 2dSphere index in MongoDB• Version 1 : Up to MongoDB 2.4• Version 2 : From MongoDB 2.6 onwards• Version 3 : From MongoDB 3.2 onwards• We will only be talking about Version 3 in this webinar

Creating a 2dSphere Indexdb.collection.createIndex ( { <location field> : "2dsphere" } )

• Location field must be coordinate or GeoJSON data

Example

>> db.wines.createIndex( { geometry: "2dsphere" } ){

"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1

}

Testing Geo Queries• Lets search for wine regions in the world• Using two collections from my gitHub repo

• https://github.com/terce13/geoData

• Import them into MongoDB• mongoimport -c wines -d geo wine_regions.json• mongoimport -c countries -d geo countries.json

Country Document (Vatican){

"_id" : ObjectId("577e2ebd1007503076ac8c86"),

"type" : "Feature","properties" : {

"featurecla" : "Admin-0 country",

"sovereignt" : "Vatican","type" : "Sovereign country","admin" : "Vatican","adm0_a3" : "VAT","name" : "Vatican","name_long" : "Vatican","abbrev" : "Vat.","postal" : "V","formal_en" : "State of the

Vatican City","name_sort" : "Vatican (Holy

Sea)","name_alt" : "Holy Sea”,"pop_est" : 832,"economy" : "2. Developed

region: nonG7","income_grp" : "2. High income:

nonOECD","continent" : "Europe","region_un" : "Europe",

"subregion" : "Southern Europe","region_wb" : "Europe & Central

Asia",},

"geometry" : {"type" : "Polygon","coordinates" :

[ [ [12.439160156250011, 41.898388671875],

[12.430566406250023, 41.89755859375],

[12.427539062500017, 41.900732421875],

[12.430566406250023, 41.90546875],

[12.438378906250023, 41.906201171875],

[12.439160156250011, 41.898388671875]]]}

}

Wine region documentMongoDB Enterprise > db.wines.findOne(){

"_id" : ObjectId("577e2e7e1007503076ac8769"),"properties" : {"name" : "AOC Anjou-Villages","description" : null,"id" : "a629ojjxl15z"},"type" : "Feature","geometry" : {"type" : "Point","coordinates" : [ -0.618980171610645, 47.2211343496821]}

}

You can type this into google maps but

remember to reverse the coordinate order

Add IndexesMongoDB Enterprise > db.wines.createIndex({ geometry: "2dsphere" }){

"createdCollectionAutomatically" : false,

"numIndexesBefore" : 1,

"numIndexesAfter" : 2,

"ok" : 1

}

MongoDB Enterprise > db.countries.createIndex({ geometry: "2dsphere" }){

"createdCollectionAutomatically" : false,

"numIndexesBefore" : 1,

"numIndexesAfter" : 2,

"ok" : 1

}

MongoDB Enterprise >

$geoIntersects to find our country• Assume we are at lat: 43.47, lon: -3.81• What country are we in? Use $geoIntersects

db.countries.findOne({ geometry: { $geoIntersects: { $geometry: { type: "Point", coordinates: [ -3.81, 43.47 ]}}}},

{"properties.name": 1})

Results{

"_id" : ObjectId("577e2ebd1007503076ac8be5"),

"properties" : {"name" : "Spain"

}}

Wine regions around me• Use $near (ordered results by distance)

db.wines.find({geometry: {$near: {$geometry:{type : "Point",

coordinates : [-3.81,43.47]}, $maxDistance: 250000 } }})

Results (Projected){ "properties" : { "name" : "DO Arabako-Txakolina" } }{ "properties" : { "name" : "DO Chacoli de Vizcaya" } }{ "properties" : { "name" : "DO Chacoli de Guetaria" } }{ "properties" : { "name" : "DO Rioja" } }{ "properties" : { "name" : "DO Navarra" } }{ "properties" : { "name" : "DO Cigales" } }{ "properties" : { "name" : "AOC Irouléguy" } }{ "properties" : { "name" : "DO Ribera de Duero" } }{ "properties" : { "name" : "DO Rueda" } }{ "properties" : { "name" : "AOC Béarn-Bellocq" } }

But screens are not circulardb.wines.find({ geometry: { $geoWithin: { $geometry:{type : "Polygon",

coordinates : [[[-51,-29],[-71,-29],[-71,-33],[-51,-33],[-51,-29]]]}}}

})

Results – (Projected){ "properties" : { "name" : "Pinheiro Machado" } }{ "properties" : { "name" : "Rio Negro" } }{ "properties" : { "name" : "Tacuarembó" } }{ "properties" : { "name" : "Rivera" } }{ "properties" : { "name" : "Artigas" } }{ "properties" : { "name" : "Salto" } }{ "properties" : { "name" : "Paysandú" } }{ "properties" : { "name" : "Mendoza" } }{ "properties" : { "name" : "Luján de Cuyo" } }{ "properties" : { "name" : "Aconcagua" } }

Use geo objects smartly• Use polygons and/or multipolygons from a collection to query a

second one.var mex = db.countries.findOne({"properties.name" : "Mexico"})db.wines.find({geometry: {

$geoWithin: {$geometry: mex.geometry}}})

{ "_id" : ObjectId("577e2e7e1007503076ac8ab9"), "properties" : { "name" : "Los Romos", "description" : null, "id" : "a629ojjkguyw" }, "type" : "Feature", "geometry" : { "type" : "Point", "coordinates" : [ -102.304048304437, 22.0992980768825 ] } }

{ "_id" : ObjectId("577e2e7e1007503076ac8a8d"), "properties" : { "name" : "Hermosillo", "description" : null, "id" : "a629ojiw0i7f" }, "type" : "Feature", "geometry" : { "type" : "Point", "coordinates" : [ -111.03600413129, 29.074715739466 ] } }

Let’s do crazy thingsvar wines = db.wines.find()while (wines.hasNext()){

var wine = wines.next();var country = db.countries.findOne({geometry :

{$geoIntersects : {$geometry : wine.geometry}}});if (country!=null){db.wines.update({"_id" : wine._id},{$set : {"properties.country" :

country.properties.name}});}

}

Summary of Operators• $geoIntersect: Find areas or points that overlap or are

adjacent• Points or polygons, doesn’t matter.

• $geoWithin: Find areas on points that lie within a specific area• Use screen limits smartly

• $near: Returns locations in order from nearest to furthest away• Find closest objects.

Summary• Los índices de texto permiten hacer búsquedas tipo Google, SOLR, ElasticSearch

• Pueden tenere en cuenta los pesos de diferentes campos• Pueden combinarse con otras búsquedas• Pueden devolver los resultado ordenados por relevancia• Pueden ser multilenguaje y case/accent insensitive

• Los índices geoespaciales permiten manejar objetos GeoJSON• Permiten hacer búsquedas por proximidad, inclusión e intersección• Utilizan el sistema de referencia más habitual, WGS84

• Ojo!!! Latitud y longitud son al revés que Google Maps.

• Pueden combinarse con otras búsquedas • Existe un índice especial (2d) para superficies planas (un campo de fútbol, un mundo virtual, etc.)

Próximo WebinarIntroducción a Aggregation Framework

• 19 de Julio 2016 – 16:00 CEST, 11:00 ART, 9:00

• ¡Regístrese si aún no lo ha hecho!• MongoDB Aggregation Framework concede al desarrollador la capacidad de

desplegar un procesamiento de análisis avanzado dentro de la base de datos..• Este procesa los datos en una pipeline tipo Unix y permite a los desarrolladores:

• Remodelar, transformar y extraer datos.• Aplicar funciones analíticas estándares que van desde las sumas y las medias hasta la

desviación estándar.

• Regístrese en : https://www.mongodb.com/webinars

• Denos su opinión, por favor: back-to-basics@mongodb.com

¿Preguntas?

top related