Your SlideShare is downloading. ×
0
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists

2,630

Published on

TDWG 2013 talk on ElasticSearch by Canadensys and GBIF France.

TDWG 2013 talk on ElasticSearch by Canadensys and GBIF France.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,630
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
53
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists Christian Gendreau, Canadensys Marie-Elise Lecoq, GBIF France
  • 2. Introduction ElasticSearch is an open source, document oriented, distributed search engine, built on top of Apache Lucene. From ElasticSearch GitHub page
  • 3. Setup •  Java 6 or higher •  Download : # wget …elasticsearch-0.90.5.zip •  Unzip
  • 4. Configuration •  Name your cluster •  Replication and multi-shard are enabled by default •  Start : # bin/elasticsearch
  • 5. Add data Using the REST API $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elastic Search" }'
  • 6. Import data Rivers •  Document-based database (mongoDB) •  JDBC (relational database) •  Data source (wikipedia, Twitter)
  • 7. Mapping •  Schema-less •  Customize indexing •  Customize querying
  • 8. ElasticSearch at Canadensys Database of Vascular Plants of Canada (VASCAN) data.canadensys.net/vascan
  • 9. Our ElasticSearch index Index structure for scientific names •  autocompletion : edge_ngram filter o  “carex” -> “ca”,”car”,”care”,”carex” •  genus first letter : pattern_replace filter o  “carex feta” -> “c. feta” •  epithet : path_hierarchy tokenizer o  “carex feta” -> “feta”
  • 10. ElasticSearch at GBIF France Data stored in ElasticSearch are updated upon MongoDB changes. The search engine requests elasticsearch using filters like taxon, date, place, dataset and geolocalisation. Statistic calculation using facets
  • 11. ElasticSearch at GBIF France
  • 12. ElasticSearch - Solr •  Solr and elasticsearch both tries to solve the same problem with no much differences •  Development setup and production deployment (replication / sharding) easier with elasticsearch •  By default, the elasticsearch is well configured for Lucene and customization remains easy.
  • 13. Facets •  “Group by” in SQL •  Mostly used for calculate statistics •  Example : curl -XGET [...] "facets" : { ”dataset" : { "terms" : { "field" : ”dataset", "order" : "term” …
  • 14. API and libraries REST API o  interoperability between different programming languages o  HTTP request Java API o  o  more efficient than REST API due to the binary API use. built in marshaling(data formatting on the network)
  • 15. Query - RESTfull API Example: $ curl localhost:9200/vascan/_search?pretty=1 -d '{"query":{ "match":{ "name" :{ "query":"carex" } } } }’
  • 16. Query - Java API Code example: ... SearchRequestBuilder srb = client.prepareSearch(INDEX_NAME) .setQuery(QueryBuilders .boolQuery() .should(QueryBuilders.matchQuery("vernacular_name",text)) .setTypes(VERNACULAR_TYPE); ...
  • 17. Pitfalls •  •  •  •  Error reporting (index creation, river creation) Results may be hard to predict using complex queries Documentation With each mapping modification comes a free reindex from data
  • 18. Future •  Scientific Name analyzer •  Geospatial component
  • 19. Thank you!

×