SlideShare a Scribd company logo
1 of 19
ElasticSearch
lessons
learned
Alberto Paro, October 30, 2014
Agenda
Introduction
ElasticSearch
Common Pitfalls
Questions
About me
Alberto Paro, @aparo77
My motto: “always learning”
CTO at Big Data Technologies
Freelance Consulting
International companies (Italy, Switzerland, Austria, USA)
Web Development on Big Data Solutions
NLP/Spark/Lucene/SOLR/ElasticSearch implementations &
training
Reactive and Functional Programming (Scala, Akka,
Spray.io, Play)
About me
Packt Publishing Book Author and reviewer
ElasticSearch Cookbook (Author, Dec 2013)
ElasticSearch Server (Review, Apr 2014)
ElasticSearch Cookbook – Second Edition (Author, Dec 2014)
Using ElasticSearch from 2010 ~ version 1.10
PyES – ElasticSearch python driver used by Cern, IBM, …
ElasticSearch MongoDB river
Django ElasticSearch Engine
For companies I developed up to 4 ORMs for ElasticSearch
(.Net, Python, Scala) and several plugins
ElasticSearch
Apache Lucene
Started in 2010 by Shay Banon
Open Source – Apache License
A company was formed in 2012: ElasticSearch
Training, support and development
ElasticSearch
Scalable
Distributed, Node Discovery
Automatic sharding
Query distribution
RESTful, HTTP API
With API wrappers for .Net, Ruby, Java, Scala, …
JSON in, JSON out -> JSON Coast-to-Coast
Document Model
Maps Json to Object
“schemaless” -> field type recognition
Keeps source, keeps ‘version’ number, keeps timestamp, …
ElasticSearch
Field types and analyzers
String, numeric, geo, …
Custom types: attachments, IP, IBAN, …
Arrays, subdocuments, nested documents
Integrated Aggregations
Your big data insights
Terms
Min/Max/Avg/Sum
Top hit
Geo Distance
And more
DBMS -> ElasticSearch
DBMS ElasticSearch MongoDB
Database Index Database
Table Type Collection
Field Field Field
Record Document Document
User must rethink their models.
DBMS -> ElasticSearch
Datamodelling is the same Entity Relation, plus:
Multi values
Embedding
Mutable/Immutable data
Alternative three foreign key alternative:
Term query
Parent/Child
Nested
{
"book" : {
"isbn" : ”9781782166627",
"name" : ”ElasticSearch Cookbook",
"author" : {
"first_name" : ”Alberto",
"last_name" : ”Paro"
},
"pages" : 430,
"tag" : [”elasticsearch", ”java”, “python”, “Rest”]
}
}
Common Pitfalls
Schema(less)?
Automatic field type recognition
Can miss types
Strict about types: only some types can be upgraded
Check the datetime:
UNIX (epoch from …) (the standard world)
ISO 8601 -> “yyyy-MM-ddTHH:mm:ssZ”
Common Pitfalls
What’s the best transport protocol?
In JVM, prefer the native
Faster
Extra bonus
HTTP best for balancer
Thrift best for performance
Faster than HTTP
Charset “safe”
Common Pitfalls
Never, Never publish your ElasticSearch server outside
DMZ
Security problems with scripting
Simple HTTP can destroy your server
Or simply drain your money on Amazon Cloud
ElasticSearch has a lot of problems with URL security
Vulnerabilities
Common Pitfalls
Very fast indexing
Bulk indexing:
Set up without replicas (replicas = 0, not 1)
Play with bulk size (300-500-1000-5000-10000)
Performances depends on data complexity
Before indexing: After indexing:
curl -XPUT localhost:9200/test/_settings -d
'{
"index" : {
"refresh_interval" : "1s"
} }'
curl -XPUT localhost:9200/test/_settings -d
'{
"index" : {
"refresh_interval" : "-1”
} }'
Common Pitfalls
ElasticSearch uses a lot of memory and file-descriptors!
Optimize them in /etc/security/limits.conf
elasticsearch soft nofile 32000
elasticsearch hard nofile 32000
elasticsearch - memlock unlimited
Set the ES_HEAP_SIZE
ElasticSearch config file conf/elasticsearch.yml
bootstrap.mlockall: true
Common Pitfalls
Wait the yellow status
Are you using ElasticSearch as Primary datastore?
It can replace both DBMS or MongoDB
but it depends on your data
Cron Snapshots
Don’t abuse flush
(Be reactive)
Prefer “update” to post repost the same object
Use the “version” Luke!
Common Pitfalls
If possible don’t use rivers
Hard to debug
Reduce your server responsivity
Can crash your server
They will be removed (2.0?)
(Prefer Spark SchemaDDL)
Use scripts
The easy way to extend ElasticSearch for trivial
functionalities
Prefer Groovy (or native Java for performances)
Don’t use inline scripts, if possible
Prefer indexed or file with parameters
Common Pitfalls
Use plugins
If it’s not available, write a new one
Always backup before upgrading
Snapshots can save your life!
Bug in 1.3.x
Check your plugins to compatibility
Read the ElasticSearch changelog
Sometimes you MUST upgrade your cluster
Use a least 3 nodes (if possible)
Conclusions
ElasticSearch benefits
Easy to setup
Very clever architecture
Drawbacks
Change sharding in a full index non-trivial
Pay attention when upgrading
ElasticSearch
Clever architecture, fast, stable, extendable
Does exactly what you need
Thank you
alberto.paro@gmail.com
@aparo77
Questions?

More Related Content

What's hot

10 Steps to Secure Wordpress Sites
10 Steps to Secure Wordpress Sites10 Steps to Secure Wordpress Sites
10 Steps to Secure Wordpress SitesAapna Infotech
 
Fusker - A NodeJS Security Framework
Fusker - A NodeJS Security FrameworkFusker - A NodeJS Security Framework
Fusker - A NodeJS Security Frameworkwearefractal
 
How to Secure Your WordPress Site
How to Secure Your WordPress SiteHow to Secure Your WordPress Site
How to Secure Your WordPress SiteQBurst
 
Top Ten WordPress Security Tips for 2012
Top Ten WordPress Security Tips for 2012Top Ten WordPress Security Tips for 2012
Top Ten WordPress Security Tips for 2012Brad Williams
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!FlyData Inc.
 
BGOUG 2014 Decrease Your MySQL Attack Surface
BGOUG 2014 Decrease Your MySQL Attack SurfaceBGOUG 2014 Decrease Your MySQL Attack Surface
BGOUG 2014 Decrease Your MySQL Attack SurfaceGeorgi Kodinov
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Anshum Gupta
 
WordPress Security Fundamentals - WordCamp Biratnagar 2018
WordPress Security Fundamentals - WordCamp Biratnagar 2018WordPress Security Fundamentals - WordCamp Biratnagar 2018
WordPress Security Fundamentals - WordCamp Biratnagar 2018Abul Khayer
 
2016 oSC MySQL Firewall
2016 oSC MySQL Firewall2016 oSC MySQL Firewall
2016 oSC MySQL FirewallGeorgi Kodinov
 
How to add storage to esxi 5.5
How to add storage to esxi 5.5How to add storage to esxi 5.5
How to add storage to esxi 5.5Osama Mustafa
 
Install oracle siebel on windows 2008 r2
Install oracle siebel on windows 2008 r2Install oracle siebel on windows 2008 r2
Install oracle siebel on windows 2008 r2Osama Mustafa
 
Web Technology Management Lecture IV
Web Technology Management Lecture IVWeb Technology Management Lecture IV
Web Technology Management Lecture IVsopekmir
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworksAnshum Gupta
 
J2ee user managment using dwh builder
J2ee user managment using dwh builderJ2ee user managment using dwh builder
J2ee user managment using dwh builderOsama Mustafa
 
ember-socket-guru - common api for websockets providers
ember-socket-guru - common api for websockets providersember-socket-guru - common api for websockets providers
ember-socket-guru - common api for websockets providersKuba Niechciał
 
Apache
ApacheApache
Apachejtpond
 
Automatic Backup in Ceph
Automatic Backup in CephAutomatic Backup in Ceph
Automatic Backup in Cephnaxarul
 
Running OpenStack and Midonet - Nobuyuki Tamaoki, Virtual Tech Japan
Running OpenStack and Midonet - Nobuyuki Tamaoki, Virtual Tech JapanRunning OpenStack and Midonet - Nobuyuki Tamaoki, Virtual Tech Japan
Running OpenStack and Midonet - Nobuyuki Tamaoki, Virtual Tech JapanMidoNet
 

What's hot (20)

10 Steps to Secure Wordpress Sites
10 Steps to Secure Wordpress Sites10 Steps to Secure Wordpress Sites
10 Steps to Secure Wordpress Sites
 
Azure370
Azure370Azure370
Azure370
 
Fusker - A NodeJS Security Framework
Fusker - A NodeJS Security FrameworkFusker - A NodeJS Security Framework
Fusker - A NodeJS Security Framework
 
How to Secure Your WordPress Site
How to Secure Your WordPress SiteHow to Secure Your WordPress Site
How to Secure Your WordPress Site
 
Top Ten WordPress Security Tips for 2012
Top Ten WordPress Security Tips for 2012Top Ten WordPress Security Tips for 2012
Top Ten WordPress Security Tips for 2012
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!
 
BGOUG 2014 Decrease Your MySQL Attack Surface
BGOUG 2014 Decrease Your MySQL Attack SurfaceBGOUG 2014 Decrease Your MySQL Attack Surface
BGOUG 2014 Decrease Your MySQL Attack Surface
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015
 
WordPress Security Fundamentals - WordCamp Biratnagar 2018
WordPress Security Fundamentals - WordCamp Biratnagar 2018WordPress Security Fundamentals - WordCamp Biratnagar 2018
WordPress Security Fundamentals - WordCamp Biratnagar 2018
 
2016 oSC MySQL Firewall
2016 oSC MySQL Firewall2016 oSC MySQL Firewall
2016 oSC MySQL Firewall
 
How to add storage to esxi 5.5
How to add storage to esxi 5.5How to add storage to esxi 5.5
How to add storage to esxi 5.5
 
Sql installation
Sql installationSql installation
Sql installation
 
Install oracle siebel on windows 2008 r2
Install oracle siebel on windows 2008 r2Install oracle siebel on windows 2008 r2
Install oracle siebel on windows 2008 r2
 
Web Technology Management Lecture IV
Web Technology Management Lecture IVWeb Technology Management Lecture IV
Web Technology Management Lecture IV
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
 
J2ee user managment using dwh builder
J2ee user managment using dwh builderJ2ee user managment using dwh builder
J2ee user managment using dwh builder
 
ember-socket-guru - common api for websockets providers
ember-socket-guru - common api for websockets providersember-socket-guru - common api for websockets providers
ember-socket-guru - common api for websockets providers
 
Apache
ApacheApache
Apache
 
Automatic Backup in Ceph
Automatic Backup in CephAutomatic Backup in Ceph
Automatic Backup in Ceph
 
Running OpenStack and Midonet - Nobuyuki Tamaoki, Virtual Tech Japan
Running OpenStack and Midonet - Nobuyuki Tamaoki, Virtual Tech JapanRunning OpenStack and Midonet - Nobuyuki Tamaoki, Virtual Tech Japan
Running OpenStack and Midonet - Nobuyuki Tamaoki, Virtual Tech Japan
 

Similar to ElasticSearch Meetup 30 - 10 - 2014

Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Learn you some Ansible for great good!
Learn you some Ansible for great good!Learn you some Ansible for great good!
Learn you some Ansible for great good!David Lapsley
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Tom Laszewski
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Federico Panini
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearchErhwen Kuo
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAmazon Web Services
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data AnalyticsAmazon Web Services
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocratJonathan Linowes
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Site Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariSite Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariJoseph Scott
 

Similar to ElasticSearch Meetup 30 - 10 - 2014 (20)

Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Learn you some Ansible for great good!
Learn you some Ansible for great good!Learn you some Ansible for great good!
Learn you some Ansible for great good!
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
owasp top 10
owasp top 10owasp top 10
owasp top 10
 
12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Site Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariSite Performance - From Pinto to Ferrari
Site Performance - From Pinto to Ferrari
 

More from Alberto Paro

LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19Alberto Paro
 
2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patternsAlberto Paro
 
Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Alberto Paro
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup Alberto Paro
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
 
2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big DataAlberto Paro
 
What's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeWhat's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeAlberto Paro
 
Scala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSScala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSAlberto Paro
 

More from Alberto Paro (10)

Data streaming
Data streamingData streaming
Data streaming
 
LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19
 
2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns
 
Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data
 
What's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeWhat's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - Firenze
 
Scala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSScala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJS
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

ElasticSearch Meetup 30 - 10 - 2014

  • 3. About me Alberto Paro, @aparo77 My motto: “always learning” CTO at Big Data Technologies Freelance Consulting International companies (Italy, Switzerland, Austria, USA) Web Development on Big Data Solutions NLP/Spark/Lucene/SOLR/ElasticSearch implementations & training Reactive and Functional Programming (Scala, Akka, Spray.io, Play)
  • 4. About me Packt Publishing Book Author and reviewer ElasticSearch Cookbook (Author, Dec 2013) ElasticSearch Server (Review, Apr 2014) ElasticSearch Cookbook – Second Edition (Author, Dec 2014) Using ElasticSearch from 2010 ~ version 1.10 PyES – ElasticSearch python driver used by Cern, IBM, … ElasticSearch MongoDB river Django ElasticSearch Engine For companies I developed up to 4 ORMs for ElasticSearch (.Net, Python, Scala) and several plugins
  • 5. ElasticSearch Apache Lucene Started in 2010 by Shay Banon Open Source – Apache License A company was formed in 2012: ElasticSearch Training, support and development
  • 6. ElasticSearch Scalable Distributed, Node Discovery Automatic sharding Query distribution RESTful, HTTP API With API wrappers for .Net, Ruby, Java, Scala, … JSON in, JSON out -> JSON Coast-to-Coast Document Model Maps Json to Object “schemaless” -> field type recognition Keeps source, keeps ‘version’ number, keeps timestamp, …
  • 7. ElasticSearch Field types and analyzers String, numeric, geo, … Custom types: attachments, IP, IBAN, … Arrays, subdocuments, nested documents Integrated Aggregations Your big data insights Terms Min/Max/Avg/Sum Top hit Geo Distance And more
  • 8. DBMS -> ElasticSearch DBMS ElasticSearch MongoDB Database Index Database Table Type Collection Field Field Field Record Document Document User must rethink their models.
  • 9. DBMS -> ElasticSearch Datamodelling is the same Entity Relation, plus: Multi values Embedding Mutable/Immutable data Alternative three foreign key alternative: Term query Parent/Child Nested { "book" : { "isbn" : ”9781782166627", "name" : ”ElasticSearch Cookbook", "author" : { "first_name" : ”Alberto", "last_name" : ”Paro" }, "pages" : 430, "tag" : [”elasticsearch", ”java”, “python”, “Rest”] } }
  • 10. Common Pitfalls Schema(less)? Automatic field type recognition Can miss types Strict about types: only some types can be upgraded Check the datetime: UNIX (epoch from …) (the standard world) ISO 8601 -> “yyyy-MM-ddTHH:mm:ssZ”
  • 11. Common Pitfalls What’s the best transport protocol? In JVM, prefer the native Faster Extra bonus HTTP best for balancer Thrift best for performance Faster than HTTP Charset “safe”
  • 12. Common Pitfalls Never, Never publish your ElasticSearch server outside DMZ Security problems with scripting Simple HTTP can destroy your server Or simply drain your money on Amazon Cloud ElasticSearch has a lot of problems with URL security Vulnerabilities
  • 13. Common Pitfalls Very fast indexing Bulk indexing: Set up without replicas (replicas = 0, not 1) Play with bulk size (300-500-1000-5000-10000) Performances depends on data complexity Before indexing: After indexing: curl -XPUT localhost:9200/test/_settings -d '{ "index" : { "refresh_interval" : "1s" } }' curl -XPUT localhost:9200/test/_settings -d '{ "index" : { "refresh_interval" : "-1” } }'
  • 14. Common Pitfalls ElasticSearch uses a lot of memory and file-descriptors! Optimize them in /etc/security/limits.conf elasticsearch soft nofile 32000 elasticsearch hard nofile 32000 elasticsearch - memlock unlimited Set the ES_HEAP_SIZE ElasticSearch config file conf/elasticsearch.yml bootstrap.mlockall: true
  • 15. Common Pitfalls Wait the yellow status Are you using ElasticSearch as Primary datastore? It can replace both DBMS or MongoDB but it depends on your data Cron Snapshots Don’t abuse flush (Be reactive) Prefer “update” to post repost the same object Use the “version” Luke!
  • 16. Common Pitfalls If possible don’t use rivers Hard to debug Reduce your server responsivity Can crash your server They will be removed (2.0?) (Prefer Spark SchemaDDL) Use scripts The easy way to extend ElasticSearch for trivial functionalities Prefer Groovy (or native Java for performances) Don’t use inline scripts, if possible Prefer indexed or file with parameters
  • 17. Common Pitfalls Use plugins If it’s not available, write a new one Always backup before upgrading Snapshots can save your life! Bug in 1.3.x Check your plugins to compatibility Read the ElasticSearch changelog Sometimes you MUST upgrade your cluster Use a least 3 nodes (if possible)
  • 18. Conclusions ElasticSearch benefits Easy to setup Very clever architecture Drawbacks Change sharding in a full index non-trivial Pay attention when upgrading ElasticSearch Clever architecture, fast, stable, extendable Does exactly what you need

Editor's Notes

  1. Extra bonus: node monitoring
  2. Extra bonus: node monitoring