SlideShare a Scribd company logo
1 of 25
FIELD DATA TYPESby Bo Andersen - codingexplained.com
OUTLINE
➤ Core data types
➤ String, numeric, data, boolean, binary
➤ Complex data types
➤ Object, array, nested
➤ Geo data types
➤ Geo-point, Geo-shape
➤ Specialized data types
➤ IPv4, completion, token count, attachment
CORE
DATA
TYPES
STRING
➤ String field types accept string values
➤ Can be sub-divided into full text and keywords
➤ We will take a look at these next
STRING - FULL TEXT
➤ Typically used for text based relevance searches (e.g. search for products by name)
➤ Full text fields are analyzed
➤ Data is passed through an analyzer to convert the string into a list of individual
terms, before being indexed
➤ This allows Elasticsearch to search for individual words within a full text field
➤ Full text fields are not used for sorting and are rarely used for aggregations
STRING - KEYWORDS
➤ Exact values such as tags, status, e-mail addresses, etc.
➤ Keywords fields are not analyzed
➤ The exact string value is added to the index as a single term
➤ Typically used for filtering
➤ E.g. find all products where status is "On Discount"
➤ Also often used for sorting and aggregations
NUMERIC
➤ Supports the following numeric types
➤ long (signed 64-bit integer)
➤ integer (signed 32-bit integer)
➤ short (signed 16-bit integer)
➤ byte (signed 8-bit integer)
➤ double (double-precision 64-bit floating point)
➤ float (single-precision 32-bit floating point)
DATE
➤ Dates in Elasticsearch can be either
➤ Strings containing formatted dates
➤ E.g. 2016-01-01 or 2016/01/01 12:00:00
➤ A long number representing milliseconds since the epoch
➤ An integer representing seconds since the epoch
➤ Internally stored as a long number representing milliseconds since the epoch
DATE - FORMATS
➤ Defaults to strict_date_optional_time||epoch_millis
➤ Dates with optional timestamps, which conform to the formats supported by
strict_date_optional_time - or milliseconds since the epoch
➤ Examples
➤ 2016-01-01 (date only)
➤ 2016-01-01T12:00:00Z (date including time)
➤ 1410020500000 (milliseconds since the epoch)
➤ Multiple formats can be specified by separating them with the || separator
➤ E.g. yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis
BOOLEAN
➤ Boolean fields accept true and false values as in JSON
➤ Can also accept strings and numbers which are interpreted as either true or false
➤ False values
➤ false, "false", "off", "no", "0", "" (empty string), 0, 0.0
➤ True values
➤ Anything that is not false
BINARY
➤ A binary value as a Base64 encoded string
➤ E.g. aHR0cDovL2NvZGluZ2V4cGxhaW5lZC5jb20=
➤ Not searchable
COMPLEX
DATA TYPES
OBJECT
➤ JSON documents are hierarchical
➤ A document may contain inner objects, which in turn may contain inner objects
➤ In Elasticsearch, documents are indexed as flat lists of key-value pairs
{
"message": "Some text...",
"customer.age": 26,
"customer.address.city": "Copenhagen",
"customer.address.country": "Denmark"
}
ARRAY
➤ Elasticsearch does not have a dedicated array type
➤ Any field can contain zero or more values by default
➤ All values in an array must be of the same data type
➤ When adding a field dynamically, the first value in the array determines the field type
➤ Examples
➤ Array of strings: ["Elasticsearch", "rocks"]
➤ Array of integers: [1, 2]
➤ Array of arrays: [1, [2, 3]] - equivalent of [1, 2, 3]
➤ Array of objects: [{ "name": "Andy", "age": 26 }, { "name":
"Brenda", "age": 32 }]
ARRAY - OBJECTS
➤ Arrays of objects do not work as you would expect
➤ You cannot query each object independently of the other objects in the array
➤ Lucene has no concept of inner objects
➤ Elasticsearch flattens object hierarchies into a list of field names and values
is stored similar to this:
{ "users : [{ "name": "Andy", "age": 26 }, { "name": "Brenda", "age": 32 }] }
{ "users.name": ["Andy", "Brenda"], "users.age": [32, 26] }
➤ The association between "Andy" and 26 is lost
➤ A search for a user named "Andy" who is 26 years old would return incorrect results!
➤ If you need to be able to do this, then you must use the nested data type
NESTED
➤ If you need to index arrays of objects and to maintain the independence of each
object in the array, you should used the nested data type
➤ Internally, nested objects index each object in the array as a separate hidden
document
➤ Each nested object can be queried independently of the others, with a nested
query
➤ A nested query is executed against the nested objects as if they were indexed as
separate documents (internally, this is actually the case)
GEO DATA
TYPES
GEO-POINT
➤ Latitude-longitude pairs
➤ Used for geographical operations on documents (searching, sorting, ...)
{
"location": {
"lat": 33.5206608,
"lon": -86.8024900
}
}
{
"location": "33.5206608,-86.8024900"
}
{
"location": "drm3btev3e86"
}
{
"location": [-86.8024900,33.5206608]
}
1 2
3 4
GEO-SHAPE
➤ Geo shapes such as rectangles and polygons
➤ Should be used when either the data being indexed or the queries being executed
contain shapes other than just points
➤ LineString
➤ Array of two or more positions (array of arrays). Straight line in the case of two
points
➤ Polygon
➤ An array of arrays, where each array contains points
➤ The first and last points in the outer array must be the same (to close the polygon)
➤ ...
SPECIALIZED
DATA TYPES
IPV4
➤ Used to map IPv4 addresses
➤ Internally, values are indexed as long values
COMPLETION
➤ The completion suggester is a so-called prefix suggester
➤ It does not do spell correction, but enables basic auto-complete functionality
➤ Useful for providing the user with suggestions while searching, e.g. like on Google
➤ Stores a FST (Finite State Transducer) as part of the index
➤ Allows for very fast loads and executions
➤ You don't have to worry about this - just know when to use this type
TOKEN COUNT
➤ An integer field which accepts string values
➤ The string values are analyzed, and the number of tokens are indexed
➤ Example
➤ A name property could have a length field of the type token_count
➤ Then, a search query could be executed to find persons whose name contains X
tokens (split by space, for instance)
ATTACHMENT
➤ Lets Elasticsearch index attachments in common formats
➤ E.g. PDF, XLS, PPT, ...
➤ Attachment content is stored as a Base64 encoded string
➤ This functionality is available as a plugin that must be installed
➤ sudo /path/to/elasticsearchbin/plugin install mapper-attachments
➤ Must be installed on every node of a cluster
➤ Nodes must be restarted after the installation
THANK YOU FOR
WATCHING!

More Related Content

What's hot

Indexing techniques
Indexing techniquesIndexing techniques
Indexing techniquesHuda Alameen
 
Data indexing presentation
Data indexing presentationData indexing presentation
Data indexing presentationgmbmanikandan
 
Pattern matching & file input and output
Pattern matching & file input and outputPattern matching & file input and output
Pattern matching & file input and outputMehul Jariwala
 
Data exchange over internet (XML vs JSON)
Data exchange over internet (XML vs JSON)Data exchange over internet (XML vs JSON)
Data exchange over internet (XML vs JSON)Wajahat Shahid
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text miningKrish_ver2
 
File organization 1
File organization 1File organization 1
File organization 1Rupali Rana
 
Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataXing Niu
 
Dutch Government Business Case
Dutch Government Business CaseDutch Government Business Case
Dutch Government Business CaseHans Overbeek
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XMLKumar
 
An hour with Database and SQL
An hour with Database and SQLAn hour with Database and SQL
An hour with Database and SQLIraj Hedayati
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashingJeet Poria
 
Annotating search results from web databases
Annotating search results from web databasesAnnotating search results from web databases
Annotating search results from web databasesJPINFOTECH JAYAPRAKASH
 
F Database
F DatabaseF Database
F DatabaseCTIN
 
Application of Data structure
Application of Data structureApplication of Data structure
Application of Data structureDeepika051991
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...Javed Khan
 

What's hot (20)

Indexing techniques
Indexing techniquesIndexing techniques
Indexing techniques
 
Data indexing presentation
Data indexing presentationData indexing presentation
Data indexing presentation
 
Pattern matching & file input and output
Pattern matching & file input and outputPattern matching & file input and output
Pattern matching & file input and output
 
Data exchange over internet (XML vs JSON)
Data exchange over internet (XML vs JSON)Data exchange over internet (XML vs JSON)
Data exchange over internet (XML vs JSON)
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
File organization 1
File organization 1File organization 1
File organization 1
 
Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open Data
 
Dutch Government Business Case
Dutch Government Business CaseDutch Government Business Case
Dutch Government Business Case
 
How web searching engines work
How web searching engines workHow web searching engines work
How web searching engines work
 
Sql introduction
Sql introductionSql introduction
Sql introduction
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
searching
searchingsearching
searching
 
An hour with Database and SQL
An hour with Database and SQLAn hour with Database and SQL
An hour with Database and SQL
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Annotating search results from web databases
Annotating search results from web databasesAnnotating search results from web databases
Annotating search results from web databases
 
F Database
F DatabaseF Database
F Database
 
DATA BASE MODEL Rohini
DATA BASE MODEL RohiniDATA BASE MODEL Rohini
DATA BASE MODEL Rohini
 
Application of Data structure
Application of Data structureApplication of Data structure
Application of Data structure
 
Presentation1
Presentation1Presentation1
Presentation1
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
 

Similar to Elasticsearch Field Data Types

06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and AnalysisOpenThink Labs
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSANurjahan Nipa
 
1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.pptAshok280385
 
SQL (Basic to Intermediate Customized 8 Hours)
SQL (Basic to Intermediate Customized 8 Hours)SQL (Basic to Intermediate Customized 8 Hours)
SQL (Basic to Intermediate Customized 8 Hours)Edu4Sure
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with PostgresEDB
 
ds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.pptds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.pptAlliVinay1
 
arrays and pointers
arrays and pointersarrays and pointers
arrays and pointersSamiksha Pun
 
Simple Queriebhjjnhhbbbbnnnnjjs In SQL.pdf
Simple Queriebhjjnhhbbbbnnnnjjs In SQL.pdfSimple Queriebhjjnhhbbbbnnnnjjs In SQL.pdf
Simple Queriebhjjnhhbbbbnnnnjjs In SQL.pdfManojVishwakarma91
 
Data Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptxData Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptxGIRISHKUMARBC1
 

Similar to Elasticsearch Field Data Types (20)

06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
 
Dsa unit 1
Dsa unit 1Dsa unit 1
Dsa unit 1
 
Unit 1.ppt
Unit 1.pptUnit 1.ppt
Unit 1.ppt
 
Searching in AtoM
Searching in AtoMSearching in AtoM
Searching in AtoM
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSA
 
DS_PPT.ppt
DS_PPT.pptDS_PPT.ppt
DS_PPT.ppt
 
1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt
 
Indexing
IndexingIndexing
Indexing
 
java.pdf
java.pdfjava.pdf
java.pdf
 
Data structure
Data structureData structure
Data structure
 
SQL (Basic to Intermediate Customized 8 Hours)
SQL (Basic to Intermediate Customized 8 Hours)SQL (Basic to Intermediate Customized 8 Hours)
SQL (Basic to Intermediate Customized 8 Hours)
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
 
Data type
Data typeData type
Data type
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
 
ds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.pptds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.ppt
 
DS_PPT.pptx
DS_PPT.pptxDS_PPT.pptx
DS_PPT.pptx
 
Data structure
Data structureData structure
Data structure
 
arrays and pointers
arrays and pointersarrays and pointers
arrays and pointers
 
Simple Queriebhjjnhhbbbbnnnnjjs In SQL.pdf
Simple Queriebhjjnhhbbbbnnnnjjs In SQL.pdfSimple Queriebhjjnhhbbbbnnnnjjs In SQL.pdf
Simple Queriebhjjnhhbbbbnnnnjjs In SQL.pdf
 
Data Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptxData Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptx
 

Recently uploaded

JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 

Recently uploaded (20)

JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 

Elasticsearch Field Data Types

  • 1. FIELD DATA TYPESby Bo Andersen - codingexplained.com
  • 2. OUTLINE ➤ Core data types ➤ String, numeric, data, boolean, binary ➤ Complex data types ➤ Object, array, nested ➤ Geo data types ➤ Geo-point, Geo-shape ➤ Specialized data types ➤ IPv4, completion, token count, attachment
  • 4. STRING ➤ String field types accept string values ➤ Can be sub-divided into full text and keywords ➤ We will take a look at these next
  • 5. STRING - FULL TEXT ➤ Typically used for text based relevance searches (e.g. search for products by name) ➤ Full text fields are analyzed ➤ Data is passed through an analyzer to convert the string into a list of individual terms, before being indexed ➤ This allows Elasticsearch to search for individual words within a full text field ➤ Full text fields are not used for sorting and are rarely used for aggregations
  • 6. STRING - KEYWORDS ➤ Exact values such as tags, status, e-mail addresses, etc. ➤ Keywords fields are not analyzed ➤ The exact string value is added to the index as a single term ➤ Typically used for filtering ➤ E.g. find all products where status is "On Discount" ➤ Also often used for sorting and aggregations
  • 7. NUMERIC ➤ Supports the following numeric types ➤ long (signed 64-bit integer) ➤ integer (signed 32-bit integer) ➤ short (signed 16-bit integer) ➤ byte (signed 8-bit integer) ➤ double (double-precision 64-bit floating point) ➤ float (single-precision 32-bit floating point)
  • 8. DATE ➤ Dates in Elasticsearch can be either ➤ Strings containing formatted dates ➤ E.g. 2016-01-01 or 2016/01/01 12:00:00 ➤ A long number representing milliseconds since the epoch ➤ An integer representing seconds since the epoch ➤ Internally stored as a long number representing milliseconds since the epoch
  • 9. DATE - FORMATS ➤ Defaults to strict_date_optional_time||epoch_millis ➤ Dates with optional timestamps, which conform to the formats supported by strict_date_optional_time - or milliseconds since the epoch ➤ Examples ➤ 2016-01-01 (date only) ➤ 2016-01-01T12:00:00Z (date including time) ➤ 1410020500000 (milliseconds since the epoch) ➤ Multiple formats can be specified by separating them with the || separator ➤ E.g. yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis
  • 10. BOOLEAN ➤ Boolean fields accept true and false values as in JSON ➤ Can also accept strings and numbers which are interpreted as either true or false ➤ False values ➤ false, "false", "off", "no", "0", "" (empty string), 0, 0.0 ➤ True values ➤ Anything that is not false
  • 11. BINARY ➤ A binary value as a Base64 encoded string ➤ E.g. aHR0cDovL2NvZGluZ2V4cGxhaW5lZC5jb20= ➤ Not searchable
  • 13. OBJECT ➤ JSON documents are hierarchical ➤ A document may contain inner objects, which in turn may contain inner objects ➤ In Elasticsearch, documents are indexed as flat lists of key-value pairs { "message": "Some text...", "customer.age": 26, "customer.address.city": "Copenhagen", "customer.address.country": "Denmark" }
  • 14. ARRAY ➤ Elasticsearch does not have a dedicated array type ➤ Any field can contain zero or more values by default ➤ All values in an array must be of the same data type ➤ When adding a field dynamically, the first value in the array determines the field type ➤ Examples ➤ Array of strings: ["Elasticsearch", "rocks"] ➤ Array of integers: [1, 2] ➤ Array of arrays: [1, [2, 3]] - equivalent of [1, 2, 3] ➤ Array of objects: [{ "name": "Andy", "age": 26 }, { "name": "Brenda", "age": 32 }]
  • 15. ARRAY - OBJECTS ➤ Arrays of objects do not work as you would expect ➤ You cannot query each object independently of the other objects in the array ➤ Lucene has no concept of inner objects ➤ Elasticsearch flattens object hierarchies into a list of field names and values is stored similar to this: { "users : [{ "name": "Andy", "age": 26 }, { "name": "Brenda", "age": 32 }] } { "users.name": ["Andy", "Brenda"], "users.age": [32, 26] } ➤ The association between "Andy" and 26 is lost ➤ A search for a user named "Andy" who is 26 years old would return incorrect results! ➤ If you need to be able to do this, then you must use the nested data type
  • 16. NESTED ➤ If you need to index arrays of objects and to maintain the independence of each object in the array, you should used the nested data type ➤ Internally, nested objects index each object in the array as a separate hidden document ➤ Each nested object can be queried independently of the others, with a nested query ➤ A nested query is executed against the nested objects as if they were indexed as separate documents (internally, this is actually the case)
  • 18. GEO-POINT ➤ Latitude-longitude pairs ➤ Used for geographical operations on documents (searching, sorting, ...) { "location": { "lat": 33.5206608, "lon": -86.8024900 } } { "location": "33.5206608,-86.8024900" } { "location": "drm3btev3e86" } { "location": [-86.8024900,33.5206608] } 1 2 3 4
  • 19. GEO-SHAPE ➤ Geo shapes such as rectangles and polygons ➤ Should be used when either the data being indexed or the queries being executed contain shapes other than just points ➤ LineString ➤ Array of two or more positions (array of arrays). Straight line in the case of two points ➤ Polygon ➤ An array of arrays, where each array contains points ➤ The first and last points in the outer array must be the same (to close the polygon) ➤ ...
  • 21. IPV4 ➤ Used to map IPv4 addresses ➤ Internally, values are indexed as long values
  • 22. COMPLETION ➤ The completion suggester is a so-called prefix suggester ➤ It does not do spell correction, but enables basic auto-complete functionality ➤ Useful for providing the user with suggestions while searching, e.g. like on Google ➤ Stores a FST (Finite State Transducer) as part of the index ➤ Allows for very fast loads and executions ➤ You don't have to worry about this - just know when to use this type
  • 23. TOKEN COUNT ➤ An integer field which accepts string values ➤ The string values are analyzed, and the number of tokens are indexed ➤ Example ➤ A name property could have a length field of the type token_count ➤ Then, a search query could be executed to find persons whose name contains X tokens (split by space, for instance)
  • 24. ATTACHMENT ➤ Lets Elasticsearch index attachments in common formats ➤ E.g. PDF, XLS, PPT, ... ➤ Attachment content is stored as a Base64 encoded string ➤ This functionality is available as a plugin that must be installed ➤ sudo /path/to/elasticsearchbin/plugin install mapper-attachments ➤ Must be installed on every node of a cluster ➤ Nodes must be restarted after the installation