SlideShare a Scribd company logo
www.impetus.com
Handling
Data Corruption
in Elasticsearch
This white paper focuses on handling data corruption in Elasticsearch. It describes
how to recover data from corrupted indices of Elasticsearch and
re-index that data in a new index.
The paper also guides you about Lucene’s index terminology.
Elasticsearch is an Open Source, schema free, and restful search engine built
on Apache Lucene. It has a stand-alone database server for data intake and
storage in a format optimized for language-based searches and a JSON-based
access API for ease-of-use.
An Elasticsearch cluster can be horizontally scaled by adding a new node at
runtime to cater to the increased volume of data as per need. It uses
zen-discovery for internal co-ordination between the nodes in a cluster.
Failover and high availability can be achieved by using replication and using a
distributed cluster setup.
What is Elasticsearch?
2
Data Replication
Data replication is used for high data availability. For example, if the replication
factor is 1, then there will be one replica of each primary shard. In case of
replication, there are rare chances of data loss. If the primary shard fails, then a
replica of that shard is used to manage the cluster in a stable state. If we
perform any query or other operation, it will be served by that shard. This
enables us to recover the data in case of data replication.
However, data replication has its own set of limitations like storage. In such
cases, where users do not want to replicate due to storage issues, recovering
the data of index if any primary shard gets corrupt is a major challenge.
Data Recovery from Corrupted Index
Data can be recovered from corrupted index by reading data files of an index
and re-indexing it to a new index. However, to recover the data, the user needs
to store all the fields in Elasticsearch, which stores and indexes the data as
Lucene files.
Each shard in the index may have multiple segments, which, if corrupt, makes
the index unstable. To make the data searchable, index must be in stable
state, which can be ensured in two ways:
• Run optimize operation on an index and merge all segments to one in a
shard. This may cause data loss since it removes the reference of that
particular segment of which data got corrupt.
• Recover the data by reading data files and re-indexing the same.
3
Lucene uses many files for an index. The table below highlights the four major
files that can be used to recover the data:
Name Extension Brief Description
Fields .fnm Stores information about the fields
Field Index .fdx Contains pointers to field data
Field Data .fdt The stored fields for documents
Segment Info .si Stores metadata about a segment
Note: If any of these files are corrupt, there are chances of data loss in case of
zero replication.
There are four steps to recover data from the corrupted index, which are
detailed below:
Before data recovery, it is important to identify the shard id of corrupted shard
of an index. Corrupted shards can be identified using UNASSIGNED state of
shard. However, you need to ensure that the whole cluster is in running state
and all the nodes are up. You can find a list of unassigned shards from
Elasticsearch cluster state. There are different ways of getting cluster state, for
example using curl request:
$ curl -XGET 'http://localhost:9200/_cluster/state'
Identify corrupted shards of index
You can identify the shard directory by logic dependent on the Elasticsearch
home and cluster name. If there is only one node on the machine, use the
shard id and index name to identify the shard directory.
Identify shard’s index directory
String shardDir=new
StringBuilder().append(esHome).append("/").append(dataDi
rectryName).append("/").append(clusterName).append("/nod
es/0/indices/").append(indexName).append("/").append(sha
rdId).append("/index").toString();
4
public void readAndReindexData(String indexName, String
indexDir,String newIndexName) {
try {
Codec codec = new Lucene42Codec();
File indexDirectory = new File(indexDir);
Directory dir = FSDirectory.open(indexDirectory);
List<String> segmentList = new ArrayList<String>();
/* Identify segment list by listing files in shard
directory. Each segment will have .si file */
for (File f : FileUtils.listFiles(indexDirectory, new
RegexFileFilter("_.*.si"), null)) {
String s = f.getName();
segmentList.add(s.substring(0,
s.indexOf('.')));
}
int total=0;
// Iterate over each segment of that shard and reindex
that
for (String segmentName : segmentList) {
try{
IOContext ioContext = new IOContext();
SegmentInfo segmentInfos =
codec.segmentInfoFormat().getSegmentInfoReader().read(dir,
segmentName, ioContext);
Directory segmentDir;
if (segmentInfos.getUseCompoundFile()) {
segmentDir = new CompoundFileDirectory(dir,
IndexFileNames.segmentFileName(segmentName, "",
IndexFileNames.COMPOUND_FILE_EXTENSION), ioContext,
false);
} else {
segmentDir = dir;
}
// Collect fields information
FieldInfos fieldInfos =
codec.fieldInfosFormat().getFieldInfosReader().read(segmen
tDir, segmentName, ioContext);
StoredFieldsReader storedFieldsReader =
codec.storedFieldsFormat() .fieldsReader(segmentDir,
segmentInfos, fieldInfos, ioContext);
Read data of corrupted shard using .fdt, .fdx files
There may be number of segments in an index, which one needs to identify
and then read the data of a particular segment. After reading a document from
a segment, you can insert the document into another index.
A sample code to read data from index using .fdt, .fdx, .fnm, and .si files is
given below:
5
total=total+segment?Infos.getDocCount();
for (int i = 0; i < segmentInfos.getDocCount(); ++i) {
try {
DocumentStoredFieldVisitor visitor = new
DocumentStoredFieldVisitor();
storedFieldsReader.visitDocument(i, visitor);
Document doc = visitor.getDocument();
// Get list of fields of a document
List<IndexableField> list = doc.getFields();
Map<String, Object> tempMap = new HashMap<String,
Object>();
for (IndexableField indexableField : list) {
tempMap.put(indexableField.name(),
indexableField.stringValue());
}
// Re-index the document in new index
this.index(tempMap,newIndexName);
} catch (Exception e) {
System.out.println("Couldn't get document " + i + ",
stored fields corruption.");
}}}catch(Exception e){}
}
System.out.println(total+" documents recovered.");
}catch (Exception e) {
e.printStackTrace();
}
}
When you read a document from the index, the document contains uid and
source fields. You can get the document id from uid field. Before indexing the
document, you need to remove the uid and source field, because Lucene add
these two fields by default when any document is indexed.
Re-index data in new index
© 2014 Impetus Technologies, Inc.
All rights reserved. Product and
company names mentioned herein
may be trademarks of their
respective companies.
August 2014
Impetus is a Software Solutions and Services Company with deep technical
maturity that brings you thought leadership, proactive innovation, and a
track record of success. Our Services and Solutions portfolio includes
Carrier grade large systems, Big Data, Cloud, Enterprise Mobility, and Test
and Performance Engineering.
Visit www.impetus.com or write to us at inquiry@impetus.com
About Impetus
Conclusion
As the data volume is increasing rapidly, it is a challenge for organizations to
replicate the data due to storage cost. Elasticsearch addresses this challenge
effectively and helps organizations recover data from corrupted Elasticsearch
index.
// Re-index the document in new index
private void index(Map<String, Object> record,String
newIndexName){
String docId=((String) record.get("_uid")).split("#")[1];
String mappingType=((String)
record.get("_uid")).split("#")[0];
record.remove("_uid");
record.remove("_source");
IndexRequest indexRequest = new IndexRequest(newIndexName,
mappingType, docId);
indexRequest.source(record);
BulkRequestBuilder bulkRequestBuilder =
client.prepareBulk();
bulkRequestBuilder.add(indexRequest);
bulkRequestBuilder.execute().actionGet();
}
Testing Environment:
Elasticsearch- 0.90.5
Java - 1.6.45
Operating System- RHEL
A sample code to re-index the documents using same document ids is given
below:

More Related Content

What's hot

Scmad Chapter08
Scmad Chapter08Scmad Chapter08
Scmad Chapter08
Marcel Caraciolo
 
SQL Prepared Statements Tutorial
SQL Prepared Statements TutorialSQL Prepared Statements Tutorial
SQL Prepared Statements Tutorial
ProdigyView
 
Java căn bản - Chapter12
Java căn bản - Chapter12Java căn bản - Chapter12
Java căn bản - Chapter12Vince Vo
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
Abdelrahman Othman Helal
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Sperasoft
 
Apache Lucene Basics
Apache Lucene BasicsApache Lucene Basics
Apache Lucene Basics
Anirudh Sharma
 
MS Sql Server: Manipulating Database
MS Sql Server: Manipulating DatabaseMS Sql Server: Manipulating Database
MS Sql Server: Manipulating Database
DataminingTools Inc
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 -  MongoDB Aggregation Framework.pptxSH 2 - SES 3 -  MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptxMongoDB
 
Reproducible, Open Data Science in the Life Sciences
Reproducible, Open  Data Science in the  Life SciencesReproducible, Open  Data Science in the  Life Sciences
Reproducible, Open Data Science in the Life Sciences
Eamonn Maguire
 
Chapter 10: hashing data structure
Chapter 10:  hashing data structureChapter 10:  hashing data structure
Chapter 10: hashing data structure
Mahmoud Alfarra
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
rainynovember12
 
linked_lists3
linked_lists3linked_lists3
linked_lists3
Mohamed Elsayed
 
ADO.net control
ADO.net controlADO.net control
ADO.net control
Paneliya Prince
 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structures
ecomputernotes
 
Correlated update vs merge
Correlated update vs mergeCorrelated update vs merge
Correlated update vs merge
Heribertus Bramundito
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
Eamonn Maguire
 
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Fujio Turner
 
Sql introduction
Sql introductionSql introduction
Sql introduction
vimal_guru
 

What's hot (19)

Scmad Chapter08
Scmad Chapter08Scmad Chapter08
Scmad Chapter08
 
SQL Prepared Statements Tutorial
SQL Prepared Statements TutorialSQL Prepared Statements Tutorial
SQL Prepared Statements Tutorial
 
Java căn bản - Chapter12
Java căn bản - Chapter12Java căn bản - Chapter12
Java căn bản - Chapter12
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
บทที่4
บทที่4บทที่4
บทที่4
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Apache Lucene Basics
Apache Lucene BasicsApache Lucene Basics
Apache Lucene Basics
 
MS Sql Server: Manipulating Database
MS Sql Server: Manipulating DatabaseMS Sql Server: Manipulating Database
MS Sql Server: Manipulating Database
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 -  MongoDB Aggregation Framework.pptxSH 2 - SES 3 -  MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
 
Reproducible, Open Data Science in the Life Sciences
Reproducible, Open  Data Science in the  Life SciencesReproducible, Open  Data Science in the  Life Sciences
Reproducible, Open Data Science in the Life Sciences
 
Chapter 10: hashing data structure
Chapter 10:  hashing data structureChapter 10:  hashing data structure
Chapter 10: hashing data structure
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
 
linked_lists3
linked_lists3linked_lists3
linked_lists3
 
ADO.net control
ADO.net controlADO.net control
ADO.net control
 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structures
 
Correlated update vs merge
Correlated update vs mergeCorrelated update vs merge
Correlated update vs merge
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
 
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
 
Sql introduction
Sql introductionSql introduction
Sql introduction
 

Similar to Impetus White Paper- Handling Data Corruption in Elasticsearch

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
Elastic search
Elastic searchElastic search
Elastic search
Binit Pathak
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
Tiziano Fagni
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 
Elasticsearch: An Overview
Elasticsearch: An OverviewElasticsearch: An Overview
Elasticsearch: An Overview
Ruby Shrestha
 
Recursively Searching Files and DirectoriesSummaryBuild a class .pdf
Recursively Searching Files and DirectoriesSummaryBuild a class .pdfRecursively Searching Files and DirectoriesSummaryBuild a class .pdf
Recursively Searching Files and DirectoriesSummaryBuild a class .pdf
mallik3000
 
Hazelcast
HazelcastHazelcast
Hazelcast
Jeevesh Pandey
 
A FAST METHOD FOR IMPLEMENTATION OF THE PROPERTY LISTS IN PROGRAMMING LANGUAGES
A FAST METHOD FOR IMPLEMENTATION OF THE PROPERTY LISTS IN PROGRAMMING LANGUAGESA FAST METHOD FOR IMPLEMENTATION OF THE PROPERTY LISTS IN PROGRAMMING LANGUAGES
A FAST METHOD FOR IMPLEMENTATION OF THE PROPERTY LISTS IN PROGRAMMING LANGUAGES
ijpla
 
Share point 2013 coding standards and best practices 1.0
Share point 2013 coding standards and best practices 1.0Share point 2013 coding standards and best practices 1.0
Share point 2013 coding standards and best practices 1.0
LiquidHub
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Divij Sehgal
 
D0373024030
D0373024030D0373024030
D0373024030
theijes
 
ASP.Net Presentation Part2
ASP.Net Presentation Part2ASP.Net Presentation Part2
ASP.Net Presentation Part2Neeraj Mathur
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Pratyush Majumdar
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Advanced Web Programming Chapter 12
Advanced Web Programming Chapter 12Advanced Web Programming Chapter 12
Advanced Web Programming Chapter 12
RohanMistry15
 

Similar to Impetus White Paper- Handling Data Corruption in Elasticsearch (20)

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Elasticsearch: An Overview
Elasticsearch: An OverviewElasticsearch: An Overview
Elasticsearch: An Overview
 
Recursively Searching Files and DirectoriesSummaryBuild a class .pdf
Recursively Searching Files and DirectoriesSummaryBuild a class .pdfRecursively Searching Files and DirectoriesSummaryBuild a class .pdf
Recursively Searching Files and DirectoriesSummaryBuild a class .pdf
 
Hazelcast
HazelcastHazelcast
Hazelcast
 
A FAST METHOD FOR IMPLEMENTATION OF THE PROPERTY LISTS IN PROGRAMMING LANGUAGES
A FAST METHOD FOR IMPLEMENTATION OF THE PROPERTY LISTS IN PROGRAMMING LANGUAGESA FAST METHOD FOR IMPLEMENTATION OF THE PROPERTY LISTS IN PROGRAMMING LANGUAGES
A FAST METHOD FOR IMPLEMENTATION OF THE PROPERTY LISTS IN PROGRAMMING LANGUAGES
 
Share point 2013 coding standards and best practices 1.0
Share point 2013 coding standards and best practices 1.0Share point 2013 coding standards and best practices 1.0
Share point 2013 coding standards and best practices 1.0
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
ADO.NET
ADO.NETADO.NET
ADO.NET
 
D0373024030
D0373024030D0373024030
D0373024030
 
ASP.Net Presentation Part2
ASP.Net Presentation Part2ASP.Net Presentation Part2
ASP.Net Presentation Part2
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
SphinxSE with MySQL
SphinxSE with MySQLSphinxSE with MySQL
SphinxSE with MySQL
 
Advanced Web Programming Chapter 12
Advanced Web Programming Chapter 12Advanced Web Programming Chapter 12
Advanced Web Programming Chapter 12
 

More from Impetus Technologies

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Impetus Technologies
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Impetus Technologies
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Impetus Technologies
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Impetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Impetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Impetus Technologies
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Impetus Technologies
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Impetus Technologies
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Impetus Technologies
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
Impetus Technologies
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
Impetus Technologies
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Impetus Technologies
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Impetus Technologies
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Impetus Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trendsImpetus Technologies
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
Impetus Technologies
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
Impetus Technologies
 
Real-time Predictive Analytics in Manufacturing - Impetus Webinar
Real-time Predictive Analytics in Manufacturing - Impetus WebinarReal-time Predictive Analytics in Manufacturing - Impetus Webinar
Real-time Predictive Analytics in Manufacturing - Impetus Webinar
Impetus Technologies
 

More from Impetus Technologies (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trends
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 
Real-time Predictive Analytics in Manufacturing - Impetus Webinar
Real-time Predictive Analytics in Manufacturing - Impetus WebinarReal-time Predictive Analytics in Manufacturing - Impetus Webinar
Real-time Predictive Analytics in Manufacturing - Impetus Webinar
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Impetus White Paper- Handling Data Corruption in Elasticsearch

  • 1. www.impetus.com Handling Data Corruption in Elasticsearch This white paper focuses on handling data corruption in Elasticsearch. It describes how to recover data from corrupted indices of Elasticsearch and re-index that data in a new index. The paper also guides you about Lucene’s index terminology.
  • 2. Elasticsearch is an Open Source, schema free, and restful search engine built on Apache Lucene. It has a stand-alone database server for data intake and storage in a format optimized for language-based searches and a JSON-based access API for ease-of-use. An Elasticsearch cluster can be horizontally scaled by adding a new node at runtime to cater to the increased volume of data as per need. It uses zen-discovery for internal co-ordination between the nodes in a cluster. Failover and high availability can be achieved by using replication and using a distributed cluster setup. What is Elasticsearch? 2 Data Replication Data replication is used for high data availability. For example, if the replication factor is 1, then there will be one replica of each primary shard. In case of replication, there are rare chances of data loss. If the primary shard fails, then a replica of that shard is used to manage the cluster in a stable state. If we perform any query or other operation, it will be served by that shard. This enables us to recover the data in case of data replication. However, data replication has its own set of limitations like storage. In such cases, where users do not want to replicate due to storage issues, recovering the data of index if any primary shard gets corrupt is a major challenge. Data Recovery from Corrupted Index Data can be recovered from corrupted index by reading data files of an index and re-indexing it to a new index. However, to recover the data, the user needs to store all the fields in Elasticsearch, which stores and indexes the data as Lucene files. Each shard in the index may have multiple segments, which, if corrupt, makes the index unstable. To make the data searchable, index must be in stable state, which can be ensured in two ways: • Run optimize operation on an index and merge all segments to one in a shard. This may cause data loss since it removes the reference of that particular segment of which data got corrupt. • Recover the data by reading data files and re-indexing the same.
  • 3. 3 Lucene uses many files for an index. The table below highlights the four major files that can be used to recover the data: Name Extension Brief Description Fields .fnm Stores information about the fields Field Index .fdx Contains pointers to field data Field Data .fdt The stored fields for documents Segment Info .si Stores metadata about a segment Note: If any of these files are corrupt, there are chances of data loss in case of zero replication. There are four steps to recover data from the corrupted index, which are detailed below: Before data recovery, it is important to identify the shard id of corrupted shard of an index. Corrupted shards can be identified using UNASSIGNED state of shard. However, you need to ensure that the whole cluster is in running state and all the nodes are up. You can find a list of unassigned shards from Elasticsearch cluster state. There are different ways of getting cluster state, for example using curl request: $ curl -XGET 'http://localhost:9200/_cluster/state' Identify corrupted shards of index You can identify the shard directory by logic dependent on the Elasticsearch home and cluster name. If there is only one node on the machine, use the shard id and index name to identify the shard directory. Identify shard’s index directory String shardDir=new StringBuilder().append(esHome).append("/").append(dataDi rectryName).append("/").append(clusterName).append("/nod es/0/indices/").append(indexName).append("/").append(sha rdId).append("/index").toString();
  • 4. 4 public void readAndReindexData(String indexName, String indexDir,String newIndexName) { try { Codec codec = new Lucene42Codec(); File indexDirectory = new File(indexDir); Directory dir = FSDirectory.open(indexDirectory); List<String> segmentList = new ArrayList<String>(); /* Identify segment list by listing files in shard directory. Each segment will have .si file */ for (File f : FileUtils.listFiles(indexDirectory, new RegexFileFilter("_.*.si"), null)) { String s = f.getName(); segmentList.add(s.substring(0, s.indexOf('.'))); } int total=0; // Iterate over each segment of that shard and reindex that for (String segmentName : segmentList) { try{ IOContext ioContext = new IOContext(); SegmentInfo segmentInfos = codec.segmentInfoFormat().getSegmentInfoReader().read(dir, segmentName, ioContext); Directory segmentDir; if (segmentInfos.getUseCompoundFile()) { segmentDir = new CompoundFileDirectory(dir, IndexFileNames.segmentFileName(segmentName, "", IndexFileNames.COMPOUND_FILE_EXTENSION), ioContext, false); } else { segmentDir = dir; } // Collect fields information FieldInfos fieldInfos = codec.fieldInfosFormat().getFieldInfosReader().read(segmen tDir, segmentName, ioContext); StoredFieldsReader storedFieldsReader = codec.storedFieldsFormat() .fieldsReader(segmentDir, segmentInfos, fieldInfos, ioContext); Read data of corrupted shard using .fdt, .fdx files There may be number of segments in an index, which one needs to identify and then read the data of a particular segment. After reading a document from a segment, you can insert the document into another index. A sample code to read data from index using .fdt, .fdx, .fnm, and .si files is given below:
  • 5. 5 total=total+segment?Infos.getDocCount(); for (int i = 0; i < segmentInfos.getDocCount(); ++i) { try { DocumentStoredFieldVisitor visitor = new DocumentStoredFieldVisitor(); storedFieldsReader.visitDocument(i, visitor); Document doc = visitor.getDocument(); // Get list of fields of a document List<IndexableField> list = doc.getFields(); Map<String, Object> tempMap = new HashMap<String, Object>(); for (IndexableField indexableField : list) { tempMap.put(indexableField.name(), indexableField.stringValue()); } // Re-index the document in new index this.index(tempMap,newIndexName); } catch (Exception e) { System.out.println("Couldn't get document " + i + ", stored fields corruption."); }}}catch(Exception e){} } System.out.println(total+" documents recovered."); }catch (Exception e) { e.printStackTrace(); } } When you read a document from the index, the document contains uid and source fields. You can get the document id from uid field. Before indexing the document, you need to remove the uid and source field, because Lucene add these two fields by default when any document is indexed. Re-index data in new index
  • 6. © 2014 Impetus Technologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. August 2014 Impetus is a Software Solutions and Services Company with deep technical maturity that brings you thought leadership, proactive innovation, and a track record of success. Our Services and Solutions portfolio includes Carrier grade large systems, Big Data, Cloud, Enterprise Mobility, and Test and Performance Engineering. Visit www.impetus.com or write to us at inquiry@impetus.com About Impetus Conclusion As the data volume is increasing rapidly, it is a challenge for organizations to replicate the data due to storage cost. Elasticsearch addresses this challenge effectively and helps organizations recover data from corrupted Elasticsearch index. // Re-index the document in new index private void index(Map<String, Object> record,String newIndexName){ String docId=((String) record.get("_uid")).split("#")[1]; String mappingType=((String) record.get("_uid")).split("#")[0]; record.remove("_uid"); record.remove("_source"); IndexRequest indexRequest = new IndexRequest(newIndexName, mappingType, docId); indexRequest.source(record); BulkRequestBuilder bulkRequestBuilder = client.prepareBulk(); bulkRequestBuilder.add(indexRequest); bulkRequestBuilder.execute().actionGet(); } Testing Environment: Elasticsearch- 0.90.5 Java - 1.6.45 Operating System- RHEL A sample code to re-index the documents using same document ids is given below: