SlideShare a Scribd company logo
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
≈ LEAVE A COMMENT 
[] 
Tags 
leveragebigdata 
— 
Elastic Search integration with Hadoop 
28 Saturday Jun 2014 
POSTED BY LEVERAGEBIGDATA IN UNCATEGORIZED 
Elastic Search, Hadoop, Hive, MapReduce 
Elastic is open source distributed search engine, based on lucene framework with Rest API. You 
can download the elastic search using the URL 
http://www.elasticsearch.org/overview/elkdownloads/. Unzip the downloaded zip or tar file and 
then start one instance or node of elastic search by running the script ‘elasticsearch- 
1.2.1/bin/elasticsearch’ as shown below: 
Installing plugin: 
We can install plugins for enhance feature like elasticsearch-head provide the web interface to 
interact with its cluster. Use the command ‘elasticsearch-1.2.1/bin/plugin -install 
mobz/elasticsearch-head’ as shown below: 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 1/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
And, Elastic Search web interface can be using url: http://localhost:9200/_plugin/head/ 
Creating the index: 
(You can skip this step) In Search domain, index is like relational database. By default number of 
shared created is ’5′ and replication factor “1″ which can be changed on creation depending on 
your requirement. We can increase the number of replication factor but not number of shards. 
1 curl -XPUT "http://localhost:9200/movies/" -d '{"settings" : {"number_of_shards" Create Elastic Search Index 
Loading data to Elastic Search: 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 2/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
If we put data to the search domain it will automatically create the index. 
Load data using -XPUT 
We need to specify the id (1) as shown below: 
1 curl -XPUT "http://localhost:9200/movies/movie/1" -d '{"title": "Men with Wings", 1 curl -XPOST "http://localhost:9200/movies/movie" -d' { "title": "Lawrence of Arabia", 1 curl -XPOST "http://localhost:9200/_search" -d' { "query": { "query_string": { Note: movies->index, movie->index type, 1->id 
Elastic Search -XPUT 
Load data using -XPOST 
The id will be automatically generated as shown below: 
Elastic Search -XPOST 
Note: _id: U2oQjN5LRQCW8PWBF9vipA is automatically generated. 
The _search endpoint 
The index document can be searched using below query: 
ES Search Result 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 3/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
Integrating with Map Reduce (Hadoop 1.2.1) 
To integrate Elastic Search with Map Reduce follow the below steps: 
Add a dependency to pom.xml: 
123456789 
<dependency> 
<groupId>org.elasticsearch</groupId> 
<artifactId>elasticsearch-hadoop</artifactId> 
<version>2.0.0</version> 
</dependency> 
or Download and add elasticSearch-hadoop.jar file to classpath. 
Elastic Search as source & HDFS as sink: 
In Map Reduce job, you specify the index/index type of search engine from where you need to 
fetch data in hdfs file system. And input format type as ‘EsInputFormat’ (This format type is 
defined in elasticsearch-hadoop jar). In org.apache.hadoop.conf.Configuration set elastic search 
index type using field ‘es.resource’ and any search query using field ‘es.query’ and also set 
InputFormatClass as ‘EsInputFormat’ as shown below: 
ElasticSourceHadoopSinkJob.java 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
import java.io.IOException; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.MapWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
import org.elasticsearch.hadoop.mr.EsInputFormat; 
public class ElasticSourceHadoopSinkJob { 
public static void main(String arg[]) throws IOException, ClassNotFoundException, Configuration conf = new Configuration(); 
conf.set("es.resource", "movies/movie"); 
//conf.set("es.query", "?q=kill"); 
final Job job = new Job(conf, 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 4/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
"Get information from elasticSearch."); 
job.setJarByClass(ElasticSourceHadoopSinkJob.class); 
job.setMapperClass(ElasticSourceHadoopSinkMapper.class); 
job.setInputFormatClass(EsInputFormat.class); 
job.setOutputFormatClass(TextOutputFormat.class); 
job.setNumReduceTasks(0); 
job.setMapOutputKeyClass(Text.class); 
job.setMapOutputValueClass(MapWritable.class); 
FileOutputFormat.setOutputPath(job, new Path(arg[0])); 
System.exit(job.waitForCompletion(true) ? 0 : 1); 
} 
} 
ElasticSourceHadoopSinkMapper.java 
123456789 
10 
11 
12 
13 
14 
15 
import java.io.IOException; 
import org.apache.hadoop.io.MapWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
public class ElasticSourceHadoopSinkMapper extends Mapper<Object, MapWritable, @Override 
protected void map(Object key, MapWritable value, 
Context context) 
throws IOException, InterruptedException { 
context.write(new Text(key.toString()), value); 
} 
} 
HDFS as source & Elastic Search as sink: 
In Map Reduce job, specify the index/index type of search engine from where you need to load 
data from hdfs file system. And input format type as ‘EsOutputFormat’ (This format type is 
defined in elasticsearch-hadoop jar). ElasticSinkHadoopSourceJob.java 
123456789 10 
11 
import java.io.IOException; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.MapWritable; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 
import org.elasticsearch.hadoop.mr.EsOutputFormat; 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 5/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
public class ElasticSinkHadoopSourceJob { 
public static void main(String str[]) throws IOException, ClassNotFoundException, Configuration conf = new Configuration(); 
conf.set("es.resource", "movies/movie"); 
final Job job = new Job(conf, 
"Get information from elasticSearch."); 
job.setJarByClass(ElasticSinkHadoopSourceJob.class); 
job.setMapperClass(ElasticSinkHadoopSourceMapper.class); 
job.setInputFormatClass(TextInputFormat.class); 
job.setOutputFormatClass(EsOutputFormat.class); 
job.setNumReduceTasks(0); 
job.setMapOutputKeyClass(NullWritable.class); 
job.setMapOutputValueClass(MapWritable.class); 
FileInputFormat.setInputPaths(job, new Path("data/ElasticSearchData")); 
System.exit(job.waitForCompletion(true) ? 0 : 1); 
} 
} 
ElasticSinkHadoopSourceMapper.java 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
import java.io.IOException; 
import org.apache.hadoop.io.ArrayWritable; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.MapWritable; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
public class ElasticSinkHadoopSourceMapper extends Mapper<LongWritable, Text, @Override 
protected void map(LongWritable key, Text value, 
Context context) 
throws IOException, InterruptedException { 
String[] splitValue=value.toString().split(","); 
MapWritable doc = new MapWritable(); 
doc.put(new Text("year"), new IntWritable(Integer.parseInt(splitValue[0]))); 
doc.put(new Text("title"), new Text(splitValue[1])); 
doc.put(new Text("director"), new Text(splitValue[2])); 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 6/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
String genres=splitValue[3]; 
if(genres!=null){ 
String[] splitGenres=genres.split("$"); 
ArrayWritable genresList=new ArrayWritable(splitGenres); 
doc.put(new Text("genres"), genresList); 
} 
context.write(NullWritable.get(), doc); 
} 
} 
Integrate with Hive: 
Download elasticsearch-hadoop.jar file and include it in path using hive.aux.jars.path as shown 
below: bin/hive –hiveconf hive.aux.jars.path=<path-of-jar>/elasticsearch-hadoop-2.0.0.jar or ADD 
elasticsearch-hadoop-2.0.0.jar to <hive-home>/lib and <hadoop-home>/lib 
Elastic Search as source & Hive as sink: 
Now, create external table to load data from Elastic search as shown below: 
1 CREATE EXTERNAL TABLE movie (id BIGINT, title STRING, director STRING, year BIGINT, 1 CREATE TABLE movie_internal (title STRING, id BIGINT, director STRING, year BIGINT, You need to specify the elastic search index type using ‘es.resource’ and can specify query using 
‘es.query’. 
Load data from Elastic Search to Hive 
Elastic Search as sink & Hive as source: 
Create an internal table in hive like ‘movie_internal’ and load data to it. Then load data from 
internal table to elastic search as shown below: 
Create internal table: 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 7/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
Load data to internal table: 
1 LOAD DATA LOCAL INPATH '<path>/hiveElastic.txt' OVERWRITE INTO TABLE movie_internal; 
hiveElastic.txt 
12 
Title1,1,dire1,2003,Action$Crime$Thriller 
Title2,2,dire2,2007,Biography$Crime$Drama 
Load data from hive internal table to ElasticSearch : 
1 INSERT OVERWRITE TABLE movie SELECT NULL, m.title, m.director, m.year, m.genres Load data from Hive to Elastic Search 
Verify inserted data from Elastic Search query 
References: 
1. ElasticSearch 
2. Apache Hadoop 
3. Apache Hbase 
4. Apache Spark 
5. JBKSoft Technologies 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 8/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
About Occasionally, these ads 
some of your visitors may see an advertisement here. 
Tell me more | Dismiss this message 
Create a free website or blog at WordPress.com. The Chateau Theme. 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 9/9

More Related Content

What's hot

2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic search
Henry Saputra
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES Systems
Chris Birchall
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
Mark Greene
 
Distributed in memory data grid
Distributed in memory data gridDistributed in memory data grid
Distributed in memory data grid
Alexander Albul
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
Holden Karau
 
Powershell for Log Analysis and Data Crunching
 Powershell for Log Analysis and Data Crunching Powershell for Log Analysis and Data Crunching
Powershell for Log Analysis and Data Crunching
Michelle D'israeli
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
Mitsuharu Hamba
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
Luiz Rocha
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearch
martijnvg
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2
Mike Frampton
 
Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014
Puppet
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
NAVER D2
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
Holden Karau
 

What's hot (20)

2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic search
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES Systems
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
 
Distributed in memory data grid
Distributed in memory data gridDistributed in memory data grid
Distributed in memory data grid
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
 
Powershell for Log Analysis and Data Crunching
 Powershell for Log Analysis and Data Crunching Powershell for Log Analysis and Data Crunching
Powershell for Log Analysis and Data Crunching
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearch
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2
 
Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
 

Similar to Elastic search integration with hadoop leveragebigdata

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
Olga Lavrentieva
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduce
Ashok Agarwal
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
Lukas Vlcek
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
Neo4J and Weka 2
Neo4J and Weka 2 Neo4J and Weka 2
Neo4J and Weka 2
Vasko Yordanov
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
Matthew McCullough
 
Streaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchStreaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & Elasticsearch
Keira Zhou
 
Speed up your GWT coding with gQuery
Speed up your GWT coding with gQuerySpeed up your GWT coding with gQuery
Speed up your GWT coding with gQuery
Manuel Carrasco Moñino
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
NexThoughts Technologies
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
Jairam Chandar
 
Javascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSJavascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJS
Ladislav Prskavec
 
Vocanic Map Reduce Lite
Vocanic Map Reduce LiteVocanic Map Reduce Lite
Vocanic Map Reduce Lite
Shreeniwas Iyer
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
Alexander Schätzle
 
Cloud native java script apps
Cloud native java script appsCloud native java script apps
Cloud native java script apps
Gary Sieling
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
Appsembler
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
Gabriele Modena
 
Release 8.1 - Breakfast Paris
Release 8.1 - Breakfast ParisRelease 8.1 - Breakfast Paris
Release 8.1 - Breakfast Paris
Nuxeo
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
Peter Friese
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
C.T.Co
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
Alexey Buzdin
 

Similar to Elastic search integration with hadoop leveragebigdata (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduce
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Neo4J and Weka 2
Neo4J and Weka 2 Neo4J and Weka 2
Neo4J and Weka 2
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
 
Streaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchStreaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & Elasticsearch
 
Speed up your GWT coding with gQuery
Speed up your GWT coding with gQuerySpeed up your GWT coding with gQuery
Speed up your GWT coding with gQuery
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Javascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSJavascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJS
 
Vocanic Map Reduce Lite
Vocanic Map Reduce LiteVocanic Map Reduce Lite
Vocanic Map Reduce Lite
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
 
Cloud native java script apps
Cloud native java script appsCloud native java script apps
Cloud native java script apps
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
 
Release 8.1 - Breakfast Paris
Release 8.1 - Breakfast ParisRelease 8.1 - Breakfast Paris
Release 8.1 - Breakfast Paris
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 

Elastic search integration with hadoop leveragebigdata

  • 1. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata ≈ LEAVE A COMMENT [] Tags leveragebigdata — Elastic Search integration with Hadoop 28 Saturday Jun 2014 POSTED BY LEVERAGEBIGDATA IN UNCATEGORIZED Elastic Search, Hadoop, Hive, MapReduce Elastic is open source distributed search engine, based on lucene framework with Rest API. You can download the elastic search using the URL http://www.elasticsearch.org/overview/elkdownloads/. Unzip the downloaded zip or tar file and then start one instance or node of elastic search by running the script ‘elasticsearch- 1.2.1/bin/elasticsearch’ as shown below: Installing plugin: We can install plugins for enhance feature like elasticsearch-head provide the web interface to interact with its cluster. Use the command ‘elasticsearch-1.2.1/bin/plugin -install mobz/elasticsearch-head’ as shown below: http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 1/9
  • 2. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata And, Elastic Search web interface can be using url: http://localhost:9200/_plugin/head/ Creating the index: (You can skip this step) In Search domain, index is like relational database. By default number of shared created is ’5′ and replication factor “1″ which can be changed on creation depending on your requirement. We can increase the number of replication factor but not number of shards. 1 curl -XPUT "http://localhost:9200/movies/" -d '{"settings" : {"number_of_shards" Create Elastic Search Index Loading data to Elastic Search: http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 2/9
  • 3. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata If we put data to the search domain it will automatically create the index. Load data using -XPUT We need to specify the id (1) as shown below: 1 curl -XPUT "http://localhost:9200/movies/movie/1" -d '{"title": "Men with Wings", 1 curl -XPOST "http://localhost:9200/movies/movie" -d' { "title": "Lawrence of Arabia", 1 curl -XPOST "http://localhost:9200/_search" -d' { "query": { "query_string": { Note: movies->index, movie->index type, 1->id Elastic Search -XPUT Load data using -XPOST The id will be automatically generated as shown below: Elastic Search -XPOST Note: _id: U2oQjN5LRQCW8PWBF9vipA is automatically generated. The _search endpoint The index document can be searched using below query: ES Search Result http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 3/9
  • 4. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata Integrating with Map Reduce (Hadoop 1.2.1) To integrate Elastic Search with Map Reduce follow the below steps: Add a dependency to pom.xml: 123456789 <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>2.0.0</version> </dependency> or Download and add elasticSearch-hadoop.jar file to classpath. Elastic Search as source & HDFS as sink: In Map Reduce job, you specify the index/index type of search engine from where you need to fetch data in hdfs file system. And input format type as ‘EsInputFormat’ (This format type is defined in elasticsearch-hadoop jar). In org.apache.hadoop.conf.Configuration set elastic search index type using field ‘es.resource’ and any search query using field ‘es.query’ and also set InputFormatClass as ‘EsInputFormat’ as shown below: ElasticSourceHadoopSinkJob.java 123456789 10 11 12 13 14 15 16 17 18 19 import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.elasticsearch.hadoop.mr.EsInputFormat; public class ElasticSourceHadoopSinkJob { public static void main(String arg[]) throws IOException, ClassNotFoundException, Configuration conf = new Configuration(); conf.set("es.resource", "movies/movie"); //conf.set("es.query", "?q=kill"); final Job job = new Job(conf, http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 4/9
  • 5. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 "Get information from elasticSearch."); job.setJarByClass(ElasticSourceHadoopSinkJob.class); job.setMapperClass(ElasticSourceHadoopSinkMapper.class); job.setInputFormatClass(EsInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setNumReduceTasks(0); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(MapWritable.class); FileOutputFormat.setOutputPath(job, new Path(arg[0])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ElasticSourceHadoopSinkMapper.java 123456789 10 11 12 13 14 15 import java.io.IOException; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class ElasticSourceHadoopSinkMapper extends Mapper<Object, MapWritable, @Override protected void map(Object key, MapWritable value, Context context) throws IOException, InterruptedException { context.write(new Text(key.toString()), value); } } HDFS as source & Elastic Search as sink: In Map Reduce job, specify the index/index type of search engine from where you need to load data from hdfs file system. And input format type as ‘EsOutputFormat’ (This format type is defined in elasticsearch-hadoop jar). ElasticSinkHadoopSourceJob.java 123456789 10 11 import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.elasticsearch.hadoop.mr.EsOutputFormat; http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 5/9
  • 6. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 public class ElasticSinkHadoopSourceJob { public static void main(String str[]) throws IOException, ClassNotFoundException, Configuration conf = new Configuration(); conf.set("es.resource", "movies/movie"); final Job job = new Job(conf, "Get information from elasticSearch."); job.setJarByClass(ElasticSinkHadoopSourceJob.class); job.setMapperClass(ElasticSinkHadoopSourceMapper.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(EsOutputFormat.class); job.setNumReduceTasks(0); job.setMapOutputKeyClass(NullWritable.class); job.setMapOutputValueClass(MapWritable.class); FileInputFormat.setInputPaths(job, new Path("data/ElasticSearchData")); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ElasticSinkHadoopSourceMapper.java 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import java.io.IOException; import org.apache.hadoop.io.ArrayWritable; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class ElasticSinkHadoopSourceMapper extends Mapper<LongWritable, Text, @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] splitValue=value.toString().split(","); MapWritable doc = new MapWritable(); doc.put(new Text("year"), new IntWritable(Integer.parseInt(splitValue[0]))); doc.put(new Text("title"), new Text(splitValue[1])); doc.put(new Text("director"), new Text(splitValue[2])); http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 6/9
  • 7. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 24 25 26 27 28 29 30 31 32 33 String genres=splitValue[3]; if(genres!=null){ String[] splitGenres=genres.split("$"); ArrayWritable genresList=new ArrayWritable(splitGenres); doc.put(new Text("genres"), genresList); } context.write(NullWritable.get(), doc); } } Integrate with Hive: Download elasticsearch-hadoop.jar file and include it in path using hive.aux.jars.path as shown below: bin/hive –hiveconf hive.aux.jars.path=<path-of-jar>/elasticsearch-hadoop-2.0.0.jar or ADD elasticsearch-hadoop-2.0.0.jar to <hive-home>/lib and <hadoop-home>/lib Elastic Search as source & Hive as sink: Now, create external table to load data from Elastic search as shown below: 1 CREATE EXTERNAL TABLE movie (id BIGINT, title STRING, director STRING, year BIGINT, 1 CREATE TABLE movie_internal (title STRING, id BIGINT, director STRING, year BIGINT, You need to specify the elastic search index type using ‘es.resource’ and can specify query using ‘es.query’. Load data from Elastic Search to Hive Elastic Search as sink & Hive as source: Create an internal table in hive like ‘movie_internal’ and load data to it. Then load data from internal table to elastic search as shown below: Create internal table: http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 7/9
  • 8. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata Load data to internal table: 1 LOAD DATA LOCAL INPATH '<path>/hiveElastic.txt' OVERWRITE INTO TABLE movie_internal; hiveElastic.txt 12 Title1,1,dire1,2003,Action$Crime$Thriller Title2,2,dire2,2007,Biography$Crime$Drama Load data from hive internal table to ElasticSearch : 1 INSERT OVERWRITE TABLE movie SELECT NULL, m.title, m.director, m.year, m.genres Load data from Hive to Elastic Search Verify inserted data from Elastic Search query References: 1. ElasticSearch 2. Apache Hadoop 3. Apache Hbase 4. Apache Spark 5. JBKSoft Technologies http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 8/9
  • 9. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata About Occasionally, these ads some of your visitors may see an advertisement here. Tell me more | Dismiss this message Create a free website or blog at WordPress.com. The Chateau Theme. http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 9/9