SlideShare a Scribd company logo
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. II (May – Jun. 2015), PP 06-12
www.iosrjournals.org
DOI: 10.9790/0661-17320612 www.iosrjournals.org 6 | Page
Leveraging Map Reduce With Hadoop for Weather Data
Analytics
Riyaz P.A.1
, Surekha Mariam Varghese2
1,2.
(Dept. of CSE, Mar Athanasius College of Engineering, Kothamangalam, Kerala, India)
Abstract : Collecting, storing and processing of huge amounts of climatic data is necessary for accurate
prediction of weather. Meteorological departments use different types of sensors such as temperature, humidity
etc. to get the values. Number of sensors and volume and velocity of data in each of the sensors makes the data
processing time consuming and complex. Leveraging MapReduce with Hadoop to process the massive amount
of data. Hadoop is an open framework suitable for large scale data processing. MapReduce programming model
helps to process large data sets in parallel, distributed manner. This project aims to build a data analytical engine
for high velocity, huge volume temperature data from sensors using MapReduce on Hadoop.
Keywords – Bigdata, MapReduce, Hadoop, Weather, Analytics, Temperature
I. Introduction
Big Data has become one of the buzzwords in IT during the last couple of years. Initially it was shaped
by organizations which had to handle fast growth rates of data like web data, data resulting from scientific or
business simulations or other data sources. Some of those companies’ business models are fundamentally based
on indexing and using this large amount of data. The pressure to handle the growing data amount on the web
e.g. lead Google to develop the Google File System [1] and MapReduce [2].
Bigdata and Internet of things are related areas. The sensor data is a kind of Bigdata. The sensor data
have high velocity. The large number of sensors generates high volume of data. While the term “Internet of
Things” encompasses all kinds of gadgets and devices, many of which are designed to perform a single task
well. The main focus with respect to IoT technology is on devices that share information via a connected
network.
India is an emerging country. Now most of the cities have become smart. Different sensors employed
in smart city can be used to measure weather parameters. Weather forecast department has begun collect and
analysis massive amount of data like temperature. They use different sensor values like temperature, humidity to
predict the rain fall etc. When the number of sensors increases, the data becomes high volume and the sensor
data have high velocity data. There is a need of a scalable analytics tool to process massive amount of data.
The traditional approach of process the data is very slow. Process the sensor data with MapReduce in
Hadoop framework which remove the scalability bottleneck. Hadoop is an open framework used for Bigdata
analytics. Hadoop’s main processing engine is MapReduce, which is currently one of the most popular big-data
processing frameworks available. MapReduce is a framework for executing highly parallelizable and
distributable algorithms across huge data sets using a large number of commodity computers. Using Mapreduce
with Hadoop, the temperature can be analyse without scalability issues. The speed of processing data can
increase rapidly when across multi cluster distributed network.
In this paper a new Temperature Data Anlaytical Engine is proposed, which leverages MapReduce with
Hadoop framework of producing results without scalability bottleneck. This paper is organized as follows: in
section II, the works related to MapReduce is discussed. Section III describes the Temperature Data Analytical
Engine implementation. Section IV describes the analytics of NCDC Data. Finally the section V gives the
conclusion for the work.
II. Related Works
2.1 Hadoop
Hadoop is widely used in big data applications in the industry, e.g., spam filtering, network searching,
click-stream analysis, and social recommendation. In addition, considerable academic research is now based on
Hadoop. Some representative cases are given below. As declared in June 2012, Yahoo runs Hadoop in 42,000
servers at four data centers to support its products and services, e.g.,searching and spam filtering, etc. At
present, the biggest Hadoop cluster has 4,000 nodes, but the number of nodes will be increased to 10,000 with
the release of Hadoop 2.0. In the same month, Facebook announced that their Hadoop cluster can process 100
PB data, which grew by 0.5 PB per day as in November 2012. Some well-known agencies that use Hadoop to
conduct distributed computation are listed in [3].
Leveraging Map Reduce with Hadoop for Weather Data Analytics
DOI: 10.9790/0661-17320612 www.iosrjournals.org 7 | Page
In addition, many companies provide Hadoop commercial execution and/or support, including
Cloudera, IBM, MapR, EMC, and Oracle. According to the Gartner Research, Bigdata Analytics is a trending
topic in 2014 [4]. Hadoop is an open framework mostly used for Bigdata Analytics. MapReduce is a
programming paradigm associated with the Hadoop.
Fig.1: Hadoop Environment
Some of its characteristics of Hadoop are following. It is an open-source system developed by Apache
in Java. It is designed to handle very large data sets. It is designed to scale to very large clusters. It is designed to
run on commodity hardware. It offers resilience via data replication. It offers automatic failover in the event of a
crash. It automatically fragments storage over the cluster. It brings processing to the data. Its supports large
volumes of files into the millions. The Hadoop environment as shown in Figure 1
2.2 Map Reduce
Hadoop using HDFS for data storing and using MapReduce to processing that data. HDFS is Hadoop’s
implementation of a distributed filesystem. It is designed to hold a large amount of data, and provide access to
this data to many clients distributed across a network [5]. MapReduce is an excellent model for distributed
computing, introduced by Google in 2004. Each MapReduce job is composed of a certain number of map and
reduce tasks. The MapReduce model for serving multiple jobs consists of a processor sharing queue for the Map
Tasks and a multi-server queue for the Reduce Tasks [6].
To run a MapReduce job, users should furnish a map function, a reduce function, input data, and an
output data location as shown in figure 2. When executed, Hadoop carries out the following steps:
Hadoop breaks the input data into multiple data items by new lines and runs the map function once for
each data item, giving the item as the input for the function. When executed, the map function outputs one or
more key-value pairs.Hadoop collects all the key-value pairs generated from the map function, sorts them by the
key, and groups together the values with the same key.
For each distinct key, Hadoop runs the reduce function once while passing the key and list of values for
that key as input.The reduce function may output one or more key-value pairs, and Hadoop writes them to a file
as the final result.Hadoop allows the user to configure the job, submit it, control its execution, and query the
state. Every job consists of independent tasks, and all the tasks need to have a system slot to run [6]. In Hadoop
all scheduling and allocation decisions are made on a task and node slot level for both the map and reduce
phases[7].
There are three important scheduling issues in MapReduce such as locality, synchronization and
fairness. Locality is defined as the distance between the input data node and task-assigned node.
Synchronization is the process of transferring the intermediate output of the map processes to the reduce process
Leveraging Map Reduce with Hadoop for Weather Data Analytics
DOI: 10.9790/0661-17320612 www.iosrjournals.org 8 | Page
Fig.2: MapReduce Framework
As input is also consider as a factor which affects the performance. Fairness finiteness have trade-offs
between the locality and dependency between the map and reduce phases. Due to the important issues and many
more problems in scheduling of MapReduce, the Scheduling is one of the most critical aspects of MapReduce.
There are many algorithms to solve this issue with different techniques and approaches. Some of them get focus
to improvement data locality and some of them implements to provide Synchronization processing and many of
them have been designed to minimizing the total completion time.
E.Dede et.al [8] proposed MARISSA allows for: Iterative application support: The ability for an
application to assess its output and schedule further executions. The ability to run a different executable on each
node of the cluster. The ability to run different input datasets on different nodes. The ability for all or a subset of
the nodes to run duplicates of the same task, allowing the reduce step to decide which result to select.
Thilina Gunarthane et.al [9] introduce AzureMapReduce, a novel MapReduce runtime built using the
Microsoft Azure cloud infrastructure services. AzureMapReduce architecture successfully leverages the high
latency, eventually consistent, yet highly scalable Azure infrastructure services to provide an efficient, on
demand alternative to traditional MapReduce clusters. Further evaluate the use and performance of MapReduce
frameworks, including AzureMapReduce, in cloud environments for scientific applications using sequence
assembly and sequence alignment as use cases.
Vinay Sudakaran et.al [10] introduce analyses of surface air temperature and ocean surface temperature
changes are carried out by several groups, including the Goddard institute of space studies (GISS) [11] and the
National Climatic Data Center based on the data available from a large number of land based weather stations
and ship data, which forms an instrumental source of measurement of global climate change.
Uncertainties in the collected data from both land and ocean, with respect to their quality and
uniformity, force analysis of both the land based station data and the combined data to estimate the global
temperature change. Estimating long term global temperature change has significant advantages over restricting
the temperature analysis to regions with dense station coverage, providing a much better ability to identify
phenomenon that influence the global climate change, such as increasing atmospheric CO2 [12]. This has been
the primary goal of GISS analysis, and an area with potential to make more efficient through use of MapReduce
to improve throughput.
Ekanayake et al. [13] evaluated the Hadoop implementation of MapReduce with High Energy Physics
data analysis. The analyses were conducted on a collection of data files produced by high energy physics
experiments, which is both data and compute intensive. As an outcome of this porting, it was observed that
scientific data analysis that has some form of SPMD (Single- Program Multiple Data) algorithm is more likely
to benefit from MapReduce when compared to others. However, the use of iterative algorithms required by
many scientific applications were seen as a limitation to the existing MapReduce implementations.
III. Implementation
The input weather dataset contain the values of temperature, time, place etc. The input weather dataset
file on the left of Figure 3 is split into chunks of data. The size of these splits is controlled by the InputSplit
method within the FileInputFormat class of the Map Reduce job. The number of splits is influenced by the
HDFS block size, and one mapper job is created for each split data chunk. Each split data chunk that is, “a
record,” is sent to a Mapper process on a Data Node server. In the proposed system, the Map process creates a
series of key-value pairs where the key is the word, for instance, “place” and the value is the temperature. These
key-value pairs are then shuffled into lists by key type. The shuffled lists are input to Reduce tasks, which
Leveraging Map Reduce with Hadoop for Weather Data Analytics
DOI: 10.9790/0661-17320612 www.iosrjournals.org 9 | Page
reduce the dataset volume by the values. The Reduce output is then a simple list of averaged key-value pairs.
The Map Reduce framework takes care of all other tasks, like scheduling and resources.
Fig.3: Proposed MapReduce Framework
The proposed MapReduce application is shown in figure 3. The mapper function will find the average
temperature and associated with the key as place. The all values are combined ie reduce to form the final result.
The average temperature, maximum temperature, minimum temperature etc can be calculated in the reduce
phase.
3.1 Driver Operation
The driver is what actually sets up the job, submits it, and waits for completion. The driver is driven
from a configuration file for ease of use for things like being able to specify the input/output directories. It can
also accept Groovy script based mappers and reducers without recompilation.
3.2 Mapper Operation
The actual mapping routine was a simple filter so that only variables that matched certain criteria
would be sent to the reducer. It was initially assumed that the mapper would act like a distributed search
capability and only pull the <key, value> pairs out of the file that matched the criteria. This turned out not to be
the case, and the mapping step took a surprisingly long amount of time.
The default Hadoop input file format reader opens the files and starts reading through the files for the
<key, value> pairs. Once it finds a <key, value> pair, it reads both the key and all the values to pass that along
to the mapper. In the case of the NCDC data, the values can be quite large and consume a large amount of
resources to read and go through all <key, value> pairs within a single file.
Currently, there is no concept within Hadoop to allow for a “lazy” loading of the values. As they are
accessed, the entire <key, value> must be read. Initially, the mapping operator was used to try to filter out the
<key, value> pairs that did not match the criteria. This did not have a significant impact on performance, since
the mapper is not actually the part of Hadoop that is reading the data. The mapper is just accepting data from the
input file format reader.
Subsequently, modified the default input file format reader that Hadoop uses to read the sequenced
files. This routine basically opens a file and performs a simple loop to read every <key, value> within the file.
This routine was modified to include an accept function in order to filter the keys prior to actually reading in the
values. If the filter matched the desired key, then the values are read into memory and passed to the mapper. If
the filter did not match, then the values were skipped. There is some values in the temperature data may be null.
In the mapper the null values were filtered.
Based on the position of sensor values, the mapper function assigns the values to the <Key, Value>
pairs. Place id can be used as Key to get the entire calculation of a place. Alternatively combination of place and
date can be used as Key. The values are temperature in an hour.
3.3 Reduce Operation
With the sequencing and mapping complete, the resulting <key, value> pairs that matched the criteria
to be analyzed were forwarded to the reducer. While the actual simple averaging operation was straightforward
and relatively simple to set up, the reducer turned out to be more complex than expected. Once a <key, value>
Leveraging Map Reduce with Hadoop for Weather Data Analytics
DOI: 10.9790/0661-17320612 www.iosrjournals.org 10 | Page
object has been created, a comparator is also needed to order the keys. If the data is to be grouped, a group
comparator is also needed. In addition, a partitioner must be created in order to handle the partitioning of the
data into groups of sorted keys.
With all these components in place, Hadoop takes the <key, value> pairs generated by the mappers and
groups and sorts them as specified as shown in figure 4. By default, Hadoop assumes that all values that share a
key will be sent to the same reducer. Hence, a single operation over a very large data set will only employ one
reducer, i.e., one node. By using partitions, sets of keys to group can be created to pass these grouped keys to
different reducers and parallelize the reduction operation. This may result in multiple output files so that an
additional combination step may be needed to handle the consolidation of all results.
Fig 4 : Inside Mapreduce
3.4 Map Reduce Process
MapReduce is a framework for processing highly distributable problems across huge data sets using a
large number of computers (nodes). In a “map” operation the head node takes the input, partitions it into smaller
sub-problems, and distributes them to data nodes. A data node may do this again in turn, leading to a multi-level
tree structure. The data node processes the smaller problem, and passes the answer back to a reducer node to
perform the reduction operation. In a “reduce” step, the reducer node then collects the answers to all the sub-
problems and combines them in some way to form the output, the answer to the problem it was originally trying
to solve. The map and reduce functions of Map-Reduce are both defined with respect to data structured in <key,
value> pairs.
The following describes the overall general MapReduce process that is executed for the averaging
operation, maximum operation and minimum operation on the NCDC data:
The NCDC files were processed into Hadoop sequence files on the HDFS Head Node. The files were
read from the local NCDC directory, sequenced, and written back out to a local disk.
The resulting sequence files were then ingested into the Hadoop file system with the default replica
factor of three and, initially, the default block size of 64 MB.
The job containing the actual MapReduce operation was submitted to the Head Node to be run.
Along with the JobTracker, the Head Node schedules and runs the job on the cluster. Hadoop
distributes all the mappers across all data nodes that contain the data to be analyzed.
On each data node, the input format reader opens up each sequence file for reading and passes all the
<key,value> pairs to the mapping function.
Leveraging Map Reduce with Hadoop for Weather Data Analytics
DOI: 10.9790/0661-17320612 www.iosrjournals.org 11 | Page
The mapper determines if the key matches the criteria of the given query. If so, the mapper saves the
<key, value> pair for delivery back to the reducer. If not, the <key, value> pair is discarded. All keys and values
within a file are read and analyzed by the mapper.
Once the mapper is done, all the <key, value> pairs that match the query are sent back to the reducer.
The reducer then performs the desired averaging operation on the sorted <key, value> pairs to create a final
<key, value> pair result.
This final result is then stored as a sequence file within the HDFS.
IV. Analysing NCDC Data
National Climatic Data Center (NCDC) have provide weather datasets [14]. Daily Global Weather
Measurements 1929-2009 (NCDC, GSOD) dataset is one of the biggest dataset available for weather forecast.
Its total size is around 20 GB. It is available on amazon web services [15].The United States National Climatic
Data Center (NCDC), previously known as the National Weather Records Center (NWRC), in Asheville, North
Carolina is the world's largest active archive of weather data. The Center has more than 150 years of data on
hand with 224 gigabytes of new information added each day. NCDC archives 99 percent of all NOAA data,
including over 320 million paper records; 2.5 million microfiche records; over 1.2 petabytes of digital data
residing in a mass storage environment. NCDC has satellite weather images back to 1960.
Proposed System used the temperature dataset of NCDC in 2014. The records are stored in HDFS.
They are splitted and goes to different mappers. Finally all results goes to reducer. Due to practical limit, the
analysis is executed in Hadoop standalone mode. MapReduce Framework execution as shown in figure 5. The
results shows that adding more number of systems to the network will speed up the entire data processing. That
is the major advantage of MapReduce with Hadoop framework.
Fig 5 : MapReduce Framework Execution
Leveraging Map Reduce with Hadoop for Weather Data Analytics
DOI: 10.9790/0661-17320612 www.iosrjournals.org 12 | Page
V. Conclusion
In traditional systems, the processing of millions of records is time consuming process. In the era of
Internet of things, the meteorological department uses different sensors to get the temperature, humidity etc
values. Leveraging Mapreduce with Hadoop to analyze the sensor data alias the bigdata is an effective solution.
MapReduce is a framework for executing highly parallelizable and distributable algorithms across huge
data sets using a large number of commodity computers. Using Mapreduce with Hadoop, the temperature can be
analyse effectively. The scalability bottleneck is removed by using Hadoop with MapReduce. Addition of more
systems to the distributed network gives faster processing of the data.
With the wide spread employment of these technologies throughout the commercial industry and the
interests within the open-source communities, the capabilities of MapReduce and Hadoop will continue to grow
and mature. The use of these types of technologies for large scale data analyses has the potential to greatly
enhance the weather forecast too.
References
[1] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. In Proceedings of the Nineteenth ACM
Symposium on Operating Systems Principles, SOSP ’03, pages 29–43, New York, NY, USA, October 2003. ACM. ISBN 1-58113-
757-5. doi: 10.1145/945445.945450.
[2] Jeffrey Dean and Sanjay Ghemawat. MapReduce: A Flexible Data Processing Tool. Communications of the ACM, 53(1):72–77,
January 2010. ISSN 0001-0782. doi: 10.1145/1629175.1629198.
[3] Wiki (2013). Applications and organizations using hadoop. http://wiki.apache.org/hadoop/PoweredBy
[4] Gartner Research Cycle 2014, http://www.gartner .com
[5] K. Morton, M. Balazinska and D. Grossman, “Paratimer: a progress indicator for MapReduce DAGs”, In Proceedings of the 2010
international conference on Management of data, 2010, pp.507–518.
[6] Lu, Wei, et al. “Efficient processing of k nearest neighbor joins using MapReduce”, Proceedings of the VLDB Endowment, Vol. 5,
NO. 10, 2012, pp. 1016-1027.
[7] J. Dean and S. Ghemawat, “Mapreduce: simplied data processing on large clusters”, in OSDI 2004: Proceedings of 6th Symposium
on Operating System Design and Implementation.
[8] E. Dede, Z. Fadika, J. Hartog, M. Govindaraju, L. Ramakrishnan, D.Gunter, and R. Canon. Marissa: Mapreduce implementation for
streaming science applications. In E-Science (e-Science), 2012 IEEE 8th
International Conference on, pages 1 to 8, Oct 2012.
[9] Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox, MapReduce in the Clouds for Science, 2nd IEEE International
Conference on Cloud Computing Technology and Science, DOI 10.1109/CloudCom.2010.107
[10] Vinay Sudhakaran, Neil Chue Hong, Evaluating the suitability of MapReduce for surface temperature analysis codes, DataCloud-
SC '11 Proceedings of the second international workshop on Data intensive computing in the clouds, Pages 3-12, ACM New York,
NY, USA ©2011
[11] Hansen, J., R. Ruedy, J. Glascoe, and M. Sato. “GISS analysis of surface temperature change.” J. Geophys.Res.,104, 1999: 30,997-
31,022.
[12] Hansen, James, and Sergej Lebedeff. “Global Trends of Measured Surface Air Temperature.” J. Geophys. Res., 92,1987: 13,345-
13,372.
[13] Jaliya Ekanayake and Shrideep Pallickara, MapReduce for Data Intensive Scientific Analyses, eScience, ESCIENCE '08
Proceedings of the 2008 Fourth IEEE International Conference on eScience, Pages 277-284, IEEE Computer Society Washington,
DC, USA ©2008
[14] National Climatic Data Centre , http://www.ncdc.noaa.gov
[15] Daily Global Weather Measurements 1929-2009 (NCDC,GSOD) dataset in Amazon web services ,
http://www.amazon.com/datasets/2759

More Related Content

What's hot

Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
The HDF-EOS Tools and Information Center
 
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Dawn Wright
 
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Amazon Web Services
 
A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...
newmooxx
 
"Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ..."Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ...
Dataconomy Media
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Globus
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
Dan Pilone
 
Snow cover assessment tool using Python
Snow cover assessment tool using PythonSnow cover assessment tool using Python
Snow cover assessment tool using Python
Prasun Kumar Gupta
 
NJ Wildlife Habitat Finder
NJ Wildlife Habitat FinderNJ Wildlife Habitat Finder
NJ Wildlife Habitat Finder
Dan Ford
 
Multidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGISMultidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGIS
The HDF-EOS Tools and Information Center
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Rabindra Nath Nandi
 
MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#Erik Lebel
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
Rob Emanuele
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
Jody Garnett
 
K venkata reddy
K venkata reddyK venkata reddy
K venkata reddyClimDev15
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Rob Emanuele
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
Rob Emanuele
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
The HDF-EOS Tools and Information Center
 
Team3 presentation
Team3 presentationTeam3 presentation
Team3 presentation
Amanda Gilbert
 
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Giuseppe Procaccianti
 

What's hot (20)

Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
 
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS...
 
A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...
 
"Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ..."Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ...
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
 
Snow cover assessment tool using Python
Snow cover assessment tool using PythonSnow cover assessment tool using Python
Snow cover assessment tool using Python
 
NJ Wildlife Habitat Finder
NJ Wildlife Habitat FinderNJ Wildlife Habitat Finder
NJ Wildlife Habitat Finder
 
Multidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGISMultidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGIS
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
K venkata reddy
K venkata reddyK venkata reddy
K venkata reddy
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Team3 presentation
Team3 presentationTeam3 presentation
Team3 presentation
 
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
 

Viewers also liked

ldapvi & python-ldap で stress-free life
ldapvi & python-ldap で stress-free lifeldapvi & python-ldap で stress-free life
ldapvi & python-ldap で stress-free life
Kouhei Maeda
 
DNS について
DNS についてDNS について
DNS について
Ryoma Takeda
 
En busca del saber pedagogico y espitémico fundante de la educación inclusiva...
En busca del saber pedagogico y espitémico fundante de la educación inclusiva...En busca del saber pedagogico y espitémico fundante de la educación inclusiva...
En busca del saber pedagogico y espitémico fundante de la educación inclusiva...
hipocampus0727
 
Www
WwwWww
Results in NiO (3 of 3)
Results in NiO (3 of 3)Results in NiO (3 of 3)
Results in NiO (3 of 3)
Javier García Molleja
 
Ann Brentlinger Marketing and Communications
Ann Brentlinger Marketing and CommunicationsAnn Brentlinger Marketing and Communications
Ann Brentlinger Marketing and CommunicationsAnn Brentlinger
 
Is the 2013 Ford Focus a truly competitive compact?
Is the 2013 Ford Focus a truly competitive compact?Is the 2013 Ford Focus a truly competitive compact?
Is the 2013 Ford Focus a truly competitive compact?
managingeditor
 
Git コンフリクト
Git コンフリクトGit コンフリクト
Git コンフリクト
Yoshi Watanabe
 
おいしい!GitHub ~GitHub Patchwork Tokyo @dots 夏休み版~
おいしい!GitHub ~GitHub Patchwork Tokyo @dots 夏休み版~おいしい!GitHub ~GitHub Patchwork Tokyo @dots 夏休み版~
おいしい!GitHub ~GitHub Patchwork Tokyo @dots 夏休み版~
Sanae Yamashita
 
2012-07-07 Sapporo.r #1 LT発表資料 「仕事で使うR」
2012-07-07 Sapporo.r #1 LT発表資料 「仕事で使うR」2012-07-07 Sapporo.r #1 LT発表資料 「仕事で使うR」
2012-07-07 Sapporo.r #1 LT発表資料 「仕事で使うR」
Yoshiyuki Nakamura
 
Shesh Nath Verma
Shesh Nath VermaShesh Nath Verma
Shesh Nath VermaShesh Verma
 

Viewers also liked (12)

ldapvi & python-ldap で stress-free life
ldapvi & python-ldap で stress-free lifeldapvi & python-ldap で stress-free life
ldapvi & python-ldap で stress-free life
 
DNS について
DNS についてDNS について
DNS について
 
Portfolio
PortfolioPortfolio
Portfolio
 
En busca del saber pedagogico y espitémico fundante de la educación inclusiva...
En busca del saber pedagogico y espitémico fundante de la educación inclusiva...En busca del saber pedagogico y espitémico fundante de la educación inclusiva...
En busca del saber pedagogico y espitémico fundante de la educación inclusiva...
 
Www
WwwWww
Www
 
Results in NiO (3 of 3)
Results in NiO (3 of 3)Results in NiO (3 of 3)
Results in NiO (3 of 3)
 
Ann Brentlinger Marketing and Communications
Ann Brentlinger Marketing and CommunicationsAnn Brentlinger Marketing and Communications
Ann Brentlinger Marketing and Communications
 
Is the 2013 Ford Focus a truly competitive compact?
Is the 2013 Ford Focus a truly competitive compact?Is the 2013 Ford Focus a truly competitive compact?
Is the 2013 Ford Focus a truly competitive compact?
 
Git コンフリクト
Git コンフリクトGit コンフリクト
Git コンフリクト
 
おいしい!GitHub ~GitHub Patchwork Tokyo @dots 夏休み版~
おいしい!GitHub ~GitHub Patchwork Tokyo @dots 夏休み版~おいしい!GitHub ~GitHub Patchwork Tokyo @dots 夏休み版~
おいしい!GitHub ~GitHub Patchwork Tokyo @dots 夏休み版~
 
2012-07-07 Sapporo.r #1 LT発表資料 「仕事で使うR」
2012-07-07 Sapporo.r #1 LT発表資料 「仕事で使うR」2012-07-07 Sapporo.r #1 LT発表資料 「仕事で使うR」
2012-07-07 Sapporo.r #1 LT発表資料 「仕事で使うR」
 
Shesh Nath Verma
Shesh Nath VermaShesh Nath Verma
Shesh Nath Verma
 

Similar to Leveraging Map Reduce With Hadoop for Weather Data Analytics

Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
SANTOSH WAYAL
 
B04 06 0918
B04 06 0918B04 06 0918
IJET-V2I6P25
IJET-V2I6P25IJET-V2I6P25
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
G017143640
G017143640G017143640
G017143640
IOSR Journals
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
inventionjournals
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
cscpconf
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
cscpconf
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
csandit
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
AM Publications,India
 
Paper id 25201498
Paper id 25201498Paper id 25201498
Paper id 25201498IJRAT
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
ijdpsjournal
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
ijsrd.com
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
Varun Narang
 

Similar to Leveraging Map Reduce With Hadoop for Weather Data Analytics (20)

Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
IJET-V2I6P25
IJET-V2I6P25IJET-V2I6P25
IJET-V2I6P25
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
G017143640
G017143640G017143640
G017143640
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Paper id 25201498
Paper id 25201498Paper id 25201498
Paper id 25201498
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 

More from iosrjce

An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...
iosrjce
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
iosrjce
 
Childhood Factors that influence success in later life
Childhood Factors that influence success in later lifeChildhood Factors that influence success in later life
Childhood Factors that influence success in later life
iosrjce
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
iosrjce
 
Customer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in DubaiCustomer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in Dubai
iosrjce
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
iosrjce
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model ApproachConsumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
iosrjce
 
Student`S Approach towards Social Network Sites
Student`S Approach towards Social Network SitesStudent`S Approach towards Social Network Sites
Student`S Approach towards Social Network Sites
iosrjce
 
Broadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeBroadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperative
iosrjce
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
iosrjce
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
iosrjce
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on BangladeshConsumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
iosrjce
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
iosrjce
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
iosrjce
 
Media Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & ConsiderationMedia Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & Consideration
iosrjce
 
Customer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyCustomer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative study
iosrjce
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...
iosrjce
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
iosrjce
 
Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...
iosrjce
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
iosrjce
 

More from iosrjce (20)

An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
 
Childhood Factors that influence success in later life
Childhood Factors that influence success in later lifeChildhood Factors that influence success in later life
Childhood Factors that influence success in later life
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
 
Customer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in DubaiCustomer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in Dubai
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model ApproachConsumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
 
Student`S Approach towards Social Network Sites
Student`S Approach towards Social Network SitesStudent`S Approach towards Social Network Sites
Student`S Approach towards Social Network Sites
 
Broadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeBroadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperative
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on BangladeshConsumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
 
Media Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & ConsiderationMedia Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & Consideration
 
Customer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyCustomer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative study
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
 
Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
 

Recently uploaded

The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 

Recently uploaded (20)

The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 

Leveraging Map Reduce With Hadoop for Weather Data Analytics

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. II (May – Jun. 2015), PP 06-12 www.iosrjournals.org DOI: 10.9790/0661-17320612 www.iosrjournals.org 6 | Page Leveraging Map Reduce With Hadoop for Weather Data Analytics Riyaz P.A.1 , Surekha Mariam Varghese2 1,2. (Dept. of CSE, Mar Athanasius College of Engineering, Kothamangalam, Kerala, India) Abstract : Collecting, storing and processing of huge amounts of climatic data is necessary for accurate prediction of weather. Meteorological departments use different types of sensors such as temperature, humidity etc. to get the values. Number of sensors and volume and velocity of data in each of the sensors makes the data processing time consuming and complex. Leveraging MapReduce with Hadoop to process the massive amount of data. Hadoop is an open framework suitable for large scale data processing. MapReduce programming model helps to process large data sets in parallel, distributed manner. This project aims to build a data analytical engine for high velocity, huge volume temperature data from sensors using MapReduce on Hadoop. Keywords – Bigdata, MapReduce, Hadoop, Weather, Analytics, Temperature I. Introduction Big Data has become one of the buzzwords in IT during the last couple of years. Initially it was shaped by organizations which had to handle fast growth rates of data like web data, data resulting from scientific or business simulations or other data sources. Some of those companies’ business models are fundamentally based on indexing and using this large amount of data. The pressure to handle the growing data amount on the web e.g. lead Google to develop the Google File System [1] and MapReduce [2]. Bigdata and Internet of things are related areas. The sensor data is a kind of Bigdata. The sensor data have high velocity. The large number of sensors generates high volume of data. While the term “Internet of Things” encompasses all kinds of gadgets and devices, many of which are designed to perform a single task well. The main focus with respect to IoT technology is on devices that share information via a connected network. India is an emerging country. Now most of the cities have become smart. Different sensors employed in smart city can be used to measure weather parameters. Weather forecast department has begun collect and analysis massive amount of data like temperature. They use different sensor values like temperature, humidity to predict the rain fall etc. When the number of sensors increases, the data becomes high volume and the sensor data have high velocity data. There is a need of a scalable analytics tool to process massive amount of data. The traditional approach of process the data is very slow. Process the sensor data with MapReduce in Hadoop framework which remove the scalability bottleneck. Hadoop is an open framework used for Bigdata analytics. Hadoop’s main processing engine is MapReduce, which is currently one of the most popular big-data processing frameworks available. MapReduce is a framework for executing highly parallelizable and distributable algorithms across huge data sets using a large number of commodity computers. Using Mapreduce with Hadoop, the temperature can be analyse without scalability issues. The speed of processing data can increase rapidly when across multi cluster distributed network. In this paper a new Temperature Data Anlaytical Engine is proposed, which leverages MapReduce with Hadoop framework of producing results without scalability bottleneck. This paper is organized as follows: in section II, the works related to MapReduce is discussed. Section III describes the Temperature Data Analytical Engine implementation. Section IV describes the analytics of NCDC Data. Finally the section V gives the conclusion for the work. II. Related Works 2.1 Hadoop Hadoop is widely used in big data applications in the industry, e.g., spam filtering, network searching, click-stream analysis, and social recommendation. In addition, considerable academic research is now based on Hadoop. Some representative cases are given below. As declared in June 2012, Yahoo runs Hadoop in 42,000 servers at four data centers to support its products and services, e.g.,searching and spam filtering, etc. At present, the biggest Hadoop cluster has 4,000 nodes, but the number of nodes will be increased to 10,000 with the release of Hadoop 2.0. In the same month, Facebook announced that their Hadoop cluster can process 100 PB data, which grew by 0.5 PB per day as in November 2012. Some well-known agencies that use Hadoop to conduct distributed computation are listed in [3].
  • 2. Leveraging Map Reduce with Hadoop for Weather Data Analytics DOI: 10.9790/0661-17320612 www.iosrjournals.org 7 | Page In addition, many companies provide Hadoop commercial execution and/or support, including Cloudera, IBM, MapR, EMC, and Oracle. According to the Gartner Research, Bigdata Analytics is a trending topic in 2014 [4]. Hadoop is an open framework mostly used for Bigdata Analytics. MapReduce is a programming paradigm associated with the Hadoop. Fig.1: Hadoop Environment Some of its characteristics of Hadoop are following. It is an open-source system developed by Apache in Java. It is designed to handle very large data sets. It is designed to scale to very large clusters. It is designed to run on commodity hardware. It offers resilience via data replication. It offers automatic failover in the event of a crash. It automatically fragments storage over the cluster. It brings processing to the data. Its supports large volumes of files into the millions. The Hadoop environment as shown in Figure 1 2.2 Map Reduce Hadoop using HDFS for data storing and using MapReduce to processing that data. HDFS is Hadoop’s implementation of a distributed filesystem. It is designed to hold a large amount of data, and provide access to this data to many clients distributed across a network [5]. MapReduce is an excellent model for distributed computing, introduced by Google in 2004. Each MapReduce job is composed of a certain number of map and reduce tasks. The MapReduce model for serving multiple jobs consists of a processor sharing queue for the Map Tasks and a multi-server queue for the Reduce Tasks [6]. To run a MapReduce job, users should furnish a map function, a reduce function, input data, and an output data location as shown in figure 2. When executed, Hadoop carries out the following steps: Hadoop breaks the input data into multiple data items by new lines and runs the map function once for each data item, giving the item as the input for the function. When executed, the map function outputs one or more key-value pairs.Hadoop collects all the key-value pairs generated from the map function, sorts them by the key, and groups together the values with the same key. For each distinct key, Hadoop runs the reduce function once while passing the key and list of values for that key as input.The reduce function may output one or more key-value pairs, and Hadoop writes them to a file as the final result.Hadoop allows the user to configure the job, submit it, control its execution, and query the state. Every job consists of independent tasks, and all the tasks need to have a system slot to run [6]. In Hadoop all scheduling and allocation decisions are made on a task and node slot level for both the map and reduce phases[7]. There are three important scheduling issues in MapReduce such as locality, synchronization and fairness. Locality is defined as the distance between the input data node and task-assigned node. Synchronization is the process of transferring the intermediate output of the map processes to the reduce process
  • 3. Leveraging Map Reduce with Hadoop for Weather Data Analytics DOI: 10.9790/0661-17320612 www.iosrjournals.org 8 | Page Fig.2: MapReduce Framework As input is also consider as a factor which affects the performance. Fairness finiteness have trade-offs between the locality and dependency between the map and reduce phases. Due to the important issues and many more problems in scheduling of MapReduce, the Scheduling is one of the most critical aspects of MapReduce. There are many algorithms to solve this issue with different techniques and approaches. Some of them get focus to improvement data locality and some of them implements to provide Synchronization processing and many of them have been designed to minimizing the total completion time. E.Dede et.al [8] proposed MARISSA allows for: Iterative application support: The ability for an application to assess its output and schedule further executions. The ability to run a different executable on each node of the cluster. The ability to run different input datasets on different nodes. The ability for all or a subset of the nodes to run duplicates of the same task, allowing the reduce step to decide which result to select. Thilina Gunarthane et.al [9] introduce AzureMapReduce, a novel MapReduce runtime built using the Microsoft Azure cloud infrastructure services. AzureMapReduce architecture successfully leverages the high latency, eventually consistent, yet highly scalable Azure infrastructure services to provide an efficient, on demand alternative to traditional MapReduce clusters. Further evaluate the use and performance of MapReduce frameworks, including AzureMapReduce, in cloud environments for scientific applications using sequence assembly and sequence alignment as use cases. Vinay Sudakaran et.al [10] introduce analyses of surface air temperature and ocean surface temperature changes are carried out by several groups, including the Goddard institute of space studies (GISS) [11] and the National Climatic Data Center based on the data available from a large number of land based weather stations and ship data, which forms an instrumental source of measurement of global climate change. Uncertainties in the collected data from both land and ocean, with respect to their quality and uniformity, force analysis of both the land based station data and the combined data to estimate the global temperature change. Estimating long term global temperature change has significant advantages over restricting the temperature analysis to regions with dense station coverage, providing a much better ability to identify phenomenon that influence the global climate change, such as increasing atmospheric CO2 [12]. This has been the primary goal of GISS analysis, and an area with potential to make more efficient through use of MapReduce to improve throughput. Ekanayake et al. [13] evaluated the Hadoop implementation of MapReduce with High Energy Physics data analysis. The analyses were conducted on a collection of data files produced by high energy physics experiments, which is both data and compute intensive. As an outcome of this porting, it was observed that scientific data analysis that has some form of SPMD (Single- Program Multiple Data) algorithm is more likely to benefit from MapReduce when compared to others. However, the use of iterative algorithms required by many scientific applications were seen as a limitation to the existing MapReduce implementations. III. Implementation The input weather dataset contain the values of temperature, time, place etc. The input weather dataset file on the left of Figure 3 is split into chunks of data. The size of these splits is controlled by the InputSplit method within the FileInputFormat class of the Map Reduce job. The number of splits is influenced by the HDFS block size, and one mapper job is created for each split data chunk. Each split data chunk that is, “a record,” is sent to a Mapper process on a Data Node server. In the proposed system, the Map process creates a series of key-value pairs where the key is the word, for instance, “place” and the value is the temperature. These key-value pairs are then shuffled into lists by key type. The shuffled lists are input to Reduce tasks, which
  • 4. Leveraging Map Reduce with Hadoop for Weather Data Analytics DOI: 10.9790/0661-17320612 www.iosrjournals.org 9 | Page reduce the dataset volume by the values. The Reduce output is then a simple list of averaged key-value pairs. The Map Reduce framework takes care of all other tasks, like scheduling and resources. Fig.3: Proposed MapReduce Framework The proposed MapReduce application is shown in figure 3. The mapper function will find the average temperature and associated with the key as place. The all values are combined ie reduce to form the final result. The average temperature, maximum temperature, minimum temperature etc can be calculated in the reduce phase. 3.1 Driver Operation The driver is what actually sets up the job, submits it, and waits for completion. The driver is driven from a configuration file for ease of use for things like being able to specify the input/output directories. It can also accept Groovy script based mappers and reducers without recompilation. 3.2 Mapper Operation The actual mapping routine was a simple filter so that only variables that matched certain criteria would be sent to the reducer. It was initially assumed that the mapper would act like a distributed search capability and only pull the <key, value> pairs out of the file that matched the criteria. This turned out not to be the case, and the mapping step took a surprisingly long amount of time. The default Hadoop input file format reader opens the files and starts reading through the files for the <key, value> pairs. Once it finds a <key, value> pair, it reads both the key and all the values to pass that along to the mapper. In the case of the NCDC data, the values can be quite large and consume a large amount of resources to read and go through all <key, value> pairs within a single file. Currently, there is no concept within Hadoop to allow for a “lazy” loading of the values. As they are accessed, the entire <key, value> must be read. Initially, the mapping operator was used to try to filter out the <key, value> pairs that did not match the criteria. This did not have a significant impact on performance, since the mapper is not actually the part of Hadoop that is reading the data. The mapper is just accepting data from the input file format reader. Subsequently, modified the default input file format reader that Hadoop uses to read the sequenced files. This routine basically opens a file and performs a simple loop to read every <key, value> within the file. This routine was modified to include an accept function in order to filter the keys prior to actually reading in the values. If the filter matched the desired key, then the values are read into memory and passed to the mapper. If the filter did not match, then the values were skipped. There is some values in the temperature data may be null. In the mapper the null values were filtered. Based on the position of sensor values, the mapper function assigns the values to the <Key, Value> pairs. Place id can be used as Key to get the entire calculation of a place. Alternatively combination of place and date can be used as Key. The values are temperature in an hour. 3.3 Reduce Operation With the sequencing and mapping complete, the resulting <key, value> pairs that matched the criteria to be analyzed were forwarded to the reducer. While the actual simple averaging operation was straightforward and relatively simple to set up, the reducer turned out to be more complex than expected. Once a <key, value>
  • 5. Leveraging Map Reduce with Hadoop for Weather Data Analytics DOI: 10.9790/0661-17320612 www.iosrjournals.org 10 | Page object has been created, a comparator is also needed to order the keys. If the data is to be grouped, a group comparator is also needed. In addition, a partitioner must be created in order to handle the partitioning of the data into groups of sorted keys. With all these components in place, Hadoop takes the <key, value> pairs generated by the mappers and groups and sorts them as specified as shown in figure 4. By default, Hadoop assumes that all values that share a key will be sent to the same reducer. Hence, a single operation over a very large data set will only employ one reducer, i.e., one node. By using partitions, sets of keys to group can be created to pass these grouped keys to different reducers and parallelize the reduction operation. This may result in multiple output files so that an additional combination step may be needed to handle the consolidation of all results. Fig 4 : Inside Mapreduce 3.4 Map Reduce Process MapReduce is a framework for processing highly distributable problems across huge data sets using a large number of computers (nodes). In a “map” operation the head node takes the input, partitions it into smaller sub-problems, and distributes them to data nodes. A data node may do this again in turn, leading to a multi-level tree structure. The data node processes the smaller problem, and passes the answer back to a reducer node to perform the reduction operation. In a “reduce” step, the reducer node then collects the answers to all the sub- problems and combines them in some way to form the output, the answer to the problem it was originally trying to solve. The map and reduce functions of Map-Reduce are both defined with respect to data structured in <key, value> pairs. The following describes the overall general MapReduce process that is executed for the averaging operation, maximum operation and minimum operation on the NCDC data: The NCDC files were processed into Hadoop sequence files on the HDFS Head Node. The files were read from the local NCDC directory, sequenced, and written back out to a local disk. The resulting sequence files were then ingested into the Hadoop file system with the default replica factor of three and, initially, the default block size of 64 MB. The job containing the actual MapReduce operation was submitted to the Head Node to be run. Along with the JobTracker, the Head Node schedules and runs the job on the cluster. Hadoop distributes all the mappers across all data nodes that contain the data to be analyzed. On each data node, the input format reader opens up each sequence file for reading and passes all the <key,value> pairs to the mapping function.
  • 6. Leveraging Map Reduce with Hadoop for Weather Data Analytics DOI: 10.9790/0661-17320612 www.iosrjournals.org 11 | Page The mapper determines if the key matches the criteria of the given query. If so, the mapper saves the <key, value> pair for delivery back to the reducer. If not, the <key, value> pair is discarded. All keys and values within a file are read and analyzed by the mapper. Once the mapper is done, all the <key, value> pairs that match the query are sent back to the reducer. The reducer then performs the desired averaging operation on the sorted <key, value> pairs to create a final <key, value> pair result. This final result is then stored as a sequence file within the HDFS. IV. Analysing NCDC Data National Climatic Data Center (NCDC) have provide weather datasets [14]. Daily Global Weather Measurements 1929-2009 (NCDC, GSOD) dataset is one of the biggest dataset available for weather forecast. Its total size is around 20 GB. It is available on amazon web services [15].The United States National Climatic Data Center (NCDC), previously known as the National Weather Records Center (NWRC), in Asheville, North Carolina is the world's largest active archive of weather data. The Center has more than 150 years of data on hand with 224 gigabytes of new information added each day. NCDC archives 99 percent of all NOAA data, including over 320 million paper records; 2.5 million microfiche records; over 1.2 petabytes of digital data residing in a mass storage environment. NCDC has satellite weather images back to 1960. Proposed System used the temperature dataset of NCDC in 2014. The records are stored in HDFS. They are splitted and goes to different mappers. Finally all results goes to reducer. Due to practical limit, the analysis is executed in Hadoop standalone mode. MapReduce Framework execution as shown in figure 5. The results shows that adding more number of systems to the network will speed up the entire data processing. That is the major advantage of MapReduce with Hadoop framework. Fig 5 : MapReduce Framework Execution
  • 7. Leveraging Map Reduce with Hadoop for Weather Data Analytics DOI: 10.9790/0661-17320612 www.iosrjournals.org 12 | Page V. Conclusion In traditional systems, the processing of millions of records is time consuming process. In the era of Internet of things, the meteorological department uses different sensors to get the temperature, humidity etc values. Leveraging Mapreduce with Hadoop to analyze the sensor data alias the bigdata is an effective solution. MapReduce is a framework for executing highly parallelizable and distributable algorithms across huge data sets using a large number of commodity computers. Using Mapreduce with Hadoop, the temperature can be analyse effectively. The scalability bottleneck is removed by using Hadoop with MapReduce. Addition of more systems to the distributed network gives faster processing of the data. With the wide spread employment of these technologies throughout the commercial industry and the interests within the open-source communities, the capabilities of MapReduce and Hadoop will continue to grow and mature. The use of these types of technologies for large scale data analyses has the potential to greatly enhance the weather forecast too. References [1] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP ’03, pages 29–43, New York, NY, USA, October 2003. ACM. ISBN 1-58113- 757-5. doi: 10.1145/945445.945450. [2] Jeffrey Dean and Sanjay Ghemawat. MapReduce: A Flexible Data Processing Tool. Communications of the ACM, 53(1):72–77, January 2010. ISSN 0001-0782. doi: 10.1145/1629175.1629198. [3] Wiki (2013). Applications and organizations using hadoop. http://wiki.apache.org/hadoop/PoweredBy [4] Gartner Research Cycle 2014, http://www.gartner .com [5] K. Morton, M. Balazinska and D. Grossman, “Paratimer: a progress indicator for MapReduce DAGs”, In Proceedings of the 2010 international conference on Management of data, 2010, pp.507–518. [6] Lu, Wei, et al. “Efficient processing of k nearest neighbor joins using MapReduce”, Proceedings of the VLDB Endowment, Vol. 5, NO. 10, 2012, pp. 1016-1027. [7] J. Dean and S. Ghemawat, “Mapreduce: simplied data processing on large clusters”, in OSDI 2004: Proceedings of 6th Symposium on Operating System Design and Implementation. [8] E. Dede, Z. Fadika, J. Hartog, M. Govindaraju, L. Ramakrishnan, D.Gunter, and R. Canon. Marissa: Mapreduce implementation for streaming science applications. In E-Science (e-Science), 2012 IEEE 8th International Conference on, pages 1 to 8, Oct 2012. [9] Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox, MapReduce in the Clouds for Science, 2nd IEEE International Conference on Cloud Computing Technology and Science, DOI 10.1109/CloudCom.2010.107 [10] Vinay Sudhakaran, Neil Chue Hong, Evaluating the suitability of MapReduce for surface temperature analysis codes, DataCloud- SC '11 Proceedings of the second international workshop on Data intensive computing in the clouds, Pages 3-12, ACM New York, NY, USA ©2011 [11] Hansen, J., R. Ruedy, J. Glascoe, and M. Sato. “GISS analysis of surface temperature change.” J. Geophys.Res.,104, 1999: 30,997- 31,022. [12] Hansen, James, and Sergej Lebedeff. “Global Trends of Measured Surface Air Temperature.” J. Geophys. Res., 92,1987: 13,345- 13,372. [13] Jaliya Ekanayake and Shrideep Pallickara, MapReduce for Data Intensive Scientific Analyses, eScience, ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience, Pages 277-284, IEEE Computer Society Washington, DC, USA ©2008 [14] National Climatic Data Centre , http://www.ncdc.noaa.gov [15] Daily Global Weather Measurements 1929-2009 (NCDC,GSOD) dataset in Amazon web services , http://www.amazon.com/datasets/2759