SlideShare a Scribd company logo
1 of 1
Download to read offline
Processing of Time Series Data with Hadoop
Miroslav Mihaylov
The Data and the Problem
In search for general approach for granularity of big time series data
High density large volume time series data.
approximately 20 records/s for a single source
individual data set can exceed 100 million records
Identify various features at different timescales.
harmonic oscillations and slope deviations at few seconds span
characterize trends at larger time scales (hourly and daily)
Methods and Tools
Use the sliding window and for each perform a number of calculations
Mean,Variance and Slope
Fast Fourier Transform
Need to run these for different window size.
Computations utilizing java class libraries from the apache commons math.
FastFourierTransform
DescriptiveStatistics
FFT algorithm prototyping
Locate the harmonic oscillations patterns from the Fourier peaks evolution
Left 60 second sample from the data.
Arrows show individual window ranges.
Below Fourier transforms for windows 1-5.
Hadoop
First MapReduce Job
For each of the sliding windows run a set of filters
Different set of filters for different window size
Separate job for each granularity level
Input data for an individual window.
id time value
. . . . . . . . . . . .
4219122 1370293608.89 1.2844170
4219123 1370293608.94 1.2854440
4219124 1370293609.05 1.2884030
4219125 1370293609.11 1.2837774
4219126 1370293609.17 1.2844120
4219127 1370293609.23 1.2854267
. . . . . . . . . . . .
=⇒
Key: Middle point of the window
key mean slope FFTpeak
1370293609.05 1.2857 2.36e-2 3.8e-4
For smaller window size need to run the second MapReduce for identifying attributes such as
harmonic oscillations- FFT peak values analysis.
discontinuities and abrupt changes- Variance and Slope Analysis
There is no second MapReduce for the largest granularity level
Current Status and Future Endeavors
So far the first general MapReduce task is implemented. Further step of development are:
Utilize the Secondary MapReduce job
Consider implementing additional signal processing filters
Interfacing with database-Currently input/output is ASCII files
Further Challenges
Apply supervised machine learning algorithm for specific feature identification
All of the analysis is for a single data source. Processing two or more correlated data sets would
become a whole different task
Real-time visualization of the data
About
This work is from a class project for “IDS 594 Big Data Analytics”
prof. Kunpeng Zhang
mmihay2@uic.edu

More Related Content

What's hot

Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsYash Khandelwal
 
3D Analyst - Watershed, Padang
3D Analyst - Watershed, Padang3D Analyst - Watershed, Padang
3D Analyst - Watershed, PadangHartanto Sanjaya
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFishAnushree Prasanna Kumar
 
Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Finding Top-k Dominance on Incomplete Big Data Using MapReduce FrameworkFinding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Finding Top-k Dominance on Incomplete Big Data Using MapReduce FrameworkNavid Kalaei
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduceThibault Debatty
 
3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTMHartanto Sanjaya
 
Creating watershed using SRTM DEM
Creating watershed using SRTM DEMCreating watershed using SRTM DEM
Creating watershed using SRTM DEMbajajngadat
 
Point Clouds: What's New
Point Clouds: What's NewPoint Clouds: What's New
Point Clouds: What's NewSafe Software
 
Exploring Raster with FME
Exploring Raster with FMEExploring Raster with FME
Exploring Raster with FMESafe Software
 
3D Analyst Watershed Lombok
3D Analyst Watershed  Lombok3D Analyst Watershed  Lombok
3D Analyst Watershed LombokHartanto Sanjaya
 
Basic use of xcms
Basic use of xcmsBasic use of xcms
Basic use of xcmsXiuxia Du
 
NEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium PosterNEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium PosterBarbara Jean Neal
 
Prepare LiDAR Data To Meet Your Requirements
Prepare LiDAR Data To Meet Your RequirementsPrepare LiDAR Data To Meet Your Requirements
Prepare LiDAR Data To Meet Your RequirementsSafe Software
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Florent Renucci
 
3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream NetworkHartanto Sanjaya
 

What's hot (20)

BarnieMAT
BarnieMATBarnieMAT
BarnieMAT
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
All projects
All projectsAll projects
All projects
 
3D Watershed Celebes
3D Watershed Celebes3D Watershed Celebes
3D Watershed Celebes
 
3D Analyst - Watershed, Padang
3D Analyst - Watershed, Padang3D Analyst - Watershed, Padang
3D Analyst - Watershed, Padang
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
 
Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Finding Top-k Dominance on Incomplete Big Data Using MapReduce FrameworkFinding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM
 
Creating watershed using SRTM DEM
Creating watershed using SRTM DEMCreating watershed using SRTM DEM
Creating watershed using SRTM DEM
 
Point Clouds: What's New
Point Clouds: What's NewPoint Clouds: What's New
Point Clouds: What's New
 
Exploring Raster with FME
Exploring Raster with FMEExploring Raster with FME
Exploring Raster with FME
 
3D Analyst Lab 1
3D Analyst Lab 13D Analyst Lab 1
3D Analyst Lab 1
 
3D Analyst Watershed Lombok
3D Analyst Watershed  Lombok3D Analyst Watershed  Lombok
3D Analyst Watershed Lombok
 
Basic use of xcms
Basic use of xcmsBasic use of xcms
Basic use of xcms
 
NEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium PosterNEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium Poster
 
Prepare LiDAR Data To Meet Your Requirements
Prepare LiDAR Data To Meet Your RequirementsPrepare LiDAR Data To Meet Your Requirements
Prepare LiDAR Data To Meet Your Requirements
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
3 D Analyst
3 D Analyst3 D Analyst
3 D Analyst
 
3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network
 

Similar to Processing Time Series Data with Hadoop MapReduce

My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Florent Renucci
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Wuhyun Rico Shin
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierA High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierIJRES Journal
 
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Srinath Perera
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemMelissa Luster
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemReza Rahimi
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban DonatoEsteban Donato
 
An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingParis Carbone
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptrveiga100
 
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...Tiziano De Matteis
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.pptgrssieee
 

Similar to Processing Time Series Data with Hadoop MapReduce (20)

My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierA High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
 
UNIT-1.pptx
UNIT-1.pptxUNIT-1.pptx
UNIT-1.pptx
 
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging System
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Malstone KDD 2010
Malstone KDD 2010Malstone KDD 2010
Malstone KDD 2010
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
Project Report
Project ReportProject Report
Project Report
 
An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data Streaming
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 

Processing Time Series Data with Hadoop MapReduce

  • 1. Processing of Time Series Data with Hadoop Miroslav Mihaylov The Data and the Problem In search for general approach for granularity of big time series data High density large volume time series data. approximately 20 records/s for a single source individual data set can exceed 100 million records Identify various features at different timescales. harmonic oscillations and slope deviations at few seconds span characterize trends at larger time scales (hourly and daily) Methods and Tools Use the sliding window and for each perform a number of calculations Mean,Variance and Slope Fast Fourier Transform Need to run these for different window size. Computations utilizing java class libraries from the apache commons math. FastFourierTransform DescriptiveStatistics FFT algorithm prototyping Locate the harmonic oscillations patterns from the Fourier peaks evolution Left 60 second sample from the data. Arrows show individual window ranges. Below Fourier transforms for windows 1-5. Hadoop First MapReduce Job For each of the sliding windows run a set of filters Different set of filters for different window size Separate job for each granularity level Input data for an individual window. id time value . . . . . . . . . . . . 4219122 1370293608.89 1.2844170 4219123 1370293608.94 1.2854440 4219124 1370293609.05 1.2884030 4219125 1370293609.11 1.2837774 4219126 1370293609.17 1.2844120 4219127 1370293609.23 1.2854267 . . . . . . . . . . . . =⇒ Key: Middle point of the window key mean slope FFTpeak 1370293609.05 1.2857 2.36e-2 3.8e-4 For smaller window size need to run the second MapReduce for identifying attributes such as harmonic oscillations- FFT peak values analysis. discontinuities and abrupt changes- Variance and Slope Analysis There is no second MapReduce for the largest granularity level Current Status and Future Endeavors So far the first general MapReduce task is implemented. Further step of development are: Utilize the Secondary MapReduce job Consider implementing additional signal processing filters Interfacing with database-Currently input/output is ASCII files Further Challenges Apply supervised machine learning algorithm for specific feature identification All of the analysis is for a single data source. Processing two or more correlated data sets would become a whole different task Real-time visualization of the data About This work is from a class project for “IDS 594 Big Data Analytics” prof. Kunpeng Zhang mmihay2@uic.edu