This document provides an overview of time series data mining. It begins with an introduction to time series data and examples of time series similarity search tasks. It then discusses major time series mining tasks like indexing, clustering, classification, prediction and anomaly detection. Distance measures for time series similarity search are explained, including Dynamic Time Warping which allows for nonlinear time alignments. Dimensionality reduction techniques like Fourier analysis and discretization using Symbolic Aggregate Approximation are also summarized. The document is presented as an introduction to key concepts and techniques in time series data mining.
Classification of common clustering algorithm and techniques, e.g., hierarchical clustering, distance measures, K-means, Squared error, SOFM, Clustering large databases.
Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
Get involved with the steps of Kmeans and Hierarchical clustering and also understand how scaling affects the clustering with Agglomerative and Divise modes.
Do let me know if anything is required. Ping me at google #bobrupakroy
It is a data mining technique used to place the data elements into their related groups. Clustering is the process of partitioning the data (or objects) into the same class, The data in one class is more similar to each other than to those in other cluster.
Classification of common clustering algorithm and techniques, e.g., hierarchical clustering, distance measures, K-means, Squared error, SOFM, Clustering large databases.
Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
Get involved with the steps of Kmeans and Hierarchical clustering and also understand how scaling affects the clustering with Agglomerative and Divise modes.
Do let me know if anything is required. Ping me at google #bobrupakroy
It is a data mining technique used to place the data elements into their related groups. Clustering is the process of partitioning the data (or objects) into the same class, The data in one class is more similar to each other than to those in other cluster.
Adaptive K-Means Clustering Algorithm for MR Breast Image Segmentation
3D Brain Tumor Segmentation Scheme using K-mean Clustering and Connected Component Labeling Algorithms
Volume Identification and Estimation of MRI Brain Tumor
MRI Breast cancer diagnosis hybrid approach using adaptive Ant-based segmentation and Multilayer Perceptron NN classifier
Image Segmentation
Types of Image Segmentation
Semantic Segmentation
Instance Segmentation
Types of Image Segmentation Techniques based on the image properties:
Threshold Method.
Edge Based Segmentation.
Region-Based Segmentation.
Clustering Based Segmentation.
Watershed Based Method.
Artificial Neural Network Based Segmentation.
In machine learning, support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier.
Time series forecasting with machine learningDr Wei Liu
An introduction of developing and application time series forecast models with both traditional time series methods and machine learning techniques. Case study for a challenging very short-term electrical price forecasting project was presented.
its very useful for students.
Sharpening process in spatial domain
Direct Manipulation of image Pixels.
The objective of Sharpening is to highlight transitions in intensity
The image blurring is accomplished by pixel averaging in a neighborhood.
Since averaging is analogous to integration.
Prepared by
M. Sahaya Pretha
Department of Computer Science and Engineering,
MS University, Tirunelveli Dist, Tamilnadu.
Using Classification and Clustering with Azure Machine Learning Models shows how to use classification and clustering algorithms with Azure Machine Learning.
SA is a global optimization technique.
It distinguishes between different local optima.
It is a memory less algorithm & the algorithm does not use any information gathered during the search.
SA is motivated by an analogy to annealing in solids.
& it is an iterative improvement algorithm.
Adaptive K-Means Clustering Algorithm for MR Breast Image Segmentation
3D Brain Tumor Segmentation Scheme using K-mean Clustering and Connected Component Labeling Algorithms
Volume Identification and Estimation of MRI Brain Tumor
MRI Breast cancer diagnosis hybrid approach using adaptive Ant-based segmentation and Multilayer Perceptron NN classifier
Image Segmentation
Types of Image Segmentation
Semantic Segmentation
Instance Segmentation
Types of Image Segmentation Techniques based on the image properties:
Threshold Method.
Edge Based Segmentation.
Region-Based Segmentation.
Clustering Based Segmentation.
Watershed Based Method.
Artificial Neural Network Based Segmentation.
In machine learning, support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier.
Time series forecasting with machine learningDr Wei Liu
An introduction of developing and application time series forecast models with both traditional time series methods and machine learning techniques. Case study for a challenging very short-term electrical price forecasting project was presented.
its very useful for students.
Sharpening process in spatial domain
Direct Manipulation of image Pixels.
The objective of Sharpening is to highlight transitions in intensity
The image blurring is accomplished by pixel averaging in a neighborhood.
Since averaging is analogous to integration.
Prepared by
M. Sahaya Pretha
Department of Computer Science and Engineering,
MS University, Tirunelveli Dist, Tamilnadu.
Using Classification and Clustering with Azure Machine Learning Models shows how to use classification and clustering algorithms with Azure Machine Learning.
SA is a global optimization technique.
It distinguishes between different local optima.
It is a memory less algorithm & the algorithm does not use any information gathered during the search.
SA is motivated by an analogy to annealing in solids.
& it is an iterative improvement algorithm.
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. In in this lecture we overview the mining of data streams
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
This presentation will present topics such as "What is Anomaly Detection? What are the different types of Data that may be used? What are the popular techniques may be used to identify anomalies. What are the best practices in anomaly detection? What is the Value of Anomaly Detection?
Accelerating Dynamic Time Warping Subsequence Search with GPUDavide Nardone
Many time series data mining problems require
subsequence similarity search as a subroutine. While this can
be performed with any distance measure, and dozens of
distance measures have been proposed in the last decade, there
is increasing evidence that Dynamic Time Warping (DTW) is
the best measure across a wide range of domains. Given
DTW’s usefulness and ubiquity, there has been a large
community-wide effort to mitigate its relative lethargy.
Proposed speedup techniques include early abandoning
strategies, lower-bound based pruning, indexing and
embedding. In this work we argue that we are now close to
exhausting all possible speedup from software, and that we
must turn to hardware-based solutions if we are to tackle the
many problems that are currently untenable even with stateof-
the-art algorithms running on high-end desktops. With this
motivation, we investigate both GPU (Graphics Processing
Unit) and FPGA (Field Programmable Gate Array) based
acceleration of subsequence similarity search under the DTW
measure. As we shall show, our novel algorithms allow GPUs,
which are typically bundled with standard desktops, to achieve
two orders of magnitude speedup. For problem domains which
require even greater scale up, we show that FPGAs costing just
a few thousand dollars can be used to produce four orders of
magnitude speedup. We conduct detailed case studies on the
classification of astronomical observations and similarity
search in commercial agriculture, and demonstrate that our
ideas allow us to tackle problems that would be simply
untenable otherwise.
"En esta charla daremos una visión general sobre la minería de series temporales. En particular nos concentraremos en los problemas clásicos de la minería de datos, como la agrupación, la clasificación supervisada y la detección de anomalías. Para estos problemas haremos hincapié en las diferencias entre trabajar con series temporales o vectores regulares. Además, presentaremos nuevos problemas que aparecen en el área de la minería de datos cuando se trabaja con series temporales y también señalan la complejidad de trabajar con flujos de series temporales.
"
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
Recent developments in the field of reduced order modeling - and in particular, active subspace construction - have made it possible to efficiently approximate complex models by constructing low-order response surfaces based upon a small subspace of the original high dimensional parameter space. These methods rely upon the fact that the response tends to vary more prominently in a few dominant directions defined by linear combinations of the original inputs, allowing for a rotation of the coordinate axis and a consequent transformation of the parameters. In this talk, we discuss a gradient free active subspace algorithm that is feasible for high dimensional parameter spaces where finite-difference techniques are impractical. We illustrate an initialized gradient-free active subspace algorithm for a neutronics example implemented with SCALE6.1.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Technology Analysis Using Patent Citation Network of a Seminal PatentShanmukha S. Potti
There exists a need for a quick technology intelligence process to visualize, identify and track changes in science and technology for technology driven entities. In order to fill this gap, a methodology to portray a patent citation network of a seminal patent is proposed here. This methodology can be primarily used to determine what sub-technologies contributed to the technology in question, at what period in time and with how much impact. The same can also be extended to identify what sub technologies spun out from the technology/invention and also the application areas of the same. Patent latent variables like strength and similarity are also used to illustrate the development and flow of scientific knowledge in the technology considered. The efficacy of the methodology is demonstrated by considering patents related to light emitting GaN based compound semi-conductor devices technology (Blue LED Technology), and analyzing the findings resulting from the use of the methodology.
Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...Shanmukha S. Potti
The project - Construction of 6 Euro IV CPCL Oil storage tankers is critically analyzed from a project management perspective.
Systems approach, Work Breakdown Structure, PERT, Costing, Project C&M, Biz (CMS) are discussed in detail.
BIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLERShanmukha S. Potti
1. This project deals with bidirectional speed control of DC motor using 8051 micro-controller.
2. Design of H bridge dc-dc converter is an IGBT based bridge circuit.
3. The control circuit consists of the 8051 microcontroller which is programmed to generate pulses to turn on IGBTs per required sequence.
4. The H bridge dc-dc converter is implemented with hardware setup and software program in the 8051 –C code.
Proactive planning for catastrophic events in supply chainsShanmukha S. Potti
Research paper in the Journal of Operations Management
by A.Micheal Knemeyer, Walter Zinn, Cuneyt Eroglu is discussed. The presentation also provides an introduction to proactive planning for catastrophic events in supply chains.
HR Analytics: New approaches, higher returns on human capital investmentShanmukha S. Potti
As global economic and political conditions continue to concern business leaders, their attention turns to the various levers that can foster success in uncertain times by looking for competitive insights to the massive data they can now capture. But to date, HR departments have lagged behind the efforts of Marketing, IT, CRM and other functions. The purpose of this paper is to show how business function leaders can start mining data to measure and improve HR's contributions to business performance.
Commercialization Options for a set of Wireless PatentsShanmukha S. Potti
Given a portfolio of patents, this project utilizes two approaches of study – one is analysis of the portfolio as a whole and the second is specific analysis limited to individual patent assets.
This process involves mining for crown jewels in a portfolio, using Patent Analytics.
Patent assets thus identified were mapped to a wireless value chain and an innovation value chain to determine preferred commercialization options.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
The affect of service quality and online reviews on customer loyalty in the E...
Time series data mining techniques
1. IT'S ABOUT TIME !!
Presented By-
P.SHANMUKHA SREENIVAS
M.MGT 1
2. AN OVERVIEW ON TIME SERIES DATA MINING
OUTLINE
2
1. Introduction
2. Similarity Search in Time Series Data
3. Feature-based Dimensionality Reduction
4. Discretization
5. Other Time Series Data Mining Tasks
6. Conclusions
3. 3
Introduction
6145.45
6128.75
6142.7
6201.2
6151.9
6050.95
5917.75
5855.95
5984
5993.9
5934.8
5920.05
5950
5950.7
5963.8
6141.15
..
..
6471.4
6511.7
6563.25
6558.45
6492.7
6546.75
A time series is a collection of observations
made sequentially in time.
CNX IT returns
Examples: Financial time series, scientific time series
4. TIME SERIES SIMILARITY SEARCH
4
Some examples:
- Identifying companies with similar patterns of growth.
- Determining products with similar selling patterns
- Discovering stocks with similar movement in stock prices.
- Finding out whether a musical score is similar to one of a set
of copyrighted scores.
5. Major Time Series Data Mining Tasks
• Indexing
• Clustering
• Classification
• Prediction
• Anomaly Detection
Indexing and clustering make explicit use of a distance measure
The others make implicit use of a distance measure
6. TIME SERIES SIMILARITY SEARCH
DISTANCE MEASURES
Euclidean distance
Dynamic Time Warping
Other distance measures
o Threshold query based similarity search (TQuEST)
o Minkowski Distance
6
7. 7
Euclidean Distance Metric
Given two time series
Q = q1…qn
and
C = c1…cn
their Euclidean distance is
defined as:
n
2 ,
i i D Q C q c
i
1
C
Q
D(Q,C)
8. What’s wrong with Euclidean Distance?
Similar sequences but they are shifted and have different scales
Normalize the time series before measuring
the distance between them. 푥푖
What if a sequence is stretched or compressed along the time axis?
(Goldin and Kanellakis, 1995)
′ =
푥푖 − μ
σ
9. 9
Dynamic Time Warping (Berndt et al.)
Dynamic Time Warping is a technique that finds the optimal
alignment between two time series if one time series may be
“warped” non-linearly by stretching or shrinking it along its time
axis.
This warping between two time series can be used or to determine
the similarity between the two time series.
Fixed Time Axis
Sequences are aligned “one to one”.
“Warped” Time Axis
Nonlinear alignments are possible.
10. DYNAMIC TIME WARPING
[BERNDT, CLIFFORD, 1994]
Allows acceleration-deceleration of signals along the time
dimension
Basic idea
X = (x1; x2; :::xN); N є N Y = (y1; y2; :::yM); M є N
*Data sequences should be sampled at equidistant points in time
Algorithm starts by building the distance matrix C є R (N*M)
representing all pairwise distances between X and Y
This distance matrix is also called as the local cost matrix
c(i,j) = ||xi - yj|| i є [1 : N]; j є [1 : M]
Once the local cost matrix is built, the algorithm finds the
alignment path which runs through the low-cost areas – ‘valleys’
on the augmented cost matrix
11. C
Q
C Q
HOW IS DTW
CALCULATED?
(i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) }
Warping path w
12. CONSTRAINTS
Boundary condition
Shanmukha Sreenivas P , DoMS
The starting and ending points of the warping path must be the first and the
last points of aligned sequences i.e C1 =(1,1) Ck=(M,N)
Monotonicity condition
n1< n2 < ::: < nK and m1< m2< :::< mK.
This condition preserves the time-ordering of points.
Step size condition
This criteria limits the warping path from long jumps (shifts in time) while
aligning sequences.
i.e we’ll be looking at only these values w(i-1,j-1) , w(i-1,j ) , w(i,j-1)
12
13. Shanmukha Sreenivas P , DoMS
CONSTRAINT VISUALIZATION
a)Admissible path satisfying constraints
b)Violation of boundary condition
c)Violation of monotonicity
d)Violation of step size
13
14. STEP SIZE CONDITION
A global constraint constrains the indices of the warping path wk = (i,j)k such that
j-r i j+r
Where r is a term defining allowed range of warping for a given point in a
sequence.
r =
Sakoe-Chiba Band Itakura Parallelogram
18. FORMULATION
Let D(i, j) refer to the dynamic time warping
distance between the subsequences
x1, x2, …, xi
y1, y2, …, yj
D(i, j) = | xi – yj | + min{ D(i – 1, j), D(i – 1, j – 1), D(i, j – 1) }
19. SOLUTION BY DYNAMIC PROGRAMMING
Basic implementation = O(n2) where n is the length of
the sequences
will have to solve the problem for each (i, j)
pair
If warping window is specified, then O(nw)
Only solve for the (i, j) pairs where | i – j | <=
w
20. FEATURE-BASED DIMENSIONALITY
REDUCTION
20
• Time series databases are often extremely large.
Searching directly on these data will be very
complex and inefficient.
• To overcome this problem, we should use some of
transformation methods to reduce the magnitude of
time series.
• These transformation methods are called
dimensionality reduction techniques.
21. 21
Dimensionality Reduction
C
An Example of a
Technique I
0 20 40 60 80 100 120 140
Raw
Data
0.4995
0.5264
0.5523
0.5761
0.5973
0.6153
0.6301
0.6420
0.6515
0.6596
0.6672
0.6751
0.6843
0.6954
0.7086
0.7240
0.7412
0.7595
0.7780
0.7956
0.8115
0.8247
0.8345
0.8407
0.8431
0.8423
0.8387
…
The graphic shows a
time series with 128
points.
The raw data used to
produce the graphic is
also reproduced as a
column of numbers (just
the first 30 or so points are
shown).
n = 128
27. DISCRETIZATION
27
• Discretization of a time series is tranforming it into a
symbolic string.
• The main benefit of this discretization is that there is an
enormous wealth of existing algorithms and data structures
that allow the efficient manipulations of symbolic
representations.
• Lin and Keogh et al. (2003) proposed a method called
Symbolic Aggregate Approximation (SAX), which allows
the descretization of original time series into symbolic
strings.
28. SYMBOLIC AGGREGATE
APPROXIMATION (SAX) [LIN ET AL. 2003]
28
baabccbc
The first symbolic representation
of time series, that allows
discretization of time series into
symbolic strings
29. HOW DO WE OBTAIN SAX
29
C
C
0 20 40 60 80 100 120
0
-
b
20 40 60 80 100 120
b
b
a
c
c
c
a
baabccbc
First convert the time
series to PAA
representation, then
convert the PAA to
symbols
30. TWO PARAMETER CHOICES
30
0 20 40 60 80 100 120
0
-
b
20 40 60 80 100 120
b
b
a
c
c
c
a
C
C
1 2 3 4 5 6 7
1
8
The word size, in this
case 8
The alphabet size (cardinality), in this case 3
3
2
1
31. Structural representations help in
understanding time series through
Data analysis + Visualization
SAX is claimed to be a landmark representation
of time series
Symbolic and therefore allows use of discrete data
structures and their corresponding algorithms for
analysis
Also helps with visualization
31
32. THANK YOU
www.cs.ucr.edu/~eamonn/TSDMA/index.html
32
Datasets and code used in
this presentation can be
found at..