SlideShare a Scribd company logo
1 of 1
Many big datasets exhibit lot of redundancies
in the data that can be exploited to design
faster query search tools. This report exploits
to design entropy scaling search tools on well
defined structured datasets. Frameworks of
similarity search based on characterizing the
dataset’s entropy and fractal dimensions are
used. Through these search techniques it is
proved to have lesser time and space
complexity. Time is scaled with metric
entropy(number of covering hyper spheres)
and low fractal number. Space is scaled with
sum of metric entropy and information
theoretic entropy. Also this approach
accelerates the standard tools, with no loss in
specificity and little loss in sensitivity. In later
part we will estimating the weights of each
mutational signatures in order to cluster the
DNA sequences hierarchically using GA
Methods
Entropy scaling search for massive biological data
Gokulakannan Selvam, Raghunathan Rengaswamy *
Department of Chemical Engineering, Indian Institute of Technology Madras
The better reduction in time and space is
achieved in entropy search is by using the
following methods,
 Firstly, every DNA sequences are
transformed using BW function which
forms the basis for further compression
 The output is applied on to Move to front
transformation function to increase entropy
 Above transformed data is then encoded
by Huffman function (for high compression)
 Finding the dissimilarities between every
compressed DNA sequences (hamming d.)
 Hierarchically cluster the sequences and
then estimate appropriate cluster centers
Later weights of all six mutational signatures
(C>A,C>G,C>T,T>A,T>C,T>G) is estimated
using genetic algorithm with fitness function –
simple evolutionary principles to find the
optimum values
Abstract
Entropy scaling
framework
A- finding dissimilarities , B- selecting
cluster centers, C- coarse search,
D- fine search
Flow chart
start
Initialize the weights
Cluster hierarchically and then estimate the
total number of mis-match
If change in objective function is less
than tolerance
stop
Yes
NO
Plot 2: plot of fitness value and average distance
vs. generations for population size = 100 and initial
range is [0,100000]
Results
 For 100x100 randomly generated sequence,
Total size of original data = 12,288 bytes
Total size of compressed data = 3919 bytes
Compressibility ~ 3.14
1
0 2 4
without
compression
With
compression
For the 10 neight files {file 1, file 2,.., file10}
Observed clusters
Cluster 1: {file 1}
Cluster 2: {file 2}
Cluster 3:{file 3,file 4,file 5,..,file 9,file 10}
w1 w2 w3 w4 w5 w6
objective
function
-144552 2310314 -119340 114300 217846.2 191830.1 -0.03597
Theory
 For local fractal dimension around a data
point can be computed by determining
other data points within the radii r1 and r2,
of that point;given those points (n1 and n2)
d = log (n2/n1) / log (r2/r1)
 Time complexity is measure of how long
the process takes while space complexity
is measure of how much memory is used
up during the process
Order = O (k + |Bd (q,r)| (r+2rc / r)d)
References
Conclusion
1. Entropy scaling search –cell systems by
Y.W. Yu, N.M. Daniels, D.C. Danko
2. DNA sequence compression using BWT
by Don A.,Y.Zhang, Amar M, Tim bell
3. Opportunistic data structures by Paola F
and Giovanni Manzini
In this project we have introduced an
entropy-scaling framework for accelerating
approximate search on dynamic omic data’s
 This approach bounds both time and
space as functions of the dataset
entropy (metric entropy bounds time,
while I-T entropy bounds space)
 Low fractal dimensions ensures that run
time is dominated by metric entropy
 Weights of all 6 mutational signatures
are dependent on many factors like
population size and initial range
 Better results can be obtained by GA
and by considering many other features
& larger data sets

More Related Content

What's hot

3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Alexander Decker
 
Compressive Data Gathering using NACS in Wireless Sensor Network
Compressive Data Gathering using NACS in Wireless Sensor NetworkCompressive Data Gathering using NACS in Wireless Sensor Network
Compressive Data Gathering using NACS in Wireless Sensor NetworkIRJET Journal
 
Optimal Converge cast Methods for Tree- Based WSNs
Optimal Converge cast Methods for Tree- Based WSNsOptimal Converge cast Methods for Tree- Based WSNs
Optimal Converge cast Methods for Tree- Based WSNsIJMER
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...Raed Aldahdooh
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore ProjectsLatest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects1crore projects
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniquesPoonam Kshirsagar
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyijpla
 
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERPERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERijdms
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering TypesAshwin Shenoy M
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...EuroIoTa
 
LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.Gaurav Agarwal
 

What's hot (20)

3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
 
Compressive Data Gathering using NACS in Wireless Sensor Network
Compressive Data Gathering using NACS in Wireless Sensor NetworkCompressive Data Gathering using NACS in Wireless Sensor Network
Compressive Data Gathering using NACS in Wireless Sensor Network
 
Optimal Converge cast Methods for Tree- Based WSNs
Optimal Converge cast Methods for Tree- Based WSNsOptimal Converge cast Methods for Tree- Based WSNs
Optimal Converge cast Methods for Tree- Based WSNs
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
50120140505013
5012014050501350120140505013
50120140505013
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore ProjectsLatest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracy
 
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERPERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
 
LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.
 

Viewers also liked

INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI...
 INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI... INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI...
INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI...IAEME Publication
 
MCDM Techniques for the Selection of Material Handling Equipment in the Autom...
MCDM Techniques for the Selection of Material Handling Equipment in the Autom...MCDM Techniques for the Selection of Material Handling Equipment in the Autom...
MCDM Techniques for the Selection of Material Handling Equipment in the Autom...IJMER
 
Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision makingMohd Syahril Said
 
Multi Criteria Decision Making With PROMETHEE method and software
Multi Criteria Decision Making With PROMETHEE method and softwareMulti Criteria Decision Making With PROMETHEE method and software
Multi Criteria Decision Making With PROMETHEE method and softwareAfrouz Hojati
 
Multiple criteria decision analysis using
Multiple criteria decision analysis usingMultiple criteria decision analysis using
Multiple criteria decision analysis usingIJMIT JOURNAL
 
Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision makingKhalid Mdnoh
 
MCDM Introduction 08-01
MCDM Introduction 08-01MCDM Introduction 08-01
MCDM Introduction 08-01rmcnab67
 
ROBOTIC WELDING Presentation to show2
ROBOTIC WELDING Presentation to show2ROBOTIC WELDING Presentation to show2
ROBOTIC WELDING Presentation to show2Prateek Sood
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 

Viewers also liked (14)

INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI...
 INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI... INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI...
INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI...
 
MCDM Techniques for the Selection of Material Handling Equipment in the Autom...
MCDM Techniques for the Selection of Material Handling Equipment in the Autom...MCDM Techniques for the Selection of Material Handling Equipment in the Autom...
MCDM Techniques for the Selection of Material Handling Equipment in the Autom...
 
Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision making
 
Multi Criteria Decision Making With PROMETHEE method and software
Multi Criteria Decision Making With PROMETHEE method and softwareMulti Criteria Decision Making With PROMETHEE method and software
Multi Criteria Decision Making With PROMETHEE method and software
 
Multiple criteria decision analysis using
Multiple criteria decision analysis usingMultiple criteria decision analysis using
Multiple criteria decision analysis using
 
Multicriteria Decision Analysis
Multicriteria Decision AnalysisMulticriteria Decision Analysis
Multicriteria Decision Analysis
 
Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision making
 
mcdm method
mcdm methodmcdm method
mcdm method
 
MCDM Introduction 08-01
MCDM Introduction 08-01MCDM Introduction 08-01
MCDM Introduction 08-01
 
VIKOR LOGO & IDENTITIES - Bộ nhận diện Cty CP Tôn VIKOR
VIKOR LOGO & IDENTITIES - Bộ nhận diện Cty CP Tôn VIKORVIKOR LOGO & IDENTITIES - Bộ nhận diện Cty CP Tôn VIKOR
VIKOR LOGO & IDENTITIES - Bộ nhận diện Cty CP Tôn VIKOR
 
ROBOTIC WELDING Presentation to show2
ROBOTIC WELDING Presentation to show2ROBOTIC WELDING Presentation to show2
ROBOTIC WELDING Presentation to show2
 
Robotic welding
Robotic weldingRobotic welding
Robotic welding
 
Welding Robots
Welding RobotsWelding Robots
Welding Robots
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 

Similar to Entropy scaling search method

Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...cscpconf
 
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...csandit
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsJigisha Aryya
 
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...sherinmm
 
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...sherinmm
 
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discoveryJAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discoveryIEEEGLOBALSOFTTECHNOLOGIES
 
Binarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsBinarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
Accurate time series classification using shapelets
Accurate time series classification using shapeletsAccurate time series classification using shapelets
Accurate time series classification using shapeletsIJDKP
 
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...IJMER
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...IJECEIAES
 
A Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingA Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingMartha Brown
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachEditor IJMTER
 
Scalable and efficient cluster based framework for
Scalable and efficient cluster based framework forScalable and efficient cluster based framework for
Scalable and efficient cluster based framework foreSAT Publishing House
 
Scalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingScalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingeSAT Journals
 
Provably secure and efficient audio compression based on compressive sensing
Provably secure and efficient audio compression based on  compressive sensingProvably secure and efficient audio compression based on  compressive sensing
Provably secure and efficient audio compression based on compressive sensingIJECEIAES
 
Enchancing the Data Collection in Tree based Wireless Sensor Networks
Enchancing the Data Collection in Tree based Wireless Sensor NetworksEnchancing the Data Collection in Tree based Wireless Sensor Networks
Enchancing the Data Collection in Tree based Wireless Sensor Networksijsrd.com
 

Similar to Entropy scaling search method (20)

Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
 
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systems
 
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
 
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
 
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discoveryJAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
 
Binarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsBinarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf Manuscripts
 
Robust Algorithm for Discrete Tomography with Gray Value Estimation
Robust Algorithm for Discrete Tomography with Gray Value EstimationRobust Algorithm for Discrete Tomography with Gray Value Estimation
Robust Algorithm for Discrete Tomography with Gray Value Estimation
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
Accurate time series classification using shapelets
Accurate time series classification using shapeletsAccurate time series classification using shapelets
Accurate time series classification using shapelets
 
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
 
A Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingA Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series Forecasting
 
Ceis 4
Ceis 4Ceis 4
Ceis 4
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed Approach
 
Scalable and efficient cluster based framework for
Scalable and efficient cluster based framework forScalable and efficient cluster based framework for
Scalable and efficient cluster based framework for
 
Scalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingScalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexing
 
Provably secure and efficient audio compression based on compressive sensing
Provably secure and efficient audio compression based on  compressive sensingProvably secure and efficient audio compression based on  compressive sensing
Provably secure and efficient audio compression based on compressive sensing
 
Enchancing the Data Collection in Tree based Wireless Sensor Networks
Enchancing the Data Collection in Tree based Wireless Sensor NetworksEnchancing the Data Collection in Tree based Wireless Sensor Networks
Enchancing the Data Collection in Tree based Wireless Sensor Networks
 

Recently uploaded

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

Entropy scaling search method

  • 1. Many big datasets exhibit lot of redundancies in the data that can be exploited to design faster query search tools. This report exploits to design entropy scaling search tools on well defined structured datasets. Frameworks of similarity search based on characterizing the dataset’s entropy and fractal dimensions are used. Through these search techniques it is proved to have lesser time and space complexity. Time is scaled with metric entropy(number of covering hyper spheres) and low fractal number. Space is scaled with sum of metric entropy and information theoretic entropy. Also this approach accelerates the standard tools, with no loss in specificity and little loss in sensitivity. In later part we will estimating the weights of each mutational signatures in order to cluster the DNA sequences hierarchically using GA Methods Entropy scaling search for massive biological data Gokulakannan Selvam, Raghunathan Rengaswamy * Department of Chemical Engineering, Indian Institute of Technology Madras The better reduction in time and space is achieved in entropy search is by using the following methods,  Firstly, every DNA sequences are transformed using BW function which forms the basis for further compression  The output is applied on to Move to front transformation function to increase entropy  Above transformed data is then encoded by Huffman function (for high compression)  Finding the dissimilarities between every compressed DNA sequences (hamming d.)  Hierarchically cluster the sequences and then estimate appropriate cluster centers Later weights of all six mutational signatures (C>A,C>G,C>T,T>A,T>C,T>G) is estimated using genetic algorithm with fitness function – simple evolutionary principles to find the optimum values Abstract Entropy scaling framework A- finding dissimilarities , B- selecting cluster centers, C- coarse search, D- fine search Flow chart start Initialize the weights Cluster hierarchically and then estimate the total number of mis-match If change in objective function is less than tolerance stop Yes NO Plot 2: plot of fitness value and average distance vs. generations for population size = 100 and initial range is [0,100000] Results  For 100x100 randomly generated sequence, Total size of original data = 12,288 bytes Total size of compressed data = 3919 bytes Compressibility ~ 3.14 1 0 2 4 without compression With compression For the 10 neight files {file 1, file 2,.., file10} Observed clusters Cluster 1: {file 1} Cluster 2: {file 2} Cluster 3:{file 3,file 4,file 5,..,file 9,file 10} w1 w2 w3 w4 w5 w6 objective function -144552 2310314 -119340 114300 217846.2 191830.1 -0.03597 Theory  For local fractal dimension around a data point can be computed by determining other data points within the radii r1 and r2, of that point;given those points (n1 and n2) d = log (n2/n1) / log (r2/r1)  Time complexity is measure of how long the process takes while space complexity is measure of how much memory is used up during the process Order = O (k + |Bd (q,r)| (r+2rc / r)d) References Conclusion 1. Entropy scaling search –cell systems by Y.W. Yu, N.M. Daniels, D.C. Danko 2. DNA sequence compression using BWT by Don A.,Y.Zhang, Amar M, Tim bell 3. Opportunistic data structures by Paola F and Giovanni Manzini In this project we have introduced an entropy-scaling framework for accelerating approximate search on dynamic omic data’s  This approach bounds both time and space as functions of the dataset entropy (metric entropy bounds time, while I-T entropy bounds space)  Low fractal dimensions ensures that run time is dominated by metric entropy  Weights of all 6 mutational signatures are dependent on many factors like population size and initial range  Better results can be obtained by GA and by considering many other features & larger data sets