Entropy scaling search method

•Download as PPTX, PDF•

0 likes•79 views

As a B.Tech project, developed a faster search method by reducing both space and time thereby enhancing data mining process across various domains of application. -> Achieved lossless data compression with higher compressibility ratio -> Performed clustering in compressed domain for classification and pattern matching -> Applied it on human genome for identification of cancer signatures

Engineering

$Many big datasets exhibit lot of redundancies in the data that can be exploited to design faster query search tools. This report exploits to design entropy scaling search tools on well defined structured datasets. Frameworks of similarity search based on characterizing the dataset’s entropy and fractal dimensions are used. Through these search techniques it is proved to have lesser time and space complexity. Time is scaled with metric entropy(number of covering hyper spheres) and low fractal number. Space is scaled with sum of metric entropy and information theoretic entropy. Also this approach accelerates the standard tools, with no loss in specificity and little loss in sensitivity. In later part we will estimating the weights of each mutational signatures in order to cluster the DNA sequences hierarchically using GA Methods Entropy scaling search for massive biological data Gokulakannan Selvam, Raghunathan Rengaswamy * Department of Chemical Engineering, Indian Institute of Technology Madras The better reduction in time and space is achieved in entropy search is by using the following methods,  Firstly, every DNA sequences are transformed using BW function which forms the basis for further compression  The output is applied on to Move to front transformation function to increase entropy  Above transformed data is then encoded by Huffman function (for high compression)  Finding the dissimilarities between every compressed DNA sequences (hamming d.)  Hierarchically cluster the sequences and then estimate appropriate cluster centers Later weights of all six mutational signatures (C>A,C>G,C>T,T>A,T>C,T>G) is estimated using genetic algorithm with fitness function – simple evolutionary principles to find the optimum values Abstract Entropy scaling framework A- finding dissimilarities , B- selecting cluster centers, C- coarse search, D- fine search Flow chart start Initialize the weights Cluster hierarchically and then estimate the total number of mis-match If change in objective function is less than tolerance stop Yes NO Plot 2: plot of fitness value and average distance vs. generations for population size = 100 and initial range is [0,100000] Results  For 100x100 randomly generated sequence, Total size of original data = 12,288 bytes Total size of compressed data = 3919 bytes Compressibility ~ 3.14 1 0 2 4 without compression With compression For the 10 neight files {file 1, file 2,.., file10} Observed clusters Cluster 1: {file 1} Cluster 2: {file 2} Cluster 3:{file 3,file 4,file 5,..,file 9,file 10} w1 w2 w3 w4 w5 w6 objective function -144552 2310314 -119340 114300 217846.2 191830.1 -0.03597 Theory  For local fractal dimension around a data point can be computed by determining other data points within the radii r1 and r2, of that point;given those points (n1 and n2) d = log (n2/n1) / log (r2/r1)  Time complexity is measure of how long the process takes while space complexity is measure of how much memory is used up during the process Order = O (k + |Bd (q,r)| (r+2rc / r)d) References Conclusion 1. Entropy scaling search –cell systems by Y.W. Yu, N.M. Daniels, D.C. Danko 2. DNA sequence compression using BWT by Don A.,Y.Zhang, Amar M, Tim bell 3. Opportunistic data structures by Paola F and Giovanni Manzini In this project we have introduced an entropy-scaling framework for accelerating approximate search on dynamic omic data’s  This approach bounds both time and space as functions of the dataset entropy (metric entropy bounds time, while I-T entropy bounds space)  Low fractal dimensions ensures that run time is dominated by metric entropy  Weights of all 6 mutational signatures are dependent on many factors like population size and initial range  Better results can be obtained by GA and by considering many other features & larger data sets$

What's hot

3.3 hierarchical methodsKrish_ver2

Analysis of mass based and density based clustering techniques on numerical d...Alexander Decker

Compressive Data Gathering using NACS in Wireless Sensor NetworkIRJET Journal

Optimal Converge cast Methods for Tree- Based WSNsIJMER

Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications

50120140505013IAEME Publication

CLIQUE Automatic subspace clustering of high dimensional data for data mining...Raed Aldahdooh

New Approach for K-mean and K-medoids AlgorithmEditor IJCATR

Mean shift and Hierarchical clustering Yan Xu

Large Scale Data Clustering: an overviewVahid Mirjalili

Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri

K-Means, its Variants and its ApplicationsVarad Meru

Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects1crore projects

Current clustering techniquesPoonam Kshirsagar

An improvement in k mean clustering algorithm using better time and accuracyijpla

PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERijdms

DATA MINING:Clustering TypesAshwin Shenoy M

A Tale of Data Pattern Discovery in ParallelJenny Liu

Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...EuroIoTa

LIDAR- Light Detection and Ranging.Gaurav Agarwal

What's hot (20)

3.3 hierarchical methods

Analysis of mass based and density based clustering techniques on numerical d...

Compressive Data Gathering using NACS in Wireless Sensor Network

Optimal Converge cast Methods for Tree- Based WSNs

Premeditated Initial Points for K-Means Clustering

50120140505013

CLIQUE Automatic subspace clustering of high dimensional data for data mining...

New Approach for K-mean and K-medoids Algorithm

Mean shift and Hierarchical clustering

Large Scale Data Clustering: an overview

Clustering for Stream and Parallelism (DATA ANALYTICS)

K-Means, its Variants and its Applications

Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects

Current clustering techniques

An improvement in k mean clustering algorithm using better time and accuracy

PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER

DATA MINING:Clustering Types

A Tale of Data Pattern Discovery in Parallel

Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...

LIDAR- Light Detection and Ranging.

Viewers also liked

INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI...IAEME Publication

MCDM Techniques for the Selection of Material Handling Equipment in the Autom...IJMER

Multi criteria decision makingMohd Syahril Said

Multi Criteria Decision Making With PROMETHEE method and softwareAfrouz Hojati

Multiple criteria decision analysis usingIJMIT JOURNAL

Multicriteria Decision Analysisanggiaputrinilasari

Multi criteria decision makingKhalid Mdnoh

mcdm methodKaushik Rudra

MCDM Introduction 08-01rmcnab67

VIKOR LOGO & IDENTITIES - Bộ nhận diện Cty CP Tôn VIKORTHANHS BRANDING & MANAGEMENT COMPANY

ROBOTIC WELDING Presentation to show2Prateek Sood

Robotic weldingSridhar Raj

Welding RobotsVenkata Raja Paruchuru

SEO: Getting PersonalKirsty Hulse

Viewers also liked (14)

INSTANCE BASED MULTI CRITERIA DECISION MODEL FOR CLOUD SERVICE SELECTION USI...

MCDM Techniques for the Selection of Material Handling Equipment in the Autom...

Multi criteria decision making

Multi Criteria Decision Making With PROMETHEE method and software

Multiple criteria decision analysis using

Multicriteria Decision Analysis

Multi criteria decision making

mcdm method

MCDM Introduction 08-01

VIKOR LOGO & IDENTITIES - Bộ nhận diện Cty CP Tôn VIKOR

ROBOTIC WELDING Presentation to show2

Robotic welding

Welding Robots

SEO: Getting Personal

Similar to Entropy scaling search method

Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...cscpconf

NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...csandit

Algorithm selection for sorting in embedded and mobile systemsJigisha Aryya

MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...sherinmm

Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...sherinmm

JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discoveryIEEEGLOBALSOFTTECHNOLOGIES

Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal

Robust Algorithm for Discrete Tomography with Gray Value EstimationAssociation of Scientists, Developers and Faculties

Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker

Accurate time series classification using shapeletsIJDKP

Fast Data Collection with Interference and Life Time in Tree Based Wireless S...IJMER

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit

A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...IJECEIAES

A Hybrid Deep Neural Network Model For Time Series ForecastingMartha Brown

Ceis 4Alexander Decker

Detection of Outliers in Large Dataset using Distributed ApproachEditor IJMTER

Scalable and efficient cluster based framework foreSAT Publishing House

Scalable and efficient cluster based framework for multidimensional indexingeSAT Journals

Provably secure and efficient audio compression based on compressive sensingIJECEIAES

Enchancing the Data Collection in Tree based Wireless Sensor Networksijsrd.com

Similar to Entropy scaling search method (20)

Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...

NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...

Algorithm selection for sorting in embedded and mobile systems

MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...

Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...

JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery

Binarization of Degraded Text documents and Palm Leaf Manuscripts

Robust Algorithm for Discrete Tomography with Gray Value Estimation

Survey on classification algorithms for data mining (comparison and evaluation)

Accurate time series classification using shapelets

Fast Data Collection with Interference and Life Time in Tree Based Wireless S...

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...

A Hybrid Deep Neural Network Model For Time Series Forecasting

Ceis 4

Detection of Outliers in Large Dataset using Distributed Approach

Scalable and efficient cluster based framework for

Scalable and efficient cluster based framework for multidimensional indexing

Provably secure and efficient audio compression based on compressive sensing

Enchancing the Data Collection in Tree based Wireless Sensor Networks

Recently uploaded

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani

Heart Disease Prediction using machine learning.pptxPoojaBan

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Internship report on mechanical engineeringmalavadedarshan25

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar

Current Transformer Drawing and GTP for MSETCLDeelipZope

Oxy acetylene welding presentation note.eptoze12

What are the advantages and disadvantages of membrane structures.pptxwendy cai

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

microprocessor 8085 and its interfacingjaychoudhary37

Artificial-Intelligence-in-Electronics (K).pptxbritheesh05

IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

Recently uploaded (20)

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf

Heart Disease Prediction using machine learning.pptx

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

Internship report on mechanical engineering

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger

Current Transformer Drawing and GTP for MSETCL

Oxy acetylene welding presentation note.

What are the advantages and disadvantages of membrane structures.pptx

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Introduction to Microprocesso programming and interfacing.pptx

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

microprocessor 8085 and its interfacing

Artificial-Intelligence-in-Electronics (K).pptx

IVE Industry Focused Event - Defence Sector 2024

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

Entropy scaling search method

1. Many big datasets exhibit lot of redundancies in the data that can be exploited to design faster query search tools. This report exploits to design entropy scaling search tools on well defined structured datasets. Frameworks of similarity search based on characterizing the dataset’s entropy and fractal dimensions are used. Through these search techniques it is proved to have lesser time and space complexity. Time is scaled with metric entropy(number of covering hyper spheres) and low fractal number. Space is scaled with sum of metric entropy and information theoretic entropy. Also this approach accelerates the standard tools, with no loss in specificity and little loss in sensitivity. In later part we will estimating the weights of each mutational signatures in order to cluster the DNA sequences hierarchically using GA Methods Entropy scaling search for massive biological data Gokulakannan Selvam, Raghunathan Rengaswamy * Department of Chemical Engineering, Indian Institute of Technology Madras The better reduction in time and space is achieved in entropy search is by using the following methods,  Firstly, every DNA sequences are transformed using BW function which forms the basis for further compression  The output is applied on to Move to front transformation function to increase entropy  Above transformed data is then encoded by Huffman function (for high compression)  Finding the dissimilarities between every compressed DNA sequences (hamming d.)  Hierarchically cluster the sequences and then estimate appropriate cluster centers Later weights of all six mutational signatures (C>A,C>G,C>T,T>A,T>C,T>G) is estimated using genetic algorithm with fitness function – simple evolutionary principles to find the optimum values Abstract Entropy scaling framework A- finding dissimilarities , B- selecting cluster centers, C- coarse search, D- fine search Flow chart start Initialize the weights Cluster hierarchically and then estimate the total number of mis-match If change in objective function is less than tolerance stop Yes NO Plot 2: plot of fitness value and average distance vs. generations for population size = 100 and initial range is [0,100000] Results  For 100x100 randomly generated sequence, Total size of original data = 12,288 bytes Total size of compressed data = 3919 bytes Compressibility ~ 3.14 1 0 2 4 without compression With compression For the 10 neight files {file 1, file 2,.., file10} Observed clusters Cluster 1: {file 1} Cluster 2: {file 2} Cluster 3:{file 3,file 4,file 5,..,file 9,file 10} w1 w2 w3 w4 w5 w6 objective function -144552 2310314 -119340 114300 217846.2 191830.1 -0.03597 Theory  For local fractal dimension around a data point can be computed by determining other data points within the radii r1 and r2, of that point;given those points (n1 and n2) d = log (n2/n1) / log (r2/r1)  Time complexity is measure of how long the process takes while space complexity is measure of how much memory is used up during the process Order = O (k + |Bd (q,r)| (r+2rc / r)d) References Conclusion 1. Entropy scaling search –cell systems by Y.W. Yu, N.M. Daniels, D.C. Danko 2. DNA sequence compression using BWT by Don A.,Y.Zhang, Amar M, Tim bell 3. Opportunistic data structures by Paola F and Giovanni Manzini In this project we have introduced an entropy-scaling framework for accelerating approximate search on dynamic omic data’s  This approach bounds both time and space as functions of the dataset entropy (metric entropy bounds time, while I-T entropy bounds space)  Low fractal dimensions ensures that run time is dominated by metric entropy  Weights of all 6 mutational signatures are dependent on many factors like population size and initial range  Better results can be obtained by GA and by considering many other features & larger data sets

Entropy scaling search method

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Entropy scaling search method

Similar to Entropy scaling search method (20)

Recently uploaded

Recently uploaded (20)

Entropy scaling search method