Online data deduplication for in memory big-data analytic systems

•Download as DOCX, PDF•

0 likes•45 views

Shakas Technologies

Technology

2020 – 2021
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6.
Off: 0416-2247353 Mo: +91 9500218218 / +91 8220150373
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
Online Data Deduplication for In-Memory Big-Data Analytic Systems
Abstract :
Given a set of files that show a certain degree of similarity, we consider a novel problem of
performing data redundancy elimination across a set of distributed worker nodes in a shared-
nothing in-memory big data analytic system. The redundancy elimination scheme is designed
in a manner that is: (i) space-efficient: the total space needed to store the files is minimized
and, (ii) access-isolation: data shuffling among server is also minimized. In this paper, we first
show that finding an access-efficient and space optimal solution is an NP-Hard problem.
Following this, we present the file partitioning algorithms that locate access-efficient solutions
in an incremental manner with minimal algorithm time complexity (polynomial time). Our
experimental verification on multiple data sets confirms that the proposed file partitioning
solution is able to achieve compression ratio close to the optimal compression performance
achieved by a centralized solution.

Similar to Online data deduplication for in memory big-data analytic systems

Secure distributed de duplication systems withShakas Technologies

Towards a new hybrid approach for building documentoriented data warehIJECEIAES

Fast and scalable range query processing with strong privacy protection for c...Shakas Technologies

Improving availability and reducing redundancy using deduplication of cloud s...dhanarajp

A hybrid cloud approach for secure authorized deduplicationShakas Technologies

Benefit based data caching in ad hoc networks (synopsis)Mumbai Academisc

Exploit every bit effective caching for high dimensional nearest neighbor searchShakas Technologies

Secure and Efficient Client and Server Side Data Deduplication to Reduce Stor...dbpublications

I-Sieve: An inline High Performance Deduplication System Used in cloud storageredpel dot com

Hashedcubes simple, low memory, real time visualNexgen Technology

MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal

Maximizing p2 p file access availability in mobile ad hoc networks though rep...Shakas Technologies

A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker

Srinivasan2-10-12Kiran Srinivasan

ICICCE0298IJTET Journal

50120130406035IAEME Publication

A survey on data mining and analysis in hadoop and mongo dbAlexander Decker

AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...ijcsit

Similar to Online data deduplication for in memory big-data analytic systems (20)

Secure distributed de duplication systems with

Towards a new hybrid approach for building documentoriented data wareh

Fast and scalable range query processing with strong privacy protection for c...

Improving availability and reducing redundancy using deduplication of cloud s...

A hybrid cloud approach for secure authorized deduplication

Benefit based data caching in ad hoc networks (synopsis)

Exploit every bit effective caching for high dimensional nearest neighbor search

Secure and Efficient Client and Server Side Data Deduplication to Reduce Stor...

I-Sieve: An inline High Performance Deduplication System Used in cloud storage

Hashedcubes simple, low memory, real time visual

MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...

Maximizing p2 p file access availability in mobile ad hoc networks though rep...

A fuzzy clustering algorithm for high dimensional streaming data

Srinivasan2-10-12

ICICCE0298

50120130406035

A survey on data mining and analysis in hadoop and mongo db

AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...

Recently uploaded

APIForce Zurich 5 April Automation LPDGMarianaLemus7

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Artificial intelligence in the post-deep learning eraDeakin University

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Install Stable Diffusion in windows machinePadma Pradeep

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

CloudStudio User manual (basic edition):comworks

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Presentation on how to chat with PDF using ChatGPT code interpreter

Scanning the Internet for External Cloud Exposures via SSL Certs

Maximizing Board Effectiveness 2024 Webinar.pptx

SQL Database Design For Developers at php[tek] 2024

Designing IA for AI - Information Architecture Conference 2024

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Human Factors of XR: Using Human Factors to Design XR Systems

Artificial intelligence in the post-deep learning era

Injustice - Developers Among Us (SciFiDevCon 2024)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Install Stable Diffusion in windows machine

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

CloudStudio User manual (basic edition):

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Online data deduplication for in memory big-data analytic systems

1. 2020 – 2021 #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 Mo: +91 9500218218 / +91 8220150373 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com Online Data Deduplication for In-Memory Big-Data Analytic Systems Abstract : Given a set of files that show a certain degree of similarity, we consider a novel problem of performing data redundancy elimination across a set of distributed worker nodes in a shared- nothing in-memory big data analytic system. The redundancy elimination scheme is designed in a manner that is: (i) space-efficient: the total space needed to store the files is minimized and, (ii) access-isolation: data shuffling among server is also minimized. In this paper, we first show that finding an access-efficient and space optimal solution is an NP-Hard problem. Following this, we present the file partitioning algorithms that locate access-efficient solutions in an incremental manner with minimal algorithm time complexity (polynomial time). Our experimental verification on multiple data sets confirms that the proposed file partitioning solution is able to achieve compression ratio close to the optimal compression performance achieved by a centralized solution.

Online data deduplication for in memory big-data analytic systems

Recommended

Recommended

More Related Content

Similar to Online data deduplication for in memory big-data analytic systems

Similar to Online data deduplication for in memory big-data analytic systems (20)

More from Shakas Technologies

More from Shakas Technologies (20)

Recently uploaded

Recently uploaded (20)

Online data deduplication for in memory big-data analytic systems