Application level optimization of big data transfers through pipelining, parallelism and concurrency

•

0 likes•53 views

Application level optimization of big data transfers through pipelining, parallelism and concurrency +91-9994232214,7806844441, ieeeprojectchennai@gmail.com, www.projectsieee.com, www.ieee-projects-chennai.com IEEE PROJECTS 2016-2017 ----------------------------------- Contact:+91-9994232214,+91-7806844441 Email: ieeeprojectchennai@gmail.com

Engineering

Application-Level Optimization of Big Data Transfers through Pipelining,
Parallelism and Concurrency
Abstract:
In end-to-end data transfers, there are several factors affecting the data transfer
throughput, such as the network characteristics (e.g., network bandwidth, round-
trip-time, background traffic); end-system characteristics (e.g., NIC capacity,
number of CPU cores and their clock rate, number of disk drives and their I/O
rate); and the dataset characteristics (e.g., average file size, dataset size, file size
distribution). Optimization of big data transfers over inter-cloud and intra-
cloud networks is a challenging task that requires joint-consideration of all of
these parameters. This optimization task becomes even more challenging when
transferring datasets comprised of heterogeneous file sizes (i.e., large files and
small files mixed). Previous work in this area only focuses on the end-system and
network characteristics however does not provide models regarding the dataset
characteristics. In this study, we analyze the effects of the three most important
transfer parameters that are used to enhance data transfer throughput:
pipelining,parallelism and concurrency. We provide models and guidelines to set
the best values for these parameters and present two different transfer
optimization algorithms that use the models developed. The tests conducted over
high-speed networking and cloud testbeds show that our algorithms outperform
the most popular data transfer tools like Globus Online and UDT in majority of the
cases.

This document discusses dynamic adaptation techniques for optimizing data transfer performance over networks. It describes how the number of concurrent data transfer streams can be adjusted dynamically according to changing network conditions, without relying on historical measurements or external profiling. The proposed approach gradually increases the level of parallelism during a transfer to find a near-optimal number of streams based on instant throughput measurements, allowing it to adapt to varying environments and network utilization over time.

Ijcatr04071003

Editor IJCATR

This document provides a survey of file replication techniques used in grid systems. It begins with an introduction to grid systems and discusses their use of replication to improve response times and reduce bandwidth consumption. It then categorizes replication techniques as static or dynamic and describes challenges of replication including maintaining consistency and overhead. The document surveys various replication strategies for different grid topologies like peer-to-peer, tree and hybrid. It evaluates strategies based on factors like access latency, bandwidth consumption and fault tolerance. Specific replication techniques are discussed for peer-to-peer architectures aimed at availability, placement strategies and balancing workloads.

EXPLOITING EFFICIENT AND SCALABLE SHUFFLE TRANSFERS IN FUTURE DATA CENTER NET...

I3E Technologies

Presentation on osi layer

Kuldeep Pathak

Networking

rokubatch4

This document discusses different types of communication, transmission methods, networking concepts, network models, advantages of networking, and network topologies. It defines unicast, multicast and broadcast communication, as well as simplex, half duplex and full duplex transmission methods. The text also explains that a computer network consists of two or more linked devices, with each device being called a node. Two common network models are also outlined as client-server and peer-to-peer. Network topologies refer to the physical and logical arrangements of how nodes are connected in a network.

Layers and Peer to Peer Process - DCCN

Taimoor Muzaffar Gondal

Parallelism aware batch scheduling

aifayed

The document proposes a new DRAM scheduling algorithm called parallelism-aware batch scheduling (PAR-BS) that aims to improve system throughput and reduce thread stalling by preserving each thread's bank-level parallelism through scheduling requests from the same thread back-to-back while also ensuring fairness through request batching. Current DRAM scheduling techniques are unaware of parallelism opportunities across threads and can degrade performance by serializing threads' concurrent memory requests.

Node level parallism in Hadoop

Amar Dhillon

This document discusses node level parallelism and dynamic controlling of container creation services (CCS) on nodes. It proposes using a proportional-derivative (PD) controller to dynamically tune CCS based on metrics like CPU utilization, blocked I/O processes, and context switches. The PD approach is shown to provide faster performance and better resource utilization compared to manual tuning or a static best practice CCS value. Experimental results on 10 node clusters demonstrate the PD approach improves application completion times by 7-31% over static approaches.

This document provides an overview of parallel and distributed processing. It begins by introducing different types of parallelism including task, data, and pipelining parallelism. It then discusses domain decomposition and data parallelism using the example of processing an image by averaging pixel values. Other topics covered include classification of parallel machines, uses of parallelism such as supercomputing, and Flynn's taxonomy for classifying computers based on the number of instruction and data streams. The document aims to provide motivation and background on parallel processing concepts.

Computer Architecture: A quantitative approach - Cap4 - Section 8

Marcelo Arbore

DATA SCIENCE Lesson 2 Parallelism Computing Data Processing Performance Measu...

Jean-Antoine Moreau

Instruction Level Parallelism Compiler optimization Techniques Anna Universit...

Dr.K. Thirunadana Sikamani

INSTRUCTION LEVEL PARALLALISM

Kamran Ashraf

This document discusses instruction-level parallelism (ILP), which refers to executing multiple instructions simultaneously in a program. It describes different types of parallel instructions that do not depend on each other, such as at the bit, instruction, loop, and thread levels. The document provides an example to illustrate ILP and explains that compilers and processors aim to maximize ILP. It outlines several ILP techniques used in microarchitecture, including instruction pipelining, superscalar, out-of-order execution, register renaming, speculative execution, and branch prediction. Pipelining and superscalar processing are explained in more detail.

Instruction Level Parallelism (ILP) Limitations

Jose Pinilla

This document discusses instruction-level parallelism (ILP) limitations. It covers ILP background using a MIPS example, hardware models that were studied including register renaming and branch/jump prediction assumptions. A study of ILP limitations found diminishing returns with larger window sizes and realizable processors are limited by complexity and power constraints. Simultaneous multithreading was explored as a technique to improve ILP but has its own design challenges. Today, x86 and ARM processors employ various ILP optimizations within pipeline constraints.

Parallel Computing

Ameya Waghmare

In network aggregation techniques for wireless sensor networks - a survey

Gungi Achi

This document provides a comprehensive survey of in-network aggregation techniques for wireless sensor networks. It begins by defining in-network aggregation and classifying approaches into those with and without data size reduction. It identifies the key components of in-network aggregation as routing protocols, aggregation functions, and data representation. It reviews theoretical limits of aggregation and discusses open issues. The document aims to provide an updated view of in-network aggregation and motivate future research in this area.

Traffic-aware adaptive server load balancing for softwaredefined networks

IJECEIAES

Servers in data center networks handle heterogeneous bulk loads. Load balancing, therefore, plays an important role in optimizing network bandwidth and minimizing response time. A complete knowledge of the current network status is needed to provide a stable load in the network. The process of network status catalog in a traditional network needs additional processing which increases complexity, whereas, in software defined networking, the control plane monitors the overall working of the network continuously. Hence it is decided to propose an efficient load balancing algorithm that adapts SDN. This paper proposes an efficient algorithm TAASLB-traffic-aware adaptive server load balancing to balance the flows to the servers in a data center network. It works based on two parameters, residual bandwidth, and server capacity. It detects the elephant flows and forwards them towards the optimal server where it can be processed quickly. It has been tested with the Mininet simulator and gave considerably better results compared to the existing server load balancing algorithms in the floodlight controller. After experimentation and analysis, it is understood that the method provides comparatively better results than the existing load balancing algorithms.

Transfer reliability and congestion control strategies in opportunistic netwo...

IEEEFINALYEARPROJECTS

The document discusses transfer reliability and congestion control strategies in opportunistic networks. It begins by stating that opportunistic networks have unpredictable node contacts and rarely have complete end-to-end paths. It then discusses how modified TCP protocols are ineffective for these networks and they require different approaches than intermittently connected networks. The document surveys proposals for transfer reliability using hop-by-hop custody transfer and end-to-end receipts. It also categorizes storage congestion control based on single or multiple message copies. It identifies open research issues including replication management and drop policies for multiple copies.

JAVA 2013 IEEE NETWORKING PROJECT Transfer reliability and congestion control...

IEEEGLOBALSOFTTECHNOLOGIES

The document discusses transfer reliability and congestion control strategies in opportunistic networks. It begins by stating that opportunistic networks have unpredictable node contacts and rarely have complete end-to-end paths. It notes that modified TCP protocols are ineffective for these networks. The document then surveys proposals for ensuring reliable data transfer and avoiding network congestion in opportunistic networks. It categorizes existing proposals and identifies mechanisms like hop-by-hop custody transfer and end-to-end receipts for reliability. For congestion control, it discusses replication management, drop policies, and considering message copy numbers. The document concludes by identifying open research challenges.

Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...

IJERA Editor

This document proposes combining rough k-means clustering with a single moving average time series model to improve network traffic prediction. The document first discusses related work on network traffic prediction using various time series models. It then describes using a single moving average model to initially predict network packet loads, and enhancing this prediction by incorporating clusters identified through rough k-means analysis of the network data. The proposed integrated model is evaluated on real network traffic data and shown to improve prediction accuracy over the conventional single moving average model alone.

Web based-distributed-sesnzer-using-service-oriented-architecture

Aidah Izzah Huriyah

This document proposes a web-based distributed network analyzer using System Entity Structure (SES) over a service-oriented architecture. It aims to efficiently analyze large amounts of network behavior data. The system would distribute network monitoring and data capture inside target networks, while performing analysis outside networks for security. It uses Discrete Event System Specification and service-oriented architecture to deploy workloads across multiple servers, improving performance. The SES approach structures data hierarchically to optimize analysis based on user requirements. The document provides background on DEVS modeling, SES, and web services before describing the proposed distributed analysis system.

Optimal configuration of network

jpstudcorner

HEAD OFFICE (CHENNAI): JP INFOTECH, Old No. 31, New No: 86, 1st Floor, 1st Avenue, Ashok Nagar (Ashok Pillar), Chennai – 83. Landmark: Next to Kotak Mahendra Bank / Bharath Scans MOBILE: (0)9952649690 LANDLINE: (044) 43012642 E-MAIL: jpinfotechprojects@gmail.com PUDUCHERRY / PONDICHERRY OFFICE: JP INFOTECH, #45, Kamaraj Salai, Thattanchavady, Pondy-Tindivanam High Road, Puducherry/Pondicherry – 605009. Landmark: Opp. To Thattanchavady Industrial Estate and Next to VVP Nagar Arch. MOBILE: (0)9952649690 / (0) 8608600246 LANDLINE: (0413)4300535 E-MAIL: jpinfotechprojects@gmail.com

Survey on Synchronizing File Operations Along with Storage Scalable Mechanism

IRJET Journal

The document summarizes research on efficient file operations and storage scalability mechanisms. It discusses how data is divided into chunks and distributed to nodes for transmission in peer-to-peer networks. The proposed system aims to provide efficient load balancing, eliminate single points of failure, and ensure synchronization and security during data transmission. It uses synchronization algorithms and a hybrid distribution model combining features of peer-to-peer and client-server networks. The system is designed to securely handle insertions, deletions, splits, and concatenations of file chunks in a distributed storage system.

A New Architecture for Group Replication in Data Grid

Editor IJCATR

Nowadays, grid systems are vital technology for programs running with high performance and problems solving with largescale in scientific, engineering and business. In grid systems, heterogeneous computational resources and data should be shared between independent organizations that are scatter geographically. A data grid is a kind of grid types that make relations computational and storage resources. Data replication is an efficient way in data grid to obtain high performance and high availability by saving numerous replicas in different locations e.g. grid sites. In this research, we propose a new architecture for dynamic Group data replication. In our architecture, we added two components to OptorSim architecture: Group Replication Management component (GRM) and Management of Popular Files Group component (MPFG). OptorSim developed by European Data Grid projects for evaluate replication algorithm. By using this architecture, popular files group will be replicated in grid sites at the end of each predefined time interval.

Internet data mining 2006

raj_vij

This document discusses the challenges of collecting, storing, and analyzing large volumes of internet measurement data. It examines issues such as distributed and resilient data collection, handling multi-timescale and heterogeneous data from various sources, and developing standardized tools and formats. The paper proposes the "datapository" - an internet data repository designed to address these challenges through a collaborative framework for data sharing, storage, and analysis tools. The goal is to help both network operators and researchers more effectively harness the wealth of data available.

Dynamic control of coding for progressive packet arrivals in dtns

JPINFOTECH JAYAPRAKASH

This paper focuses on packet routing in Delay Tolerant Networks (DTNs) where end-to-end connectivity is intermittent. It studies routing policies for transferring files when packets arrive progressively at the source node. It analyzes the optimality conditions for routing policies in terms of delivery probability and delay. It proposes piecewise-threshold policies that perform better than existing work-conserving policies, especially when there is an energy constraint. It extends the analysis to coded packets generated using linear block codes and rateless coding. Numerical results show piecewise-threshold policies have higher efficiency than work-conserving policies.

2005-03-17 Air Quality Cluster TechTrack

Rudolf Husar

The document discusses a federated information system called Dvoy that aims to integrate heterogeneous air quality data from different sources and provide uniform access. It does this through the use of wrappers that encapsulate data sources and mediators implemented as web services that resolve logical heterogeneity and allow for standardized querying of multidimensional data cubes. The system uses mediators and wrappers based on previous research to overcome issues of data access, translation and merging across different source schemas and formats.

Ws Stuff

Rudolf Husar

The document discusses using a mediator-based architecture and web services to provide uniform access to distributed heterogeneous air quality data sources. Key points include: - A mediator server can homogenize data coding/formatting from various sources, allowing users to access data through a simple universal interface while minimizing changes to data providers. - Services like Dvoy use mediators and wrappers to resolve technical and logical heterogeneity across sources and provide multidimensional querying of spatial-temporal data cubes. - This approach facilitates data sharing and integration for improved analysis to address challenges from secondary pollutants and more participatory management of air quality.

Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET

ijcncs

This document summarizes a proposed Cross Layer- Performance Enhancement Architecture (CL-PEA) for mobile ad hoc networks (MANETs). The key points are: 1) Existing TCP/IP architecture is not well-suited for the dynamic topology and limited resources of MANETs. A cross-layer design where all layers can exchange information is proposed to better optimize protocol performance. 2) The proposed CL-PEA adds a new hardware layer where parameters from the hardware, operating system, and other layers can be stored. This allows all layers to access information to make more informed decisions. 3) By exchanging parameters across layers, CL-PEA aims to enhance protocol performance in

Workload-Aware Data Management in Shared-Nothing Distributed OLTP Databases

Joarder Kamal

The document outlines the progress of a PhD thesis on developing a workload-aware incremental repartitioning framework for shared-nothing distributed OLTP databases. It discusses the background, objectives, contributions including developing a transaction generation model and performance metrics, and graph-cut and transaction-level based repartitioning methods. It also provides an overview of the evaluation process, publications, and writing progress of the thesis chapters.

Viewers also liked

PMSCS 657_Parallel and Distributed processing

Md. Mashiur Rahman

Computer Architecture: A quantitative approach - Cap4 - Section 8

Marcelo Arbore

DATA SCIENCE Lesson 2 Parallelism Computing Data Processing Performance Measu...

Jean-Antoine Moreau

Instruction Level Parallelism Compiler optimization Techniques Anna Universit...

Dr.K. Thirunadana Sikamani

INSTRUCTION LEVEL PARALLALISM

Kamran Ashraf

Instruction Level Parallelism (ILP) Limitations

Jose Pinilla

Parallel Computing

Ameya Waghmare

Viewers also liked (7)

PMSCS 657_Parallel and Distributed processing

Computer Architecture: A quantitative approach - Cap4 - Section 8

DATA SCIENCE Lesson 2 Parallelism Computing Data Processing Performance Measu...

Instruction Level Parallelism Compiler optimization Techniques Anna Universit...

INSTRUCTION LEVEL PARALLALISM

Instruction Level Parallelism (ILP) Limitations

Parallel Computing

Similar to Application level optimization of big data transfers through pipelining, parallelism and concurrency

In network aggregation techniques for wireless sensor networks - a survey

Gungi Achi

Traffic-aware adaptive server load balancing for softwaredefined networks

IJECEIAES

Transfer reliability and congestion control strategies in opportunistic netwo...

IEEEFINALYEARPROJECTS

JAVA 2013 IEEE NETWORKING PROJECT Transfer reliability and congestion control...

IEEEGLOBALSOFTTECHNOLOGIES

The document discusses transfer reliability and congestion control strategies in opportunistic networks. It begins by stating that opportunistic networks have unpredictable node contacts and rarely have complete end-to-end paths. It notes that modified TCP protocols are ineffective for these networks. The document then surveys proposals for ensuring reliable data transfer and avoiding network congestion in opportunistic networks. It categorizes existing proposals and identifies mechanisms like hop-by-hop custody transfer and end-to-end receipts for reliability. For congestion control, it discusses replication management, drop policies, and considering message copy numbers. The document concludes by identifying open research challenges.

Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...

IJERA Editor

Web based-distributed-sesnzer-using-service-oriented-architecture

Aidah Izzah Huriyah

Optimal configuration of network

jpstudcorner

Survey on Synchronizing File Operations Along with Storage Scalable Mechanism

IRJET Journal

A New Architecture for Group Replication in Data Grid

Editor IJCATR

Internet data mining 2006

raj_vij

Dynamic control of coding for progressive packet arrivals in dtns

JPINFOTECH JAYAPRAKASH

2005-03-17 Air Quality Cluster TechTrack

Rudolf Husar

Ws Stuff

Rudolf Husar

Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET

ijcncs

Workload-Aware Data Management in Shared-Nothing Distributed OLTP Databases

Joarder Kamal

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...

balmanme

The document discusses Mehmet Balman's work on network-aware data management for large-scale distributed applications. It provides background on Balman, including his employment at VMware and affiliations. The presentation outline discusses VSAN and VVOL storage performance in virtualized environments, data streaming in high-bandwidth networks, the Climate100 100Gbps networking demo, and other topics related to network-aware data management.

Green wsn optimization of energy use

ijfcstjournal

Advances in micro fabrication and communication techniques have led to unimaginable proliferation of WSN applications. Research is focussed on reduction of setup operational energy costs. Bulk of operational energy costs are linked to communication activities of WSN. Any progress towards energy efficiency has a potential of huge savings globally. Therefore, every energy efficient step is an endeavour to cut costs and ‘Go Green’. In this paper, we have proposed a framework to reduce communication workload through: Innetwork compression and multiple query synthesis at the base-station and modification of query syntax through introduction of Static Variables. These approaches are general approaches which can be used in any WSN irrespective of application.

GREEN WSN- OPTIMIZATION OF ENERGY USE THROUGH REDUCTION IN COMMUNICATION WORK...

ijfcstjournal

50120130406035

IAEME Publication

This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that proposes an algorithm called Replica Placement in Graph Topology Grid (RPGTG) to optimally place data replicas in a graph-based data grid while ensuring quality of service (QoS). The algorithm aims to minimize data access time, balance load among replica servers, and avoid unnecessary replications, while restricting QoS in terms of number of hops and deadline to complete requests. The article describes how the algorithm converts the graph structure of the data grid to a hierarchical structure to better manage replica servers and proposes services to facilitate dynamic replication, including a replica catalog to track replica locations and a replica manager to perform replication

A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...

Tal Lavian Ph.D.

Data intensive Grid applications often deal with multiple terabytes and even petabytes of data. For them to be effectively deployed over distances, it is crucial that Grid infrastructures learn how to best exploit high-performance networks (such as agile optical networks). The network footprint of these Grid applications show pronounced peaks and valleys in utilization, prompting for a radical overhaul of traditional network provisioning styles such as peak-provisioning, point-and-click or operator-assisted provisioning. A Grid stack must become capable to dynamically orchestrate a complex set of variables related to application requirements, data services, and network provisioning services, all within a rapidly and continually changing environment. Presented here is a platform that addresses some of these issues. This service platform closely integrates a set of large-scale data services with those for dynamic bandwidth allocation, through a network resource middleware service, using an OGSA-compliant interface allowing direct access by external applications. Recently, this platform has been implemented as an experimental research prototype on a unique wide area optical networking testbed incorporating state-of-the-art photonic components. The paper, which presents initial results of research conducted on this prototype, indicates that these methods have the potential to address multiple major challenges related to data intensive applications. Given the complexities of this topic, especially where scheduling is required, only selected aspects of this platform are considered in this paper.

Similar to Application level optimization of big data transfers through pipelining, parallelism and concurrency (20)

In network aggregation techniques for wireless sensor networks - a survey

Traffic-aware adaptive server load balancing for softwaredefined networks

Transfer reliability and congestion control strategies in opportunistic netwo...

JAVA 2013 IEEE NETWORKING PROJECT Transfer reliability and congestion control...

Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...

Web based-distributed-sesnzer-using-service-oriented-architecture

Optimal configuration of network

Survey on Synchronizing File Operations Along with Storage Scalable Mechanism

A New Architecture for Group Replication in Data Grid

Internet data mining 2006

Dynamic control of coding for progressive packet arrivals in dtns

2005-03-17 Air Quality Cluster TechTrack

Ws Stuff

Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET

Workload-Aware Data Management in Shared-Nothing Distributed OLTP Databases

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...

Green wsn optimization of energy use

GREEN WSN- OPTIMIZATION OF ENERGY USE THROUGH REDUCTION IN COMMUNICATION WORK...

50120130406035

A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...

More from ieeepondy

Demand aware network function placement

Application level optimization of big data transfers through pipelining, parallelism and concurrency

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Application level optimization of big data transfers through pipelining, parallelism and concurrency

Similar to Application level optimization of big data transfers through pipelining, parallelism and concurrency (20)

More from ieeepondy

More from ieeepondy (20)

Recently uploaded

Recently uploaded (20)

Application level optimization of big data transfers through pipelining, parallelism and concurrency