SlideShare a Scribd company logo
1 of 29
Download to read offline
Benchmarking Personal Cloud
Storage

Spyros Eleftheriadis – itp12401@hua.gr
Harokopio University of Athens
Department of Informatics and Telematics
Postgraduate Program "Informatics and Telematics“
Modern Computer System Architectures

Thursday 16 January 2014
Introduction
• Services like Dropbox, SkyDrive and Google
Drive are becoming pervasive in people’s
routine.
• Such applications are data-intensive and their
increasing usage already produces a significant
share of Internet traffic.
• Very little is known about how providers
implement their services and the implications of
different designs.
Goals
• How different providers tackle the problem of
synchronizing people’s files?
• Development of a methodology that helps to
understand both system architecture and client
capabilities.

• What are the consequences of each design on
performance?
METHODOLOGY AND
SERVICES
Testbed I
• The testbed is composed of two parts
• (i) a test computer that runs the application-under-test in the desired
operating system
• (ii) the testing application.
• The complete environment can run either in a single machine, or in separate
machines provided that the testing application can intercept traffic from the
test computer.
Testbed II
• The actual testbed is a single Linux which both controls the experiments
and hosts a virtual machine that runs the test computer (Windows 7
Enterprise).
• The testing application receives as input benchmarking parameters
describing the sequence of operations to be performed. The testing
application acts remotely on the test computer, generating specific
workloads in the form of file batches, which are manipulated using a FTP
client. Files of different types are created or modified at run-time, e.g.,
text files composed of random words from a dictionary, images with
random pixels, or random binary files.
• Generated files are synchronized to the cloud by the application-undertest and the exchanged traffic is monitored to compute performance
metrics. These include the amount of traffic seen during the experiments,
the time before actual synchronization starts and the time to complete
synchronization.
Architecture & Data Centers
To identify how the analyzed services operate, they observe the
DNS names of contacted servers when:

• (i) starting the application
• (ii) immediately after files are manipulated
• (iii) when the application is in idle state
Contacted DNS name > IP address > Hybrid methodology >
Datacenter location estimation with about a hundred
of kilometers of precision
Checking Capabilities
Personal cloud storage applications can implement several capabilities
to optimize storage usage and to speed up transfers.
These capabilities include the adoption of
• chunking (i.e., splitting content into a maximum size data unit),
• bundling (i.e., the transmission of multiple small files as a single
object),
• deduplication (i.e., avoiding re-transmitting content already available
on servers),
• delta encoding (i.e., transmission of only modified portions of a file)
and
• compression.
For each case, a specific test has been designed to observe if the given
capability is implemented.
Benchmarking Performance
After knowing how the services are designed in terms of both data center
locations and system capabilities, we check how such choices influence
synchronization performance and the amount of overhead traffic.
They designed 8 benchmarks varying:
• number of files
• file sizes
• file types
• All files in the sets are created at run-time by our testing application.
• Each experiment is repeated 24 times per service, allowing at least 5 min
between experiments to avoid creating abnormal workloads to servers.
• The benchmark of a single storage service lasts for about 1 day.
Tested Storage Services
• Dropbox, Google Drive and SkyDrive are selected
because they are among the most popular offers
(according to the volume of search queries containing
names of cloud storage services on Google Trends).
• Wuala is considered because it is a system that offers
encryption at the client-side. (They want to verify the
impact of such privacy layer on synchronization
performance).
• Amazon Cloud Drive included to compare its
performance to Dropbox, since both services rely on
Amazon Web Services (AWS) data centers.
SYSTEM ARCHITECTURE
Protocols I
• All clients use HTTPS, except Dropbox
notification protocol, which relies on plain
HTTP. Interestingly, some Wuala storage
operations also use HTTP, since users’ privacy
has already been secured by local encryption.
• All services but Wuala use separate servers for
control and storage.
Protocols II
Firstly, the applications authenticate the user and check if any content has to be
updated.
• SkyDrive requires about 150 kB in total, 4 times more than others. This happens
because the application contacts many Microsoft Live servers during login (13 in this
example).
Secondly, once login is completed, the applications keep exchanging data with the
cloud.
• Wuala is polling servers every 5 min on average (equivalent background traffic of about
60 b/s).
• Google Drive follows close, with a lightweight 40 s polling interval (42 b/s).
• Dropbox and SkyDrive use intervals close to 1 min (82 b/s and 32 b/s, respectively).
• Amazon Cloud Drive is polling servers every 15 s, each time opening a new HTTPS
connection. This notification strategy consumes 6 kb/s – i.e., about 65 MB per day. This
information is relevant to users with bandwidth constraints (e.g., in 3G/4G networks)
and to the system: 1 million users would generate approximately 6 Gb/s of
signaling traffic alone! As the results for other providers demonstrate,
such design is not optimal and seems indeed
possible to be improved.
Protocols III
Data Centers
• Dropbox uses own servers (in the San Jose area) for client management,
while storage servers are committed to Amazon in Northern Virginia.
• Cloud Drive uses three AWS data centers: two are used for both storage
and control (in Ireland and Northern Virginia); a third one is used for storage
only (in Oregon).
• SkyDrive relies on Microsoft’s data centers in the Seattle area (for storage)
and Southern Virginia (for storage and control). We also identified a
destination in Singapore (for control only).
• Wuala data centers are located in Europe: two in the Nuremberg area, one
in Zurich and a fourth in Northern France. None is owned by Wuala.
• Google Drive follows a different approach: TCP connections are terminated
at the closest Google’s edge node, from where the traffic is routed to the
actual storage/control data center using the private Google’s network.
Google Drive Edge Nodes
CLOUD SERVICE
CLIENT CAPABILITIES
Chunking
• Only Amazon Cloud Drive does not perform chunking

• Google Drive uses 8 MB chunks
• Dropbox uses 4 MB chunks
• SkyDrive and Wuala use variable chunk sizes
Bundling
• Only Dropbox implements a file-bundling strategy

• Google Drive and Amazon Cloud Drive open one
separate TCP (and SSL) connection for each file.
• SkyDrive and Wuala submit files sequentially, waiting
for application layer acknowledgments between each
upload.
Client-side Deduplication
• Only Dropbox and Wuala implement deduplication.
• All other services have to upload the same data
even if it is readily available at the storage server.
Interestingly, Dropbox and Wuala can identify
copies of users’ files even after they are deleted
and later restored.
• In the case of Wuala, deduplication is compatible
with local encryption, i.e., two identical files
generate two identical encrypted versions.
Delta Encoding
• Delta encoding is a specialized compression
technique that calculates file differences among
two copies, allowing the transmission of only the
modifications between revisions.

• Only Dropbox fully implements delta encoding
• Wuala does not implement delta encoding.
However, deduplication prevents the client from
uploading those chunks not affected by the
change.
Delta encoding
Compression
• Compression reduce traffic and storage
requirements at the expense of processing time.
• Dropbox and Google Drive compress data before
transmission.
• Google Drive implements smart policies to verify
the file format.
PERFORMANCE
Synchronization Startup
• Dropbox is the fastest service to start
synchronizing single files. Its bundling strategy,
however, slightly delays start up with multiple files.
As we will show next, such strategy pays back in
total upload time.
• Wuala also increases its startup time when
multiple files are submitted.
• SkyDrive is by far
the slowest.
Completion Time
• Google Drive (26,49 Mb/s) and Wuala (33,34 Mb/s)
are the fastest, since each TCP connection is
terminated at data centers nearby our testbed.
• Dropbox and SkyDrive, on the other hand, are the
most impacted services.
Protocol Overhead
• Protocol overhead is the total storage and control
traffic over the benchmark size.

• Amazom Cloud Drive presents a very high
overhead because of its high number of control
flows opened for every file transfer.
• Dropbox exhibits the highest
overhead among the
remaining services, possibly
owing to the signaling cost
of implementing its
advanced capabilities.
Conclusions
• Dropbox implements most of the checked capabilities, and its
sophisticated client clearly boosts performance.

• Wuala deploys client side encryption, and this feature does
not seem to affect Wuala synchronization performance.
• These 4 examples confirm the role played by data center
placement in a centralized approach taking the perspective
of European users only, network latency is still an important
limitation for U.S. centric services, such as Dropbox and
SkyDrive. Services deploying data centers nearby our test
location, such as Wuala and Google Drive.
Thank you!

More Related Content

What's hot

Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsAvere Systems
 
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and RestIBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and RestSandeep Patil
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computingSachin Gowda
 
Directory Write Leases in MagFS
Directory Write Leases in MagFSDirectory Write Leases in MagFS
Directory Write Leases in MagFSMaginatics
 
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionIDERA Software
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to ProductionVinoth Chandar
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataSchubert Zhang
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFSDataWorks Summit
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbaseDipti Borkar
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshopdatastack
 

What's hot (20)

Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and RestIBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computing
 
Directory Write Leases in MagFS
Directory Write Leases in MagFSDirectory Write Leases in MagFS
Directory Write Leases in MagFS
 
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An Introduction
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshop
 

Similar to Benchmarking Personal Cloud Storage

Storage side computing for data management in private and public clouds
Storage side computing for data management in private and public cloudsStorage side computing for data management in private and public clouds
Storage side computing for data management in private and public cloudsNoqaiya Ali
 
Overview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptxOverview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptxvishal choudhary
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - PresentationÉric Dusablon
 
Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...Project COLA
 
satya_cloud presentation
satya_cloud presentationsatya_cloud presentation
satya_cloud presentationSatyendra Jain
 
Cloud computing by NADEEM AHMED
Cloud computing by NADEEM AHMEDCloud computing by NADEEM AHMED
Cloud computing by NADEEM AHMEDNA000000
 
An Introduction to FileCatalyst
An Introduction to FileCatalystAn Introduction to FileCatalyst
An Introduction to FileCatalystFileCatalyst
 
Google cloud computing
Google cloud computingGoogle cloud computing
Google cloud computingBrian Pichman
 
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...Amazon Web Services
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureMihail Mateev
 
Hybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmapHybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmapJohn Georgiadis
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
 
Dimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingDimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingRifaHaryadi
 
Cloud computing and data security
Cloud computing and data securityCloud computing and data security
Cloud computing and data securityMohammed Fazuluddin
 
Questions and answers
Questions and answersQuestions and answers
Questions and answersFileCatalyst
 
Acceleration Technology: Taking Media File Transfers From Days to Minutes
Acceleration Technology: Taking Media File Transfers From Days to MinutesAcceleration Technology: Taking Media File Transfers From Days to Minutes
Acceleration Technology: Taking Media File Transfers From Days to MinutesFileCatalyst
 

Similar to Benchmarking Personal Cloud Storage (20)

Storage side computing for data management in private and public clouds
Storage side computing for data management in private and public cloudsStorage side computing for data management in private and public clouds
Storage side computing for data management in private and public clouds
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Overview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptxOverview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptx
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 
Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...
 
satya_cloud presentation
satya_cloud presentationsatya_cloud presentation
satya_cloud presentation
 
cloud ppt 1.pptx
cloud ppt 1.pptxcloud ppt 1.pptx
cloud ppt 1.pptx
 
Cloud computing by NADEEM AHMED
Cloud computing by NADEEM AHMEDCloud computing by NADEEM AHMED
Cloud computing by NADEEM AHMED
 
An Introduction to FileCatalyst
An Introduction to FileCatalystAn Introduction to FileCatalyst
An Introduction to FileCatalyst
 
Google cloud computing
Google cloud computingGoogle cloud computing
Google cloud computing
 
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft Azure
 
CloudStorage_M.A.Acar
CloudStorage_M.A.AcarCloudStorage_M.A.Acar
CloudStorage_M.A.Acar
 
Hybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmapHybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmap
 
Dropbox
DropboxDropbox
Dropbox
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Dimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingDimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution Offering
 
Cloud computing and data security
Cloud computing and data securityCloud computing and data security
Cloud computing and data security
 
Questions and answers
Questions and answersQuestions and answers
Questions and answers
 
Acceleration Technology: Taking Media File Transfers From Days to Minutes
Acceleration Technology: Taking Media File Transfers From Days to MinutesAcceleration Technology: Taking Media File Transfers From Days to Minutes
Acceleration Technology: Taking Media File Transfers From Days to Minutes
 

Recently uploaded

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 

Recently uploaded (20)

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 

Benchmarking Personal Cloud Storage

  • 1. Benchmarking Personal Cloud Storage Spyros Eleftheriadis – itp12401@hua.gr Harokopio University of Athens Department of Informatics and Telematics Postgraduate Program "Informatics and Telematics“ Modern Computer System Architectures Thursday 16 January 2014
  • 2. Introduction • Services like Dropbox, SkyDrive and Google Drive are becoming pervasive in people’s routine. • Such applications are data-intensive and their increasing usage already produces a significant share of Internet traffic. • Very little is known about how providers implement their services and the implications of different designs.
  • 3. Goals • How different providers tackle the problem of synchronizing people’s files? • Development of a methodology that helps to understand both system architecture and client capabilities. • What are the consequences of each design on performance?
  • 5. Testbed I • The testbed is composed of two parts • (i) a test computer that runs the application-under-test in the desired operating system • (ii) the testing application. • The complete environment can run either in a single machine, or in separate machines provided that the testing application can intercept traffic from the test computer.
  • 6. Testbed II • The actual testbed is a single Linux which both controls the experiments and hosts a virtual machine that runs the test computer (Windows 7 Enterprise). • The testing application receives as input benchmarking parameters describing the sequence of operations to be performed. The testing application acts remotely on the test computer, generating specific workloads in the form of file batches, which are manipulated using a FTP client. Files of different types are created or modified at run-time, e.g., text files composed of random words from a dictionary, images with random pixels, or random binary files. • Generated files are synchronized to the cloud by the application-undertest and the exchanged traffic is monitored to compute performance metrics. These include the amount of traffic seen during the experiments, the time before actual synchronization starts and the time to complete synchronization.
  • 7. Architecture & Data Centers To identify how the analyzed services operate, they observe the DNS names of contacted servers when: • (i) starting the application • (ii) immediately after files are manipulated • (iii) when the application is in idle state Contacted DNS name > IP address > Hybrid methodology > Datacenter location estimation with about a hundred of kilometers of precision
  • 8. Checking Capabilities Personal cloud storage applications can implement several capabilities to optimize storage usage and to speed up transfers. These capabilities include the adoption of • chunking (i.e., splitting content into a maximum size data unit), • bundling (i.e., the transmission of multiple small files as a single object), • deduplication (i.e., avoiding re-transmitting content already available on servers), • delta encoding (i.e., transmission of only modified portions of a file) and • compression. For each case, a specific test has been designed to observe if the given capability is implemented.
  • 9. Benchmarking Performance After knowing how the services are designed in terms of both data center locations and system capabilities, we check how such choices influence synchronization performance and the amount of overhead traffic. They designed 8 benchmarks varying: • number of files • file sizes • file types • All files in the sets are created at run-time by our testing application. • Each experiment is repeated 24 times per service, allowing at least 5 min between experiments to avoid creating abnormal workloads to servers. • The benchmark of a single storage service lasts for about 1 day.
  • 10. Tested Storage Services • Dropbox, Google Drive and SkyDrive are selected because they are among the most popular offers (according to the volume of search queries containing names of cloud storage services on Google Trends). • Wuala is considered because it is a system that offers encryption at the client-side. (They want to verify the impact of such privacy layer on synchronization performance). • Amazon Cloud Drive included to compare its performance to Dropbox, since both services rely on Amazon Web Services (AWS) data centers.
  • 12. Protocols I • All clients use HTTPS, except Dropbox notification protocol, which relies on plain HTTP. Interestingly, some Wuala storage operations also use HTTP, since users’ privacy has already been secured by local encryption. • All services but Wuala use separate servers for control and storage.
  • 13. Protocols II Firstly, the applications authenticate the user and check if any content has to be updated. • SkyDrive requires about 150 kB in total, 4 times more than others. This happens because the application contacts many Microsoft Live servers during login (13 in this example). Secondly, once login is completed, the applications keep exchanging data with the cloud. • Wuala is polling servers every 5 min on average (equivalent background traffic of about 60 b/s). • Google Drive follows close, with a lightweight 40 s polling interval (42 b/s). • Dropbox and SkyDrive use intervals close to 1 min (82 b/s and 32 b/s, respectively). • Amazon Cloud Drive is polling servers every 15 s, each time opening a new HTTPS connection. This notification strategy consumes 6 kb/s – i.e., about 65 MB per day. This information is relevant to users with bandwidth constraints (e.g., in 3G/4G networks) and to the system: 1 million users would generate approximately 6 Gb/s of signaling traffic alone! As the results for other providers demonstrate, such design is not optimal and seems indeed possible to be improved.
  • 15. Data Centers • Dropbox uses own servers (in the San Jose area) for client management, while storage servers are committed to Amazon in Northern Virginia. • Cloud Drive uses three AWS data centers: two are used for both storage and control (in Ireland and Northern Virginia); a third one is used for storage only (in Oregon). • SkyDrive relies on Microsoft’s data centers in the Seattle area (for storage) and Southern Virginia (for storage and control). We also identified a destination in Singapore (for control only). • Wuala data centers are located in Europe: two in the Nuremberg area, one in Zurich and a fourth in Northern France. None is owned by Wuala. • Google Drive follows a different approach: TCP connections are terminated at the closest Google’s edge node, from where the traffic is routed to the actual storage/control data center using the private Google’s network.
  • 18. Chunking • Only Amazon Cloud Drive does not perform chunking • Google Drive uses 8 MB chunks • Dropbox uses 4 MB chunks • SkyDrive and Wuala use variable chunk sizes
  • 19. Bundling • Only Dropbox implements a file-bundling strategy • Google Drive and Amazon Cloud Drive open one separate TCP (and SSL) connection for each file. • SkyDrive and Wuala submit files sequentially, waiting for application layer acknowledgments between each upload.
  • 20. Client-side Deduplication • Only Dropbox and Wuala implement deduplication. • All other services have to upload the same data even if it is readily available at the storage server. Interestingly, Dropbox and Wuala can identify copies of users’ files even after they are deleted and later restored. • In the case of Wuala, deduplication is compatible with local encryption, i.e., two identical files generate two identical encrypted versions.
  • 21. Delta Encoding • Delta encoding is a specialized compression technique that calculates file differences among two copies, allowing the transmission of only the modifications between revisions. • Only Dropbox fully implements delta encoding • Wuala does not implement delta encoding. However, deduplication prevents the client from uploading those chunks not affected by the change.
  • 23. Compression • Compression reduce traffic and storage requirements at the expense of processing time. • Dropbox and Google Drive compress data before transmission. • Google Drive implements smart policies to verify the file format.
  • 25. Synchronization Startup • Dropbox is the fastest service to start synchronizing single files. Its bundling strategy, however, slightly delays start up with multiple files. As we will show next, such strategy pays back in total upload time. • Wuala also increases its startup time when multiple files are submitted. • SkyDrive is by far the slowest.
  • 26. Completion Time • Google Drive (26,49 Mb/s) and Wuala (33,34 Mb/s) are the fastest, since each TCP connection is terminated at data centers nearby our testbed. • Dropbox and SkyDrive, on the other hand, are the most impacted services.
  • 27. Protocol Overhead • Protocol overhead is the total storage and control traffic over the benchmark size. • Amazom Cloud Drive presents a very high overhead because of its high number of control flows opened for every file transfer. • Dropbox exhibits the highest overhead among the remaining services, possibly owing to the signaling cost of implementing its advanced capabilities.
  • 28. Conclusions • Dropbox implements most of the checked capabilities, and its sophisticated client clearly boosts performance. • Wuala deploys client side encryption, and this feature does not seem to affect Wuala synchronization performance. • These 4 examples confirm the role played by data center placement in a centralized approach taking the perspective of European users only, network latency is still an important limitation for U.S. centric services, such as Dropbox and SkyDrive. Services deploying data centers nearby our test location, such as Wuala and Google Drive.