SlideShare a Scribd company logo
Benchmarking Personal Cloud
Storage

Spyros Eleftheriadis – itp12401@hua.gr
Harokopio University of Athens
Department of Informatics and Telematics
Postgraduate Program "Informatics and Telematics“
Modern Computer System Architectures

Thursday 16 January 2014
Introduction
• Services like Dropbox, SkyDrive and Google
Drive are becoming pervasive in people’s
routine.
• Such applications are data-intensive and their
increasing usage already produces a significant
share of Internet traffic.
• Very little is known about how providers
implement their services and the implications of
different designs.
Goals
• How different providers tackle the problem of
synchronizing people’s files?
• Development of a methodology that helps to
understand both system architecture and client
capabilities.

• What are the consequences of each design on
performance?
METHODOLOGY AND
SERVICES
Testbed I
• The testbed is composed of two parts
• (i) a test computer that runs the application-under-test in the desired
operating system
• (ii) the testing application.
• The complete environment can run either in a single machine, or in separate
machines provided that the testing application can intercept traffic from the
test computer.
Testbed II
• The actual testbed is a single Linux which both controls the experiments
and hosts a virtual machine that runs the test computer (Windows 7
Enterprise).
• The testing application receives as input benchmarking parameters
describing the sequence of operations to be performed. The testing
application acts remotely on the test computer, generating specific
workloads in the form of file batches, which are manipulated using a FTP
client. Files of different types are created or modified at run-time, e.g.,
text files composed of random words from a dictionary, images with
random pixels, or random binary files.
• Generated files are synchronized to the cloud by the application-undertest and the exchanged traffic is monitored to compute performance
metrics. These include the amount of traffic seen during the experiments,
the time before actual synchronization starts and the time to complete
synchronization.
Architecture & Data Centers
To identify how the analyzed services operate, they observe the
DNS names of contacted servers when:

• (i) starting the application
• (ii) immediately after files are manipulated
• (iii) when the application is in idle state
Contacted DNS name > IP address > Hybrid methodology >
Datacenter location estimation with about a hundred
of kilometers of precision
Checking Capabilities
Personal cloud storage applications can implement several capabilities
to optimize storage usage and to speed up transfers.
These capabilities include the adoption of
• chunking (i.e., splitting content into a maximum size data unit),
• bundling (i.e., the transmission of multiple small files as a single
object),
• deduplication (i.e., avoiding re-transmitting content already available
on servers),
• delta encoding (i.e., transmission of only modified portions of a file)
and
• compression.
For each case, a specific test has been designed to observe if the given
capability is implemented.
Benchmarking Performance
After knowing how the services are designed in terms of both data center
locations and system capabilities, we check how such choices influence
synchronization performance and the amount of overhead traffic.
They designed 8 benchmarks varying:
• number of files
• file sizes
• file types
• All files in the sets are created at run-time by our testing application.
• Each experiment is repeated 24 times per service, allowing at least 5 min
between experiments to avoid creating abnormal workloads to servers.
• The benchmark of a single storage service lasts for about 1 day.
Tested Storage Services
• Dropbox, Google Drive and SkyDrive are selected
because they are among the most popular offers
(according to the volume of search queries containing
names of cloud storage services on Google Trends).
• Wuala is considered because it is a system that offers
encryption at the client-side. (They want to verify the
impact of such privacy layer on synchronization
performance).
• Amazon Cloud Drive included to compare its
performance to Dropbox, since both services rely on
Amazon Web Services (AWS) data centers.
SYSTEM ARCHITECTURE
Protocols I
• All clients use HTTPS, except Dropbox
notification protocol, which relies on plain
HTTP. Interestingly, some Wuala storage
operations also use HTTP, since users’ privacy
has already been secured by local encryption.
• All services but Wuala use separate servers for
control and storage.
Protocols II
Firstly, the applications authenticate the user and check if any content has to be
updated.
• SkyDrive requires about 150 kB in total, 4 times more than others. This happens
because the application contacts many Microsoft Live servers during login (13 in this
example).
Secondly, once login is completed, the applications keep exchanging data with the
cloud.
• Wuala is polling servers every 5 min on average (equivalent background traffic of about
60 b/s).
• Google Drive follows close, with a lightweight 40 s polling interval (42 b/s).
• Dropbox and SkyDrive use intervals close to 1 min (82 b/s and 32 b/s, respectively).
• Amazon Cloud Drive is polling servers every 15 s, each time opening a new HTTPS
connection. This notification strategy consumes 6 kb/s – i.e., about 65 MB per day. This
information is relevant to users with bandwidth constraints (e.g., in 3G/4G networks)
and to the system: 1 million users would generate approximately 6 Gb/s of
signaling traffic alone! As the results for other providers demonstrate,
such design is not optimal and seems indeed
possible to be improved.
Protocols III
Data Centers
• Dropbox uses own servers (in the San Jose area) for client management,
while storage servers are committed to Amazon in Northern Virginia.
• Cloud Drive uses three AWS data centers: two are used for both storage
and control (in Ireland and Northern Virginia); a third one is used for storage
only (in Oregon).
• SkyDrive relies on Microsoft’s data centers in the Seattle area (for storage)
and Southern Virginia (for storage and control). We also identified a
destination in Singapore (for control only).
• Wuala data centers are located in Europe: two in the Nuremberg area, one
in Zurich and a fourth in Northern France. None is owned by Wuala.
• Google Drive follows a different approach: TCP connections are terminated
at the closest Google’s edge node, from where the traffic is routed to the
actual storage/control data center using the private Google’s network.
Google Drive Edge Nodes
CLOUD SERVICE
CLIENT CAPABILITIES
Chunking
• Only Amazon Cloud Drive does not perform chunking

• Google Drive uses 8 MB chunks
• Dropbox uses 4 MB chunks
• SkyDrive and Wuala use variable chunk sizes
Bundling
• Only Dropbox implements a file-bundling strategy

• Google Drive and Amazon Cloud Drive open one
separate TCP (and SSL) connection for each file.
• SkyDrive and Wuala submit files sequentially, waiting
for application layer acknowledgments between each
upload.
Client-side Deduplication
• Only Dropbox and Wuala implement deduplication.
• All other services have to upload the same data
even if it is readily available at the storage server.
Interestingly, Dropbox and Wuala can identify
copies of users’ files even after they are deleted
and later restored.
• In the case of Wuala, deduplication is compatible
with local encryption, i.e., two identical files
generate two identical encrypted versions.
Delta Encoding
• Delta encoding is a specialized compression
technique that calculates file differences among
two copies, allowing the transmission of only the
modifications between revisions.

• Only Dropbox fully implements delta encoding
• Wuala does not implement delta encoding.
However, deduplication prevents the client from
uploading those chunks not affected by the
change.
Delta encoding
Compression
• Compression reduce traffic and storage
requirements at the expense of processing time.
• Dropbox and Google Drive compress data before
transmission.
• Google Drive implements smart policies to verify
the file format.
PERFORMANCE
Synchronization Startup
• Dropbox is the fastest service to start
synchronizing single files. Its bundling strategy,
however, slightly delays start up with multiple files.
As we will show next, such strategy pays back in
total upload time.
• Wuala also increases its startup time when
multiple files are submitted.
• SkyDrive is by far
the slowest.
Completion Time
• Google Drive (26,49 Mb/s) and Wuala (33,34 Mb/s)
are the fastest, since each TCP connection is
terminated at data centers nearby our testbed.
• Dropbox and SkyDrive, on the other hand, are the
most impacted services.
Protocol Overhead
• Protocol overhead is the total storage and control
traffic over the benchmark size.

• Amazom Cloud Drive presents a very high
overhead because of its high number of control
flows opened for every file transfer.
• Dropbox exhibits the highest
overhead among the
remaining services, possibly
owing to the signaling cost
of implementing its
advanced capabilities.
Conclusions
• Dropbox implements most of the checked capabilities, and its
sophisticated client clearly boosts performance.

• Wuala deploys client side encryption, and this feature does
not seem to affect Wuala synchronization performance.
• These 4 examples confirm the role played by data center
placement in a centralized approach taking the perspective
of European users only, network latency is still an important
limitation for U.S. centric services, such as Dropbox and
SkyDrive. Services deploying data centers nearby our test
location, such as Wuala and Google Drive.
Thank you!

More Related Content

What's hot

Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Doug O'Flaherty
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
Gregory Pence
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
Konstantin V. Shvachko
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
Avere Systems
 
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and RestIBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
Sandeep Patil
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computing
Sachin Gowda
 
Directory Write Leases in MagFS
Directory Write Leases in MagFSDirectory Write Leases in MagFS
Directory Write Leases in MagFSMaginatics
 
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An Introduction
IDERA Software
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
Vinoth Chandar
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
Schubert Zhang
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
DataWorks Summit
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbaseDipti Borkar
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
Yoni Farin
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshop
datastack
 

What's hot (20)

Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and RestIBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computing
 
Directory Write Leases in MagFS
Directory Write Leases in MagFSDirectory Write Leases in MagFS
Directory Write Leases in MagFS
 
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An Introduction
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshop
 

Similar to Benchmarking Personal Cloud Storage

Storage side computing for data management in private and public clouds
Storage side computing for data management in private and public cloudsStorage side computing for data management in private and public clouds
Storage side computing for data management in private and public clouds
Noqaiya Ali
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
RahulBhole12
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud Computing
Rutuja751147
 
Overview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptxOverview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptx
vishal choudhary
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
Éric Dusablon
 
Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA
 
satya_cloud presentation
satya_cloud presentationsatya_cloud presentation
satya_cloud presentation
Satyendra Jain
 
cloud ppt 1.pptx
cloud ppt 1.pptxcloud ppt 1.pptx
cloud ppt 1.pptx
dineshkumar837456146
 
Cloud computing by NADEEM AHMED
Cloud computing by NADEEM AHMEDCloud computing by NADEEM AHMED
Cloud computing by NADEEM AHMED
NA000000
 
An Introduction to FileCatalyst
An Introduction to FileCatalystAn Introduction to FileCatalyst
An Introduction to FileCatalyst
FileCatalyst
 
Google cloud computing
Google cloud computingGoogle cloud computing
Google cloud computing
Brian Pichman
 
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
Amazon Web Services
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft Azure
Mihail Mateev
 
Hybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmapHybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmap
John Georgiadis
 
Dropbox
DropboxDropbox
Dropbox
nikul patel
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloudian
 
Dimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingDimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution Offering
RifaHaryadi
 
Cloud computing and data security
Cloud computing and data securityCloud computing and data security
Cloud computing and data security
Mohammed Fazuluddin
 

Similar to Benchmarking Personal Cloud Storage (20)

Storage side computing for data management in private and public clouds
Storage side computing for data management in private and public cloudsStorage side computing for data management in private and public clouds
Storage side computing for data management in private and public clouds
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud Computing
 
Overview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptxOverview of Cloud Computing New.pptx
Overview of Cloud Computing New.pptx
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 
Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...
 
satya_cloud presentation
satya_cloud presentationsatya_cloud presentation
satya_cloud presentation
 
cloud ppt 1.pptx
cloud ppt 1.pptxcloud ppt 1.pptx
cloud ppt 1.pptx
 
Cloud computing by NADEEM AHMED
Cloud computing by NADEEM AHMEDCloud computing by NADEEM AHMED
Cloud computing by NADEEM AHMED
 
An Introduction to FileCatalyst
An Introduction to FileCatalystAn Introduction to FileCatalyst
An Introduction to FileCatalyst
 
Google cloud computing
Google cloud computingGoogle cloud computing
Google cloud computing
 
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft Azure
 
CloudStorage_M.A.Acar
CloudStorage_M.A.AcarCloudStorage_M.A.Acar
CloudStorage_M.A.Acar
 
Hybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmapHybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmap
 
Dropbox
DropboxDropbox
Dropbox
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Dimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingDimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution Offering
 
Cloud computing and data security
Cloud computing and data securityCloud computing and data security
Cloud computing and data security
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Benchmarking Personal Cloud Storage

  • 1. Benchmarking Personal Cloud Storage Spyros Eleftheriadis – itp12401@hua.gr Harokopio University of Athens Department of Informatics and Telematics Postgraduate Program "Informatics and Telematics“ Modern Computer System Architectures Thursday 16 January 2014
  • 2. Introduction • Services like Dropbox, SkyDrive and Google Drive are becoming pervasive in people’s routine. • Such applications are data-intensive and their increasing usage already produces a significant share of Internet traffic. • Very little is known about how providers implement their services and the implications of different designs.
  • 3. Goals • How different providers tackle the problem of synchronizing people’s files? • Development of a methodology that helps to understand both system architecture and client capabilities. • What are the consequences of each design on performance?
  • 5. Testbed I • The testbed is composed of two parts • (i) a test computer that runs the application-under-test in the desired operating system • (ii) the testing application. • The complete environment can run either in a single machine, or in separate machines provided that the testing application can intercept traffic from the test computer.
  • 6. Testbed II • The actual testbed is a single Linux which both controls the experiments and hosts a virtual machine that runs the test computer (Windows 7 Enterprise). • The testing application receives as input benchmarking parameters describing the sequence of operations to be performed. The testing application acts remotely on the test computer, generating specific workloads in the form of file batches, which are manipulated using a FTP client. Files of different types are created or modified at run-time, e.g., text files composed of random words from a dictionary, images with random pixels, or random binary files. • Generated files are synchronized to the cloud by the application-undertest and the exchanged traffic is monitored to compute performance metrics. These include the amount of traffic seen during the experiments, the time before actual synchronization starts and the time to complete synchronization.
  • 7. Architecture & Data Centers To identify how the analyzed services operate, they observe the DNS names of contacted servers when: • (i) starting the application • (ii) immediately after files are manipulated • (iii) when the application is in idle state Contacted DNS name > IP address > Hybrid methodology > Datacenter location estimation with about a hundred of kilometers of precision
  • 8. Checking Capabilities Personal cloud storage applications can implement several capabilities to optimize storage usage and to speed up transfers. These capabilities include the adoption of • chunking (i.e., splitting content into a maximum size data unit), • bundling (i.e., the transmission of multiple small files as a single object), • deduplication (i.e., avoiding re-transmitting content already available on servers), • delta encoding (i.e., transmission of only modified portions of a file) and • compression. For each case, a specific test has been designed to observe if the given capability is implemented.
  • 9. Benchmarking Performance After knowing how the services are designed in terms of both data center locations and system capabilities, we check how such choices influence synchronization performance and the amount of overhead traffic. They designed 8 benchmarks varying: • number of files • file sizes • file types • All files in the sets are created at run-time by our testing application. • Each experiment is repeated 24 times per service, allowing at least 5 min between experiments to avoid creating abnormal workloads to servers. • The benchmark of a single storage service lasts for about 1 day.
  • 10. Tested Storage Services • Dropbox, Google Drive and SkyDrive are selected because they are among the most popular offers (according to the volume of search queries containing names of cloud storage services on Google Trends). • Wuala is considered because it is a system that offers encryption at the client-side. (They want to verify the impact of such privacy layer on synchronization performance). • Amazon Cloud Drive included to compare its performance to Dropbox, since both services rely on Amazon Web Services (AWS) data centers.
  • 12. Protocols I • All clients use HTTPS, except Dropbox notification protocol, which relies on plain HTTP. Interestingly, some Wuala storage operations also use HTTP, since users’ privacy has already been secured by local encryption. • All services but Wuala use separate servers for control and storage.
  • 13. Protocols II Firstly, the applications authenticate the user and check if any content has to be updated. • SkyDrive requires about 150 kB in total, 4 times more than others. This happens because the application contacts many Microsoft Live servers during login (13 in this example). Secondly, once login is completed, the applications keep exchanging data with the cloud. • Wuala is polling servers every 5 min on average (equivalent background traffic of about 60 b/s). • Google Drive follows close, with a lightweight 40 s polling interval (42 b/s). • Dropbox and SkyDrive use intervals close to 1 min (82 b/s and 32 b/s, respectively). • Amazon Cloud Drive is polling servers every 15 s, each time opening a new HTTPS connection. This notification strategy consumes 6 kb/s – i.e., about 65 MB per day. This information is relevant to users with bandwidth constraints (e.g., in 3G/4G networks) and to the system: 1 million users would generate approximately 6 Gb/s of signaling traffic alone! As the results for other providers demonstrate, such design is not optimal and seems indeed possible to be improved.
  • 15. Data Centers • Dropbox uses own servers (in the San Jose area) for client management, while storage servers are committed to Amazon in Northern Virginia. • Cloud Drive uses three AWS data centers: two are used for both storage and control (in Ireland and Northern Virginia); a third one is used for storage only (in Oregon). • SkyDrive relies on Microsoft’s data centers in the Seattle area (for storage) and Southern Virginia (for storage and control). We also identified a destination in Singapore (for control only). • Wuala data centers are located in Europe: two in the Nuremberg area, one in Zurich and a fourth in Northern France. None is owned by Wuala. • Google Drive follows a different approach: TCP connections are terminated at the closest Google’s edge node, from where the traffic is routed to the actual storage/control data center using the private Google’s network.
  • 18. Chunking • Only Amazon Cloud Drive does not perform chunking • Google Drive uses 8 MB chunks • Dropbox uses 4 MB chunks • SkyDrive and Wuala use variable chunk sizes
  • 19. Bundling • Only Dropbox implements a file-bundling strategy • Google Drive and Amazon Cloud Drive open one separate TCP (and SSL) connection for each file. • SkyDrive and Wuala submit files sequentially, waiting for application layer acknowledgments between each upload.
  • 20. Client-side Deduplication • Only Dropbox and Wuala implement deduplication. • All other services have to upload the same data even if it is readily available at the storage server. Interestingly, Dropbox and Wuala can identify copies of users’ files even after they are deleted and later restored. • In the case of Wuala, deduplication is compatible with local encryption, i.e., two identical files generate two identical encrypted versions.
  • 21. Delta Encoding • Delta encoding is a specialized compression technique that calculates file differences among two copies, allowing the transmission of only the modifications between revisions. • Only Dropbox fully implements delta encoding • Wuala does not implement delta encoding. However, deduplication prevents the client from uploading those chunks not affected by the change.
  • 23. Compression • Compression reduce traffic and storage requirements at the expense of processing time. • Dropbox and Google Drive compress data before transmission. • Google Drive implements smart policies to verify the file format.
  • 25. Synchronization Startup • Dropbox is the fastest service to start synchronizing single files. Its bundling strategy, however, slightly delays start up with multiple files. As we will show next, such strategy pays back in total upload time. • Wuala also increases its startup time when multiple files are submitted. • SkyDrive is by far the slowest.
  • 26. Completion Time • Google Drive (26,49 Mb/s) and Wuala (33,34 Mb/s) are the fastest, since each TCP connection is terminated at data centers nearby our testbed. • Dropbox and SkyDrive, on the other hand, are the most impacted services.
  • 27. Protocol Overhead • Protocol overhead is the total storage and control traffic over the benchmark size. • Amazom Cloud Drive presents a very high overhead because of its high number of control flows opened for every file transfer. • Dropbox exhibits the highest overhead among the remaining services, possibly owing to the signaling cost of implementing its advanced capabilities.
  • 28. Conclusions • Dropbox implements most of the checked capabilities, and its sophisticated client clearly boosts performance. • Wuala deploys client side encryption, and this feature does not seem to affect Wuala synchronization performance. • These 4 examples confirm the role played by data center placement in a centralized approach taking the perspective of European users only, network latency is still an important limitation for U.S. centric services, such as Dropbox and SkyDrive. Services deploying data centers nearby our test location, such as Wuala and Google Drive.