S3
Intro, tips and filling it up with data quickly
AWS User Group Belgium #2 - 2013/11/06

@fdenkens
frederik@skyscrape.rs...
The Skyscrapers ...
● help companies figure out cloud
● design and build platforms in the cloud
● take care of the complet...
S3 in a nutshell
What is S3
●
●
●
●

object storage architecture
abstraction from storage (black) magic
operates at object-level (aka ‘file...
Benefits
●
●
●
●
●
●

Scalability
High-availability
Low cost
99,999999999% durability
Secure
Low latency/high-speed
Some advantages for webapps
●
●
●
●
●

Easier to scale apps
Cleaner apps
Better ‘mobility’ of your app
Simpler hosting pla...
Use cases
●
●
●
●
●
●
●

Asset storage and CDN
Data storage
Static site
Backups
Mobile storage backend
File distribution
....
Buckets
●
●
●
●
●

Collection of objects
Globally unique id
a-z A-Z 0-9 . Max 100 buckets/user
No limit on number of objec...
Buckets
Best practices on naming
● DNS compatible
● FQDN
○ Allows for vhost
○ watch out for SSL: no dots :-(
Objects
●
●
●
●

Blob
Don’t care about file formats
Metadata can be added (like mimetype)
Maximum 5 TB/object
Keys
● = Name
● max 1024 chars
● UTF8
Accessing your data
● ARN
○ arn:aws:s3:::bucketname
○ arn:aws:s3:::bucketname/objectpath

● HTTP
○ http://s3.amazonaws.com...
Performance tip

s3.amazonaws.com > s3-eu-west-1.amazonaws.com
Getting data on S3
Getting data on S3
●
●
●
●
●

Through AWS services
Tools
Libraries
Filesystem mapping
Direct from client (pre-signed URL’s...
CMD line tools
●
●
●
●

s3cmd
s3-multipart
s3funnel
...
Getting data on S3: what matters?
Location
● bandwidth
● latency
Parallelization
● Multiple upload-threads
● Multipart
Limit SSL
● Negotiation overhead
● Encryption overhead on smaller instances
Instance type: I/O matters
IO class

Theoretical speed

Low

100 Mbit (?)

Moderate

250 - 500 Mbit

High

1 Gbit

Very Hi...
Other things
● Network stack optimisations
● Tool/method of upload
Performance tests
Parameters
●
●
●
●
●
●
●
●

1 GB blob
25 MB parts
single region
various IO classes
1, 10, 40, 50 threads
only upload
s3-mu...
Demo time
Some numbers
threads

moderate IO

high IO

very high IO

avg

max

avg

max

avg

max

1

18

23

21

20

19

19

10

90
...
Conclusions (1)
●
●
●
●

Optimisation is certainly possible
Single stream max 150 Mbit/20 MB/s
Newer generations are faste...
Conclusions (2)
●
●
●
●

Instance IO classes = relative concept
50 threads seem sweet-spot
Part size seemed not that impor...
Some excuses disclaimers
●
●
●
●

Not scientific
No tuning at all
Bottlenecks
Library/app used not optimal
Questions?
Thank you.
http://skyscrape.rs
@skyscrapers
Upcoming SlideShare
Loading in …5
×

S3 intro, tips and filling it up with data aws ug be #2

910 views

Published on

Read the blogpost: http://skyscrape.rs/2013/11/15/awsugbe-2-aws-use-cases-and-s3-best-practicesupload-performance/

At the second AWS User Group Belguim, I presented “S3 Intro, tips and filling it up with data quickly”. The first half focused on a general introduction to S3 on how to use it. The second section focused on how to get your data onto S3 as quickly as possible using standard tools.

After some theory on best practices, we progressed to do some tests and formulate conclusions. The tests started at around 18 megabytes per second of data transferred from an EC2 ramdisk to S3. However, through some simple optimisations we got up to 248 megabytes per second using just standard command line tools.

The two main benefactors to this dramatic performance increase were:

- instance type and related IO performance class
- the use of multiple upload threads.

Theoretically a Very High I/O instance should go up to 10 Gbit, or about 1,1 gigabytes per second. Some people (http://improve.dk/pushing-the-limits-of-amazon-s3-upload-performance/) on the internet claim to have gotten up to such speeds. Alex shed some light on how we might be able to reach that goal by taking into consideration how S3 indexing and partitioning (these two might help: http://www.slideshare.net/AmazonWebServices/building-scalable-applications-on-amazon-s3-stg303
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) works. Unfortunately I haven't had the time to test that out yet. Any takers? :-)

Published in: Technology, Business
1 Comment
0 Likes
Statistics
Notes
  • The            setup            in            the            video            no            longer            works.           
    And            all            other            links            in            comment            are            fake            too.           
    But            luckily,            we            found            a            working            one            here (copy paste link in browser) :            www.goo.gl/yT1SNP
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
910
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
20
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

S3 intro, tips and filling it up with data aws ug be #2

  1. 1. S3 Intro, tips and filling it up with data quickly AWS User Group Belgium #2 - 2013/11/06 @fdenkens frederik@skyscrape.rs http://skyscrape.rs
  2. 2. The Skyscrapers ... ● help companies figure out cloud ● design and build platforms in the cloud ● take care of the complete lifecycle, so you can focus on your business
  3. 3. S3 in a nutshell
  4. 4. What is S3 ● ● ● ● object storage architecture abstraction from storage (black) magic operates at object-level (aka ‘file’, blob, ...) Simple API
  5. 5. Benefits ● ● ● ● ● ● Scalability High-availability Low cost 99,999999999% durability Secure Low latency/high-speed
  6. 6. Some advantages for webapps ● ● ● ● ● Easier to scale apps Cleaner apps Better ‘mobility’ of your app Simpler hosting platforms No storage worries
  7. 7. Use cases ● ● ● ● ● ● ● Asset storage and CDN Data storage Static site Backups Mobile storage backend File distribution ...
  8. 8. Buckets ● ● ● ● ● Collection of objects Globally unique id a-z A-Z 0-9 . Max 100 buckets/user No limit on number of objects
  9. 9. Buckets Best practices on naming ● DNS compatible ● FQDN ○ Allows for vhost ○ watch out for SSL: no dots :-(
  10. 10. Objects ● ● ● ● Blob Don’t care about file formats Metadata can be added (like mimetype) Maximum 5 TB/object
  11. 11. Keys ● = Name ● max 1024 chars ● UTF8
  12. 12. Accessing your data ● ARN ○ arn:aws:s3:::bucketname ○ arn:aws:s3:::bucketname/objectpath ● HTTP ○ http://s3.amazonaws.com/bucket/key ○ http://bucket.s3.amazonaws.com/key ○ http://bucket/key (vhost style)
  13. 13. Performance tip s3.amazonaws.com > s3-eu-west-1.amazonaws.com
  14. 14. Getting data on S3
  15. 15. Getting data on S3 ● ● ● ● ● Through AWS services Tools Libraries Filesystem mapping Direct from client (pre-signed URL’s)
  16. 16. CMD line tools ● ● ● ● s3cmd s3-multipart s3funnel ...
  17. 17. Getting data on S3: what matters?
  18. 18. Location ● bandwidth ● latency
  19. 19. Parallelization ● Multiple upload-threads ● Multipart
  20. 20. Limit SSL ● Negotiation overhead ● Encryption overhead on smaller instances
  21. 21. Instance type: I/O matters IO class Theoretical speed Low 100 Mbit (?) Moderate 250 - 500 Mbit High 1 Gbit Very High 10 Gbit
  22. 22. Other things ● Network stack optimisations ● Tool/method of upload
  23. 23. Performance tests
  24. 24. Parameters ● ● ● ● ● ● ● ● 1 GB blob 25 MB parts single region various IO classes 1, 10, 40, 50 threads only upload s3-multipart tool standard OS install
  25. 25. Demo time
  26. 26. Some numbers threads moderate IO high IO very high IO avg max avg max avg max 1 18 23 21 20 19 19 10 90 112 100 118 153 164 40 86 114 114 119 248 248 50 86 117 119 122 207 242 (Megabytes per second)
  27. 27. Conclusions (1) ● ● ● ● Optimisation is certainly possible Single stream max 150 Mbit/20 MB/s Newer generations are faster, slightly Couldn’t get to 10 Gbit
  28. 28. Conclusions (2) ● ● ● ● Instance IO classes = relative concept 50 threads seem sweet-spot Part size seemed not that important Do error control on multi-part
  29. 29. Some excuses disclaimers ● ● ● ● Not scientific No tuning at all Bottlenecks Library/app used not optimal
  30. 30. Questions? Thank you. http://skyscrape.rs @skyscrapers

×