Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Intro to Joyent's Manta Object Storage Service


Published on

Published in: Technology

Intro to Joyent's Manta Object Storage Service

  1. 1. Introduction to Manta Rod Boothby VP 415-819-9253 August 12, 2013
  2. 2. Object Stores are the Future 2 $14,639 $12,597 $14,193 $13,228 $15,305 $11,812 $10,868 $10,432 $9,924 $13,147 $15,700 $15,200 10 14 18 29 40 82 102 262 449 556 762 905 1,000 1,300 2,000 0 500 1000 1500 2000 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13 IDC Wordwide Server Sales in $ Millions Vs Billions of Objects in AWS S3 The Number of Objects in Amazon S3 is Growing Fast Server Sales are basically flat
  3. 3. Manta is Joyent’s new Object Storage Service 3 Joyent Object Store Manta Put Data into Manta Get Data from Manta Via a RESTful API An object is non-interpreted data of any size that you read and write to the store.
  4. 4. Manta is Live and Available Today 4
  5. 5. A file is an example of an object • The code below does the following: 1. Creates a file called hello.txt that contains the words “Hello Manta” 2. Puts the file into Manta 3. Gets the file back from Manta and outputs it’s contents 5 $ echo "Hello, Manta" > /tmp/hello.txt $ mput -f /tmp/hello.txt /$MANTA_USER/stor/hello-foo /$MANTA_USER/stor/hello-foo [====================>] 100% 13B $ mget /$MANTA_USER/stor/hello-foo Hello, Manta
  6. 6. Manta Partners support File Interfaces 6 Joyent Object Store Manta Partners offer NAS File Interfaces that run in existing data centers but back up to the Manta Object Store Panzura solution is available today. The other solutions are due to be available by end of Q4, 2013.
  7. 7. Manta adds Big Data to Object Storage 7 Joyent Object Store Manta Only 1 Step - Analyze or Process Data using Manta Jobs Send in the Big Data Job Manta acts like a Platform as a Service (PaaS) for Big Data Analytics Manta is the only Object Storage System that brings Compute directly to the Data.
  8. 8. Big Data is easy on Manta vs complex on AWS 8 1 - Download Data 3 - Upload Data Again Cloud Object Store S3 2 - Analyze or Process Data Netflix has open-sourced their Genie Management Tools for Running Hadoop Jobs with S3. To Analyze Data in S3, the Netflix system requires coordinating 9 pieces of Software: Hadoop, Hive, Pig, Karyon, Servo, Ribbon, Archaius, Eureka, and Genie Big Data analytics on AWS/S3 requires 3 complex steps vs 1 simple step on Manta.
  9. 9. S3 + EC2 also requires new Sysadmins 9 Admins are needed because “Genie is not an end-to- end resource management tool - it doesn’t provision or launch clusters, and neither does it scale clusters up and down based on their utilization” End-users are the data-scientists who want to analyze or process data stored in S3
  10. 10. 4 Big Data Made Simple • Single store of record for your data • Do analysis without the learning curve of server administration • Do big data analysis in any language “There is no learning curve to run Manta for us, since it runs on Unix.” Konstantin Gredeskoul, CTO
  11. 11. Manta delivers Value • Requests • Delete! Free • POST, PUT, LIST (“GET DIR”)! $0.005/1000 requests • GET, OPTION, HEAD! $0.004/10000 requests • Bandwidth • All bandwidth in $0.000 (free) • Bandwidth out after 1st TB $0.120 /GB to $0.050 / GB 11 Storage Tier Per Individual Copy Per 2 Copies (default) First 1 TB/month $0.043 per GB $0.086 per GB Next 49 TB/month $0.036 per GB $0.072 per GB Next 450 TB/month $0.032 per GB $0.064 per GB Next 500 TB/month $0.029 per GB $0.058 per GB Next 4000 TB/month $0.027 per GB $0.054 per GB Next 5000 TB/month $0.025 per GB $0.050 per GB Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). • Storage • Compute • $0.00004/GB DRAM•sec • If you run 1000 parallel tasks on 1000 objects and they each take a second, then you've used 1000 seconds of time and the cost for this job would be $0.04.
  12. 12. Technical Appendix
  13. 13. Accessing Manta is Easy • Manta REST API • Manta CLI & Shell • Manta Node.js SDK • Manta Python SDK • Manta Ruby SDK • Manta Java SDK 13
  14. 14. Technical Description of Manta • Multi-datacenter Object Store • Granular datacenter and copy policies • No size limits • In-kernel (clustered ZFS DMU) • More akin to a MetroCluster Netapp • S3: JVM on ext3 on Linux • Strongly consistent and transactional data semantics • Close to UNIX file-system semantics 14
  15. 15. Analytics Capability: Codename Marlin • A facility for running compute jobs directly on Manta storage nodes • Complete EC2-like batch compute environment • A framework for distributing work to the right physical servers, tracking which pieces are complete, capturing the output, and repeating the whole process to facilitate multi-phase computation on objects at rest • Complete unix environment without any ETL • A non-interactive unix shell environment for doing "work" on Manta objects as local files 15
  16. 16. Why Marlin is Revolutionary Customers are able to do queries, create datapipes, do transformations and map reduce on objects very quickly and without data movement and without the additional costs of spinning up instances 16
  17. 17. Big Data Use Case Examples - Part 1 • Log processing • Clickstream analysis, map reduce on logs • Image processing • converting formats, generating thumbnails • Video processing • transcoding, extracting segments, resizing • “Hardcore" data analysis • NumPy, SciPy, R, machine learning, data mining 17
  18. 18. Big Data Use Case Examples - Part 2 • SQL-like queries over structured data • Similar to what Hive provides for Hadoop • Datapipeling • MySQL, Postgres plus other clients • Text processing • e-discovery and internal search engines • Backup and Disaster recovery • Encrypt and verify integrity without moving/downloading the data 18
  19. 19. Key Security & Sharing Example • With rich access controls in Manta, it is possible to run compute on other users' data that's been made available to you • Without actually having access to it • Without having to ship it • Without being able to egress the dataset itself 19
  20. 20. Thank You