SlideShare a Scribd company logo
1 of 30
Manta
Unleashed
BigDataSg Meetup
2 July 2013
Christopher W. V. Hogue Ph.D.
chogue@blueprint.org
Big Data in 2002 – NBLAST - Computing 361,249,575,000
Protein Sequence Alignments & storing significant hits
http://www.biomedcentral.com/content/pdf/1471-2105-3-13.pdf
Big Data in 2003 – Distributed Computing, Tiered Architecture
for 10 Billion Protein 3D structure samples
Volunteer Computing
Blueprint Data Center
What is Manta?
• Manta is a new operating-system level component
of the IaaS platform of Joyent released June 26 2013.
http://www.joyent.com
• Manta is an object store system for big-data that you can
compute on without moving your data
• Manta provides map-reduce capability for executing POSIX
standard, arbitrary compute jobs directly on cloud storage
servers
• Manta allows map-reduce operations
formed by any standard UNIX command or application
in any run-time language
without moving stored data
without Hadoop or Java code
without loading raw data into a database
What Operating System?
• Manta is built on SmartOS, using the illumos kernel, which
is open-source UNIX
• SmartOS is Not GNU/LINUX
• SmartOS is a very lightweight illumos
distro for cloud hypervisors with KVM and storage
that runs in RAM from PXE/CD/USB boot media
• Derived from Sun Microsystem’s Open Solaris
• Over 10,000 packages supported via pkgsrc system
illumos is the Open Source Unix kernel forked from Solaris
Cloud OS
Server OS
Storage OS
Kernel
DTrace
Crossbow
Zones
ZFS
SMF
MDB CDDL
Oracle closed its Solaris source…
Aug
2010
Database OS
Jan
2010
and more…
Kernel Innovations
Bugfixes
GCC build
ZFS feature flags
ZFS background delete
ZFS LZ4 compression
KVM Type 1 hypervisor
UNIX System V
Release 4
Four years of legal work
to open-source Solaris.
2004-2008
1992
Manta – What is SmartOS?
• SmartOS is Joyent’s lightweight illumos kernel based operating system
optimized for high-performance cloud computing.
• illumos is an open-source fork of Open Solaris, supported by Joyent,
Nexenta, OmniTI, DEY systems, and Delphix and other core committers.
• After Oracle bought Sun Microsystems, many Solaris software engineers,
those who built ZFS, Dtrace and other components, left Oracle and joined
the illumos effort.
• illumos distros that you can experiment with include SmartOS, OmniOS,
OpenIndiana, and NexentaStor.
• Prerequisite for Manta Use:
Your code needs to run/tested on (x86) illumos!
• Started in 2004
• IaaS hosting:
– Windows, Linux, FreeBSD KVM images
– LinkedIn , Wanelo, Voxer, Storify, Geeklist,
Tripshare …
many others
– Singapore’s Reebonz (reebonz.com.sg)
• 4 Primary Data Centers ->
• 3rd Party Smart Data Center Licensees
who run Joyent-Powered Clouds, e.g.:
– Telefonica – Spain
– http://cloud.telefonica.com/instantservers/
– MiCloud – Taiwan
– http://micloud.tw/ch/
– Libero – Italy
– http://cloud.libero.it/it/il_nostro_cloud/profilo/
http://www.joyent.com/products/compute-service/data-centers
• Class-1 DC Operators
• SSAE 16 Certified
• Multi-layered Physical Security
• Highly-Redundant Power
• Early Warning Fire Suppression
• All Tier-1 ISP Connectivity
• 10gb/40gb Fully-Meshed Network
• Full Peering, Fiber Connectivity
May 20 2013 – Dell drops Open Stack Cloud,
Partners with Joyent for high-performance, high-
availability IaaS service provision.
Joyent as an IaaS provider
• Has full development control of the entire operating
system stack
• Is the corporate steward of the
Node.js Javascript run-time language
• Community friendly - provides SmartOS image
downloads, source for free, and support
• You can deploy a private cloud for free with 3rd party
management software “Project Fifo”
SmartOS Storage Implementation
• All SmartOS storage is local,
on ZFS
– Integrated disk/volume management
– Copy-on-write
– Self-healing
– Protection against silent data corruption
– No hardware RAID dependency
– Striping, RAID-Z with no write hole
– No fsck resilvering
– Built-in filesystem compression options
– Compress a subdirectory
– Snapshots
– zfs send / receive
– Integrated SSD IO caching
– Add drives with one command, while in
production
• Manta in the Joyent Datacenter
is built on ZFS
– no SAN, no NAS head nodes
– no tiered layers
– standard commodity Intel servers
– 4 U servers with 73 TiB of user data
– basic SAS HBA technology
– Every object is stored on 2 ZFS pools by
policy default, local to the server on
which it is accessed
– Architecture leads me to speculate that
Manta stands for
“Manta is Not Tiered Architecture”
Manta Features
• A multi-datacenter object store
• Fine-grained replication commands
• No object size limits
• Per-object replication policies
• Filesystem-like namespace including directory
queries
• Up to 1 million files per directory
• Public folders for CDN data delivery
• Read-after write consistency
• SnapLinks – a file hard-link (ln) and snapshot mash-
up, allowing alternate file naming and versioning in
place. Use to mimic data movement.
• REST with JSON API
• Interactive shell access through Node.js driven SDK
and commands
• Compute in place with map-reduce processing with
arbitrary code and scripts without data movement
• GuardTime keyless data signatures and validation
Manta’s Compute-on-Storage
• On AWS E3
• Move the “big data” into
– EC2
– Hadoop
• Then orchestrate a
method to run the query
• Then clean up additional
big data instances
• On Manta
• grep in place on the
storage servers
• Manta hands back your
job output in a new folder
For a simple grep style text query
in a big-data collection of server logs:
How does Manta work?
End User
• Install Node.js package with
mlogin() and local Manta
commands
• Local Node.js environment
includes Manta interactive
shell and fast I/O data and
command transfers up to the
Manta Data Center .
• Commands transit via REST
APIs with JSON encoding.
These can be called directly.
Data Center
• Connects to End User
• Distributes and commits data
uploads according to
replication policy (2 by default)
• Fast consistency, data is ready
to use without waiting for
synch
• Jobs are launched in SmartOS
Zone VM images on the server.
• The hashed UID of the Zone
that is launched becomes the
job number/directory for
output data
Manta Commands
Client-Side Utilities
Installed locally as Joyent Manta Node.js SDK.
Also available to your jobs on the data-side -- ->
• mls - Lists directory contents
• mput - Uploads data to an object
• mget - Downloads an object from the service
• mjob - Creates and runs a computational job on the
service
• mfind - Walks a hierarchy to find names of objects
by name, size, or type
• mlogin - Interactive session client
• mln - Makes SnapLinks between objects
• mmkdir - Make directories
• mrm - Remove objects or directories
• mrmdir - Remove empty directories
• msign - Create a signed URL to a object stored in
the service
• muntar - Create a directory hierarchy from a tar file
Data-Side Utilities
Additional commands are available to your jobs in the data-
side compute environment:
• maggr - Performs key-wise aggregation on plain text
files.
• mcat - Emits the named object as an output for the
current task.
• mpipe - Output pipe for the current task.
• msplit - Split the output stream for the current task to
many reducers.
• mtee - Capture stdin and write to both stdout and a
object.
Control interactively via shell-like SDK, OR automate with REST + JSON APIs.
Manta patterns for job creation
• $ mjob create –m ’command-to-
map’ –r ‘command to reduce’
• Big Data Map Reduce version of grep:
– (GNU grep –H prints name of file matching pattern, so you know what file is matched)
• $ mjob create -m ’grep -H --label=$MANTA_INPUT_OBJECT
pattern’ -r cat
http://apidocs.joyent.com/manta/job-patterns.html
Manta Documentation – Total Word Count in text file
collection with map-reduce of wc + awk 1-liner
Interactive
REST + JSON API
Manta Documentation –
Image conversion with ImageMagik “convert”
What software can I run on Manta?
Thousands of ready to use UNIX packages
on the VM image:
• Python
• Perl
• R
• Node.js
• Java
• ImageMagik
• ffmpeg
• OpenSSL
• Sqlite
• MySQL client
• Postgres client
Or run custom software that is
not on the VM image:
• These are called Assets
• Can be interpretable code or SmartOS
compatible binaries
• Upload a SmartOS compatible package
(e.g. tarball as tgz or a script file) on
Manta
• Use a job script that unpacks the custom
asset inside the Manta VM, and executes
it.
• Use standard Unix approaches for error
loging, output, pipes and tees.
Use Cases
• Democratization of BIG DATA
– No longer in the hands of a few
• Mass market self-logging devices
– Transportation/Automotive
– E-health monitoring systems
– Sensor networks
• Scientific paper PDF collections
– Federate collections
– Allow large scale text mining
• Genomic Sequence Analysis
– Store Raw Data
– Move compute pipeline to data
– Meta-pipelines in parallel for computing
over old data with new knowledge
• Running a checksum over your data
to assure its integrity
• Log processing: clickstream analysis,
MapReduce on logs
• Text processing including search
• Image processing: converting
formats, generating thumbnails,
resizing
• Video processing: transcoding,
extracting segments, resizing
• Data Analysis, Mining and Graphing
with NumPy, SciPy and R
Manta Pricing http://www.joyent.com/products/manta/pricing
Manta compute charges
are by the second:
$0.00004/GB DRAM * sec
If you run 1000 parallel tasks in 32GB
DRAM instances on 1000 objects and
they each take a second, then you've
used 32000 seconds of time and the cost
for this job would be $1.28.
Storage charges are slightly less than
Amazon E3
Bandwidth IN is free
Bandwith OUT has tiered charges.
Request Type Price per unit of requests
Delete Free
POST, PUT, LIST (“GET DIR”) $0.005/1000 requests
GET, OPTION, HEAD $0.004/10000 requests
Storage Tier Default (2 copies)Price per GB (per individual
copy)
First 1 TB/mo $0.086 $0.043
Next 49 TB/mo $0.072 $0.036
Next 450 TB/mo $0.064 $0.032
Next 500 TB/mo $0.058 $0.029
Next 4000 TB/mo$0.054 $0.027
Next 5000 TB/mo$0.050 $0.025
Default is 2 copies. When submitting an object to the service,
you can specify the number of copies stored, from one (1) to six
(6).
Deploy a Fast, Scalable, Free, Open Source
Private IaaS Cloud Today.
• SmartOS
http://smartos.org/
• Project FiFO
http://project-fifo.net
My PXE boot 2-node
desktop IaaS Cloud setup
Fifo Web Console managing SmartOS
KVM Type 1 (bare metal) Hypervisor
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013

More Related Content

What's hot

Ultimate hybrid cloud
Ultimate hybrid cloudUltimate hybrid cloud
Ultimate hybrid cloud
Mirantis
 
Filesystem as a service in OpenStack
Filesystem as a service in OpenStackFilesystem as a service in OpenStack
Filesystem as a service in OpenStack
openstackindia
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstack
Framgia Vietnam
 

What's hot (20)

Solar Powered MicroServers - Green Computing
Solar Powered MicroServers - Green ComputingSolar Powered MicroServers - Green Computing
Solar Powered MicroServers - Green Computing
 
Ultimate hybrid cloud
Ultimate hybrid cloudUltimate hybrid cloud
Ultimate hybrid cloud
 
Instrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in productionInstrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in production
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
 
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
 
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on DemandLinux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
 
Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowski
 
Using Cinder Block Storage
Using Cinder Block StorageUsing Cinder Block Storage
Using Cinder Block Storage
 
Filesystem as a service in OpenStack
Filesystem as a service in OpenStackFilesystem as a service in OpenStack
Filesystem as a service in OpenStack
 
Introduction To OpenStack
Introduction To OpenStackIntroduction To OpenStack
Introduction To OpenStack
 
[OpenStack Day in Korea 2015] Keynote 5 - The evolution of OpenStack Networking
[OpenStack Day in Korea 2015] Keynote 5 - The evolution of OpenStack Networking[OpenStack Day in Korea 2015] Keynote 5 - The evolution of OpenStack Networking
[OpenStack Day in Korea 2015] Keynote 5 - The evolution of OpenStack Networking
 
[OpenStack Day in Korea 2015] Keynote 2 - Leveraging OpenStack to Realize the...
[OpenStack Day in Korea 2015] Keynote 2 - Leveraging OpenStack to Realize the...[OpenStack Day in Korea 2015] Keynote 2 - Leveraging OpenStack to Realize the...
[OpenStack Day in Korea 2015] Keynote 2 - Leveraging OpenStack to Realize the...
 
Laying OpenStack Cinder Block Services
Laying OpenStack Cinder Block ServicesLaying OpenStack Cinder Block Services
Laying OpenStack Cinder Block Services
 
Superfluidity, Infrastructure for mixed workloads in Mobile Edge Computing - ...
Superfluidity, Infrastructure for mixed workloads in Mobile Edge Computing - ...Superfluidity, Infrastructure for mixed workloads in Mobile Edge Computing - ...
Superfluidity, Infrastructure for mixed workloads in Mobile Edge Computing - ...
 
Is kubernetes a good choice for orchestration
Is kubernetes a good choice for orchestrationIs kubernetes a good choice for orchestration
Is kubernetes a good choice for orchestration
 
Open ebs 101
Open ebs 101Open ebs 101
Open ebs 101
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstack
 
Introduction to openstack
Introduction to openstackIntroduction to openstack
Introduction to openstack
 
ACROPOLIS CONTAINER SERVICES
ACROPOLIS CONTAINER SERVICESACROPOLIS CONTAINER SERVICES
ACROPOLIS CONTAINER SERVICES
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
 

Similar to Manta Unleashed BigDataSG talk 2 July 2013

Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
Kognitio
 
Using Eclipse and Lua for the Internet of Things - JAX2013
Using Eclipse and Lua for the Internet of Things - JAX2013Using Eclipse and Lua for the Internet of Things - JAX2013
Using Eclipse and Lua for the Internet of Things - JAX2013
Benjamin Cabé
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
Prateek Jain
 

Similar to Manta Unleashed BigDataSG talk 2 July 2013 (20)

Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Azure fb-google Web Services
Azure fb-google Web ServicesAzure fb-google Web Services
Azure fb-google Web Services
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 
Big Data
Big DataBig Data
Big Data
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Using Eclipse and Lua for the Internet of Things - JAX2013
Using Eclipse and Lua for the Internet of Things - JAX2013Using Eclipse and Lua for the Internet of Things - JAX2013
Using Eclipse and Lua for the Internet of Things - JAX2013
 
UWP apps development - Part 3
UWP apps development - Part 3UWP apps development - Part 3
UWP apps development - Part 3
 
MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitness
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
kumarResume
kumarResumekumarResume
kumarResume
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
 
Informix - The Ideal Database for IoT
Informix - The Ideal Database for IoTInformix - The Ideal Database for IoT
Informix - The Ideal Database for IoT
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Manta Unleashed BigDataSG talk 2 July 2013

  • 1. Manta Unleashed BigDataSg Meetup 2 July 2013 Christopher W. V. Hogue Ph.D. chogue@blueprint.org
  • 2. Big Data in 2002 – NBLAST - Computing 361,249,575,000 Protein Sequence Alignments & storing significant hits http://www.biomedcentral.com/content/pdf/1471-2105-3-13.pdf
  • 3. Big Data in 2003 – Distributed Computing, Tiered Architecture for 10 Billion Protein 3D structure samples Volunteer Computing Blueprint Data Center
  • 4. What is Manta? • Manta is a new operating-system level component of the IaaS platform of Joyent released June 26 2013. http://www.joyent.com • Manta is an object store system for big-data that you can compute on without moving your data • Manta provides map-reduce capability for executing POSIX standard, arbitrary compute jobs directly on cloud storage servers • Manta allows map-reduce operations formed by any standard UNIX command or application in any run-time language without moving stored data without Hadoop or Java code without loading raw data into a database
  • 5. What Operating System? • Manta is built on SmartOS, using the illumos kernel, which is open-source UNIX • SmartOS is Not GNU/LINUX • SmartOS is a very lightweight illumos distro for cloud hypervisors with KVM and storage that runs in RAM from PXE/CD/USB boot media • Derived from Sun Microsystem’s Open Solaris • Over 10,000 packages supported via pkgsrc system
  • 6. illumos is the Open Source Unix kernel forked from Solaris Cloud OS Server OS Storage OS Kernel DTrace Crossbow Zones ZFS SMF MDB CDDL Oracle closed its Solaris source… Aug 2010 Database OS Jan 2010 and more… Kernel Innovations Bugfixes GCC build ZFS feature flags ZFS background delete ZFS LZ4 compression KVM Type 1 hypervisor UNIX System V Release 4 Four years of legal work to open-source Solaris. 2004-2008 1992
  • 7. Manta – What is SmartOS? • SmartOS is Joyent’s lightweight illumos kernel based operating system optimized for high-performance cloud computing. • illumos is an open-source fork of Open Solaris, supported by Joyent, Nexenta, OmniTI, DEY systems, and Delphix and other core committers. • After Oracle bought Sun Microsystems, many Solaris software engineers, those who built ZFS, Dtrace and other components, left Oracle and joined the illumos effort. • illumos distros that you can experiment with include SmartOS, OmniOS, OpenIndiana, and NexentaStor. • Prerequisite for Manta Use: Your code needs to run/tested on (x86) illumos!
  • 8. • Started in 2004 • IaaS hosting: – Windows, Linux, FreeBSD KVM images – LinkedIn , Wanelo, Voxer, Storify, Geeklist, Tripshare … many others – Singapore’s Reebonz (reebonz.com.sg) • 4 Primary Data Centers -> • 3rd Party Smart Data Center Licensees who run Joyent-Powered Clouds, e.g.: – Telefonica – Spain – http://cloud.telefonica.com/instantservers/ – MiCloud – Taiwan – http://micloud.tw/ch/ – Libero – Italy – http://cloud.libero.it/it/il_nostro_cloud/profilo/ http://www.joyent.com/products/compute-service/data-centers • Class-1 DC Operators • SSAE 16 Certified • Multi-layered Physical Security • Highly-Redundant Power • Early Warning Fire Suppression • All Tier-1 ISP Connectivity • 10gb/40gb Fully-Meshed Network • Full Peering, Fiber Connectivity May 20 2013 – Dell drops Open Stack Cloud, Partners with Joyent for high-performance, high- availability IaaS service provision.
  • 9. Joyent as an IaaS provider • Has full development control of the entire operating system stack • Is the corporate steward of the Node.js Javascript run-time language • Community friendly - provides SmartOS image downloads, source for free, and support • You can deploy a private cloud for free with 3rd party management software “Project Fifo”
  • 10. SmartOS Storage Implementation • All SmartOS storage is local, on ZFS – Integrated disk/volume management – Copy-on-write – Self-healing – Protection against silent data corruption – No hardware RAID dependency – Striping, RAID-Z with no write hole – No fsck resilvering – Built-in filesystem compression options – Compress a subdirectory – Snapshots – zfs send / receive – Integrated SSD IO caching – Add drives with one command, while in production • Manta in the Joyent Datacenter is built on ZFS – no SAN, no NAS head nodes – no tiered layers – standard commodity Intel servers – 4 U servers with 73 TiB of user data – basic SAS HBA technology – Every object is stored on 2 ZFS pools by policy default, local to the server on which it is accessed – Architecture leads me to speculate that Manta stands for “Manta is Not Tiered Architecture”
  • 11. Manta Features • A multi-datacenter object store • Fine-grained replication commands • No object size limits • Per-object replication policies • Filesystem-like namespace including directory queries • Up to 1 million files per directory • Public folders for CDN data delivery • Read-after write consistency • SnapLinks – a file hard-link (ln) and snapshot mash- up, allowing alternate file naming and versioning in place. Use to mimic data movement. • REST with JSON API • Interactive shell access through Node.js driven SDK and commands • Compute in place with map-reduce processing with arbitrary code and scripts without data movement • GuardTime keyless data signatures and validation
  • 12. Manta’s Compute-on-Storage • On AWS E3 • Move the “big data” into – EC2 – Hadoop • Then orchestrate a method to run the query • Then clean up additional big data instances • On Manta • grep in place on the storage servers • Manta hands back your job output in a new folder For a simple grep style text query in a big-data collection of server logs:
  • 13. How does Manta work? End User • Install Node.js package with mlogin() and local Manta commands • Local Node.js environment includes Manta interactive shell and fast I/O data and command transfers up to the Manta Data Center . • Commands transit via REST APIs with JSON encoding. These can be called directly. Data Center • Connects to End User • Distributes and commits data uploads according to replication policy (2 by default) • Fast consistency, data is ready to use without waiting for synch • Jobs are launched in SmartOS Zone VM images on the server. • The hashed UID of the Zone that is launched becomes the job number/directory for output data
  • 14. Manta Commands Client-Side Utilities Installed locally as Joyent Manta Node.js SDK. Also available to your jobs on the data-side -- -> • mls - Lists directory contents • mput - Uploads data to an object • mget - Downloads an object from the service • mjob - Creates and runs a computational job on the service • mfind - Walks a hierarchy to find names of objects by name, size, or type • mlogin - Interactive session client • mln - Makes SnapLinks between objects • mmkdir - Make directories • mrm - Remove objects or directories • mrmdir - Remove empty directories • msign - Create a signed URL to a object stored in the service • muntar - Create a directory hierarchy from a tar file Data-Side Utilities Additional commands are available to your jobs in the data- side compute environment: • maggr - Performs key-wise aggregation on plain text files. • mcat - Emits the named object as an output for the current task. • mpipe - Output pipe for the current task. • msplit - Split the output stream for the current task to many reducers. • mtee - Capture stdin and write to both stdout and a object. Control interactively via shell-like SDK, OR automate with REST + JSON APIs.
  • 15. Manta patterns for job creation • $ mjob create –m ’command-to- map’ –r ‘command to reduce’ • Big Data Map Reduce version of grep: – (GNU grep –H prints name of file matching pattern, so you know what file is matched) • $ mjob create -m ’grep -H --label=$MANTA_INPUT_OBJECT pattern’ -r cat http://apidocs.joyent.com/manta/job-patterns.html
  • 16. Manta Documentation – Total Word Count in text file collection with map-reduce of wc + awk 1-liner Interactive REST + JSON API
  • 17. Manta Documentation – Image conversion with ImageMagik “convert”
  • 18. What software can I run on Manta? Thousands of ready to use UNIX packages on the VM image: • Python • Perl • R • Node.js • Java • ImageMagik • ffmpeg • OpenSSL • Sqlite • MySQL client • Postgres client Or run custom software that is not on the VM image: • These are called Assets • Can be interpretable code or SmartOS compatible binaries • Upload a SmartOS compatible package (e.g. tarball as tgz or a script file) on Manta • Use a job script that unpacks the custom asset inside the Manta VM, and executes it. • Use standard Unix approaches for error loging, output, pipes and tees.
  • 19. Use Cases • Democratization of BIG DATA – No longer in the hands of a few • Mass market self-logging devices – Transportation/Automotive – E-health monitoring systems – Sensor networks • Scientific paper PDF collections – Federate collections – Allow large scale text mining • Genomic Sequence Analysis – Store Raw Data – Move compute pipeline to data – Meta-pipelines in parallel for computing over old data with new knowledge • Running a checksum over your data to assure its integrity • Log processing: clickstream analysis, MapReduce on logs • Text processing including search • Image processing: converting formats, generating thumbnails, resizing • Video processing: transcoding, extracting segments, resizing • Data Analysis, Mining and Graphing with NumPy, SciPy and R
  • 20. Manta Pricing http://www.joyent.com/products/manta/pricing Manta compute charges are by the second: $0.00004/GB DRAM * sec If you run 1000 parallel tasks in 32GB DRAM instances on 1000 objects and they each take a second, then you've used 32000 seconds of time and the cost for this job would be $1.28. Storage charges are slightly less than Amazon E3 Bandwidth IN is free Bandwith OUT has tiered charges. Request Type Price per unit of requests Delete Free POST, PUT, LIST (“GET DIR”) $0.005/1000 requests GET, OPTION, HEAD $0.004/10000 requests Storage Tier Default (2 copies)Price per GB (per individual copy) First 1 TB/mo $0.086 $0.043 Next 49 TB/mo $0.072 $0.036 Next 450 TB/mo $0.064 $0.032 Next 500 TB/mo $0.058 $0.029 Next 4000 TB/mo$0.054 $0.027 Next 5000 TB/mo$0.050 $0.025 Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6).
  • 21. Deploy a Fast, Scalable, Free, Open Source Private IaaS Cloud Today. • SmartOS http://smartos.org/ • Project FiFO http://project-fifo.net My PXE boot 2-node desktop IaaS Cloud setup Fifo Web Console managing SmartOS KVM Type 1 (bare metal) Hypervisor