SlideShare a Scribd company logo
1 of 23
Data-Centric Hyperconvergence with
OpenStack Swift & Storlets
Mail: eran@itsonlyme.name
IRC: eranrom
Twitter: @EranRom
Takeaways
1. Storlets can be used to do REAL stuff (and
quite easily)
2. Storlets enable cost efficient data centric
services
End-to-End Deep Learning on
Unstructured Data
End-to-End Deep Learning on
Unstructured Data
Training
Set
~100 tagged pictures of
Trump, Obama, Merkel
and Bibi
obama merkel
End-to-End Deep Learning on
Unstructured Data
Training
Set
obama merkel
Test Set
Videos of Trump, Obama,
Merkel and Bibi
Step 1: Data Preparation
Training
Set
Extracted
Training
Set
1. Identify face location
2. Crop
3. Resize
Extract face storlet
merkel
merkel
Step 2: Supervised Learning
Extracted
Training
Set
model
Train model storlet
trump
obama
merkel
bibi
model
Step 3: Model Testing
Video
Recognize face storlet
Test
set
bibi
End-to-End Deep Learning on
Unstructured Data
Training
Set
Extracted
Training
Set
model Video
Extract face storlet Train model storlet
X10
0
Test
set
Recognize face storlet
Demo Setup: S2AIO with Jupyter Notebook
Swift and
Storlets all
in one
Local Scripts & S3 Vs. S2AIO
Swift and
Storlets all
in oneS3
S3 Client
With OpenCV
and SKLearn
Local Scripts & S3 Vs. S2AIO
Swift and
Storlets all
in oneS3
S3 Client
With OpenCV
and SKLearn
Dedicated M4.2XLarge (8 CPUs 32GB RAM)
S2aio on EC2 Vs. EC2/S3
Dedicated M4X2Large (8 VCPUs, 32GB Ram, High Network Performance)
0
10
20
30
40
50
60
70
Extract Train Recognize
Seconds
EC2 Swift & Storlets
EC2 & S3
But the point is…
Sources:
Ethernet: http://www.ethernetalliance.org/roadmap/
Infiniband: http://www.infinibandta.org/content/pages.php?pg=technology_overview
1 1.5 2 2.5 3 4
7.5
30
50
1 1.5
5 6
8
1
10 10
20
1.00 1.79
3.57 3.57
0
10
20
30
40
50
60
2010 2011 2012 2013 2014 2015 2016 2017 2018-2020
GrowthFactor
Storage Vs. Networking Growth
SSD
HDD
Ethernet
Infiniband
16 Disks and 4 Network Ports Servers
800.00
128.00
80.00
14.290.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
900.00
SSD HDD Ethernet Infiniband
Storage Vs. Networking Growth
Thank You!
All Demo Code: https://github.com/eranr/e2emlstorlets
My Blog: http://itsonlyme.name/blog

More Related Content

Similar to The Case for Data Centric Hyperconvergence

Tivoli Online Training in India
Tivoli Online Training in IndiaTivoli Online Training in India
Tivoli Online Training in India
United Global Soft
 

Similar to The Case for Data Centric Hyperconvergence (13)

Tivoli Online Training in India
Tivoli Online Training in IndiaTivoli Online Training in India
Tivoli Online Training in India
 
Tivoli Online Training in India
Tivoli Online Training in IndiaTivoli Online Training in India
Tivoli Online Training in India
 
Tivoli Online Training in India
Tivoli Online Training in IndiaTivoli Online Training in India
Tivoli Online Training in India
 
Tivoli online training in India
Tivoli online training in IndiaTivoli online training in India
Tivoli online training in India
 
Tivoli online training in India
Tivoli online training in IndiaTivoli online training in India
Tivoli online training in India
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
 
Tivoli Online Training in India
Tivoli Online Training in IndiaTivoli Online Training in India
Tivoli Online Training in India
 
Tivoli Online Training in India
Tivoli Online Training in IndiaTivoli Online Training in India
Tivoli Online Training in India
 
Tivoli Online Training in India
Tivoli Online Training in IndiaTivoli Online Training in India
Tivoli Online Training in India
 
State of Drupal keynote, DrupalCon Austin
State of Drupal keynote, DrupalCon AustinState of Drupal keynote, DrupalCon Austin
State of Drupal keynote, DrupalCon Austin
 
developer presentation templates
developer presentation templatesdeveloper presentation templates
developer presentation templates
 
Machine learning for mortal developers - Dublin.JS
Machine learning for mortal developers - Dublin.JSMachine learning for mortal developers - Dublin.JS
Machine learning for mortal developers - Dublin.JS
 
Tivoli Online Training in India
Tivoli Online Training in IndiaTivoli Online Training in India
Tivoli Online Training in India
 

Recently uploaded

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

The Case for Data Centric Hyperconvergence

Editor's Notes

  1. Storlets are about co-locating storage and compute. That is, instead of bringing the data to the compute, bring the compute, which is much smaller, to the data. The Stork is the Storlets project mascot
  2. More specifically, storlets allow to co-locate Dockerized computations inside Openstack Swift in a serverless fashion
  3. Swift is a massively scalable storage system that has a simple API to store and retrieve data blobs taking care of data redundancy via e.g. replication across failure domains.
  4. We use Docker to run the compute near the data in a secured and isolated manner.
  5. By serverless we mean that an end user can upload to Swift the program to run as done for any other data blob, and we will take care of the rest.
  6. This is what I refer to as a data centric hyper convergence. Like traditional hyperconvergence the idea is to have a storage compute and networking solution that can horizontally scale. Traditional hyperconvergence though is focused on general purpose virtual environments and many times go hand in hand with high end flash arrays. This is being marketed as A solution for big data analytics over semi-structure data. Here we are focusing on unstructured data, which is the majority of the data. Hyperconvergence and data centric hyper convergence are complimentary technologies where one can think of the data centric part as ‘transforming’ the unstructured data to semi-structured data that can be consumed with traditional big data machinery. As such I think that data centric hyperconvergence should also have a data management component in the mix, e.g. metadata search.
  7. The graph shows the growth factor of a single SSD/HDD and single networking ports. In Ethernet we see growth from 10Gb in 2010 to a 100 in 2014. Today we start seeing 200Gb Infiniband started at 56Gb in 2011 and like ethernet were in 100 in 2014 and are now at 200 HDD were not growing as fast, with the X8 factor due to Helium filled HDDs In SSDs, however, we see a really big growth, with Seagate announcing 60TB drive last year, and Toshiba 100TB drive to come out this year. Now, consider that in a typical storage server there are much more disks then network ports…
  8. Considering a 16 disks server with 4 network ports we see a much bigger difference in the growth factor.