SlideShare a Scribd company logo
© 2014 IBM Corporation
Advanced Data Retrieval and Analytics
with Apache Spark and Openstack Swift
Gil Vernik
IBM Research - Haifa
© 2014 IBM Corporation
Topics Covered in This Talk
§ Openstack Swift
§ Apache Spark
§ Basic integration between Spark and Swift
§ Advanced integration between Spark and Swift by
utilizing the Storlets technology.
© 2014 IBM Corporation
Digital Universe
More than 1.8 zettabytes
(1.8 trillion gigabytes)
Grows rapidly
80% owned by enterprises
75% generated by individuals
According IDC iView "Extracting Value from Chaos,"
© 2014 IBM Corporation
Map-Reduce, Databases, etc..
Data needs to be replicated, Time, Cost, etc..
© 2014 IBM Corporation
Can we do it better?
© 2014 IBM Corporation
Openstack Swift
§ A massively scalable object store
§ Known to work with thousands of
servers, stores petabytes of data.
§ Exposes REST API
§ Features:
– Storage polices
– Erasure codes
– Data replication
– ….
PUTProxy Nodes
Storage Nodes
© 2014 IBM Corporation
Apache Spark
§ Apache Spark™ is a fast and general engine for
large-scale data processing
– Up to 100x faster than Hadoop Map
Reduce in-memory, 10x faster on disk
§ Combines SQL, streaming, and complex analytics
§ Can read existing Hadoop data
§ Most active project in Apache today
© 2014 IBM Corporation
Swift enablement for data retrieval in Spark
§ Apache Spark implements Hadoop interfaces and can use
HDFS or Amazon S3 as a data source.
Swift
Network
§ IBM research enabled Spark to access data stored in
Openstack Swift.
© 2014 IBM Corporation
What do we analyze?
Swift
Network
Stored Data Input to Analytics
Images EXIF metadata
PDF Hidden metadata
LOGs Only ‘ERROR’ records
…. ….
© 2014 IBM Corporation
Yes! We can do it better.
© 2014 IBM Corporation
Storlets: Flexibly extend for Swift
Advanced Data processing inside Swift
§ Storlets is a way to ‘extend’
cloud computational capabilities
§ Storlet is compiled code,
deployed to Swift and when
triggered is executed by Storlet
Engine directly on storage
nodes.
§ Storlet engine - responsible to
execute every storlet in a secure
environment
§ Storlet is a standard Java code
© 2014 IBM Corporation
Storlets extend an object store by
moving computation to the data –
filtering, transforming, analyzing –
instead of bringing the data to the
computation
© 2014 IBM Corporation
Swift Storlets: How do they benefit Spark?
Swift Storlet
Network
Objects
Filter
Data processing+
© 2014 IBM Corporation
Storlets Enable Extending the Functionality of Spark
Example: analyzing EXIF metadata from photos
§ Object store is a
natural repository for
photos
§ Photos contain rich
capture metadata
§ Analyzing this
metadata for a set of
photos can show how
the camera is used
© 2014 IBM Corporation
Example: Analyzing EXIF metadata
Storlets can extract metadata, returning as JSON
(rather than of processing the binary data directly by Spark)
10MB 1KB
© 2014 IBM Corporation
Example: Analyzing EXIF metadata.
•  Spark accesses images via storlet
•  No change to Spark, only changes the URI
•  JSON file returned by storlet defines schema
•  SQL from Spark processes metadata
© 2014 IBM Corporation
Example: Analyzing EXIF metadata.
© 2014 IBM Corporation
Summary
§ Openstack Swift is the most popular open source
object store
§ Apache Spark is the next big thing in data analytics
§ Spark and Swift can be integrated
§ Storlets in Swift provide clear benefits for analytics
use cases.
Thank you!
More information
Gil Vernik, IBM Research -Haifa
gilv@il.ibm.com

More Related Content

What's hot

Juniper Network Automation for KrDAG
Juniper Network Automation for KrDAGJuniper Network Automation for KrDAG
Juniper Network Automation for KrDAG
KwonSun Bae
 
OpenStack Networking and Automation
OpenStack Networking and AutomationOpenStack Networking and Automation
OpenStack Networking and Automation
Adam Johnson
 
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Cloud Native Day Tel Aviv
 
Open stack journey from folsom to grizzly
Open stack journey from folsom to grizzlyOpen stack journey from folsom to grizzly
Open stack journey from folsom to grizzlyopenstackindia
 
Nova net-or-neutron-atlanta2014.pptx
Nova net-or-neutron-atlanta2014.pptxNova net-or-neutron-atlanta2014.pptx
Nova net-or-neutron-atlanta2014.pptx
Somik Behera
 
OpenContrail Cloudwatt Feedback
OpenContrail Cloudwatt FeedbackOpenContrail Cloudwatt Feedback
OpenContrail Cloudwatt Feedback
ethuleau
 
Can the Open vSwitch (OVS) bottleneck be resolved? - Erez Cohen - OpenStack D...
Can the Open vSwitch (OVS) bottleneck be resolved? - Erez Cohen - OpenStack D...Can the Open vSwitch (OVS) bottleneck be resolved? - Erez Cohen - OpenStack D...
Can the Open vSwitch (OVS) bottleneck be resolved? - Erez Cohen - OpenStack D...
Cloud Native Day Tel Aviv
 
L4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef LaribiL4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef Laribi
buildacloud
 
Open stack networking_101_update_2014
Open stack networking_101_update_2014Open stack networking_101_update_2014
Open stack networking_101_update_2014
yfauser
 
OpenStack Neutron behind the Scenes
OpenStack Neutron behind the ScenesOpenStack Neutron behind the Scenes
OpenStack Neutron behind the Scenes
Anil Bidari ( CEO , Cloud Enabled)
 
Open stack korea_uni2u_pdf
Open stack korea_uni2u_pdfOpen stack korea_uni2u_pdf
Open stack korea_uni2u_pdf
Yongyoon Shin
 
OpenDaylight Netvirt and Neutron - Mike Kolesnik, Josh Hershberg - OpenStack ...
OpenDaylight Netvirt and Neutron - Mike Kolesnik, Josh Hershberg - OpenStack ...OpenDaylight Netvirt and Neutron - Mike Kolesnik, Josh Hershberg - OpenStack ...
OpenDaylight Netvirt and Neutron - Mike Kolesnik, Josh Hershberg - OpenStack ...
Cloud Native Day Tel Aviv
 
Contrail Basics
Contrail BasicsContrail Basics
Contrail Basics
Kimberly Macias
 
BRKDCT-2445 Agile OpenStack Networking with Cisco Solutions - Cisco Live! US ...
BRKDCT-2445 Agile OpenStack Networking with Cisco Solutions - Cisco Live! US ...BRKDCT-2445 Agile OpenStack Networking with Cisco Solutions - Cisco Live! US ...
BRKDCT-2445 Agile OpenStack Networking with Cisco Solutions - Cisco Live! US ...
Rohit Agarwalla
 
Nova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute modelsNova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute modelsopenstackindia
 
OpenStack Israel Meetup - Project Kuryr: Bringing Container Networking to Neu...
OpenStack Israel Meetup - Project Kuryr: Bringing Container Networking to Neu...OpenStack Israel Meetup - Project Kuryr: Bringing Container Networking to Neu...
OpenStack Israel Meetup - Project Kuryr: Bringing Container Networking to Neu...
Cloud Native Day Tel Aviv
 
Agile Networking with OpenStack
Agile Networking with OpenStack Agile Networking with OpenStack
Agile Networking with OpenStack
openstackcisco
 
Osdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauserOsdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauser
yfauser
 
Network and Service Virtualization tutorial at ONUG Spring 2015
Network and Service Virtualization tutorial at ONUG Spring 2015Network and Service Virtualization tutorial at ONUG Spring 2015
Network and Service Virtualization tutorial at ONUG Spring 2015
SDN Hub
 
Cloud Networking - Leaving the Physical Behind - Omer Anson - OpenStack Day I...
Cloud Networking - Leaving the Physical Behind - Omer Anson - OpenStack Day I...Cloud Networking - Leaving the Physical Behind - Omer Anson - OpenStack Day I...
Cloud Networking - Leaving the Physical Behind - Omer Anson - OpenStack Day I...
Cloud Native Day Tel Aviv
 

What's hot (20)

Juniper Network Automation for KrDAG
Juniper Network Automation for KrDAGJuniper Network Automation for KrDAG
Juniper Network Automation for KrDAG
 
OpenStack Networking and Automation
OpenStack Networking and AutomationOpenStack Networking and Automation
OpenStack Networking and Automation
 
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
 
Open stack journey from folsom to grizzly
Open stack journey from folsom to grizzlyOpen stack journey from folsom to grizzly
Open stack journey from folsom to grizzly
 
Nova net-or-neutron-atlanta2014.pptx
Nova net-or-neutron-atlanta2014.pptxNova net-or-neutron-atlanta2014.pptx
Nova net-or-neutron-atlanta2014.pptx
 
OpenContrail Cloudwatt Feedback
OpenContrail Cloudwatt FeedbackOpenContrail Cloudwatt Feedback
OpenContrail Cloudwatt Feedback
 
Can the Open vSwitch (OVS) bottleneck be resolved? - Erez Cohen - OpenStack D...
Can the Open vSwitch (OVS) bottleneck be resolved? - Erez Cohen - OpenStack D...Can the Open vSwitch (OVS) bottleneck be resolved? - Erez Cohen - OpenStack D...
Can the Open vSwitch (OVS) bottleneck be resolved? - Erez Cohen - OpenStack D...
 
L4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef LaribiL4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef Laribi
 
Open stack networking_101_update_2014
Open stack networking_101_update_2014Open stack networking_101_update_2014
Open stack networking_101_update_2014
 
OpenStack Neutron behind the Scenes
OpenStack Neutron behind the ScenesOpenStack Neutron behind the Scenes
OpenStack Neutron behind the Scenes
 
Open stack korea_uni2u_pdf
Open stack korea_uni2u_pdfOpen stack korea_uni2u_pdf
Open stack korea_uni2u_pdf
 
OpenDaylight Netvirt and Neutron - Mike Kolesnik, Josh Hershberg - OpenStack ...
OpenDaylight Netvirt and Neutron - Mike Kolesnik, Josh Hershberg - OpenStack ...OpenDaylight Netvirt and Neutron - Mike Kolesnik, Josh Hershberg - OpenStack ...
OpenDaylight Netvirt and Neutron - Mike Kolesnik, Josh Hershberg - OpenStack ...
 
Contrail Basics
Contrail BasicsContrail Basics
Contrail Basics
 
BRKDCT-2445 Agile OpenStack Networking with Cisco Solutions - Cisco Live! US ...
BRKDCT-2445 Agile OpenStack Networking with Cisco Solutions - Cisco Live! US ...BRKDCT-2445 Agile OpenStack Networking with Cisco Solutions - Cisco Live! US ...
BRKDCT-2445 Agile OpenStack Networking with Cisco Solutions - Cisco Live! US ...
 
Nova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute modelsNova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute models
 
OpenStack Israel Meetup - Project Kuryr: Bringing Container Networking to Neu...
OpenStack Israel Meetup - Project Kuryr: Bringing Container Networking to Neu...OpenStack Israel Meetup - Project Kuryr: Bringing Container Networking to Neu...
OpenStack Israel Meetup - Project Kuryr: Bringing Container Networking to Neu...
 
Agile Networking with OpenStack
Agile Networking with OpenStack Agile Networking with OpenStack
Agile Networking with OpenStack
 
Osdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauserOsdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauser
 
Network and Service Virtualization tutorial at ONUG Spring 2015
Network and Service Virtualization tutorial at ONUG Spring 2015Network and Service Virtualization tutorial at ONUG Spring 2015
Network and Service Virtualization tutorial at ONUG Spring 2015
 
Cloud Networking - Leaving the Physical Behind - Omer Anson - OpenStack Day I...
Cloud Networking - Leaving the Physical Behind - Omer Anson - OpenStack Day I...Cloud Networking - Leaving the Physical Behind - Omer Anson - OpenStack Day I...
Cloud Networking - Leaving the Physical Behind - Omer Anson - OpenStack Day I...
 

Viewers also liked

Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
Haoyuan Li
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
Mark Wong
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Spark Summit
 
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanThe Hot Rod Protocol in Infinispan
The Hot Rod Protocol in Infinispan
Galder Zamarreño
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Ceph Community
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
fnothaft
 
ELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot TimesELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot Times
andrewmurraympc
 
Velox: Models in Action
Velox: Models in ActionVelox: Models in Action
Velox: Models in Action
Dan Crankshaw
 
Naïveté vs. Experience
Naïveté vs. ExperienceNaïveté vs. Experience
Naïveté vs. Experience
Mike Fogus
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
jeykottalam
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
jeykottalam
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
Zubair Nabi
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and Concurrency
David Beazley (Dabeaz LLC)
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark Summit
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
Hortonworks
 
In Search of the Perfect Global Interpreter Lock
In Search of the Perfect Global Interpreter LockIn Search of the Perfect Global Interpreter Lock
In Search of the Perfect Global Interpreter Lock
David Beazley (Dabeaz LLC)
 
Python in Action (Part 2)
Python in Action (Part 2)Python in Action (Part 2)
Python in Action (Part 2)
David Beazley (Dabeaz LLC)
 

Viewers also liked (20)

Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
 
Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
 
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanThe Hot Rod Protocol in Infinispan
The Hot Rod Protocol in Infinispan
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
ELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot TimesELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot Times
 
Velox: Models in Action
Velox: Models in ActionVelox: Models in Action
Velox: Models in Action
 
Naïveté vs. Experience
Naïveté vs. ExperienceNaïveté vs. Experience
Naïveté vs. Experience
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
 
OpenStack Cheat Sheet V2
OpenStack Cheat Sheet V2OpenStack Cheat Sheet V2
OpenStack Cheat Sheet V2
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and Concurrency
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
 
In Search of the Perfect Global Interpreter Lock
In Search of the Perfect Global Interpreter LockIn Search of the Perfect Global Interpreter Lock
In Search of the Perfect Global Interpreter Lock
 
Python in Action (Part 2)
Python in Action (Part 2)Python in Action (Part 2)
Python in Action (Part 2)
 

Similar to Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
Torsten Steinbach
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
A Brave new object store world
A Brave new object store worldA Brave new object store world
A Brave new object store world
Effi Ofer
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overview
solarisyougood
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
NetApp Se training storage grid webscale technical overview
NetApp Se training   storage grid webscale technical overviewNetApp Se training   storage grid webscale technical overview
NetApp Se training storage grid webscale technical overview
solarisyougood
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
Torsten Steinbach
 
Serverless data lake architecture
Serverless data lake architectureServerless data lake architecture
Serverless data lake architecture
Maik Wiesmüller
 
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5
Tony Pearson
 
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
NebulaInc
 
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
ScalrCMP
 
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
オラクルエンジニア通信
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Yousun Jeong
 
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
sparktc
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
Itai Yaffe
 

Similar to Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift (20)

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
A Brave new object store world
A Brave new object store worldA Brave new object store world
A Brave new object store world
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overview
 
IDE.pptx
IDE.pptxIDE.pptx
IDE.pptx
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
NetApp Se training storage grid webscale technical overview
NetApp Se training   storage grid webscale technical overviewNetApp Se training   storage grid webscale technical overview
NetApp Se training storage grid webscale technical overview
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Serverless data lake architecture
Serverless data lake architectureServerless data lake architecture
Serverless data lake architecture
 
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5
 
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
 
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
 
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
 
Big data cloud architecture
Big data cloud architectureBig data cloud architecture
Big data cloud architecture
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 

More from Daniel Krook

Commit to the Cause, Push for Change: Contributing to Call for Code Open Sour...
Commit to the Cause, Push for Change: Contributing to Call for Code Open Sour...Commit to the Cause, Push for Change: Contributing to Call for Code Open Sour...
Commit to the Cause, Push for Change: Contributing to Call for Code Open Sour...
Daniel Krook
 
Engaging Open Source Developers to Develop Tech for Good through Code and Res...
Engaging Open Source Developers to Develop Tech for Good through Code and Res...Engaging Open Source Developers to Develop Tech for Good through Code and Res...
Engaging Open Source Developers to Develop Tech for Good through Code and Res...
Daniel Krook
 
COVID-19 and Climate Change Action Through Open Source Technology
COVID-19 and Climate Change Action Through Open Source TechnologyCOVID-19 and Climate Change Action Through Open Source Technology
COVID-19 and Climate Change Action Through Open Source Technology
Daniel Krook
 
Serverless APIs with Apache OpenWhisk
Serverless APIs with Apache OpenWhiskServerless APIs with Apache OpenWhisk
Serverless APIs with Apache OpenWhisk
Daniel Krook
 
Workshop: Develop Serverless Applications with IBM Cloud Functions
Workshop: Develop Serverless Applications with IBM Cloud FunctionsWorkshop: Develop Serverless Applications with IBM Cloud Functions
Workshop: Develop Serverless Applications with IBM Cloud Functions
Daniel Krook
 
Event specifications, state of the serverless landscape, and other news from ...
Event specifications, state of the serverless landscape, and other news from ...Event specifications, state of the serverless landscape, and other news from ...
Event specifications, state of the serverless landscape, and other news from ...
Daniel Krook
 
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at SantanderServerless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Daniel Krook
 
The CNCF on Serverless
The CNCF on ServerlessThe CNCF on Serverless
The CNCF on Serverless
Daniel Krook
 
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
Building serverless applications with Apache OpenWhisk and IBM Cloud FunctionsBuilding serverless applications with Apache OpenWhisk and IBM Cloud Functions
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
Daniel Krook
 
Building serverless applications with Apache OpenWhisk
Building serverless applications with Apache OpenWhiskBuilding serverless applications with Apache OpenWhisk
Building serverless applications with Apache OpenWhisk
Daniel Krook
 
Containers vs serverless - Navigating application deployment options
Containers vs serverless - Navigating application deployment optionsContainers vs serverless - Navigating application deployment options
Containers vs serverless - Navigating application deployment options
Daniel Krook
 
Serverless architectures built on an open source platform
Serverless architectures built on an open source platformServerless architectures built on an open source platform
Serverless architectures built on an open source platform
Daniel Krook
 
Build a cloud native app with OpenWhisk
Build a cloud native app with OpenWhiskBuild a cloud native app with OpenWhisk
Build a cloud native app with OpenWhisk
Daniel Krook
 
Cloud Native Architectures with an Open Source, Event Driven, Serverless Plat...
Cloud Native Architectures with an Open Source, Event Driven, Serverless Plat...Cloud Native Architectures with an Open Source, Event Driven, Serverless Plat...
Cloud Native Architectures with an Open Source, Event Driven, Serverless Plat...
Daniel Krook
 
Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...
Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...
Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...
Daniel Krook
 
Serverless apps with OpenWhisk
Serverless apps with OpenWhiskServerless apps with OpenWhisk
Serverless apps with OpenWhisk
Daniel Krook
 
OpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven appsOpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven apps
Daniel Krook
 
Containers, OCI, CNCF, Magnum, Kuryr, and You!
Containers, OCI, CNCF, Magnum, Kuryr, and You!Containers, OCI, CNCF, Magnum, Kuryr, and You!
Containers, OCI, CNCF, Magnum, Kuryr, and You!
Daniel Krook
 
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayerTaking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Daniel Krook
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
Daniel Krook
 

More from Daniel Krook (20)

Commit to the Cause, Push for Change: Contributing to Call for Code Open Sour...
Commit to the Cause, Push for Change: Contributing to Call for Code Open Sour...Commit to the Cause, Push for Change: Contributing to Call for Code Open Sour...
Commit to the Cause, Push for Change: Contributing to Call for Code Open Sour...
 
Engaging Open Source Developers to Develop Tech for Good through Code and Res...
Engaging Open Source Developers to Develop Tech for Good through Code and Res...Engaging Open Source Developers to Develop Tech for Good through Code and Res...
Engaging Open Source Developers to Develop Tech for Good through Code and Res...
 
COVID-19 and Climate Change Action Through Open Source Technology
COVID-19 and Climate Change Action Through Open Source TechnologyCOVID-19 and Climate Change Action Through Open Source Technology
COVID-19 and Climate Change Action Through Open Source Technology
 
Serverless APIs with Apache OpenWhisk
Serverless APIs with Apache OpenWhiskServerless APIs with Apache OpenWhisk
Serverless APIs with Apache OpenWhisk
 
Workshop: Develop Serverless Applications with IBM Cloud Functions
Workshop: Develop Serverless Applications with IBM Cloud FunctionsWorkshop: Develop Serverless Applications with IBM Cloud Functions
Workshop: Develop Serverless Applications with IBM Cloud Functions
 
Event specifications, state of the serverless landscape, and other news from ...
Event specifications, state of the serverless landscape, and other news from ...Event specifications, state of the serverless landscape, and other news from ...
Event specifications, state of the serverless landscape, and other news from ...
 
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at SantanderServerless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
 
The CNCF on Serverless
The CNCF on ServerlessThe CNCF on Serverless
The CNCF on Serverless
 
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
Building serverless applications with Apache OpenWhisk and IBM Cloud FunctionsBuilding serverless applications with Apache OpenWhisk and IBM Cloud Functions
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
 
Building serverless applications with Apache OpenWhisk
Building serverless applications with Apache OpenWhiskBuilding serverless applications with Apache OpenWhisk
Building serverless applications with Apache OpenWhisk
 
Containers vs serverless - Navigating application deployment options
Containers vs serverless - Navigating application deployment optionsContainers vs serverless - Navigating application deployment options
Containers vs serverless - Navigating application deployment options
 
Serverless architectures built on an open source platform
Serverless architectures built on an open source platformServerless architectures built on an open source platform
Serverless architectures built on an open source platform
 
Build a cloud native app with OpenWhisk
Build a cloud native app with OpenWhiskBuild a cloud native app with OpenWhisk
Build a cloud native app with OpenWhisk
 
Cloud Native Architectures with an Open Source, Event Driven, Serverless Plat...
Cloud Native Architectures with an Open Source, Event Driven, Serverless Plat...Cloud Native Architectures with an Open Source, Event Driven, Serverless Plat...
Cloud Native Architectures with an Open Source, Event Driven, Serverless Plat...
 
Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...
Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...
Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...
 
Serverless apps with OpenWhisk
Serverless apps with OpenWhiskServerless apps with OpenWhisk
Serverless apps with OpenWhisk
 
OpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven appsOpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven apps
 
Containers, OCI, CNCF, Magnum, Kuryr, and You!
Containers, OCI, CNCF, Magnum, Kuryr, and You!Containers, OCI, CNCF, Magnum, Kuryr, and You!
Containers, OCI, CNCF, Magnum, Kuryr, and You!
 
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayerTaking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift

  • 1. © 2014 IBM Corporation Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift Gil Vernik IBM Research - Haifa
  • 2. © 2014 IBM Corporation Topics Covered in This Talk § Openstack Swift § Apache Spark § Basic integration between Spark and Swift § Advanced integration between Spark and Swift by utilizing the Storlets technology.
  • 3. © 2014 IBM Corporation Digital Universe More than 1.8 zettabytes (1.8 trillion gigabytes) Grows rapidly 80% owned by enterprises 75% generated by individuals According IDC iView "Extracting Value from Chaos,"
  • 4. © 2014 IBM Corporation Map-Reduce, Databases, etc.. Data needs to be replicated, Time, Cost, etc..
  • 5. © 2014 IBM Corporation Can we do it better?
  • 6. © 2014 IBM Corporation Openstack Swift § A massively scalable object store § Known to work with thousands of servers, stores petabytes of data. § Exposes REST API § Features: – Storage polices – Erasure codes – Data replication – …. PUTProxy Nodes Storage Nodes
  • 7. © 2014 IBM Corporation Apache Spark § Apache Spark™ is a fast and general engine for large-scale data processing – Up to 100x faster than Hadoop Map Reduce in-memory, 10x faster on disk § Combines SQL, streaming, and complex analytics § Can read existing Hadoop data § Most active project in Apache today
  • 8. © 2014 IBM Corporation Swift enablement for data retrieval in Spark § Apache Spark implements Hadoop interfaces and can use HDFS or Amazon S3 as a data source. Swift Network § IBM research enabled Spark to access data stored in Openstack Swift.
  • 9. © 2014 IBM Corporation What do we analyze? Swift Network Stored Data Input to Analytics Images EXIF metadata PDF Hidden metadata LOGs Only ‘ERROR’ records …. ….
  • 10. © 2014 IBM Corporation Yes! We can do it better.
  • 11. © 2014 IBM Corporation Storlets: Flexibly extend for Swift Advanced Data processing inside Swift § Storlets is a way to ‘extend’ cloud computational capabilities § Storlet is compiled code, deployed to Swift and when triggered is executed by Storlet Engine directly on storage nodes. § Storlet engine - responsible to execute every storlet in a secure environment § Storlet is a standard Java code
  • 12. © 2014 IBM Corporation Storlets extend an object store by moving computation to the data – filtering, transforming, analyzing – instead of bringing the data to the computation
  • 13. © 2014 IBM Corporation Swift Storlets: How do they benefit Spark? Swift Storlet Network Objects Filter Data processing+
  • 14. © 2014 IBM Corporation Storlets Enable Extending the Functionality of Spark Example: analyzing EXIF metadata from photos § Object store is a natural repository for photos § Photos contain rich capture metadata § Analyzing this metadata for a set of photos can show how the camera is used
  • 15. © 2014 IBM Corporation Example: Analyzing EXIF metadata Storlets can extract metadata, returning as JSON (rather than of processing the binary data directly by Spark) 10MB 1KB
  • 16. © 2014 IBM Corporation Example: Analyzing EXIF metadata. •  Spark accesses images via storlet •  No change to Spark, only changes the URI •  JSON file returned by storlet defines schema •  SQL from Spark processes metadata
  • 17. © 2014 IBM Corporation Example: Analyzing EXIF metadata.
  • 18. © 2014 IBM Corporation Summary § Openstack Swift is the most popular open source object store § Apache Spark is the next big thing in data analytics § Spark and Swift can be integrated § Storlets in Swift provide clear benefits for analytics use cases. Thank you! More information Gil Vernik, IBM Research -Haifa gilv@il.ibm.com