SlideShare a Scribd company logo
1 of 17
Download to read offline
M a r c h 2 0 1 7
Building Super Fast Cloud-Native Platforms
Yaron Haviv, CTO, iguazio
@yaronhaviv
1
iguazio © 2017
2
Building A High-Performance Cloud-Native Data Platform
Fast Data and API Gateways REST API and Web UI
Data Processing and Caching
Elastic Platform Services
Redundant 100GbE Fabric
Stateless Load-Balancing
(for Incoming Requests)
Multi Site
External
Notifications
DBMQ Monitor… RegistryLog Identity
LDAP/SSOObject Storage
(Archive)
Fast Native Access
“Server-Less”
iguazio © 2017
3
Redefining the Stack, Delivering Magnitudes-Faster Performance
8
iguazio © 2017
4
High-Performance Requires Careful Hardware Integration
V3IO
Cap’n Proto
+ Accelio*
(OS Bypass)
2 x 100GbE
Fabric
K/V
Services
VN
Services
8-24 x NVMe
DirectSSL & Overlay
Offload
Shared Mem
Lock-Free Qs
Web API Nodes Data Nodes
Web APIs
S3, Kinesis, DynamoDB...
NV Mem
Serve 800K** Web Req/Sec Serve 2M Ops/Sec and 100Gbps
0.1ms Latency @ 99% (percentile)
1 proc
/core1 proc
/tenant
1 vNIC/
tenant
** Tested with: https://github.com/v3io/http_blaster
* Accelio: https://github.com/v3io/accelio
iguazio © 2017
5
Challenges with Containers and Kubernetes
• Kubernetes Limitations/ Challenges
– Only one IF per POD, usually going through a slow overlay layer
– No native support for HW NICs (SR-IOV)
– Hard to expose low level drivers/ libraries to container
– Hard to use shared memory IPC/ files between PODs
– Docker and Kubernetes are different (security, net, volume, shmem, ..)
• Solution
– Custom network (CNI) and volume drivers
– Use granular privileges
– Many trials and errors 
iguazio © 2017
6
Container Networking 101
Container Networking Options
• SR-IOV allows native hardware access
• Multus enable multiple IFs per POD: https://github.com/ Intel-Corp/multus-cni
• More details on my blog https:// thenewstack.io/ hackers-guide-kubernetes-networking/
iguazio © 2017
7
Stateless Apps with Fastest Unified Data Access
• Fastest messaging/ DB/obj access
– Like Go channels across processes
– Lock-free, async, parallel
– Zero-copy end to end
– + Native Spark DataFrame API
• Fast container initialization
– No TCP/ IP connections init
– No memory alloc/register
• Share TCP/ RDMA connections
• NO Kernel drivers/changes !
V3IOd
(DB, stream, ..)
Fuse++
(file mounts)
Fuse++ -
Applications Services
Other Apps
Orchestrated via
FlexVolume Driver
Serverless
SDK/DFSDK
Fast Data IPC
Cap’n Proto + Lock-free
shmem queues (dev/shm)
Modified fuse lib to run async, 15x faster (~100K IOPs/thread)
See: https://github.com/v3io/libfuse
50/100 GbE
Accelerate Performance Using Shared Memory
100% Stateless Apps
iguazio © 2017
8
Today: Server-Less is Cool, But Inefficient
• 1st generation is slow to init and a resource drain
– https://medium.com/ @ferdingler/aws-lambda-no-thank-you-9c586990e67d
• Complex and unsecured data bindings (performed in the init part of the function)
• TCP/ IP or DB connections may need to re-establish on every invocation
• Slow, limited concurrency, runs one task at a time per container
• Events structure has no common schema
(see: http://docs.aws.amazon.com/ lambda/ latest/dg/eventsources.html)
Source, my blog: https://medium.com/@yaronhaviv/serverless-background-challenges-and-future-d0928df71758
iguazio © 2017
9
Building Server-Less on Steroids
API
Gateways
API GW
(ingress)
Controller
Function
instance
Function
Instance
(N workers)
invoker
Async
calls
• Objects
• Files
• Tables
• Streams
1 update for N
invocations to save IO
1 hop, low latency msg
Create
Del
Scale
Data Changes,
Streaming
* Iguazio’s server-less framework will be open sourced later this year
User defined micro-batch size and
parallelism (N Go routines per function)
Credit based
rate limiter
Workers State
and Stats
Control Data
Distributed
Log Stream
Cut overhead, add parallelism and concurrency, without violating isolation
F PODs
NGINX PODs
iguazio © 2017
10
Example: Simple HTTP Function
Built-in log stream
Data binding and credentials in
function context
(simple, fast, reusable and secure)
See Data class APIs in backup slide,
allow integrating with various sources
iguazio © 2017
11
High-Speed “Server-Less” Data Processing, Everything is a Stream
Select name, mtime, f1, f2, …
Where condition1, 2, 3, ..
StatefulSet*
Transactional (exactly once)
update of object metadata
e.g. object last backup/scan time
Control Data
Task stats,
state, and logs
Scheduled or Continuous Metadata Queries
Example Tasks:
• incremental backup modifies
objects/records to Amazon S3
• Scan/convert content
• Data/Metadata stream is sharded
consistently to N stateless workers
• Using StatefulSets to maintain consistent
shard index id (derived from POD name)
• Tables
• Streams
• Objects
• Files
Stream/Push matching
Metadata & Data
iguazio © 2017
12
Example: Scanning for Sensitive Text Files on Upload/ Update
Init part, e.g. define RegEx filters
Data/metadata PUSH
to reduce latency
1-N records per call
Async and micro-batch updates
of object/record/file attributes
Function can use
distributed task counters
Update data/attrs in one ATOMIC transaction
iguazio © 2017
13
Image Example, Leveraging a Unified Data Model
Access the objects via file
semantics, mount and share
handled automatically
Thumbnail stored as a small
blub record on the same
object for quick UI access
Use standard file access
iguazio © 2017
14
Connecting The Dots: Continuous Analytics Example
iguazio © 2017
15
Kubernetes Helps us Simplify & Accelerate Analytics at the Edge
Sources
Event Driven Code
Analytics FrameworksServer-Less Processing
18
Queries &
Functions
Security Data
Lifecycle
Unified
Data
File APIObject APIStream APINoSQL API
iguazio Data Platform Cloud Aggregation
Insights
100%
Stateless Apps
Backup
iguazio © 2017
17
Generic Data Services Binding API
Service Major APIs
Object
e.g. S3, Minio,
v3io
ListBucket(prefix string) (ListBucketResp, error)
Get(path string, ranges ...Range) ([]byte, error)
Put(path string, body []byte ...) ([]byte, error)
Del(path string) (error)
NoSQL
e.g. DynamoDB,
Cassandra, v3io
GetItem(path, attrs string) (GetItemResp, error)
GetItems(path, attrs, filter, marker string, ...) (GetItemsResp, error)
PutItem(path string, list map[string]interface{}, condition string) ([]byte, error)
UpdateItem(path string, updatestr string, condition string) ([]byte, error)
DelItem(path string, condition string) (error)
Stream
e.g. Kinesis,
Kafka, v3io
GetRecords(path, offset string, maxrec int) (GetRecordsResp, error)
PutRecords(path string, records []StreamRecord) ([]byte, error)
Seek(path string, seek string) (string, error)
File Open(name string, flag int, perm FileMode) (*File, error)

More Related Content

What's hot

Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.Vladimir Pavkin
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...PROIDEA
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToSATOSHI TAGOMORI
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
 
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
Analyzing MySQL Logs with ClickHouse, by Peter ZaitsevAnalyzing MySQL Logs with ClickHouse, by Peter Zaitsev
Analyzing MySQL Logs with ClickHouse, by Peter ZaitsevAltinity Ltd
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKShu-Jeng Hsieh
 
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
 DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and... DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...PROIDEA
 
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Databricks
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesYousun Jeong
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands Onhkbhadraa
 

What's hot (20)

Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
Analyzing MySQL Logs with ClickHouse, by Peter ZaitsevAnalyzing MySQL Logs with ClickHouse, by Peter Zaitsev
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
 DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and... DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
 
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands On
 

Similar to Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...Amazon Web Services
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overviewsolarisyougood
 
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Cloud Native Day Tel Aviv
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyAlluxio, Inc.
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudAlluxio, Inc.
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Stateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Stateful Interaction In Serverless Architecture With Redis: Pyounguk ChoStateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Stateful Interaction In Serverless Architecture With Redis: Pyounguk ChoRedis Labs
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data PlatformShu-Jeng Hsieh
 
EQR Reporting: Rails + Amazon EC2
EQR Reporting:  Rails + Amazon EC2EQR Reporting:  Rails + Amazon EC2
EQR Reporting: Rails + Amazon EC2jeperkins4
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 

Similar to Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU (20)

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
India Webinar
India WebinarIndia Webinar
India Webinar
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overview
 
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Stateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Stateful Interaction In Serverless Architecture With Redis: Pyounguk ChoStateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Stateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
EQR Reporting: Rails + Amazon EC2
EQR Reporting:  Rails + Amazon EC2EQR Reporting:  Rails + Amazon EC2
EQR Reporting: Rails + Amazon EC2
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 

Recently uploaded

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证nhjeo1gg
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 

Recently uploaded (20)

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 

Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU

  • 1. M a r c h 2 0 1 7 Building Super Fast Cloud-Native Platforms Yaron Haviv, CTO, iguazio @yaronhaviv 1
  • 2. iguazio © 2017 2 Building A High-Performance Cloud-Native Data Platform Fast Data and API Gateways REST API and Web UI Data Processing and Caching Elastic Platform Services Redundant 100GbE Fabric Stateless Load-Balancing (for Incoming Requests) Multi Site External Notifications DBMQ Monitor… RegistryLog Identity LDAP/SSOObject Storage (Archive) Fast Native Access “Server-Less”
  • 3. iguazio © 2017 3 Redefining the Stack, Delivering Magnitudes-Faster Performance 8
  • 4. iguazio © 2017 4 High-Performance Requires Careful Hardware Integration V3IO Cap’n Proto + Accelio* (OS Bypass) 2 x 100GbE Fabric K/V Services VN Services 8-24 x NVMe DirectSSL & Overlay Offload Shared Mem Lock-Free Qs Web API Nodes Data Nodes Web APIs S3, Kinesis, DynamoDB... NV Mem Serve 800K** Web Req/Sec Serve 2M Ops/Sec and 100Gbps 0.1ms Latency @ 99% (percentile) 1 proc /core1 proc /tenant 1 vNIC/ tenant ** Tested with: https://github.com/v3io/http_blaster * Accelio: https://github.com/v3io/accelio
  • 5. iguazio © 2017 5 Challenges with Containers and Kubernetes • Kubernetes Limitations/ Challenges – Only one IF per POD, usually going through a slow overlay layer – No native support for HW NICs (SR-IOV) – Hard to expose low level drivers/ libraries to container – Hard to use shared memory IPC/ files between PODs – Docker and Kubernetes are different (security, net, volume, shmem, ..) • Solution – Custom network (CNI) and volume drivers – Use granular privileges – Many trials and errors 
  • 6. iguazio © 2017 6 Container Networking 101 Container Networking Options • SR-IOV allows native hardware access • Multus enable multiple IFs per POD: https://github.com/ Intel-Corp/multus-cni • More details on my blog https:// thenewstack.io/ hackers-guide-kubernetes-networking/
  • 7. iguazio © 2017 7 Stateless Apps with Fastest Unified Data Access • Fastest messaging/ DB/obj access – Like Go channels across processes – Lock-free, async, parallel – Zero-copy end to end – + Native Spark DataFrame API • Fast container initialization – No TCP/ IP connections init – No memory alloc/register • Share TCP/ RDMA connections • NO Kernel drivers/changes ! V3IOd (DB, stream, ..) Fuse++ (file mounts) Fuse++ - Applications Services Other Apps Orchestrated via FlexVolume Driver Serverless SDK/DFSDK Fast Data IPC Cap’n Proto + Lock-free shmem queues (dev/shm) Modified fuse lib to run async, 15x faster (~100K IOPs/thread) See: https://github.com/v3io/libfuse 50/100 GbE Accelerate Performance Using Shared Memory 100% Stateless Apps
  • 8. iguazio © 2017 8 Today: Server-Less is Cool, But Inefficient • 1st generation is slow to init and a resource drain – https://medium.com/ @ferdingler/aws-lambda-no-thank-you-9c586990e67d • Complex and unsecured data bindings (performed in the init part of the function) • TCP/ IP or DB connections may need to re-establish on every invocation • Slow, limited concurrency, runs one task at a time per container • Events structure has no common schema (see: http://docs.aws.amazon.com/ lambda/ latest/dg/eventsources.html) Source, my blog: https://medium.com/@yaronhaviv/serverless-background-challenges-and-future-d0928df71758
  • 9. iguazio © 2017 9 Building Server-Less on Steroids API Gateways API GW (ingress) Controller Function instance Function Instance (N workers) invoker Async calls • Objects • Files • Tables • Streams 1 update for N invocations to save IO 1 hop, low latency msg Create Del Scale Data Changes, Streaming * Iguazio’s server-less framework will be open sourced later this year User defined micro-batch size and parallelism (N Go routines per function) Credit based rate limiter Workers State and Stats Control Data Distributed Log Stream Cut overhead, add parallelism and concurrency, without violating isolation F PODs NGINX PODs
  • 10. iguazio © 2017 10 Example: Simple HTTP Function Built-in log stream Data binding and credentials in function context (simple, fast, reusable and secure) See Data class APIs in backup slide, allow integrating with various sources
  • 11. iguazio © 2017 11 High-Speed “Server-Less” Data Processing, Everything is a Stream Select name, mtime, f1, f2, … Where condition1, 2, 3, .. StatefulSet* Transactional (exactly once) update of object metadata e.g. object last backup/scan time Control Data Task stats, state, and logs Scheduled or Continuous Metadata Queries Example Tasks: • incremental backup modifies objects/records to Amazon S3 • Scan/convert content • Data/Metadata stream is sharded consistently to N stateless workers • Using StatefulSets to maintain consistent shard index id (derived from POD name) • Tables • Streams • Objects • Files Stream/Push matching Metadata & Data
  • 12. iguazio © 2017 12 Example: Scanning for Sensitive Text Files on Upload/ Update Init part, e.g. define RegEx filters Data/metadata PUSH to reduce latency 1-N records per call Async and micro-batch updates of object/record/file attributes Function can use distributed task counters Update data/attrs in one ATOMIC transaction
  • 13. iguazio © 2017 13 Image Example, Leveraging a Unified Data Model Access the objects via file semantics, mount and share handled automatically Thumbnail stored as a small blub record on the same object for quick UI access Use standard file access
  • 14. iguazio © 2017 14 Connecting The Dots: Continuous Analytics Example
  • 15. iguazio © 2017 15 Kubernetes Helps us Simplify & Accelerate Analytics at the Edge Sources Event Driven Code Analytics FrameworksServer-Less Processing 18 Queries & Functions Security Data Lifecycle Unified Data File APIObject APIStream APINoSQL API iguazio Data Platform Cloud Aggregation Insights 100% Stateless Apps
  • 17. iguazio © 2017 17 Generic Data Services Binding API Service Major APIs Object e.g. S3, Minio, v3io ListBucket(prefix string) (ListBucketResp, error) Get(path string, ranges ...Range) ([]byte, error) Put(path string, body []byte ...) ([]byte, error) Del(path string) (error) NoSQL e.g. DynamoDB, Cassandra, v3io GetItem(path, attrs string) (GetItemResp, error) GetItems(path, attrs, filter, marker string, ...) (GetItemsResp, error) PutItem(path string, list map[string]interface{}, condition string) ([]byte, error) UpdateItem(path string, updatestr string, condition string) ([]byte, error) DelItem(path string, condition string) (error) Stream e.g. Kinesis, Kafka, v3io GetRecords(path, offset string, maxrec int) (GetRecordsResp, error) PutRecords(path string, records []StreamRecord) ([]byte, error) Seek(path string, seek string) (string, error) File Open(name string, flag int, perm FileMode) (*File, error)