WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
A Journey to Building an Autonomous
Streaming Data Platform - Scaling to Trillion
Events Monthly at Nvidia
Platform
#UnifiedAnalytics #SparkAISummit
Satish Dandu, Data Science & Engg. Manager, Nvidia
Rohit Kulkarni, Data Architect, Nvidia
#UnifiedAnalytics #SparkAISummit 3
Drinking a hot cup of coffee Setting up end-end Big Data Pipeline
OR
Source: http://matzav.com/very-hot-drinks-could-cause-cancer/ Picture of Data Pipeline, Modified. Adapted from: Building a Data Pipeline from Scratch by Alan Marazzi. Retrieved from:
https://medium.com/the-data-experience/building-a-data-pipeline-from-scratch-32b712cfb1db
WHICH ONE IS FASTER?
#UnifiedAnalytics #SparkAISummit 4
VS
https://www.pinterest.com/pin/16015933047709449
Platform Engineers:
What do you spend
your time doing?
Platform Users:
Is it faster to -
1. Drink a cup of hot
coffee
or
1. Build an end-to-end
Pipeline?
VS
[100%] -
Developing
Platform
Tools
[50%] -
Developing
Platform
Tools
[20%] -
Writing
ETL Jobs
[30%] -
Managing
Prod
Pipelines
[100%] -
Drinking a
cup of
coffee
Coffee Challenge
We Surveyed: v1.0 v2.0
[30%] -
Drinking
Coffee! [70%] -
Building
end-to-
end
Pipeline
#UnifiedAnalytics #SparkAISummit 5
AGENDA
Who we are?
Data Platform as-a-service
Architecture
Our Journey from 1.0 -> 2.0
Lessons Learned?
3 S’s
Self Service
Scalability
Security
Demo
6
Who we are ?
#UnifiedAnalytics #SparkAISummit 7
DATA PLATFORM AS-A-SERVICE
CLOUD GAMING
NVIDIA GPU CLOUD
Data Platform supports several NVidia products as tenants
AISMART CITIES
SELF DRIVING SIM
8#UnifiedAnalytics #SparkAISummit
Data Platform Stats
9
JOURNEY FROM
V1 TO V2
10
10K Foot Overview - UpStream & Downstream[v1] 10K Foot Overview - UpStream & DownStream
Data Sources Gateway Data Processing
#UnifiedAnalytics #SparkAISummit 11
ARCHITECTURE v1.0
Big Data Platform-as-a-Service (dPaaS) & AI Inferencing-as-a-Service
Top 5 Details
Scalability • Trillions of events processed
• Kratos platform handles:
~15B+ events /day
New Tenants • 1000% increase in data workloads YoY
End-End
Latency SLA
• Telemetry/Structured data
– 30 secs
• Unstructured logs – 5 mins
High
Availability
• Platform hosted on AWS
• Distributed on multiple AZs
for HA
Self Service • Kibana, DP-Explorer, BI
12
V1 PIPELINES
*Images created from makeameme.com
#UnifiedAnalytics #SparkAISummit 13
V1 LEARNINGS
Automate data
shippers
Automate building
data pipelines / ETLs
Create Zero-
Engineering
Dashboards
Platform &
Applications were
tightly coupled
SELF SERVICE
Meet growing data
demands at 1000%
YoY
Data growth from 1B
-> 25B/day and
projected to grow to
50B+/day
Log Parsing scaling
issues
SCALE
Automate “Who can
access What kind of
data”
Custom Token
Management
Natively integrate
tokens with LDAP for
compliance reasons
Security
Reduce Data
ingestion costs
Tenant level billing
BILLING $
14
10K Foot Overview - 2.010K Foot Overview - v2.0
#UnifiedAnalytics #SparkAISummit 15
ARCHITECTURE v2.0
Top N Details
Scalability • Trillion events per month
• Platform handles:
~25B+ events /day (avg)
Platform • Multi-Tenancy
• Auto-Scaling clusters
• Spot pricing
• Decoupled Infra & Application
Self Service • NV Data Bots
• NV Data Apps
End-End
Latency
• 30 seconds
High
Availability
• Platform hosted on AWS
Enhanced
Security
• Native integration with Nvidia
LDAP using tokens
• Transactional deletes with Spark
Delta
#UnifiedAnalytics #SparkAISummit 16
Data Platform v2.0
➢ Self Service
○ End-End Pipeline = NV Data Bots + NV Data Apps
○ Zero-Engineering Dashboards
○ Big Data & AI Inferencing pipelines
➢ Scalability
○ Lessons learned
○ Migrations
➢ Security
○ Lessons learned
Source: Three Pillars, Stock Photo Modified. Retrieved
from https://www.123rf.com/clipart-vector/pillars_pillar.html
17
1. SELF SERVICE
NV Data Bots & NV Data Apps
#UnifiedAnalytics #SparkAISummit 18
Config
Service
Upload Config and Deploy.
Easy to use Configs
Single Click
Deployment
Integration with CodeDeploy
to Deploy with a Single Click
Auto-Schema
Management
Schema managed behind-
the-scenes with Kafka
Schema Registry
Enhanced
Security
Advanced security
features to control who
can send data
Bot
Monitoring
Out-of-Box Monitoring
for Bots ‒ Track Lag
and Throughput
➢ NV Data Bots are light weight data shipper agents to push data automatically to data platform
(Up Stream)
NV Data Bots
Image #4 Image #5 Image #6 Image #7 Image #8
#UnifiedAnalytics #SparkAISummit 19
NV DATA BOTS
➢ NV Data Bot = Elastic Beats data agent + Yaml config NV Data BOT
Beats Config
Data Platform
➢ Sample Yaml config
Deployment Model
#UnifiedAnalytics #SparkAISummit 20
MetricBot FileBot HeartBot PacketBot
NV Data Market
Profiles
Build a Profile & Deploy with a Single Click!
Config Service
Filebot: config_a
Filebot: config_b
MetricBot: config_c
HeartBot: config_x
PacketBot: config_y
Service Profile
MetricBot: config_c FileBot: config_a
Single Click Deploy
CodeDeploy
Step 2. Build a Profile
Step 1. Create a Config
Step 3. Deploy Profile
#UnifiedAnalytics #SparkAISummit 21
Config
Service
Upload Config and Deploy. No
need to know any Spark.
Single Click
Deployment
Easily Deploy your Apps to
Auto-scaling Spark Clusters
with a single click.
Auto-Schema
Management
Schema managed behind-
the-scenes with Kafka
Schema Registry.
Zero-Engineering
Dashboards
Users don’t need to build
dashboards from scratch.
Increases Productivity.
App Analytics
Out-of-Box Monitoring for
Apps - Track Lag and
Throughput.
➢ NV Data Apps automates streaming & batch processing pipelines (Downstream)
➢ NV Data Apps are available through NV Data Market and can deploy to Spark clusters using Jenkins
NV Data Apps
Image #4 Image #5 Image #6 Image #9 Image #8
22
z
Basic App
Pass Through Apps, or Out-of-Box Apps: Apache,
Haproxy, etc.
Intermediate App
Custom Apps, User Provides SQL Query or
Parsing Logic, enables Stream-to-Stream Joins
Advanced App
Fully Supported PySpark Apps
app: apache
alias: basic_demo
output: elasticsearch
app: custom
alias: intermediate_demo
output: datawarehouse
table: demo.f_demo
sql: select … join by ...
NV Data Apps
There are 3 Categories of Apps
Sample Config.yml :
Sample Config.yml :
#UnifiedAnalytics #SparkAISummit 23
Default NV Spark Data Apps
BASIC (Pass-Thru) APPS
Structured Telemetry NV Spark Data App
AUTO SCHEMA MANAGEMENT
NV Data Apps are mainly powered by Spark
Spark Streaming requires data schema
Used Standard Elastic Beats templates
Load/Read elastic index template & create json
Read json from spark & infer schema
Automated this process
Challenges
Default beats schema has nested objects and it’s a challenge
to infer schema directly
https://www.centosblog.com/configure-apache-https-reverse-proxy-centos-linux/
https://dasunhegoda.com/apache-load-balacing-haproxy/659/
https://mesosphere.com/product/
NV Data Apps - Learnings
#UnifiedAnalytics #SparkAISummit 24
NV DATA APPS for AI Inferencing
Advanced APPs AI Inferencing NV Data Apps
➢ 2 Step process
○ Pre-Processing
○ Inferencing
➢ Pre-Processing (Blue color)
○ NV Data Spark App for Pre-Aggregation
➢ Inferencing (Orange color)
○ All Deep learning models are deployed in
containers
○ NV Inferencing App reads from kafka, performs
inferencing and produce output to kafka
25#UnifiedAnalytics #SparkAISummit
Figure 1: Meme of girl running to get coffee. Retrieved from:
www.thememegenerator.net
Demo Steps:
1. Deploy Bots with Install Only Option
2. Deploy Bots to a DataCenter with
Single Click (CodeDeploy)
3. Deploy Basic Apps and Show
Dashboards
4. Run Ad-hoc SQL Queries
5. Deploy Intermediate App
26#UnifiedAnalytics #SparkAISummit
Demo Pipeline:FileBot Config:
MetricBot Config:
Apache App Config: System App Config:
apache_app.yml
metric_config.yml
file_config.yml
demo_profile.yml
system_app.yml
Profile Config:
logs_a
metrics_a metrics_a
logs_a
27#UnifiedAnalytics #SparkAISummit
NV Data Bots Monitoring
Out-of-Box Monitoring available for all Bots
28
NV Data App Analytics
Out-of-Box Monitoring available for all Apps
29
V2
PIPELINES
*Images created from quickmeme.com
30
SCALABILITY
31
Scalability
BIG 3 Migrations for scaling to next 10x data growth
SCALING PROBLEMS: SOLUTIONS:
S3/SQS
Source Connectors Hitting
Scaling Limits
S3/SQS
Spark Streaming
AutoScaling
ElasticSearch
Sink Connectors
Hitting Scaling Limits
Elasticsearch
Spark Streaming -
Autoscaling + Custom
Connector with XPack
Security
S3 Parquet
Sink Connectors
Hitting Scaling Limits
Spark Streaming to Real-time
Databricks Delta Tables
Autoscaling
SCALING PROBLEMS: SOLUTIONS:
Not Self-serve, Hard Json
syntax to write Log Parsing
Powerful Regex Log Parsing
with Spark SQL
Hitting Scaling Limits on Log
Parsing.
2k with regex parsing/node
10k with pass-thru
Hard to auto-scale
Autoscaling Spark Clusters
can easily handle 10x the
Volume.
200k/sec/node
Logstash Connectors
32
Scalability
Highlight: Presto to Delta Migration
Presto Pain Points:
- Ad-hoc Queries are very slow- Many Small Parquet Files Problem
- Manual Compaction Process (Heavy Maintenance)
- End User Queries get Affected during Compaction Job
- Easy to Corrupt Data
- Complex Partition & View Management
Parquet Sink Connector
Live Table Historic Data
Hourly Batch
Compaction Job
Query Delta View
Delta to the Rescue:
- Auto-Optimize (Zero-Maintenance)
- Ad-hoc Queries are very fast - Compacted
Tables and SSD Caching on i3 Nodes
- End User Queries don’t get Affected
- Out-of-Box Schema Management - Can’t
corrupt data
- Auto Partition Management
Auto-optimized Live
Table
Fast Ad-hoc Queries on
Live + Historic Data
33
SECURITY
Security
Common Questions
34#UnifiedAnalytics #SparkAISummit
Which User is running
bad Queries on the
Data?!! Lets find him
Can we Block him?!
You want access to the
data? Ok you have to wait
till NEXT SPRINT!
Who is accessing what
data ?
Can we Integrate
SQL tokens with
LDAP??
Sql Tools Auth ?
35#UnifiedAnalytics #SparkAISummit
Each Tenant has 3 Groups:
1. Group Admin
○ Full Read/Write Access
○ Ability to Manage Users
2. General
○ Read Access to Data
3. GDPR
○ Read Access to GDPR Data
○ Names, Emails, etc.
Enhanced Security
Token Management Framework
#UnifiedAnalytics #SparkAISummit 36
Summary - Custom Frameworks
- Config Service
- Bot Manager
- Single Click Deployment
w/ CodeDeploy
- Bot Monitoring
NV Data Bots
- Config Service
- Schema Management
- Zero-Engineering Dashboards
- App Analytics
- Custom Elasticsearch
Connector w/ XPack Integration
NV Data Apps
- Token Management Framework
- LDAP Group
Integration
- SQL Audit Dashboard
Security
37
TEAM
Jarod Maupin Niranjan
Natarja
Ed Clune Rohit
Kulkarni
Ian Jones Amit Hora
Stefan Le Chinmay
Chandak
Narendra
Sanikommu
Sathish
Gandham Sathish Matti
Yuxing Wei
Satish Dandu
Special Thanks to NVidia Creative team
#UnifiedAnalytics #SparkAISummit 38
References
1) https://github.com/konpa/devicon/issues/140
2) https://depositphotos.com/167487860/stock-illustration-data-warehouse-icon-logo-design.html
1) Config Service Logo: https://www.shutterstock.com/image-vector/abstract-tuning-tools-configuration-symbols-lines-1068990533
2) Rocket Logo: https://www.imgrumweb.com/post/BwAiuM0FQ46
3) Schema Logo: https://www.shutterstock.com/image-vector/wire-frame-globe-sphere-connected-lines-326479382
4) Security Logo:https://killarneyeconomicconference.com/cyber-security-transatlantic-policy-forum/
5) Monitoring Logo: https://medium.com/security-token-offering/insight-into-asset-tokenization-a-cryptocurrency-trend-2e9a2a9506a8
6) Data Apps Dashboards Logo: https://www.shutterstock.com/image-vector/intelligent-technology-vector-interface-presentation-
network-1194209902
7) Beats Logo : https://www.elastic.co/products/beats
8) Dwight Screaming: https://airfreshener.club/quotes/yelling-office-dwight.html
9) Security Dog: http://tapety-na-plochu.webz.cz/index.php?str=1
#UnifiedAnalytics #SparkAISummit 39
Q&A
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trillion Events Monthly at Nvidia

  • 1.
    WIFI SSID:SparkAISummit |Password: UnifiedAnalytics
  • 2.
    A Journey toBuilding an Autonomous Streaming Data Platform - Scaling to Trillion Events Monthly at Nvidia Platform #UnifiedAnalytics #SparkAISummit Satish Dandu, Data Science & Engg. Manager, Nvidia Rohit Kulkarni, Data Architect, Nvidia
  • 3.
    #UnifiedAnalytics #SparkAISummit 3 Drinkinga hot cup of coffee Setting up end-end Big Data Pipeline OR Source: http://matzav.com/very-hot-drinks-could-cause-cancer/ Picture of Data Pipeline, Modified. Adapted from: Building a Data Pipeline from Scratch by Alan Marazzi. Retrieved from: https://medium.com/the-data-experience/building-a-data-pipeline-from-scratch-32b712cfb1db WHICH ONE IS FASTER?
  • 4.
    #UnifiedAnalytics #SparkAISummit 4 VS https://www.pinterest.com/pin/16015933047709449 PlatformEngineers: What do you spend your time doing? Platform Users: Is it faster to - 1. Drink a cup of hot coffee or 1. Build an end-to-end Pipeline? VS [100%] - Developing Platform Tools [50%] - Developing Platform Tools [20%] - Writing ETL Jobs [30%] - Managing Prod Pipelines [100%] - Drinking a cup of coffee Coffee Challenge We Surveyed: v1.0 v2.0 [30%] - Drinking Coffee! [70%] - Building end-to- end Pipeline
  • 5.
    #UnifiedAnalytics #SparkAISummit 5 AGENDA Whowe are? Data Platform as-a-service Architecture Our Journey from 1.0 -> 2.0 Lessons Learned? 3 S’s Self Service Scalability Security Demo
  • 6.
  • 7.
    #UnifiedAnalytics #SparkAISummit 7 DATAPLATFORM AS-A-SERVICE CLOUD GAMING NVIDIA GPU CLOUD Data Platform supports several NVidia products as tenants AISMART CITIES SELF DRIVING SIM
  • 8.
  • 9.
  • 10.
    10 10K Foot Overview- UpStream & Downstream[v1] 10K Foot Overview - UpStream & DownStream Data Sources Gateway Data Processing
  • 11.
    #UnifiedAnalytics #SparkAISummit 11 ARCHITECTUREv1.0 Big Data Platform-as-a-Service (dPaaS) & AI Inferencing-as-a-Service Top 5 Details Scalability • Trillions of events processed • Kratos platform handles: ~15B+ events /day New Tenants • 1000% increase in data workloads YoY End-End Latency SLA • Telemetry/Structured data – 30 secs • Unstructured logs – 5 mins High Availability • Platform hosted on AWS • Distributed on multiple AZs for HA Self Service • Kibana, DP-Explorer, BI
  • 12.
  • 13.
    #UnifiedAnalytics #SparkAISummit 13 V1LEARNINGS Automate data shippers Automate building data pipelines / ETLs Create Zero- Engineering Dashboards Platform & Applications were tightly coupled SELF SERVICE Meet growing data demands at 1000% YoY Data growth from 1B -> 25B/day and projected to grow to 50B+/day Log Parsing scaling issues SCALE Automate “Who can access What kind of data” Custom Token Management Natively integrate tokens with LDAP for compliance reasons Security Reduce Data ingestion costs Tenant level billing BILLING $
  • 14.
    14 10K Foot Overview- 2.010K Foot Overview - v2.0
  • 15.
    #UnifiedAnalytics #SparkAISummit 15 ARCHITECTUREv2.0 Top N Details Scalability • Trillion events per month • Platform handles: ~25B+ events /day (avg) Platform • Multi-Tenancy • Auto-Scaling clusters • Spot pricing • Decoupled Infra & Application Self Service • NV Data Bots • NV Data Apps End-End Latency • 30 seconds High Availability • Platform hosted on AWS Enhanced Security • Native integration with Nvidia LDAP using tokens • Transactional deletes with Spark Delta
  • 16.
    #UnifiedAnalytics #SparkAISummit 16 DataPlatform v2.0 ➢ Self Service ○ End-End Pipeline = NV Data Bots + NV Data Apps ○ Zero-Engineering Dashboards ○ Big Data & AI Inferencing pipelines ➢ Scalability ○ Lessons learned ○ Migrations ➢ Security ○ Lessons learned Source: Three Pillars, Stock Photo Modified. Retrieved from https://www.123rf.com/clipart-vector/pillars_pillar.html
  • 17.
    17 1. SELF SERVICE NVData Bots & NV Data Apps
  • 18.
    #UnifiedAnalytics #SparkAISummit 18 Config Service UploadConfig and Deploy. Easy to use Configs Single Click Deployment Integration with CodeDeploy to Deploy with a Single Click Auto-Schema Management Schema managed behind- the-scenes with Kafka Schema Registry Enhanced Security Advanced security features to control who can send data Bot Monitoring Out-of-Box Monitoring for Bots ‒ Track Lag and Throughput ➢ NV Data Bots are light weight data shipper agents to push data automatically to data platform (Up Stream) NV Data Bots Image #4 Image #5 Image #6 Image #7 Image #8
  • 19.
    #UnifiedAnalytics #SparkAISummit 19 NVDATA BOTS ➢ NV Data Bot = Elastic Beats data agent + Yaml config NV Data BOT Beats Config Data Platform ➢ Sample Yaml config Deployment Model
  • 20.
    #UnifiedAnalytics #SparkAISummit 20 MetricBotFileBot HeartBot PacketBot NV Data Market Profiles Build a Profile & Deploy with a Single Click! Config Service Filebot: config_a Filebot: config_b MetricBot: config_c HeartBot: config_x PacketBot: config_y Service Profile MetricBot: config_c FileBot: config_a Single Click Deploy CodeDeploy Step 2. Build a Profile Step 1. Create a Config Step 3. Deploy Profile
  • 21.
    #UnifiedAnalytics #SparkAISummit 21 Config Service UploadConfig and Deploy. No need to know any Spark. Single Click Deployment Easily Deploy your Apps to Auto-scaling Spark Clusters with a single click. Auto-Schema Management Schema managed behind- the-scenes with Kafka Schema Registry. Zero-Engineering Dashboards Users don’t need to build dashboards from scratch. Increases Productivity. App Analytics Out-of-Box Monitoring for Apps - Track Lag and Throughput. ➢ NV Data Apps automates streaming & batch processing pipelines (Downstream) ➢ NV Data Apps are available through NV Data Market and can deploy to Spark clusters using Jenkins NV Data Apps Image #4 Image #5 Image #6 Image #9 Image #8
  • 22.
    22 z Basic App Pass ThroughApps, or Out-of-Box Apps: Apache, Haproxy, etc. Intermediate App Custom Apps, User Provides SQL Query or Parsing Logic, enables Stream-to-Stream Joins Advanced App Fully Supported PySpark Apps app: apache alias: basic_demo output: elasticsearch app: custom alias: intermediate_demo output: datawarehouse table: demo.f_demo sql: select … join by ... NV Data Apps There are 3 Categories of Apps Sample Config.yml : Sample Config.yml :
  • 23.
    #UnifiedAnalytics #SparkAISummit 23 DefaultNV Spark Data Apps BASIC (Pass-Thru) APPS Structured Telemetry NV Spark Data App AUTO SCHEMA MANAGEMENT NV Data Apps are mainly powered by Spark Spark Streaming requires data schema Used Standard Elastic Beats templates Load/Read elastic index template & create json Read json from spark & infer schema Automated this process Challenges Default beats schema has nested objects and it’s a challenge to infer schema directly https://www.centosblog.com/configure-apache-https-reverse-proxy-centos-linux/ https://dasunhegoda.com/apache-load-balacing-haproxy/659/ https://mesosphere.com/product/ NV Data Apps - Learnings
  • 24.
    #UnifiedAnalytics #SparkAISummit 24 NVDATA APPS for AI Inferencing Advanced APPs AI Inferencing NV Data Apps ➢ 2 Step process ○ Pre-Processing ○ Inferencing ➢ Pre-Processing (Blue color) ○ NV Data Spark App for Pre-Aggregation ➢ Inferencing (Orange color) ○ All Deep learning models are deployed in containers ○ NV Inferencing App reads from kafka, performs inferencing and produce output to kafka
  • 25.
    25#UnifiedAnalytics #SparkAISummit Figure 1:Meme of girl running to get coffee. Retrieved from: www.thememegenerator.net Demo Steps: 1. Deploy Bots with Install Only Option 2. Deploy Bots to a DataCenter with Single Click (CodeDeploy) 3. Deploy Basic Apps and Show Dashboards 4. Run Ad-hoc SQL Queries 5. Deploy Intermediate App
  • 26.
    26#UnifiedAnalytics #SparkAISummit Demo Pipeline:FileBotConfig: MetricBot Config: Apache App Config: System App Config: apache_app.yml metric_config.yml file_config.yml demo_profile.yml system_app.yml Profile Config: logs_a metrics_a metrics_a logs_a
  • 27.
    27#UnifiedAnalytics #SparkAISummit NV DataBots Monitoring Out-of-Box Monitoring available for all Bots
  • 28.
    28 NV Data AppAnalytics Out-of-Box Monitoring available for all Apps
  • 29.
  • 30.
  • 31.
    31 Scalability BIG 3 Migrationsfor scaling to next 10x data growth SCALING PROBLEMS: SOLUTIONS: S3/SQS Source Connectors Hitting Scaling Limits S3/SQS Spark Streaming AutoScaling ElasticSearch Sink Connectors Hitting Scaling Limits Elasticsearch Spark Streaming - Autoscaling + Custom Connector with XPack Security S3 Parquet Sink Connectors Hitting Scaling Limits Spark Streaming to Real-time Databricks Delta Tables Autoscaling SCALING PROBLEMS: SOLUTIONS: Not Self-serve, Hard Json syntax to write Log Parsing Powerful Regex Log Parsing with Spark SQL Hitting Scaling Limits on Log Parsing. 2k with regex parsing/node 10k with pass-thru Hard to auto-scale Autoscaling Spark Clusters can easily handle 10x the Volume. 200k/sec/node Logstash Connectors
  • 32.
    32 Scalability Highlight: Presto toDelta Migration Presto Pain Points: - Ad-hoc Queries are very slow- Many Small Parquet Files Problem - Manual Compaction Process (Heavy Maintenance) - End User Queries get Affected during Compaction Job - Easy to Corrupt Data - Complex Partition & View Management Parquet Sink Connector Live Table Historic Data Hourly Batch Compaction Job Query Delta View Delta to the Rescue: - Auto-Optimize (Zero-Maintenance) - Ad-hoc Queries are very fast - Compacted Tables and SSD Caching on i3 Nodes - End User Queries don’t get Affected - Out-of-Box Schema Management - Can’t corrupt data - Auto Partition Management Auto-optimized Live Table Fast Ad-hoc Queries on Live + Historic Data
  • 33.
  • 34.
    Security Common Questions 34#UnifiedAnalytics #SparkAISummit WhichUser is running bad Queries on the Data?!! Lets find him Can we Block him?! You want access to the data? Ok you have to wait till NEXT SPRINT! Who is accessing what data ? Can we Integrate SQL tokens with LDAP?? Sql Tools Auth ?
  • 35.
    35#UnifiedAnalytics #SparkAISummit Each Tenanthas 3 Groups: 1. Group Admin ○ Full Read/Write Access ○ Ability to Manage Users 2. General ○ Read Access to Data 3. GDPR ○ Read Access to GDPR Data ○ Names, Emails, etc. Enhanced Security Token Management Framework
  • 36.
    #UnifiedAnalytics #SparkAISummit 36 Summary- Custom Frameworks - Config Service - Bot Manager - Single Click Deployment w/ CodeDeploy - Bot Monitoring NV Data Bots - Config Service - Schema Management - Zero-Engineering Dashboards - App Analytics - Custom Elasticsearch Connector w/ XPack Integration NV Data Apps - Token Management Framework - LDAP Group Integration - SQL Audit Dashboard Security
  • 37.
    37 TEAM Jarod Maupin Niranjan Natarja EdClune Rohit Kulkarni Ian Jones Amit Hora Stefan Le Chinmay Chandak Narendra Sanikommu Sathish Gandham Sathish Matti Yuxing Wei Satish Dandu Special Thanks to NVidia Creative team
  • 38.
    #UnifiedAnalytics #SparkAISummit 38 References 1)https://github.com/konpa/devicon/issues/140 2) https://depositphotos.com/167487860/stock-illustration-data-warehouse-icon-logo-design.html 1) Config Service Logo: https://www.shutterstock.com/image-vector/abstract-tuning-tools-configuration-symbols-lines-1068990533 2) Rocket Logo: https://www.imgrumweb.com/post/BwAiuM0FQ46 3) Schema Logo: https://www.shutterstock.com/image-vector/wire-frame-globe-sphere-connected-lines-326479382 4) Security Logo:https://killarneyeconomicconference.com/cyber-security-transatlantic-policy-forum/ 5) Monitoring Logo: https://medium.com/security-token-offering/insight-into-asset-tokenization-a-cryptocurrency-trend-2e9a2a9506a8 6) Data Apps Dashboards Logo: https://www.shutterstock.com/image-vector/intelligent-technology-vector-interface-presentation- network-1194209902 7) Beats Logo : https://www.elastic.co/products/beats 8) Dwight Screaming: https://airfreshener.club/quotes/yelling-office-dwight.html 9) Security Dog: http://tapety-na-plochu.webz.cz/index.php?str=1
  • 39.
  • 40.
    DON’T FORGET TORATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT