SlideShare a Scribd company logo
Software-Defined Servers
A Game Changer for Data Scientists
Gary Smerdon
CEO
TidalScale, Inc.
NVMe Flash 150 μs $150,000
Flash Array 1 ms $1,500,000
Hard Drive 5 ms $7,500,000
TCP packet retransmit 2 s $2,000,000,000
The Problem
In-Memory Computing is Key
Operation
Processing
Latency
In Human
Terms
L1-L3 Cache 1-13 ns $1
DRAM 50 ns $50
Variety &
Volume: Data
growing at 62%
CAGR GB TB
PB
EB
ZB
DataVolumes
Time
Velocity:
Data value declines
with age
BusinessValue
Age of Data (seconds)
Current Approaches
- Lower prediction accuracy
Algorithm
- Lower prediction accuracy
Sub Sample
Shard 4
Shard 3
Shard 2
Shard 1
- Lower prediction accuracy
- Time & Money
Single Huge Server
Easy to discover
relationships
Hard to discover
relationships
Sharded
Department A
Department B
Department C
Department D
Department E
Department F
Department G
Product A
Product B
Product C
Product D
Seeing All the Data Uncovers Relationships
Recent Data is more Predictive
2005-2007 mortgage
data would have
predicted the 2008
mortgage crisis…but
analysts used data only
from 2004
Five-year Modeled Default Frequency Rate by Deal Vintage Year
Year
Five-yearCumulativeDefaultFrequencyRate
2002 2003 2004 2005 2006 2007
0%
5%
10%
15%
20%
25%
30%
35%
Actual Default Frequency Rate
Model with data through 2004
Model with data through 2007
Model Accuracy Needs RAM: Decision Trees
• Decision Trees model error
rates decline with data size
& tree depth
• Learning time decreases
with tree depth
• More data & greater
tree depth consumes more
RAM
Number of Observations:
Prediction Error Rate by Data Set Size & Decision Tree Depth
What If Servers were Software-Defined?
• In-memory performance at scale
• As many cores as needed
• Self optimizing
• Everything just works
• Uses standard hardware
Software-Defined Servers
Traditional Virtualization
VirtualPhysical
Multiple virtual machines share a
single physical server
Virtual
Machine
Virtual
Machine
Virtual
Machine
Application
Operating System
100%, bit-for-bit
unmodified
Application
Operating System
Application
Operating System
Single virtual machine spans multiple physical servers
TidalScale: Software-Defined Servers
Application
Operating System
…
HyperKernel HyperKernel HyperKernel
TidalScale
Virtual
Machine
100%, bit-for-bit
unmodified
HyperKernel
…
HyperKernel HyperKernel HyperKernel HyperKernel
Application
Operating System
TidalScale Software-Defined Server
Flexible – Scales Up or Down Quickly
Seamless Scalability
HyperKernel
…
HyperKernel HyperKernel HyperKernel HyperKernel
Uses patented machine learning to transparently align resources
Application
Operating System
TidalScale Software-Defined Server
Machine Learning Driven Self-Optimization
Applications
Operating Systems
Virtual Machine
If it runs, it runs on a TidalScale Software-Defined Server
TidalScale Software-Defined Server
HyperKernel
…
HyperKernel HyperKernel HyperKernel HyperKernel
100% Compatible
Containers
Use Case: Retail Analytics on TidalScale
Performance Comparisons (TPC-H “Powertest” in Minutes)
Workload Size in GB
MinutestoProcess
100
0
10
20
30
40
50
70
100
Amazon EC2
0 200 300 400 500 600 700 800 900 1,000
60
80
90
69.1
TEST FAILS
TidalScale Software-Defined Server
22.0
33.7
Benchmark: Open Source R on TidalScale
• Version: Revolution R Open 8.0.3
with pryr, dplyr, mgcv, rpart,
randomForest, FNN, Matrix, doparallel &
foreach
• Data: CMS Public Use Dataset
• In-memory footprints: 32GB-680GB
• Operations timed:
• Load
• Join
• GAM linear regression
• GLM linear regression
• Decision Tree
• Random Forest (fixed seed)
• K Nearest Neighbors
Open Source R Performance Comparisons
TotalExecutionTime(Minutes)
100
0
100,000
200,000
400,000
700,000
- 200 300 400 500 600
300,000
500,000
600,000
Bare Metal Server (128GB)
158 days
TidalScale Software-Defined Server (5 x 128GB nodes)
1,325 3,787
https://github.com/TidalScale/R_benchmark_test
Workload Size in GB
Tomorrow’s Servers Today: A Game-Changer
“Software-defined Servers make it easy to run
memory-intensive applications like data mining,
machine learning and simulation.”
Marc Jones, Director &
Distinguished Engineer, IBM
“This is the way all servers will be built in the future.”
Gordon Bell
Industry legend & 1st outside investor in TidalScale

More Related Content

What's hot

Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
Mykola Zerniuk
 
Sqream DB on OpenPOWER performance
Sqream DB on OpenPOWER performanceSqream DB on OpenPOWER performance
Sqream DB on OpenPOWER performance
Ganesan Narayanasamy
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
DataStax Academy
 
Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...
Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...
Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...
Spark Summit
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
In-Memory Computing Summit
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
Amazon Web Services
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
DataStax
 
The Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to DatabaseThe Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to Database
DataStax Academy
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud
Amazon Web Services
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
 
Webinar | Building Apps with the Cassandra Python Driver
Webinar | Building Apps with the Cassandra Python DriverWebinar | Building Apps with the Cassandra Python Driver
Webinar | Building Apps with the Cassandra Python Driver
DataStax Academy
 
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
Amazon Web Services
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
DataStax Academy
 
Data Antipatterns
Data AntipatternsData Antipatterns
Data Antipatterns
Ines Sombra
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
Knoldus Inc.
 
Deploying ETL to Cloud
Deploying ETL to CloudDeploying ETL to Cloud
Deploying ETL to Cloud
CloverDX
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
Bigstep
 
Webinar | Introducing DataStax Enterprise 4.6
Webinar | Introducing DataStax Enterprise 4.6Webinar | Introducing DataStax Enterprise 4.6
Webinar | Introducing DataStax Enterprise 4.6
DataStax
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
Amazon Web Services
 

What's hot (20)

Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Sqream DB on OpenPOWER performance
Sqream DB on OpenPOWER performanceSqream DB on OpenPOWER performance
Sqream DB on OpenPOWER performance
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...
Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...
Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
The Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to DatabaseThe Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to Database
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Webinar | Building Apps with the Cassandra Python Driver
Webinar | Building Apps with the Cassandra Python DriverWebinar | Building Apps with the Cassandra Python Driver
Webinar | Building Apps with the Cassandra Python Driver
 
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
Data Antipatterns
Data AntipatternsData Antipatterns
Data Antipatterns
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Deploying ETL to Cloud
Deploying ETL to CloudDeploying ETL to Cloud
Deploying ETL to Cloud
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Webinar | Introducing DataStax Enterprise 4.6
Webinar | Introducing DataStax Enterprise 4.6Webinar | Introducing DataStax Enterprise 4.6
Webinar | Introducing DataStax Enterprise 4.6
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 

Similar to ODSC West TidalScale Keynote Slides

Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
Amazon Web Services
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
Amazon Web Services
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
Amazon Web Services
 
Context is Critical: How Richer Data Yields Richer Results in AIOps | Bhanu S...
Context is Critical: How Richer Data Yields Richer Results in AIOps | Bhanu S...Context is Critical: How Richer Data Yields Richer Results in AIOps | Bhanu S...
Context is Critical: How Richer Data Yields Richer Results in AIOps | Bhanu S...
OpsRamp
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Amazon Web Services
 
EC2 Foundations - Laura Thomson
EC2 Foundations - Laura ThomsonEC2 Foundations - Laura Thomson
EC2 Foundations - Laura Thomson
Amazon Web Services
 
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit ManilaMai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319 Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319
Amazon Web Services
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingApplying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Andreas Grabner
 
Xenserver 5 Selling And Positioning
Xenserver 5 Selling And PositioningXenserver 5 Selling And Positioning
Xenserver 5 Selling And Positioning
Yves Peeters
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
Amazon Web Services
 
SRV319 Amazon EC2 Foundations
SRV319 Amazon EC2 FoundationsSRV319 Amazon EC2 Foundations
SRV319 Amazon EC2 Foundations
Amazon Web Services
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
Amazon Web Services
 
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
Redis Labs
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Amazon Web Services
 
Amazon EC2 Foundations
Amazon EC2 FoundationsAmazon EC2 Foundations
Amazon EC2 Foundations
Amazon Web Services
 

Similar to ODSC West TidalScale Keynote Slides (20)

Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Context is Critical: How Richer Data Yields Richer Results in AIOps | Bhanu S...
Context is Critical: How Richer Data Yields Richer Results in AIOps | Bhanu S...Context is Critical: How Richer Data Yields Richer Results in AIOps | Bhanu S...
Context is Critical: How Richer Data Yields Richer Results in AIOps | Bhanu S...
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
EC2 Foundations - Laura Thomson
EC2 Foundations - Laura ThomsonEC2 Foundations - Laura Thomson
EC2 Foundations - Laura Thomson
 
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit ManilaMai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319 Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingApplying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
 
Xenserver 5 Selling And Positioning
Xenserver 5 Selling And PositioningXenserver 5 Selling And Positioning
Xenserver 5 Selling And Positioning
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
SRV319 Amazon EC2 Foundations
SRV319 Amazon EC2 FoundationsSRV319 Amazon EC2 Foundations
SRV319 Amazon EC2 Foundations
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Amazon EC2 Foundations
Amazon EC2 FoundationsAmazon EC2 Foundations
Amazon EC2 Foundations
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 

ODSC West TidalScale Keynote Slides

  • 1. Software-Defined Servers A Game Changer for Data Scientists Gary Smerdon CEO TidalScale, Inc.
  • 2. NVMe Flash 150 μs $150,000 Flash Array 1 ms $1,500,000 Hard Drive 5 ms $7,500,000 TCP packet retransmit 2 s $2,000,000,000 The Problem In-Memory Computing is Key Operation Processing Latency In Human Terms L1-L3 Cache 1-13 ns $1 DRAM 50 ns $50 Variety & Volume: Data growing at 62% CAGR GB TB PB EB ZB DataVolumes Time Velocity: Data value declines with age BusinessValue Age of Data (seconds)
  • 3. Current Approaches - Lower prediction accuracy Algorithm - Lower prediction accuracy Sub Sample Shard 4 Shard 3 Shard 2 Shard 1 - Lower prediction accuracy - Time & Money
  • 4. Single Huge Server Easy to discover relationships Hard to discover relationships Sharded Department A Department B Department C Department D Department E Department F Department G Product A Product B Product C Product D Seeing All the Data Uncovers Relationships
  • 5. Recent Data is more Predictive 2005-2007 mortgage data would have predicted the 2008 mortgage crisis…but analysts used data only from 2004 Five-year Modeled Default Frequency Rate by Deal Vintage Year Year Five-yearCumulativeDefaultFrequencyRate 2002 2003 2004 2005 2006 2007 0% 5% 10% 15% 20% 25% 30% 35% Actual Default Frequency Rate Model with data through 2004 Model with data through 2007
  • 6. Model Accuracy Needs RAM: Decision Trees • Decision Trees model error rates decline with data size & tree depth • Learning time decreases with tree depth • More data & greater tree depth consumes more RAM Number of Observations: Prediction Error Rate by Data Set Size & Decision Tree Depth
  • 7. What If Servers were Software-Defined? • In-memory performance at scale • As many cores as needed • Self optimizing • Everything just works • Uses standard hardware Software-Defined Servers
  • 8. Traditional Virtualization VirtualPhysical Multiple virtual machines share a single physical server Virtual Machine Virtual Machine Virtual Machine Application Operating System 100%, bit-for-bit unmodified Application Operating System Application Operating System
  • 9. Single virtual machine spans multiple physical servers TidalScale: Software-Defined Servers Application Operating System … HyperKernel HyperKernel HyperKernel TidalScale Virtual Machine 100%, bit-for-bit unmodified
  • 10. HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel Application Operating System TidalScale Software-Defined Server Flexible – Scales Up or Down Quickly Seamless Scalability
  • 11. HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel Uses patented machine learning to transparently align resources Application Operating System TidalScale Software-Defined Server Machine Learning Driven Self-Optimization
  • 12. Applications Operating Systems Virtual Machine If it runs, it runs on a TidalScale Software-Defined Server TidalScale Software-Defined Server HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel 100% Compatible Containers
  • 13. Use Case: Retail Analytics on TidalScale Performance Comparisons (TPC-H “Powertest” in Minutes) Workload Size in GB MinutestoProcess 100 0 10 20 30 40 50 70 100 Amazon EC2 0 200 300 400 500 600 700 800 900 1,000 60 80 90 69.1 TEST FAILS TidalScale Software-Defined Server 22.0 33.7
  • 14. Benchmark: Open Source R on TidalScale • Version: Revolution R Open 8.0.3 with pryr, dplyr, mgcv, rpart, randomForest, FNN, Matrix, doparallel & foreach • Data: CMS Public Use Dataset • In-memory footprints: 32GB-680GB • Operations timed: • Load • Join • GAM linear regression • GLM linear regression • Decision Tree • Random Forest (fixed seed) • K Nearest Neighbors Open Source R Performance Comparisons TotalExecutionTime(Minutes) 100 0 100,000 200,000 400,000 700,000 - 200 300 400 500 600 300,000 500,000 600,000 Bare Metal Server (128GB) 158 days TidalScale Software-Defined Server (5 x 128GB nodes) 1,325 3,787 https://github.com/TidalScale/R_benchmark_test Workload Size in GB
  • 15. Tomorrow’s Servers Today: A Game-Changer “Software-defined Servers make it easy to run memory-intensive applications like data mining, machine learning and simulation.” Marc Jones, Director & Distinguished Engineer, IBM
  • 16. “This is the way all servers will be built in the future.” Gordon Bell Industry legend & 1st outside investor in TidalScale

Editor's Notes

  1. Slide 1: Software-Defined Servers: A Game Changer for Data Scientists | Hi I'm Gary Smerdon, CEO of TidalScale. It's my pleasure to be here today. "Game Changer" is a powerful word conjuring the likes of the iPhone and the Internet. But Software-Defined Servers are a game changer for Data Scientists and I'm here today to share why.
  2. Everyone here is intimately familiar with the problem. Data *Volume* and *Variety* is growing explosively at 62% a year. Simultaneously, the *Velocity* of data change is increasing rapidly such that the business value of that data decreases exponentially by its age in seconds. Getting a handle on this requires that computing be done *in-memory* because the alternative, paging from storage, is so incredibly slow. In dollar terms its the difference between buying your weekly groceries for $50 versus for $150,000. Such outsize costs would have a big negative impact on your annual food budget! We call this latency difference the *Memory Cliff* and it has a similar negative impact on application performance.
  3. You are all familiar with the solutions available for managing large data problems today. 1. Take a sample of the data, 2. Use a clever algorithm that makes some compressed representation of the data space, or 3. Shard the data set to fit in the memory of multiple small servers. All of these approaches have the disadvantages of lowering prediction accuracy, requiring time and effort and generally just getting in the way of doing data science. It's like having to do the plumbing when you just want to cook dinner.
  4. There are many sides to the data analysis picture. One thing we know is that having complete data is often critical to data analysis. A famous example of this is the Challenger disaster where the management chain filtered out the complete data picture from upper management. The filtered view seems to indicate that the odds of a problem occurring were low. The complete data set view shows absolutely otherwise.
  5. More data allows you to see relationships not otherwise visible. A customer in the retail space struggles today with seeing relationships between products in different store departments because the data has been silo-ed across multiple physical machines. If they could have just a single large system it would be trivial to explore and discover these relationships.
  6. Timely data absolutely influences prediction accuracy. A famous example is the underestimation of mortgage default rates leading up to the 2008 financial crisis. The model was fine but the mortgage industry had been so stagnant for so long that the 2005-2007 predictions were made off 2004 data and thus missed the increasing failure rates. As you can see here, the same model updated with the real 2005-2007 data accurately predicts the actual failure rates seen.
  7. More data with finer granularity absolutely improves prediction accuracy. The more detail you can assign to a particular outcome, the more reliably you can predict. Larger data sets take more RAM, of course, but did you know that finer granularity features also require RAM? For a given data set, adding tree levels to a Random Forest algorithm explodes its RAM requirements.
  8. What if the servers you used for data science were themselves software-defined? You could simply do your data science cookery without having to worry about the plumbing! To continue the thought experiment, imagine these servers could be assembled from ordinary servers, tuned themselves automagically, delivered in-memory performance at any scale and didn't require any change to software applications or operating systems. If such a software-defined server existed, it would deliver a missing piece in today's data centers where virtually _every_ other component _is_ software-defined. Today I am happy to share with you that such a technology _does_ exist: *TidalScale*. But to explain it I have to back up a bit and explain Traditional Virtualization.
  9. Traditional virtualization effectively divvies-up a server into multiple virtual servers. The key fact is that the virtual servers have _no idea_ they aren't actually running on hardware. The applications and operating system is unmodified.
  10. TidalScale uses virtualization but does it _across_ multiple physical servers, effectively allowing you to treat physical servers like lego building blocks.
  11. Want a bigger software-defined server? Add more hardware systems underneath.
  12. Under the covers, TidalScale uses machine learning to automatically co-locate compute entity with the resources it needs. And just like ordinary virtualization it does all of this transparently - the OS and applications have no idea they aren't running on hardware....
  13. TidalScale can run virtually everything unchanged, and we deliver in-memory performance, for example...
  14. That retail analytic customer that I mentioned earlier: We were able to demonstrate scalable in-memory performance running TPC-C Power test on their analytics database. Compare that to AWS EC2 where the same workload dies when it hits the memory cliff.
  15. To document our scalability we wrote and published a benchmark on Open Source R. It measures the time to execute load, join and then the 5 most commonly used data science algorithms on CMS insurance claim data. We ran this benchmakr on a bare metal 128GB server and then on a TidalScale TidalPod composed of five 128GB servers. As you can see on this chart, the Software-defined Server's performance scales linearly.
  16. So how would it change the way you tackle data science problems to have the ability to deploy the hardware systems of tomorrow today? An example of the kind of systems we can create is the 15TB 400 core TidalPod we deployed at SoftLayer this past June using 20 standard servers. Marc Jones captured it best: “Software-defined Servers make it easy to run memory-intensive applications like data mining, machine learning and simulation.”
  17. TidalScale Software-Defined Servers are a game changer! They let you explore your data more easily, improve your results and allow you spend more time doing data science instead of data plumbing all while lowering your operational TCO and increasing IT flexibility. This is why one of our early investor's said it best: "This is the way all servers will be built in the future."