Watson christofer j_180208

AI & Storage
Watson Summit
2018-02-08
Christofer Jensen
Storage technical specialist

Agenda
Overview och AI and how it affect storage
Break
Data storage technical details

What is AI?
• Machine learning
• Deep learning
• Artificial intelligence ”Set framework”
• Four legs
• Narrow eyes
• Sharp teeth
• Tail
• Etc…
”Tell the system”
”Take action”

Three ways how IT uses data … today
Procedural (if…then)
Statistical (big data)
Artificial Intelligence
”One truth” ”Qualified guess” ”Learning Systems”

… and in 10 years
Procedural
(if…then)
Statistical
(big data)
AI

Current examples
shopping, profiling,
fraud detection …
autonomous driving,
image classification,
chatbots, gaming…
Structured processing
plausible, credible data
Accumulation of data
not 100% precise is ok
(e.g. Recommendations)
Training data
true and false examples
+ independent test data
business as usual
classic / legacy IT

Why this will happen
Procedural Statistical AI
Amount of
data used
Manual modeling
Accumulation
of examples
Automatic modeling
Legacy systems
Structured models
Data generation
”Just store the data”
New gen programmers
Automatic consumption
”Set the system free”

Procedural:
Archive for
auditing
Statistical:
Store all data for
parallel processing
Machine Learning:
Train sample data, then
offer for data trade
How is data stored?
if…then…else
GB/s
1
2
Structured Unstructured Unstructured + structured

What is important for
Image: Business over Broadway
GB/s
• Collected data is analyzed in parallel
• Number of analyzes / second is important
• Data must be close to the CPU
• Transaction latency is irrelevant
• Data consistency is irrelevant

What is important for
• Sample data is trained and then archived
• Short training = many training cycles, high quality
• The better the data, the better result
• High throughput at 1 point in the life cycle [1]
• As low as possible maintenance cost after [2]
1
2

Storage requirements summary
Primary:
• High throughput for analysis and training
• Scalable due to high data growth
• Low cost long term storage
Secondary:
• Automated archiving
• Data rescillency
• Availabilty
How does IBM solve this???

AI is more
than the sum
of all IT parts…

Automotive Industry generates large amounts of data
 Sensors
 Video
 CAN
 FlexRay
 Radar
 LiDAR
 Etc etc
Data must be synchronously captured, stored, modified and executed

Dev / test is challenging
Test Drives
50TB / day / car
R&D Lab: tagging
R&D Labs: developing
& testing
> 5PB / car model (project)
> 200h / 1h driving

Especially with a global organisation

Major IT Challenges
4. How to analyze the data – esp.
sensor and video data analytics
2. How to distribute data globally
within an enterprise
1. How to implement & operate an
efficient storage, workflow and
management system
„The Foundation“
3. How to preserve digital data
for decades
6. How to embed analytics/data
management into R&D
Environment
5. How to do efficient IT workload
and resource scheduling?

Summary – Solution Elements ADAS/AD
AREMA AgentsAREMA EngineAREMA Interfaces
<
SOAP REST OSLC
Elektrobit ADTF and other
ADTF and testing tools
AREMA Clients
Spectrum Scale client OS
IBM Video Analytics
IBM Reserach
HiL Station(s)
IBM Spectrum
Protec
Job Management, Media Portal
Automatic Video
Tagging/Labelling
ArchiveStorage & DistributionTest Execution
Test- & Lab Management
+ linkages to Development
Manage & Control Video & Testing Workflow
IBM Spectrum
Archive
LTFS Tape
Library
<
other
MiL / SiL
HPC environments IBM Spectrum
Scale
IBM Cloud Object
Storage
The
foundation
Orchestration
Intelligence

Moving on to Storage details….
”The Foundation”

Recap
Primary:
• High throughput for analysis and training
• Scalable due to high data growth
• Low cost long term storage
Secondary:
• Automated archiving
• Data rescillency
• Availabilty
GB/s
Flexible
Commodity components
Built in intelligence
Data integrity check
Multi sites

First thing to consider, storage virtualisation
A B C D
SAN / LAN
Virtualisation
Virtualisation
• Availability
• Reliability
• Performance
• Ease of use
• Automation
• Consolidation
• Hardware agnostic
• Utilisation
• ”Built in AI”
Client
Users and
applicationsCompute
Big Data
Analytics

IBM Spectrum Storage Family
FlashSystem
Any Storage
Private, Public or Hybrid Cloud
Spectrum LSF
Spectrum Symphony
Spectrum Conductor
Analytics-driven data management to reduce costs
by up to 50 percent
Optimized data protection to reduce backup costs
by up to 53 percent
Fast data retention that reduces TCO for active
archive data by up to 90%
Virtualization of mixed environments stores up to
5x more data
Enterprise storage for cloud deployed in minutes
instead of months
High-performance, highly scalable storage for
unstructured data
Web-scale secure Object Storage
Data Where And When You Need It
Copy Data Management For Modern IT
Platform computing

Spectrum Scale topology
Global namespace
IBM Spectrum Scale
Automated encrypted data placement and data migration
SMB/CIFSNFSPOSIX HDFS Controller
Disk Tape Storage Rich
Servers
Flash
On/Off Premise
OpenStack
Cinder Swift
Glance Manila
Transparent
Cloud Tiering
Site B
Site A
Site C
Cloud Data
Sharing Users and
applications
iSCSI
GB/s

Software Only Solution Bundles Off-premises
Software license
Can be deployed on standard hardware
Pre-packaged with IBM Spectrum Scale Software,
Spectrum Scale RAID, I/O servers, drives, support &
subscription
Deploy Spectrum Scale in
IBM Softlayer (Whitepaper)
High Performance Computing offerings with
Spectrum Scale
Spectrum Scale Deployment Options
+

Spectrum Scale Architecture
Spectrum Scale
Nodes
Storage Storage
Storage Network
Ethernet or InfiniBand Network
Global namespace
© Copyright IBM Corporation 2017 26
• Scale Performance
• Scale Availability
• Scale Capacity

Spectrum Scale file system data is stored in pools
Pool is a collection of devices with similar characteristics
Spectrum Scale allows to transparently migrate data from one pool to another
Spectrum Scale ILM provides cost efficient storage over data lifetimes
27
Spectrum Scale Storage Tiering
Different Types
of Storage
Storage Network
Ethernet or InfiniBand Network
Global namespace
© Copyright IBM Corporation 2017
Spectrum Scale
Nodes

© Copyright IBM Corporation 2017
IBM Elastic Storage Server™
“Twin Tailed” JBOD
Disk Enclosures
HDD or SSD Drives
IBM Power8
Linux Server
Spectrum Scale
Model GL4S
4 Enclosures
334 NL-SAS
Model GL6S
6 Enclosures
502 NL-SAS
Model GL2S
2 Enclosures
166 NL-SAS
Capacity
36 GB/s
12 GB/s
24 GB/s
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
Model GS1S
1 Enclosure
24 SSD
EXP3524
8
9
16
17
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
Model GS2S
2 Enclosures
48 SSD
EXP3524
8
9
16
17
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
EXP3524
8
9
16
17
EXP3524
8
9
16
17
Model GS4S
4 Enclosures
96 SSD
Speed
40 GB/s
14 GB/s
26 GB/s

File system File system File systemFile system
29
Spectrum Scale use case
Model GL2S
2 Enclosures
166 NL-SAS
12 GB/s
Storage
Storage Network
Spectrum
Scale
Node
Spectrum
Scale
Node
Spectrum
Scale
Node
Spectrum
Scale
Node
EXP3524
8
9
16
17
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
Model GS2S
2 Enclosures
48 SSD
26 GB/s
Spectrum
Scale
Node
Spectrum
Scale
Node
Client
Users and
applicationsCompute
Big Data
Analytics
Storage
Compute
Storage Network

Sweet Spots
• Heterogeneous | UNIX | Linux-centric
• Software Defined Infrastructure
…with full control and flexibility
• Scale-Out | Stretched Cluster | Replication | Tiering |
Virtualization | Archiving | Compression | Encryption …
• Data Intense Workflows: No more Data Tourism
• Analytics
• Machine Learning
Self-<Verb>-<Noun>
• Video processing
• Sync & Share
• …
Reconsider if…
• “Zero Touch” Infrastructure
Turn-key appliance
• General-purpose NAS
Non scale-out
Spectrum Scale Positioning

© Copyright IBM Corporation 2017 30
?

FILE STORAGE OBJECT STORAGE
• Stores hundreds of millions of files
• File system hierarchy
• Can be complex to scale
• Best for file based workflows
• I/O Performance
• Low Latency access
• Structured to be understood by humans
• File system maintains metadata
• Stores hundreds of billions of objects
• One storage pool, Object IDs
• Scales uniformly
• Low TCO
• High Latency access
• Structured to be understood by applications
• Application maintains metadata
32
What is object storage?
S3
Data Object ID
Put
Get
1
2

35© Copyright IBM Corporation 2017
0.66 TB
Copenhagen
0.66 TB
Stockholm
0.66 TB
Oslo
2.0 TB
of raw storage
Three complete copies of
the object—plus overhead
—are distributed and
maintained in separate
locations in case of failure or
disaster. Resulting in 3.6 TB
of total storage consumed.
With traditional storage, a
single 1 TB object will be
replicated three times.
Traditional Storage
1 TB
of usable data
IBM Cloud Object Storage
With IBM Cloud Object
storage there’s no need
to store replicated data
in different systems.
A single TB of object
storage is encrypted and
sliced but never replicated.
Slices are distributed
geographically for durability
and availability.
You can lose some number
of slices due to failure or
disaster, and still quickly
recover 100% of your data.
IBM Cloud Object Storage
requires less storage and has
up to 70% lower TCO.
1.2 TB
Copenhagen
1.2 TB
Stockholm
1.2 TB
Oslo
3.6 TB
of raw storage
What does that mean to IT?
1 TB
of usable data
Built in cost effectiveness

Data transfer in a global organisation

High-Speed File Transfer with IBM Aspera

Watson christofer j_180208

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Watson christofer j_180208

Similar to Watson christofer j_180208 (20)

More from IBM Sverige

More from IBM Sverige (20)

Recently uploaded

Recently uploaded (20)

Watson christofer j_180208