SlideShare a Scribd company logo
1 of 42
1© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
ISILON ROADMAP 2015
STEFAN RADTKE
CTO, EMEA
EMERGING TECHNOLOGY DIVISION
IN-PLACE ANALYTICS WITH
UNIFIED DATA ACCESS
DR. STEFAN RADTKE
2© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
ANALYTICS TODAY IS ABOUT BIG DATA
Big Data needs Big storage
41%
38%
35%
32%
28%
0% 10% 20% 30% 40% 50%
Application…
Big Data/Business…
Security/Risk…
Business Process…
Cloud Computing
3© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
BIG DATA IS SPREAD OUT
Unstructured Data
Dark Data
Structured Data
Traditional
Social Networks, UGCPublic records
Location
Data
Internet Of Things
Emerging
Social Networks,
UGCPublic
records
Location
Data
Internet Of
Things
Emerging
Unstructured Data
Dark
Data
Structured Data
Traditional
Big Data is Growing FastBig Data is Siloed
Silos, Complexity, Data explosion, security, etc..
Hadoop Can add to the Data Problems
Store, manage, protect and analyzetraditional and emergingdata
Introducing the Data Lake Foundations
Data Lake
Infrastructurechoices: virtual,physical, converged
Data Lake Foundations
Virtual
Physical
Converged
Storage
Compute
APPLICATIONS
4© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
EMC DATA LAKE FOUNDATION
EMC Isilon
Type ECS-Appliance: ScaleOut Object Isilon: ScaleOut File + Object
Protocols Swift, S3, MS Azure, Block, HDFS Swift, NFS, SMB, HDFS
Use Case Scale out Block and Object access;
Data in place Analytics with Hadoop
Access data through NAS and Object;
Enterprise Storage Features; Data in place
Analytics with Hadoop
Capacity Multi-Datacenter, Exabyte Scale Single Instance, 50 PB
EMC Cloud Storage (ECS)
5© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
Complex Architecture
• Many special-purpose
systems
• Moving data around
• No complete views
Cost of Analytics
• Existing systems strained
• No agility
• BI backlog
Time to Data
• Up-front modeling
• Transforms slow
• Transforms lose data
Visibility
• Leaving data behind
• Risk and compliance
• High cost of storage
THE OLD WAY: BRINGING DATA TO COMPUTE
ERP, CRM,
RDBMS, Machines Files, Images, Video, Logs, ClickstreamsExternal Data Sources
Data
Archives
EDWs
Marts SearchServersDocument Stores Storage
6© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
Data Center Network
TIME-TO-RESULTS
Data Copy Analysis In-Place Analysis
Existing Primary Storage
Hadoop on a Stick
Have you ever
copied 100TB from
Primary Storage to
a Hadoop system?
How long does it
take to copy
100TB from one
place to another
over a 10Gb link?
>24 Hours
Data Center Network
Existing Primary Storage
Hadoop Compute Nodes
Reading
relevant
data to
analysis
7© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
THE NEW WAY: BRINGING COMPUTE TO DATA
Multi-workload analytic platform
• Bring applications to data
• Combine different workloads on
common data (i.e. SQL + Search)
• True BI agility
4
EDWs
Marts Storage
Search
Servers
Documents
Archives
ERP, CRM,
RDBMS, Machines
Files, Images, Video, Logs,
Clickstreams
External Data Sources
Active archive
• Full fidelity original data
• Indefinite time, any source
• Lowest cost storage
1
Data management,
transformations
• One source of data for all analytics
• Persisted state of transformed data
• Significantly faster & cheaper
2
Self-service exploratory BI
• Simple search + BI tools
• “Schema on read” agility
• Reduce BI user backlog requests
3
1
1
4 3
2
8© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
EMC ISILON SCALE OUT ARCHITECTURE
Web
Apps
Cloud
HadoopArchive
Linux/Unix
Mac/iOS
Windows
9© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
Simple is Smart
• Isilon has integrated ‚RAID‘, Volume
Manager and Filesystem layers into OneFS!
• The resulting HW-Architecture based on
Intel processors is therefore MUCH simpler
• Uses internal Storage
• Data protection with FEC accross nodes
ISILON HW-TOPOLOGY
ClientsC
LAN
C
Clients
Clients
Clients
Isilon Node
GB/10GB
Ethernet
Isilon
SAS
Isilon Node
SAS
Isilon Node
SAS
Infiniband
…
Clients
10© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
Isilon Single Volume Simplicity
EMPTYEMPTYEMPTYEMPTYEMPTYFULLFULLFULLFULLBALANCEDBALANCEDBALANCEDBALANCEDBALANCED
60 sec
Upgrade
SCALE-OUT ARCHITECTURE
• Directories and files striped across
nodes in a single volume
• Adding capacity dynamically by
adding node(s)
• It’s a 60 seconds task until new
capacity is available
• Re-balancing is happening in
background
• Data protection with FEC across
nodes
• Leverages Intel technology
11© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
Data Layout
File Striping
 Data is striped across the nodes
– Not across disks
– FEC not RAID
 Data breakdown:
– FS block size 8 KB
– 16 blocks per stripe unit
– 128 KB stripe width per drive
Clients
File
Write
Data
Stripe
Unit
Data
Stripe
Unit
File
Data
stripe
unit
Data
stripe
unit
Parity
stripe
unit
12© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
• Multi-threaded daemon runs on all nodes
– Services both NN and DN protocols
– Translates HDFS RPCs to POSIX system calls
– Stateless, underlying FS handles coherency
ISI_HDFS_D
OneFS Node
isi_hdfs_d
Thread
Request VFS
OneFS
Syscall
Response
13© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
HDFS IMPLEMENTED LIKE A NAS PROTOCOL
OneFS runs a daemon that
speaks NameNode and
DataNode natively
OneFS Clustered FileSystem
OneFS Node
NameNode
DataNode
OneFS Node
NameNode
DataNode
OneFS Node
NameNode
DataNode
OneFS Node
NameNode
DataNode
Hadoop
Node
DFSClient
1) Request(“/file”)
2) Response
(block locations) 3) GetBlock(block)
14© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
HDFS: STANDARD HADOOP CLUSTER
HDFS
file
file
copy2
file
copy3
node
info
file
node
info
file
copy2
file
copy3
file
node
info
file
copy2
file
copy3
file
node
info
file
copy2
file
copy3
Node
reply
Node
reply
Node
reply
Node
reply
node
reply
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
node
info
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
Data
Compute
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
Compute
Data
Name node
3X
NFS
Name node
Decision Support
Databases
Web Click
data
OLAP
EDW
HTTP
CIFS
FTP
NFS
Landing Zone
Servers
Step 1:
Data is copied into the
Landing Zone
Step 2:
Data is copied into the
Cluster (3 times)
Step 3:
Hadoop Jobs are run
15© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
node
info
node
info
node
info
node
info
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
Decision
Support
Web
Click
data
OLAP
EDW
Step 1:
Jobs are run
Compute Cluster
HDFS: SHARED DATA LAKE
Hadoop
16© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
AND WHAT ABOUT PERFORMANCE ?
Compute
Nodes
Isilon
Nodes
17© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
TRADITIONAL COMPUTE WITH LOCAL DAS
Rack Ethernet Switch
Compute
Shuffle+HDFS
SATA
1+ Gbps
Core Ethernet Switch
Compute
1 Gbps
…
Shuffle+HDFS
Compute…
Shuffle+HDFS
Rack Ethernet Switch
Compute
Shuffle+HDFS
SATA
1+ Gbps
Compute
1 Gbps
Shuffle+HDFS
Compute…
Shuffle+HDFS
The ratio of compute and disk
space/performance is fixed.
Non-local HDFS I/O (20-90% of
HDFS I/O) will go through
Ethernet.
Local disk usage is shared
between shuffle I/O (60% of all
I/O during terasort) and HDFS
I/O.
18© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
DAS I/O PIPELINE
SAS HBA
Each drive averages 30-50 MB/sec (read or
write) with a typical Hadoop workload
CPU/Bus/Memory
Dual 10 Gbps Ethernet NIC
48-port 10 Gbps Switch
8 SAS ports × 6 Gbps6000
600
PCI 2.0 x8 bus4000
PCI 2.0 x8 bus4000
10 Gbps × 2 directions × 2 ports5000
Total
MB/sec
CPU/memory bandwidth12,000
Non-blocking 48-port Switch120,000
Hadoop DAS Server
Where is the bottleneck?
19© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
IMPROVING EFFICIENCY WITH SHARED STORAGE
Isilon InfiniBand Switch
Rack Ethernet Switch
Compute
Shuffle
SATA
1+ Gbps
10 Gbps
Core Ethernet Switch
Compute
Shuffle
10 Gbps
… …
IB
Rack Ethernet Switch
Compute
Shuffle
SATA
10 Gbps
Compute
Shuffle
10 Gbps
…
…
IB
…
The number of compute and Isilon nodes
using Intel technology can be adjusted
independently to achieve the optimal ratio
of compute and I/O bandwidth
HDFS I/O ALWAYS comes through a rack-
local Isilon node which collects data blocks
from all other Isilon nodes across the
InfiniBand fabric (see animation)
(used only for MR copy phase) 1+ Gbps (used only for MR copy phase)
Shuffle I/O (75% of all I/O during terasort)
remains on local storage. This can be flash
for optimal performance.
Isilon
HDFS
Isilon
HDFS
Isilon
HDFS
Isilon
HDFS
20© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
ISILON I/O PIPELINE
SAS HBA
Each drive averages 30-50 MB/sec (read or
write) with a typical Hadoop workload
CPU/Bus/Memory
Dual 10 Gbps Ethernet NIC
48-port 10 Gbps Switch
8 SAS ports × 6 Gbps6000
400
PCI 2.0 x8 bus4000
PCI 2.0 x8 bus4000
10 Gbps × 2 directions × 2 ports5000
Total
MB/sec
Intel CPU/memory bandwidth12,000
Non-blocking 48-port Switch120,000
Isilon Node Isilon X400 maximum throughput (per node)740
10 Gbps × 2 directions × 2 ports (per node)5000
Hadoop Compute Server
The network is NOT a bottleneck!
Isilon Node…
21© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
Traditional Hadoop+DAS
Isilon Enabled Hadoop
Data Handling not
required on Isilon
17m:32s 30m:18s 20m:50
16m:00s
Hadoop+DAS
55MB/s Node Throughput
Compute:
Time To Results:
vHadoop+Isilon
85MB/s Node Throughput
Compute:
Time To Results:
30m 18s
68m 40s
16m 00s
16m 00s
1 TB Terasort Test
HADOOP JOB CYCLE COMPARISON: TIME TO RESULTS
(THROUGHPUT)
22© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
PHASE 1: INGEST
IDC PERFORMANCE VALIDATION
0
50
100
150
200
250
300
350
400
450
NFS Write 10 GB NFS Read 10 GB
Isilon
DAS Cluster
Runtime [s]
Isilon: 4x x410 Nodes
+ 7 Compute Nodes
DAS: 7x (Compute+DAS Nodes)
4,2x
faster
36x
faster
23© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
PHASE 1I: COMPUTE RUNNING ON INTEL XEON PROCESSORS
IDC PERFORMANCE VALIDATION
0
500
1000
1500
2000
2500
3000
TeraGen TeraSort TeraValidate
Isilon
DAS Cluster
Runtime [s]
Isilon: 4x x410 Nodes
+ 7 Compute Nodes
DAS: 7x (Compute+DAS Nodes)
2,6x
faster
1,5x
faster
1,5x
faster
24© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
ESG PERFORMANCE VALIDATION
25© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
CAPACITY UTILIZATION,
DATA TIERING,
PROTECTION
26© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
NODE TYPES
SCALE-OUT NAS PRODUCT FAMILY POWERED BY INTEL
Purpose-Built for IOPs-
Intensive, Random Access,
File-Based Applications
Flexible Solution for
High Concurrent and
Sequential Throughput
Applications
Ideal for Cost-
Effective, Large-
Capacity Storage
S-Series
X-Series
NL-Series
Capacity
Performance
Ideal for Cost-Effective,
Large-Capacity Storage
HD-Series
27© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
POLICY BASED DATA PLACEMENT
Policy
<30 days
>30 days
S210
NL400
30 days-
2years
>2 years
NL400
HD400
 Multiple archive tiers
 Policy-based heuristic tiering
 Eliminate data migration
 Cost-optimized data placement
 Transparent to applications and users
 Optimize storage resources while matching
storage resources with data requirements
28© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
TIERING TO EXTERNAL OJECT STORE
SMART TIERING TO CLOUD
Key Features
Benefits
 Stub to Cloud of choice
 Extending workflow of SmartPools to
CloudPools
 Optional ability to send encrypted
data to the cloud
 Compression for efficient transport
 Simple management from a familiar
interface
 Seamless placement and availability
of data per policy
 Enable offsite DP & archive
 Transparent integration with offsite
stores
 One Accessible namespace
SmartPool -> CloudPool
Clients
SMB | NFS | RAN
OneFS
Service
Provider
Public
Cloud
Private
Cloud
29© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
Isilon Shared storageHadoop DAS
COMPARE- UTILIZATION AND CHOICE
Shared Storage
Hadoop Storage 3x copies
# copies
Number of
Nodes
+1 +2:1 +2 +3:1 +3 +4
3 2+1 (33%) 4+2 (33%) 3x -- -- --
4 3+1 (25%) 6+2 (25%) 2+2 (50%) 9+3 (25%) 4x --
5 4+1 (20%) 8+2 (20%) 3+2 (40%) 12+3 (20%) 4x 5x
6 5+1 (17%) 10+2 (17%) 4+2 (33%) 15+3 (17%) 3+3 (50%) 5x
7 6+1 (14%) 12+2 (14%) 5+2 (29%) 15+3 (17%) 4+3 (43%) 5x
8 7+1 (13%) 14+2 (12.5%) 6+2 (25%) 15+3 (17%) 5+3 (38%) 4+4 (50%)
9 8+1 (11%) 16+2 (11%) 7+2 (22%) 15+3 (17%) 6+3 (33%) 5+4 (44%)
10 9+1 (10%) 16+2 (11%) 8+2 (20%) 15+3 (17%) 7+3 (30%) 6+4 (40%)
12 11+1 (8%) 16+2 (11%) 10+2 (17%) 15+3 (17%) 9+3 (25%) 8+4 (33%)
14 13+1 (7%) 16+2 (11%) 12+2 (14%) 15+3 (17%) 11+3 (21%) 10+4 (29%)
D
1
D
2
D
3
D
4
P
0
P
1
D
5
D
6
D
7
D
8
P
0
P
1
D
9
D
1
1
D
1
2
D
1
0
P
0
P
1
Node1 Node2 Node3 Node4 Node5 Node6
Protection
Group
Protection
Group
Protection
Group
D
6
D
7
D
8
D
9
D
1
0
D
1
1
D
1
2
D
1
D
2
D
3
D
4
D
5
Data Stripes of 128kB
1 2 43
Triple
replication
30© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
INDUSTRY SOLUTION: MIRROR COPY
• Mirrored copy in a backup site
• Benefit: Achieves Local
reconstruction on hardware
failure
• Shortcoming: Storage
overhead -> 2.66x
Primary Secondary
31© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
INDUSTRY SOLUTION: DISTRIBUTED ERASURE
CODING
• Distributing fragments across sites
• Benefit: Achieves low Storage Overhead
~ 1.6x
• Shortcoming: Disk/Node failure
requires fragments to be fetched over
the WAN.
Site 1 Site 2
Site 3 Site 4
32© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
ECS HYBRID ERASURE CODING
• Data is written into chunks 3 copies
• Once a chunk fills to 128 MB, erasure coding starts
• Once it is completed and data is protected the 3 copies are
deleted.
A
A
AA
33© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
VIPR MODEL: BEST OF BOTH WORLDS
• Achieves low Storage Overhead ~
1.8x
• Local hardware failure recovery
requires no WAN traffic.
• Handles local hardware and full data
center failures
– Disk, Node, Rack, Data Center are
failure domains
Site 1 Site 2
Site 3 Site 4
34© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
DATA LAKE FOUNDATIONS WITH ECS
L.A.
Social Networks,
UGC
Public records
Location
Data
Internet
Of Things
Emerging
Traditional
Unstructured
Data
Dark Data
Structured
Data
Efficient Global Scale Big Data Analytics
Shared storage
Built-in high availability
Metered chargebacks
Multi-tenant
35© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
MULTIPLE TENANTS
 Each “Authentication Zone” is associated with
 A Network Zone
 List of Auth source associations (trusted or untrusted)
 A root folder in OneFS
IP Pool 1
IP Pool 2
IP Pool 3
AD 1
/ifs/deptC
Access
Zone1
Access
Zone2
Access
Zone3
/ifs/deptB
/ifs/deptA
AD 2
or
LDAP
36© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
MULTIPLE TENANTS WITH ECS
Authorization Provider
(AD/LDAP)
Namespace Admin
Create Namespace
Object Users
Namespace 1 Namespace 2 Namespace 3
Add Namespace &
System Admins
Object UsersObject Users
Object Buckets Object Buckets Object Buckets
System Admin
Assigns Object user to a
namespace
37© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
NFS
HDFS
SMB, NFS,
HTTP, FTP,
HDFS 1.x
...
HDFS 2.x
...
Node
reply
Node
reply
Node
reply
Node
reply
name
node
name
node
name
node
name
node
datanode
NFS
SMB
SMB
NFS MAP Reduce
MAP Reduce
MAP Reduce
MAP Reduce
MAP Reduce
MAP Reduce
SUPPORT FOR MULTIPLE ANALYTICS APPLICATIONS
38© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
SUMMARY: COMPARING SIDE BY SIDE
DAS Isilon
Simultaneous Hadoop Distributions r a
File/Object Level Access Control Lists a a
User Controlled Snapshot R a
De-duplication R a
WORM (SEC 17a-4) r a
POSIX Compliance r a
Independent Scaling (Storage/Compute) r a
Hadoop Distribution Portability r a
SQL Support (HAWQ, Impala, TEZ) a a
Encryption a2
SED
Data Tiering a3
a
Hadoop Distribution Support 1 All
Disaster Recovery Full File Copy
Snap
Replicate
DAS Isilon
Data-Set Management Ingested In-place
Protection overhead 200% ~20%
NameNode Redundancy (HA) Active/Passive N-to-N
Ability to edit files/objects r a
NFS v3, v4 r a
SMB 1, 2 r a
HTTP r a
FTP r a
Object (Proprietary) r a
HDFS v1 r a
HDFS v2 a1
a
Simultaneous Multi-Protocol r a
1. Only one version at a time
2. Software-Only (~30% impact to performance)
3. Available only in HDFS 2.5+
39© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
P I V O T A L B I G D A T A S U I T E
V M W A R E V C L O U D S U I T E
EMC DATA LAKE FOUNDATION: ISILON + ECS
VCE VBLOCK | XTREMIO | DATA DOMAIN
O P E N
A N A L Y T I C S
T O O L B O X
D A T A A N D A N A L Y T I C S C A T A L O G
A D V A N C E D A N A L Y T I C SA P P L I C A T I O N S
A T S C A L E
D A T A
P R O C E S S I N G
GREENPLUM
DATABASE
HAWQ
SPRING XD PIVOTAL HDSPARK
REDIS
RABBITMQ
GEMFIRE
BDS ON PIVOTAL
CLOUD FOUNDRY
H A D O O P
PLATFORMMANAGER
DATAGOVERNOR
DATA
MANAGER
INGEST
MANAGER
ANALYTICS
MANAGER
EMC BUSINESS DATA LAKE
40© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
FOR MORE INFORMATION ....
.... SEE US AT THE BOOTH
THANK YOU !
41© Copyright 2015 EMC Corporation. All rights reserved.
EMC Solutions are Powered by
Intel® Xeon® Processor Technology
QUESTIONS ?
Dr. Stefan Radtke
CTO, EMEA
EMC Emerging Technology Division
Phone: +49-176-34434460
E-Mail: Stefan.Radtke@emc.com
Linkedin: http://de.linkedin.com/in/drstefanradtke
Blog: http://stefanradtke.blogspot.com
In-Place analytics with Unified Data Access

More Related Content

What's hot

IoT Story: From Edge to HDP
IoT Story: From Edge to HDPIoT Story: From Edge to HDP
IoT Story: From Edge to HDPDataWorks Summit
 
Trends in Data Protection with DCIG
Trends in Data Protection with DCIGTrends in Data Protection with DCIG
Trends in Data Protection with DCIGGina Tragos
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY
 
Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution EMC
 
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
DataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage HypervisorDataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage HypervisorASBIS SK
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopHortonworks
 
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
The Need for Speed: Parallel I/O and the New Tick-Tock in ComputingThe Need for Speed: Parallel I/O and the New Tick-Tock in Computing
The Need for Speed: Parallel I/O and the New Tick-Tock in ComputingDataCore Software
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias StorageFran Navarro
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricDataWorks Summit
 

What's hot (20)

EMC Unified Analytics Platform. Gintaras Pelenis
EMC Unified Analytics Platform. Gintaras PelenisEMC Unified Analytics Platform. Gintaras Pelenis
EMC Unified Analytics Platform. Gintaras Pelenis
 
IoT Story: From Edge to HDP
IoT Story: From Edge to HDPIoT Story: From Edge to HDP
IoT Story: From Edge to HDP
 
Trends in Data Protection with DCIG
Trends in Data Protection with DCIGTrends in Data Protection with DCIG
Trends in Data Protection with DCIG
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution
 
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
DataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage HypervisorDataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage Hypervisor
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
 
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
The Need for Speed: Parallel I/O and the New Tick-Tock in ComputingThe Need for Speed: Parallel I/O and the New Tick-Tock in Computing
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
 
DataCore At VMworld 2016
DataCore At VMworld 2016DataCore At VMworld 2016
DataCore At VMworld 2016
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
 

Similar to In-Place analytics with Unified Data Access

Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop StacksDataWorks Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoopTaldor Group
 
Webinar: The Bifurcation of the Flash Market
Webinar: The Bifurcation of the Flash MarketWebinar: The Bifurcation of the Flash Market
Webinar: The Bifurcation of the Flash MarketStorage Switzerland
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveClaudioFahey1
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex systemIbm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex systemIBM Switzerland
 
Eng systems oracle_overview
Eng systems oracle_overviewEng systems oracle_overview
Eng systems oracle_overviewFran Navarro
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonRSD
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsRed_Hat_Storage
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 

Similar to In-Place analytics with Unified Data Access (20)

Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop Stacks
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoop
 
Webinar: The Bifurcation of the Flash Market
Webinar: The Bifurcation of the Flash MarketWebinar: The Bifurcation of the Flash Market
Webinar: The Bifurcation of the Flash Market
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep Dive
 
Emc isilon overview
Emc isilon overview Emc isilon overview
Emc isilon overview
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex systemIbm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
Eng systems oracle_overview
Eng systems oracle_overviewEng systems oracle_overview
Eng systems oracle_overview
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec Isilon
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

In-Place analytics with Unified Data Access

  • 1. 1© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology ISILON ROADMAP 2015 STEFAN RADTKE CTO, EMEA EMERGING TECHNOLOGY DIVISION IN-PLACE ANALYTICS WITH UNIFIED DATA ACCESS DR. STEFAN RADTKE
  • 2. 2© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology ANALYTICS TODAY IS ABOUT BIG DATA Big Data needs Big storage 41% 38% 35% 32% 28% 0% 10% 20% 30% 40% 50% Application… Big Data/Business… Security/Risk… Business Process… Cloud Computing
  • 3. 3© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology BIG DATA IS SPREAD OUT Unstructured Data Dark Data Structured Data Traditional Social Networks, UGCPublic records Location Data Internet Of Things Emerging Social Networks, UGCPublic records Location Data Internet Of Things Emerging Unstructured Data Dark Data Structured Data Traditional Big Data is Growing FastBig Data is Siloed Silos, Complexity, Data explosion, security, etc.. Hadoop Can add to the Data Problems Store, manage, protect and analyzetraditional and emergingdata Introducing the Data Lake Foundations Data Lake Infrastructurechoices: virtual,physical, converged Data Lake Foundations Virtual Physical Converged Storage Compute APPLICATIONS
  • 4. 4© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology EMC DATA LAKE FOUNDATION EMC Isilon Type ECS-Appliance: ScaleOut Object Isilon: ScaleOut File + Object Protocols Swift, S3, MS Azure, Block, HDFS Swift, NFS, SMB, HDFS Use Case Scale out Block and Object access; Data in place Analytics with Hadoop Access data through NAS and Object; Enterprise Storage Features; Data in place Analytics with Hadoop Capacity Multi-Datacenter, Exabyte Scale Single Instance, 50 PB EMC Cloud Storage (ECS)
  • 5. 5© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology Complex Architecture • Many special-purpose systems • Moving data around • No complete views Cost of Analytics • Existing systems strained • No agility • BI backlog Time to Data • Up-front modeling • Transforms slow • Transforms lose data Visibility • Leaving data behind • Risk and compliance • High cost of storage THE OLD WAY: BRINGING DATA TO COMPUTE ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, ClickstreamsExternal Data Sources Data Archives EDWs Marts SearchServersDocument Stores Storage
  • 6. 6© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology Data Center Network TIME-TO-RESULTS Data Copy Analysis In-Place Analysis Existing Primary Storage Hadoop on a Stick Have you ever copied 100TB from Primary Storage to a Hadoop system? How long does it take to copy 100TB from one place to another over a 10Gb link? >24 Hours Data Center Network Existing Primary Storage Hadoop Compute Nodes Reading relevant data to analysis
  • 7. 7© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology THE NEW WAY: BRINGING COMPUTE TO DATA Multi-workload analytic platform • Bring applications to data • Combine different workloads on common data (i.e. SQL + Search) • True BI agility 4 EDWs Marts Storage Search Servers Documents Archives ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources Active archive • Full fidelity original data • Indefinite time, any source • Lowest cost storage 1 Data management, transformations • One source of data for all analytics • Persisted state of transformed data • Significantly faster & cheaper 2 Self-service exploratory BI • Simple search + BI tools • “Schema on read” agility • Reduce BI user backlog requests 3 1 1 4 3 2
  • 8. 8© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology EMC ISILON SCALE OUT ARCHITECTURE Web Apps Cloud HadoopArchive Linux/Unix Mac/iOS Windows
  • 9. 9© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology Simple is Smart • Isilon has integrated ‚RAID‘, Volume Manager and Filesystem layers into OneFS! • The resulting HW-Architecture based on Intel processors is therefore MUCH simpler • Uses internal Storage • Data protection with FEC accross nodes ISILON HW-TOPOLOGY ClientsC LAN C Clients Clients Clients Isilon Node GB/10GB Ethernet Isilon SAS Isilon Node SAS Isilon Node SAS Infiniband … Clients
  • 10. 10© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology Isilon Single Volume Simplicity EMPTYEMPTYEMPTYEMPTYEMPTYFULLFULLFULLFULLBALANCEDBALANCEDBALANCEDBALANCEDBALANCED 60 sec Upgrade SCALE-OUT ARCHITECTURE • Directories and files striped across nodes in a single volume • Adding capacity dynamically by adding node(s) • It’s a 60 seconds task until new capacity is available • Re-balancing is happening in background • Data protection with FEC across nodes • Leverages Intel technology
  • 11. 11© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology Data Layout File Striping  Data is striped across the nodes – Not across disks – FEC not RAID  Data breakdown: – FS block size 8 KB – 16 blocks per stripe unit – 128 KB stripe width per drive Clients File Write Data Stripe Unit Data Stripe Unit File Data stripe unit Data stripe unit Parity stripe unit
  • 12. 12© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology • Multi-threaded daemon runs on all nodes – Services both NN and DN protocols – Translates HDFS RPCs to POSIX system calls – Stateless, underlying FS handles coherency ISI_HDFS_D OneFS Node isi_hdfs_d Thread Request VFS OneFS Syscall Response
  • 13. 13© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology HDFS IMPLEMENTED LIKE A NAS PROTOCOL OneFS runs a daemon that speaks NameNode and DataNode natively OneFS Clustered FileSystem OneFS Node NameNode DataNode OneFS Node NameNode DataNode OneFS Node NameNode DataNode OneFS Node NameNode DataNode Hadoop Node DFSClient 1) Request(“/file”) 2) Response (block locations) 3) GetBlock(block)
  • 14. 14© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology HDFS: STANDARD HADOOP CLUSTER HDFS file file copy2 file copy3 node info file node info file copy2 file copy3 file node info file copy2 file copy3 file node info file copy2 file copy3 Node reply Node reply Node reply Node reply node reply MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce node info MAP Reduce MAP Reduce MAP Reduce MAP Reduce Data Compute MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce Compute Data Name node 3X NFS Name node Decision Support Databases Web Click data OLAP EDW HTTP CIFS FTP NFS Landing Zone Servers Step 1: Data is copied into the Landing Zone Step 2: Data is copied into the Cluster (3 times) Step 3: Hadoop Jobs are run
  • 15. 15© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology node info node info node info node info MAP Reduce MAP Reduce MAP Reduce MAP Reduce Decision Support Web Click data OLAP EDW Step 1: Jobs are run Compute Cluster HDFS: SHARED DATA LAKE Hadoop
  • 16. 16© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology AND WHAT ABOUT PERFORMANCE ? Compute Nodes Isilon Nodes
  • 17. 17© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology TRADITIONAL COMPUTE WITH LOCAL DAS Rack Ethernet Switch Compute Shuffle+HDFS SATA 1+ Gbps Core Ethernet Switch Compute 1 Gbps … Shuffle+HDFS Compute… Shuffle+HDFS Rack Ethernet Switch Compute Shuffle+HDFS SATA 1+ Gbps Compute 1 Gbps Shuffle+HDFS Compute… Shuffle+HDFS The ratio of compute and disk space/performance is fixed. Non-local HDFS I/O (20-90% of HDFS I/O) will go through Ethernet. Local disk usage is shared between shuffle I/O (60% of all I/O during terasort) and HDFS I/O.
  • 18. 18© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology DAS I/O PIPELINE SAS HBA Each drive averages 30-50 MB/sec (read or write) with a typical Hadoop workload CPU/Bus/Memory Dual 10 Gbps Ethernet NIC 48-port 10 Gbps Switch 8 SAS ports × 6 Gbps6000 600 PCI 2.0 x8 bus4000 PCI 2.0 x8 bus4000 10 Gbps × 2 directions × 2 ports5000 Total MB/sec CPU/memory bandwidth12,000 Non-blocking 48-port Switch120,000 Hadoop DAS Server Where is the bottleneck?
  • 19. 19© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology IMPROVING EFFICIENCY WITH SHARED STORAGE Isilon InfiniBand Switch Rack Ethernet Switch Compute Shuffle SATA 1+ Gbps 10 Gbps Core Ethernet Switch Compute Shuffle 10 Gbps … … IB Rack Ethernet Switch Compute Shuffle SATA 10 Gbps Compute Shuffle 10 Gbps … … IB … The number of compute and Isilon nodes using Intel technology can be adjusted independently to achieve the optimal ratio of compute and I/O bandwidth HDFS I/O ALWAYS comes through a rack- local Isilon node which collects data blocks from all other Isilon nodes across the InfiniBand fabric (see animation) (used only for MR copy phase) 1+ Gbps (used only for MR copy phase) Shuffle I/O (75% of all I/O during terasort) remains on local storage. This can be flash for optimal performance. Isilon HDFS Isilon HDFS Isilon HDFS Isilon HDFS
  • 20. 20© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology ISILON I/O PIPELINE SAS HBA Each drive averages 30-50 MB/sec (read or write) with a typical Hadoop workload CPU/Bus/Memory Dual 10 Gbps Ethernet NIC 48-port 10 Gbps Switch 8 SAS ports × 6 Gbps6000 400 PCI 2.0 x8 bus4000 PCI 2.0 x8 bus4000 10 Gbps × 2 directions × 2 ports5000 Total MB/sec Intel CPU/memory bandwidth12,000 Non-blocking 48-port Switch120,000 Isilon Node Isilon X400 maximum throughput (per node)740 10 Gbps × 2 directions × 2 ports (per node)5000 Hadoop Compute Server The network is NOT a bottleneck! Isilon Node…
  • 21. 21© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology Traditional Hadoop+DAS Isilon Enabled Hadoop Data Handling not required on Isilon 17m:32s 30m:18s 20m:50 16m:00s Hadoop+DAS 55MB/s Node Throughput Compute: Time To Results: vHadoop+Isilon 85MB/s Node Throughput Compute: Time To Results: 30m 18s 68m 40s 16m 00s 16m 00s 1 TB Terasort Test HADOOP JOB CYCLE COMPARISON: TIME TO RESULTS (THROUGHPUT)
  • 22. 22© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology PHASE 1: INGEST IDC PERFORMANCE VALIDATION 0 50 100 150 200 250 300 350 400 450 NFS Write 10 GB NFS Read 10 GB Isilon DAS Cluster Runtime [s] Isilon: 4x x410 Nodes + 7 Compute Nodes DAS: 7x (Compute+DAS Nodes) 4,2x faster 36x faster
  • 23. 23© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology PHASE 1I: COMPUTE RUNNING ON INTEL XEON PROCESSORS IDC PERFORMANCE VALIDATION 0 500 1000 1500 2000 2500 3000 TeraGen TeraSort TeraValidate Isilon DAS Cluster Runtime [s] Isilon: 4x x410 Nodes + 7 Compute Nodes DAS: 7x (Compute+DAS Nodes) 2,6x faster 1,5x faster 1,5x faster
  • 24. 24© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology ESG PERFORMANCE VALIDATION
  • 25. 25© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology CAPACITY UTILIZATION, DATA TIERING, PROTECTION
  • 26. 26© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology NODE TYPES SCALE-OUT NAS PRODUCT FAMILY POWERED BY INTEL Purpose-Built for IOPs- Intensive, Random Access, File-Based Applications Flexible Solution for High Concurrent and Sequential Throughput Applications Ideal for Cost- Effective, Large- Capacity Storage S-Series X-Series NL-Series Capacity Performance Ideal for Cost-Effective, Large-Capacity Storage HD-Series
  • 27. 27© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology POLICY BASED DATA PLACEMENT Policy <30 days >30 days S210 NL400 30 days- 2years >2 years NL400 HD400  Multiple archive tiers  Policy-based heuristic tiering  Eliminate data migration  Cost-optimized data placement  Transparent to applications and users  Optimize storage resources while matching storage resources with data requirements
  • 28. 28© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology TIERING TO EXTERNAL OJECT STORE SMART TIERING TO CLOUD Key Features Benefits  Stub to Cloud of choice  Extending workflow of SmartPools to CloudPools  Optional ability to send encrypted data to the cloud  Compression for efficient transport  Simple management from a familiar interface  Seamless placement and availability of data per policy  Enable offsite DP & archive  Transparent integration with offsite stores  One Accessible namespace SmartPool -> CloudPool Clients SMB | NFS | RAN OneFS Service Provider Public Cloud Private Cloud
  • 29. 29© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology Isilon Shared storageHadoop DAS COMPARE- UTILIZATION AND CHOICE Shared Storage Hadoop Storage 3x copies # copies Number of Nodes +1 +2:1 +2 +3:1 +3 +4 3 2+1 (33%) 4+2 (33%) 3x -- -- -- 4 3+1 (25%) 6+2 (25%) 2+2 (50%) 9+3 (25%) 4x -- 5 4+1 (20%) 8+2 (20%) 3+2 (40%) 12+3 (20%) 4x 5x 6 5+1 (17%) 10+2 (17%) 4+2 (33%) 15+3 (17%) 3+3 (50%) 5x 7 6+1 (14%) 12+2 (14%) 5+2 (29%) 15+3 (17%) 4+3 (43%) 5x 8 7+1 (13%) 14+2 (12.5%) 6+2 (25%) 15+3 (17%) 5+3 (38%) 4+4 (50%) 9 8+1 (11%) 16+2 (11%) 7+2 (22%) 15+3 (17%) 6+3 (33%) 5+4 (44%) 10 9+1 (10%) 16+2 (11%) 8+2 (20%) 15+3 (17%) 7+3 (30%) 6+4 (40%) 12 11+1 (8%) 16+2 (11%) 10+2 (17%) 15+3 (17%) 9+3 (25%) 8+4 (33%) 14 13+1 (7%) 16+2 (11%) 12+2 (14%) 15+3 (17%) 11+3 (21%) 10+4 (29%) D 1 D 2 D 3 D 4 P 0 P 1 D 5 D 6 D 7 D 8 P 0 P 1 D 9 D 1 1 D 1 2 D 1 0 P 0 P 1 Node1 Node2 Node3 Node4 Node5 Node6 Protection Group Protection Group Protection Group D 6 D 7 D 8 D 9 D 1 0 D 1 1 D 1 2 D 1 D 2 D 3 D 4 D 5 Data Stripes of 128kB 1 2 43 Triple replication
  • 30. 30© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology INDUSTRY SOLUTION: MIRROR COPY • Mirrored copy in a backup site • Benefit: Achieves Local reconstruction on hardware failure • Shortcoming: Storage overhead -> 2.66x Primary Secondary
  • 31. 31© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology INDUSTRY SOLUTION: DISTRIBUTED ERASURE CODING • Distributing fragments across sites • Benefit: Achieves low Storage Overhead ~ 1.6x • Shortcoming: Disk/Node failure requires fragments to be fetched over the WAN. Site 1 Site 2 Site 3 Site 4
  • 32. 32© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology ECS HYBRID ERASURE CODING • Data is written into chunks 3 copies • Once a chunk fills to 128 MB, erasure coding starts • Once it is completed and data is protected the 3 copies are deleted. A A AA
  • 33. 33© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology VIPR MODEL: BEST OF BOTH WORLDS • Achieves low Storage Overhead ~ 1.8x • Local hardware failure recovery requires no WAN traffic. • Handles local hardware and full data center failures – Disk, Node, Rack, Data Center are failure domains Site 1 Site 2 Site 3 Site 4
  • 34. 34© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology DATA LAKE FOUNDATIONS WITH ECS L.A. Social Networks, UGC Public records Location Data Internet Of Things Emerging Traditional Unstructured Data Dark Data Structured Data Efficient Global Scale Big Data Analytics Shared storage Built-in high availability Metered chargebacks Multi-tenant
  • 35. 35© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology MULTIPLE TENANTS  Each “Authentication Zone” is associated with  A Network Zone  List of Auth source associations (trusted or untrusted)  A root folder in OneFS IP Pool 1 IP Pool 2 IP Pool 3 AD 1 /ifs/deptC Access Zone1 Access Zone2 Access Zone3 /ifs/deptB /ifs/deptA AD 2 or LDAP
  • 36. 36© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology MULTIPLE TENANTS WITH ECS Authorization Provider (AD/LDAP) Namespace Admin Create Namespace Object Users Namespace 1 Namespace 2 Namespace 3 Add Namespace & System Admins Object UsersObject Users Object Buckets Object Buckets Object Buckets System Admin Assigns Object user to a namespace
  • 37. 37© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology NFS HDFS SMB, NFS, HTTP, FTP, HDFS 1.x ... HDFS 2.x ... Node reply Node reply Node reply Node reply name node name node name node name node datanode NFS SMB SMB NFS MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce SUPPORT FOR MULTIPLE ANALYTICS APPLICATIONS
  • 38. 38© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology SUMMARY: COMPARING SIDE BY SIDE DAS Isilon Simultaneous Hadoop Distributions r a File/Object Level Access Control Lists a a User Controlled Snapshot R a De-duplication R a WORM (SEC 17a-4) r a POSIX Compliance r a Independent Scaling (Storage/Compute) r a Hadoop Distribution Portability r a SQL Support (HAWQ, Impala, TEZ) a a Encryption a2 SED Data Tiering a3 a Hadoop Distribution Support 1 All Disaster Recovery Full File Copy Snap Replicate DAS Isilon Data-Set Management Ingested In-place Protection overhead 200% ~20% NameNode Redundancy (HA) Active/Passive N-to-N Ability to edit files/objects r a NFS v3, v4 r a SMB 1, 2 r a HTTP r a FTP r a Object (Proprietary) r a HDFS v1 r a HDFS v2 a1 a Simultaneous Multi-Protocol r a 1. Only one version at a time 2. Software-Only (~30% impact to performance) 3. Available only in HDFS 2.5+
  • 39. 39© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology P I V O T A L B I G D A T A S U I T E V M W A R E V C L O U D S U I T E EMC DATA LAKE FOUNDATION: ISILON + ECS VCE VBLOCK | XTREMIO | DATA DOMAIN O P E N A N A L Y T I C S T O O L B O X D A T A A N D A N A L Y T I C S C A T A L O G A D V A N C E D A N A L Y T I C SA P P L I C A T I O N S A T S C A L E D A T A P R O C E S S I N G GREENPLUM DATABASE HAWQ SPRING XD PIVOTAL HDSPARK REDIS RABBITMQ GEMFIRE BDS ON PIVOTAL CLOUD FOUNDRY H A D O O P PLATFORMMANAGER DATAGOVERNOR DATA MANAGER INGEST MANAGER ANALYTICS MANAGER EMC BUSINESS DATA LAKE
  • 40. 40© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology FOR MORE INFORMATION .... .... SEE US AT THE BOOTH THANK YOU !
  • 41. 41© Copyright 2015 EMC Corporation. All rights reserved. EMC Solutions are Powered by Intel® Xeon® Processor Technology QUESTIONS ? Dr. Stefan Radtke CTO, EMEA EMC Emerging Technology Division Phone: +49-176-34434460 E-Mail: Stefan.Radtke@emc.com Linkedin: http://de.linkedin.com/in/drstefanradtke Blog: http://stefanradtke.blogspot.com