Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster & Amazon Web Services

Big Data Goes Airborne
Jorge A. Lopez
Director Product Marketing,
Syncsort
Chris Keyser
Partner Solution Architect,
Amazon Web Services

Agenda
1. The Cloud as a Data Platform
2. Addressing Data Processing Challenges with Ironcluster & AWS
3. DEMO
4. Closing Comments + Q&A

Why Are Customers Adopting Cloud and AWS?
1.
Cost savings
through
economics of
scale
Don’t have to
guess on capacity
3.
Agility, Speed to
market & Flexibility
4.
Global in minutes
5.
2.
Trade capital
expense for
variable expense
Security and
Compliance
6.

AWS Global Infrastructure
10 Regions
26 Availability Zones
51 Edge Locations

The Good News Is that Cloud Isn’t an ‘All or Nothing’ Choice
On-Premises
Resources
Cloud
Resources
Integration
Corporate
Data Centers

Integrating Your On-Premises, AWS and SaaS Infrastructure
Applications on premise
App Migration/Archiving
Hybrid Data Warehouse / BI
Active Directory
Network Configuration
Corporate
Data
Centers
Users & Access Rules (IAM)
Your Private Network (VPC)
Your On-Premises
Data Center
AWS Direct Connect
Your Cloud
Data Center
Applications on AWS
Data Warehouse/BI
Managed Databases

AWS Provides Broad and Deep Services
Regions Availability Zones Content Delivery POPs
Storage GatewayS3 EBS Glacier Import/Export
DynamoD
B
ElastiCache
StorageCompute Databases
RDS
MySQL, PostgreSQL
Oracle, SQL Server
Elastic Load BalancerEC2 Auto Scaling
Direct Connect Route 53VPC
Networking
Analytics
Data PipelineRedshiftEMR Kinesis SWFSNS SQS CloudSearchSES AppStreamCloudFront
Application Services
WorkSpaces
Management &
AdministrationIAM CloudWatchCloudTrail APIs and SDKsManagement ConsoleCloud HSM Command Line Interface
Elastic Beanstalk for Java, Node.js, Python,
Ruby, PHP and .Net OpsWorks CloudFormationContainers & Deployment
Technology
Partners
Consulting Partners AWS MarketplaceEcosystem
Support CertificationTrainingProfessional Services

G2
GPU
enabled
M3
General
purpose
Memory
optimized
R3
Storage and IO
optimized
C3
Compute
optimized
I2 HS1
32 vCPU
60 GB RAM
720 GB SSD
32 vCPU
244 GB RAM
6.4 TB SSD
16 vCPU
117 GB RAM
48 TB HDD
8 vCPU
15 GB RAM
1536 CUDA cores
4 GB Video RAM
32 vCPU
244 GB RAM
720 GB SSD
c3.8xlarge i2.8xlarge hs1.8xlarge r3.8xlarge G2.2xlarge
8 vCPU
30 GB RAM
160 GB SSD
m3.2xlarge
Amazon EC2 - Broad Selection of Compute Instance Families

AWS as a Data Platform
EC2EBS
Instance Storage
RedshiftRDS
SQL Stores
EMR
hadoop
DynamoDB
NoSQL
Kinesis
stream
Cloud
Search
search
S3
Storage Services
Cloud
FrontGlacier
DB
A
Data
Velocity
Variety
Volume
Structured, Unstructured, Text, Binary
Gigabytes, Terabytes, Petabytes
Millisecond, Second, Minute, Hour, Day

Master instance
group
Task instance
group
Core instance
group
HDFS HDFS
Amazon S3Amazon
Redshift
Amazon
DynamoDB
Amazon EMR - Hadoop Tuned for AWS

Amazon Redshift - Petabyte Scale Data Warehouse
Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Backup and restore via S3
– Parallel load from S3, EMR, or DynamoDB
HW optimized for data processing
– DW1: 2TB – 1.6PB Magnetic
– DW2: 160GB – 256TB SSD
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC

The Data Processing Challenge
!

Innovative Cloud Solutions
Ironcluster ETL,
Amazon EC2 Edition
COLLECT, PROCESS & DISTRIBUTE DATA AT DISRUPTIVE SCALE & COST
 Blazingly FAST, infinitely SCALABLE
 EASY to use graphical user interface
 Self-tuning engine for SMART data integration
 The capacity you need, when YOU need it
 Instantly provision with single-click access
Ironcluster Hadoop ETL
for Amazon EMR
Now FREE
in the AWS
Marketplace!
Only pure-play ETL app available on the AWS Marketplace

Ironcluster – Enterprise-grade ETL in 3 Easy Steps
Done? Spin Down
Ironcluster
Go to AWS Marketplace &
Select Your Ironcluster Instance
Spin up Ironcluster &
Start Developing
1 2 3

Got Big Data? – Enter Hadoop with Ironcluster Hadoop ETL
Now… How do I get productive quickly?
! Many use cases
(Where do I start?)
!! Disparate tools
(or BYOL)
!!! Lots of manual coding
!!!! Expensive, hard-to-find skills
Outcomes: High Costs + Slow Results
Get Your Hadoop Cluster
! Procure
!! Setup
!!! Configure
!!!! Deploy

Got Big Data? – Enter Hadoop with Ironcluster Hadoop ETL
Now… How do I get productive quickly?
! Many use cases
(Where do I start?)
!! Disparate tools
(or BYOL)
!!! Lots of manual coding
!!!! Expensive, hard-to-find skills
Outcomes: High Costs + Slow Results
Get Your Hadoop ClusterGet Your Hadoop Cluster
! Procure
!! Setup
!!! Configure
!!!! Deploy
Vs.
Now …Get right to work!
Fully Productive in Days + No Brainer Cost

Syncsort Ironcluster: Hadoop ETL for Amazon EMR
Blazingly Fast, Easy to Use
Hadoop ETL on Amazon EMR
+( )
 Develop MapReduce ETL jobs graphically
 Create sophisticated data flows in no time,
with a library of Use Case Accelerators
 Avoid the coding nightmare without
compromising on performance
 Develop once, reuse many times
 Leverage all your data, including Amazon
Redshift & S3 sources/targets
 Scale infinitely with a disruptively low,
“no brainer” price
It’s FREE!!

It’s All About Discovering New Insights
An End-to-End Approach to Data Processing & Visualization
Create data extracts in seconds with just a click in Ironcluster!
Access your data from
virtually any source
including Social, Redshift,
S3, XML, and more
Visualize w/ Tableau
• Combined power of
Hadoop & AWS
• Faster queries
• All enterprise data
• Advanced analytics
Vast Variety of
Data Sources
Process w/ Ironcluster in AWS
• Fastest & lightweight
run-time ETL engine
• Deploy with or without
Hadoop
• Comprehensive library of
transformations
TDEs at blazing speed
• Directly create TDE
files or objects to
load Tableau
• Cut latency
• No pre-requisite
software to install
Ironcluster Tableau Connector

Lower Your Cost & Optimize Cloud Computing on Any AWS Platform
Redshift: Transform data, then load to Redshift for reporting and advanced analytics
S3: Stream log data from S3, aggregate for insight into web user behavior, stream back to S3
RDS: Translate data from MySQL, Oracle, Microsoft SQL Server, or PostgreSQL
DynamoDB: Join large data volumes & load to DynamoDB for mobile, gaming and add apps
<---> Throughput
Speed &
Efficiency
*Users of the new Ironcluster ETL for EC2 can experience up to a 75% reduction in processing time and total cost of ownership
when compared to legacy ETL approaches and tools. Based on Syncsort benchmarking and POCs.
$
75% Processing
Time
Cost
*

The Possibilities Are Endless
Sort & aggregate
massive data volumes
generated by mobile
devices to improve
customer satisfaction
Develop & run complex
market risk models on big
datasets with Ironcluster in
Amazon EMR
Leverage Use Case
Accelerators to quickly
deploy click-stream and
web log analysis
applications in AWS
Pre-process PB of data
from sensors and
research new algorithms
to support quality
assurance

Visit Us @ The Amazon Web Services Marketplace
Try Ironcluster ETL
FREE for 30 Days!
www.syncsort.com/IronclusterEC2
Got Big Data?
Get Ironcluster Hadoop ETL
for Amazon EMR FREE!
www.syncsort.com/IronclusterEMR
Watch this Webcast On-Demand -
Including a Product Demonstration!
http://bit.ly/1zYh9er

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster & Amazon Web Services

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster & Amazon Web Services

Similar to Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster & Amazon Web Services (20)

More from Precisely

More from Precisely (20)

Recently uploaded

Recently uploaded (20)

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster & Amazon Web Services

Editor's Notes