Big Data Goes Airborne
Big Data Goes Airborne
Jorge A. Lopez
Director Product Marketing,
Syncsort
Chris Keyser
Partner Solution Architect,
Amazon...
Agenda
1. The Cloud as a Data Platform
2. Addressing Data Processing Challenges with Ironcluster & AWS
3. DEMO
4. Closing ...
Why Are Customers Adopting Cloud and AWS?
1.
Cost savings
through
economics of
scale
Don’t have to
guess on capacity
3.
Ag...
AWS Global Infrastructure
10 Regions
26 Availability Zones
51 Edge Locations
The Good News Is that Cloud Isn’t an ‘All or Nothing’ Choice
On-Premises
Resources
Cloud
Resources
Integration
Corporate
D...
Integrating Your On-Premises, AWS and SaaS Infrastructure
Applications on premise
App Migration/Archiving
Hybrid Data Ware...
AWS Provides Broad and Deep Services
Regions Availability Zones Content Delivery POPs
Storage GatewayS3 EBS Glacier Import...
G2
GPU
enabled
M3
General
purpose
Memory
optimized
R3
Storage and IO
optimized
C3
Compute
optimized
I2 HS1
32 vCPU
60 GB R...
AWS as a Data Platform
EC2EBS
Instance Storage
RedshiftRDS
SQL Stores
EMR
hadoop
DynamoDB
NoSQL
Kinesis
stream
Cloud
Searc...
Master instance
group
Task instance
group
Core instance
group
HDFS HDFS
Amazon S3Amazon
Redshift
Amazon
DynamoDB
Amazon EM...
Amazon Redshift - Petabyte Scale Data Warehouse
Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution...
The Data Processing Challenge
!
Innovative Cloud Solutions
Ironcluster ETL,
Amazon EC2 Edition
COLLECT, PROCESS & DISTRIBUTE DATA AT DISRUPTIVE SCALE & CO...
Ironcluster – Enterprise-grade ETL in 3 Easy Steps
Done? Spin Down
Ironcluster
Go to AWS Marketplace &
Select Your Ironclu...
Got Big Data? – Enter Hadoop with Ironcluster Hadoop ETL
Now… How do I get productive quickly?
! Many use cases
(Where do ...
Got Big Data? – Enter Hadoop with Ironcluster Hadoop ETL
Now… How do I get productive quickly?
! Many use cases
(Where do ...
Syncsort Ironcluster: Hadoop ETL for Amazon EMR
Blazingly Fast, Easy to Use
Hadoop ETL on Amazon EMR
+( )
 Develop MapRed...
It’s All About Discovering New Insights
An End-to-End Approach to Data Processing & Visualization
Create data extracts in ...
Lower Your Cost & Optimize Cloud Computing on Any AWS Platform
Redshift: Transform data, then load to Redshift for reporti...
The Possibilities Are Endless
Sort & aggregate
massive data volumes
generated by mobile
devices to improve
customer satisf...
Visit Us @ The Amazon Web Services Marketplace
Try Ironcluster ETL
FREE for 30 Days!
www.syncsort.com/IronclusterEC2
Got B...
Upcoming SlideShare
Loading in …5
×

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster & Amazon Web Services

585 views

Published on

Learn about the only solution to instantly provision a full-featured ETL environment running on AWS for less than your Sunday newspaper!

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
585
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Make agility clearer on this slide….add something here with security/compliance as well.

  • Our data center footprint is global, spanning 5 continents with highly redundant clusters of data centers in each region. Our footprint is expanding continuously as we increase capacity, redundancy and add locations to meet the needs of our customers around the world.
  • TODO: add Infa and data movmment into this slide. Put apps into the enterprise side, add a layer for Mercator / SFDC as another block on this diagram
  • big reason people are moving so fast to the cloud is breadth of services/features/geo AWS has

    If want to build new businesses from scratch or move some/all workloads to cloud, need a broad array of services and features to make this happen and not have to piecemeal it
  • Today, we’re extending these instance families further.

    HS1 instance family which will double the number of vCPU threads
    Increase storage throughput performance from 2.6 to 3.6 gigabits per second.
    R3 instance family. R3 instances feature an 8:1 memory to CPU ratio, with up to 244GB of RAM, fast SSD based local storage and enhanced networking.
    R3 instances replace the M2 and CR1 instances, focusing on memory-optimized use cases.
    R3’s offers more instances sizes up to 244GiB of RAM, with around 27% faster memory based on STREAM performance over M2.
  • Start an EMR cluster using console or cli tools
    Master instance group created that controls the cluster
    Core instance group created for life of cluster
    Core instances run DataNode and TaskTracker daemons
    Optional task instances can be added or subtracted to perform work (SPOT)
    S3 can be used as underlying ‘file system’ for input/output data
    Master node coordinates distribution of work and manages cluster state
    Core and Task instances read-write to S3


  • As we’ve seen AWS allows you to instantly provision a great platform to manage and process large amounts of data with and without Hadoop. However, this is just part of the story. Without the right tools, collecting, processing and distributing data for valuable analytics requires either manual coding or writing hundreds of lines of SQL and in the case of Hadoop even Java Pig, HiveQL, and more.
  • That’s why we developed Ironcluster – these are the first and only pure-play ETL solutions available on the Amazon market place, so you can instantly deploy a full feature ETL environment to collect, process and distribute data in the cloud.

    Ironcluster ETL, Amazon EC2 Edition allows you to instantly provision a full-featured ETL environment running on Amazon Elastic Compute Cloud (Amazon EC2). Ironcluster ETL takes away the complexity of data integration, delivering a much more agile ETL environment with the capacity you need, when you need it. No hardware to procure, no software licenses to buy.

    Ironcluster Hadoop ETL runs natively within your amazon EMR cluster – allowing you to leverage the massive scalability and performance of Hadoop in the Cloud
  • Both – Ironcluster ETL and Ironcluster Hadoop ETL are available on the AWS Marketplace, this means

    Let me tell you a bit about each…

    Complete Customer Quote from Greg Sokol, Data Warehouse Architect, ModCloth, an early Ironcluster user.
    “We needed an easy to install and upgrade, high-performance, lightweight ETL product that works well in the cloud with Amazon Web Services,”…“Ironcluster ETL has served as a great product given our requirements and priorities, helping us take full advantage of the cost and efficiency benefits we achieve with cloud computing as part of our data management architecture.”
  • Then Hadoop


    First roadblocl – How do you stand up your Hadoop cluster?
    Solution -> Now you have it!
    Second: -> Now What?
  • Then Hadoop


    First roadblocl – How do you stand up your Hadoop cluster?
    Solution -> Now you have it!
    Second: -> Now What?
  • A bit more detail about Hadoop

    The first and only ETL tool for Amazon EMR
    GUI
    Use Case Accelerators
    Price point
    FREE VERSION
    Fully integrated Hadoop ETL – Smarter architecture – no code generation


    Faster time to deployment
    And lower costs
    We’re part of the AWS marketplace
    You don’t have to buy your license – we’re integrated into AWS marketplace for Amazon EMR


    AWS
    Marketplace
    Partner network logo



    Free online support for the free version
    World-class support
    Free online for free version
    Personal support for paid version
  • In the end is all about the insights you can get from your Data, and we know people love data discovery and visualization tools

    The good news is you can use Syncsort DMX-h with the leading BI tool of your choice, but I specifically wanted to mention Tableau – since they are one of our strategic partners and we just happened to release a fully integrated connector, that allows you to create a Tableau data extract file directly from our interface.

    You simply select Tableau as the target and it will generate the TDE file, no need to install any additional software since we include the Tableau API.

  • Now from the business perspective there are benefits too….
  • So when you think about amazon….
  • Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster & Amazon Web Services

    1. 1. Big Data Goes Airborne
    2. 2. Big Data Goes Airborne Jorge A. Lopez Director Product Marketing, Syncsort Chris Keyser Partner Solution Architect, Amazon Web Services
    3. 3. Agenda 1. The Cloud as a Data Platform 2. Addressing Data Processing Challenges with Ironcluster & AWS 3. DEMO 4. Closing Comments + Q&A
    4. 4. Why Are Customers Adopting Cloud and AWS? 1. Cost savings through economics of scale Don’t have to guess on capacity 3. Agility, Speed to market & Flexibility 4. Global in minutes 5. 2. Trade capital expense for variable expense Security and Compliance 6.
    5. 5. AWS Global Infrastructure 10 Regions 26 Availability Zones 51 Edge Locations
    6. 6. The Good News Is that Cloud Isn’t an ‘All or Nothing’ Choice On-Premises Resources Cloud Resources Integration Corporate Data Centers
    7. 7. Integrating Your On-Premises, AWS and SaaS Infrastructure Applications on premise App Migration/Archiving Hybrid Data Warehouse / BI Active Directory Network Configuration Corporate Data Centers Users & Access Rules (IAM) Your Private Network (VPC) Your On-Premises Data Center AWS Direct Connect Your Cloud Data Center Applications on AWS Data Warehouse/BI Managed Databases
    8. 8. AWS Provides Broad and Deep Services Regions Availability Zones Content Delivery POPs Storage GatewayS3 EBS Glacier Import/Export DynamoD B ElastiCache StorageCompute Databases RDS MySQL, PostgreSQL Oracle, SQL Server Elastic Load BalancerEC2 Auto Scaling Direct Connect Route 53VPC Networking Analytics Data PipelineRedshiftEMR Kinesis SWFSNS SQS CloudSearchSES AppStreamCloudFront Application Services WorkSpaces Management & AdministrationIAM CloudWatchCloudTrail APIs and SDKsManagement ConsoleCloud HSM Command Line Interface Elastic Beanstalk for Java, Node.js, Python, Ruby, PHP and .Net OpsWorks CloudFormationContainers & Deployment Technology Partners Consulting Partners AWS MarketplaceEcosystem Support CertificationTrainingProfessional Services
    9. 9. G2 GPU enabled M3 General purpose Memory optimized R3 Storage and IO optimized C3 Compute optimized I2 HS1 32 vCPU 60 GB RAM 720 GB SSD 32 vCPU 244 GB RAM 6.4 TB SSD 16 vCPU 117 GB RAM 48 TB HDD 8 vCPU 15 GB RAM 1536 CUDA cores 4 GB Video RAM 32 vCPU 244 GB RAM 720 GB SSD c3.8xlarge i2.8xlarge hs1.8xlarge r3.8xlarge G2.2xlarge 8 vCPU 30 GB RAM 160 GB SSD m3.2xlarge Amazon EC2 - Broad Selection of Compute Instance Families
    10. 10. AWS as a Data Platform EC2EBS Instance Storage RedshiftRDS SQL Stores EMR hadoop DynamoDB NoSQL Kinesis stream Cloud Search search S3 Storage Services Cloud FrontGlacier DB A Data Velocity Variety Volume Structured, Unstructured, Text, Binary Gigabytes, Terabytes, Petabytes Millisecond, Second, Minute, Hour, Day
    11. 11. Master instance group Task instance group Core instance group HDFS HDFS Amazon S3Amazon Redshift Amazon DynamoDB Amazon EMR - Hadoop Tuned for AWS
    12. 12. Amazon Redshift - Petabyte Scale Data Warehouse Leader Node – SQL endpoint – Stores metadata – Coordinates query execution Compute Nodes – Local, columnar storage – Execute queries in parallel – Backup and restore via S3 – Parallel load from S3, EMR, or DynamoDB HW optimized for data processing – DW1: 2TB – 1.6PB Magnetic – DW2: 160GB – 256TB SSD 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
    13. 13. The Data Processing Challenge !
    14. 14. Innovative Cloud Solutions Ironcluster ETL, Amazon EC2 Edition COLLECT, PROCESS & DISTRIBUTE DATA AT DISRUPTIVE SCALE & COST  Blazingly FAST, infinitely SCALABLE  EASY to use graphical user interface  Self-tuning engine for SMART data integration  The capacity you need, when YOU need it  Instantly provision with single-click access Ironcluster Hadoop ETL for Amazon EMR Now FREE in the AWS Marketplace! Only pure-play ETL app available on the AWS Marketplace
    15. 15. Ironcluster – Enterprise-grade ETL in 3 Easy Steps Done? Spin Down Ironcluster Go to AWS Marketplace & Select Your Ironcluster Instance Spin up Ironcluster & Start Developing 1 2 3
    16. 16. Got Big Data? – Enter Hadoop with Ironcluster Hadoop ETL Now… How do I get productive quickly? ! Many use cases (Where do I start?) !! Disparate tools (or BYOL) !!! Lots of manual coding !!!! Expensive, hard-to-find skills Outcomes: High Costs + Slow Results Get Your Hadoop Cluster ! Procure !! Setup !!! Configure !!!! Deploy
    17. 17. Got Big Data? – Enter Hadoop with Ironcluster Hadoop ETL Now… How do I get productive quickly? ! Many use cases (Where do I start?) !! Disparate tools (or BYOL) !!! Lots of manual coding !!!! Expensive, hard-to-find skills Outcomes: High Costs + Slow Results Get Your Hadoop ClusterGet Your Hadoop Cluster ! Procure !! Setup !!! Configure !!!! Deploy Vs. Now …Get right to work! Fully Productive in Days + No Brainer Cost
    18. 18. Syncsort Ironcluster: Hadoop ETL for Amazon EMR Blazingly Fast, Easy to Use Hadoop ETL on Amazon EMR +( )  Develop MapReduce ETL jobs graphically  Create sophisticated data flows in no time, with a library of Use Case Accelerators  Avoid the coding nightmare without compromising on performance  Develop once, reuse many times  Leverage all your data, including Amazon Redshift & S3 sources/targets  Scale infinitely with a disruptively low, “no brainer” price It’s FREE!!
    19. 19. It’s All About Discovering New Insights An End-to-End Approach to Data Processing & Visualization Create data extracts in seconds with just a click in Ironcluster! Access your data from virtually any source including Social, Redshift, S3, XML, and more Visualize w/ Tableau • Combined power of Hadoop & AWS • Faster queries • All enterprise data • Advanced analytics Vast Variety of Data Sources Process w/ Ironcluster in AWS • Fastest & lightweight run-time ETL engine • Deploy with or without Hadoop • Comprehensive library of transformations TDEs at blazing speed • Directly create TDE files or objects to load Tableau • Cut latency • No pre-requisite software to install Ironcluster Tableau Connector
    20. 20. Lower Your Cost & Optimize Cloud Computing on Any AWS Platform Redshift: Transform data, then load to Redshift for reporting and advanced analytics S3: Stream log data from S3, aggregate for insight into web user behavior, stream back to S3 RDS: Translate data from MySQL, Oracle, Microsoft SQL Server, or PostgreSQL DynamoDB: Join large data volumes & load to DynamoDB for mobile, gaming and add apps <---> Throughput Speed & Efficiency *Users of the new Ironcluster ETL for EC2 can experience up to a 75% reduction in processing time and total cost of ownership when compared to legacy ETL approaches and tools. Based on Syncsort benchmarking and POCs. $ 75% Processing Time Cost *
    21. 21. The Possibilities Are Endless Sort & aggregate massive data volumes generated by mobile devices to improve customer satisfaction Develop & run complex market risk models on big datasets with Ironcluster in Amazon EMR Leverage Use Case Accelerators to quickly deploy click-stream and web log analysis applications in AWS Pre-process PB of data from sensors and research new algorithms to support quality assurance
    22. 22. Visit Us @ The Amazon Web Services Marketplace Try Ironcluster ETL FREE for 30 Days! www.syncsort.com/IronclusterEC2 Got Big Data? Get Ironcluster Hadoop ETL for Amazon EMR FREE! www.syncsort.com/IronclusterEMR Watch this Webcast On-Demand - Including a Product Demonstration! http://bit.ly/1zYh9er

    ×