Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk
Upcoming SlideShare
Loading in...5
×
 

Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

on

  • 1,738 views

Lightening talk from the Hadoop Summit 2013 in Amsterdam covering how Syncsort is helping make Hadoop Ready for Prime Time. It includes the pluggable sort contribution - the impact on sort, join, ...

Lightening talk from the Hadoop Summit 2013 in Amsterdam covering how Syncsort is helping make Hadoop Ready for Prime Time. It includes the pluggable sort contribution - the impact on sort, join, aggregation, merge, filter in hadoopand Syncsort's ability to move mainframe data to hadoop - Big Iron to Big Data.

Statistics

Views

Total Views
1,738
Views on SlideShare
1,738
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Organizations typically struggle with data processing at all stages of the Big Data Continuum

Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk Presentation Transcript

  • Making Hadoop Ready for Prime Time Hadoop Summit Amsterdam March 2013 Steve Totman Director Of Strategy Syncsort March 20th 2013 Photo Credit Aaron Sikkink http://www.flickr.com/people/housequakecom/
  • 2
  • Syncsort Confidential and Proprietary - do not copy or distribute 3
  • The Big Data Continuum Big Data Continuum Handcoding nightmare Integrating Big Data… Smarter Hand-coding: SQL, JCL. Basic ETL Tools Challenges Min Data Awakening SQL Migration Max Value Advancing Traditional BI Standardization & Plateauing Dynamic Hitting arch limits + Early Hadoop Heavy Platforms. exponential costs. adoption prototyping Demand for MF data Growing MIPS & experimentation Long development cycles Highperformance ETL Syncsort Confidential and Proprietary - do not copy or distribute Unsustainable costs ETL & Rehosting Optimization Hadoop connectivity & sort gaps Hadoop Sort & Connectivity Evolved Big Data is the new standard for both MF & open systems data Efficiency, ETL & skills gaps Hadoop ETL DMExpress MFX 4
  • Mandatory sort steps in MapReduce processing Syncsort Confidential and Proprietary - do not copy or distribute 5
  • Syncsort Confidential and Proprietary - do not copy or distribute 6
  • 7
  • Smart Contributions to Improve Hadoop Native Sort: ᵡ modular Not ᵡ Limited capabilities ᵡ Difficult to fine-tune & configure (requires JIRA Description 4807 Allow MapOutputBuffer to be pluggable 4808 Allow Reduce-side merge to be pluggable 4809 Make classes required for 2454 public 4812 Create reduce input merger plug-in 4842 Shuffle race can hang reducer 2461 HDFS file name globbing in libhdfs 4482 Backport of 2454 to MapReduce 1 & 1.2 coding & compilation) Native Sort Native Sort Hadoop Contribution: Hadoop Node Node  Modular  Extensible  Configurable through use of external sorters on MapReduce nodes Native Sort Native Sort Hadoop Node Hadoop Node First Included - Hadoop distribution, CDH4.2, on February 26th …and more!! 8 Sy nc
  • Benefits to the Community MATCH COMPRESSION MERGE TeraSort Benchmark RANK LOOKUP Elapsed Time (min) 250 200 150 100 50 0 0 1000 2000 3000 File Size (GB) JOIN AGGREGRATION Syncsort Confidential and Proprietary - do not copy or distribute 4000 5000 CDC 9
  • Data Access: Mainframes Today Syncsort Confidential and Proprietary - do not copy or distribute 50% Run 10
  • Syncsort. A Bridge to Scalable, Cost-effective Big Data Connect Pre-process •HDFS Connectivity •Mainframe •Teradata •Files •RDBMS, Appliances •Sort, Join •Aggregate •Compress •Partition Facilitate •Graphical UI •No Manual Coding •No Tuning Optimize •Up to 6x Faster Load •Up to 2x Faster Sort •Faster MapReduce Jobs •Less Storage Over 40 Years Solving Big Data Challenges with Fast. Efficient. Simple. Cost Effective DI Technology Syncsort Confidential and Proprietary - do not copy or distribute 11
  • Hourly Load into comScore’s Hadoop Cluster SyncSort’s DMExpress saves comScore over 4TB of data per day! That’s 1460TB a year -1.42 Petabytes 500,000,000,000 450,000,000,000 400,000,000,000 350,000,000,000 300,000,000,000 250,000,000,000 200,000,000,000 150,000,000,000 100,000,000,000 50,000,000,000 1 2 3 4 5 6 7 8 9 10 Input Data in Bytes © comScore, Inc. Proprietary. 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Output Data in Bytes 12
  • comScore’s Daily Trend of Event Volume 5,000,000,000 40,000,000,000 4,000,000,000 30,000,000,000 3,000,000,000 20,000,000,000 2,000,000,000 10,000,000,000 1,000,000,000 0 # of panel records 6,000,000,000 50,000,000,000 # of census records 60,000,000,000 0 Beacon Records Panel Records Please Attend Mike Brown’s Session Analyzing 1.4 Trillion Events with Hadoop Tomorrow © comScore, Inc. Proprietary. 13
  • (No elephants were harmed during the creation of this talk but some are now a lot faster & meaner) Please visit our booth to register for a free evaluation Syncsort Confidential and Proprietary - do not copy or distribute © comScore, Inc. Proprietary. 14