Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Upcoming SlideShare
Loading in...5
×
 

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

on

  • 69 views

Demand for quicker access to multiple integrated sources of data continues to rise. Immediate access to data stored in a variety of systems - such as mainframes, data warehouses, and data marts - to ...

Demand for quicker access to multiple integrated sources of data continues to rise. Immediate access to data stored in a variety of systems - such as mainframes, data warehouses, and data marts - to mine visually for business intelligence is the competitive differentiation enterprises need to win in today’s economy.

Stop playing the waiting game and learn about a new end-to-end solution for combining, analyzing, and visualizing data from practically any source in your enterprise environment.

Leading organizations are already taking advantage of this architectural innovation to gain modern insights while reducing costs and propelling their businesses ahead of the competition.

Are you tired of waiting? Don't let your architecture hold you back. Access this webinar and hear from a team of industry experts on how you can Break the Barriers to Big Data Insight.

Statistics

Views

Total Views
69
Views on SlideShare
69
Embed Views
0

Actions

Likes
1
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Make agility clearer on this slide….add something here with security/compliance as well. <br />
  • <br /> Our data center footprint is global, spanning 5 continents with highly redundant clusters of data centers in each region. Our footprint is expanding continuously as we increase capacity, redundancy and add locations to meet the needs of our customers around the world.
  • TODO: add Infa and data movmment into this slide. Put apps into the enterprise side, add a layer for Mercator / SFDC as another block on this diagram <br />
  • big reason people are moving so fast to the cloud is breadth of services/features/geo AWS has <br /> <br /> If want to build new businesses from scratch or move some/all workloads to cloud, need a broad array of services and features to make this happen and not have to piecemeal it <br />
  • Today, we’re extending these instance families further. <br /> <br /> HS1 instance family which will double the number of vCPU threads <br /> Increase storage throughput performance from 2.6 to 3.6 gigabits per second. <br /> R3 instance family. R3 instances feature an 8:1 memory to CPU ratio, with up to 244GB of RAM, fast SSD based local storage and enhanced networking. <br /> R3 instances replace the M2 and CR1 instances, focusing on memory-optimized use cases. <br /> R3’s offers more instances sizes up to 244GiB of RAM, with around 27% faster memory based on STREAM performance over M2.
  • Start an EMR cluster using console or cli tools <br /> Master instance group created that controls the cluster <br /> Core instance group created for life of cluster <br /> Core instances run DataNode and TaskTracker daemons <br /> Optional task instances can be added or subtracted to perform work (SPOT) <br /> S3 can be used as underlying ‘file system’ for input/output data <br /> Master node coordinates distribution of work and manages cluster state <br /> Core and Task instances read-write to S3 <br /> <br /> <br />
  • As we’ve seen AWS allows you to instantly provision a great platform to manage and process large amounts of data with and without Hadoop. However, this is just part of the story. Without the right tools, collecting, processing and distributing data for valuable analytics requires either manual coding or writing hundreds of lines of SQL and in the case of Hadoop even Java Pig, HiveQL, and more. <br />
  • That’s why we developed Ironcluster – these are the first and only pure-play ETL solutions available on the Amazon market place, so you can instantly deploy a full feature ETL environment to collect, process and distribute data in the cloud. <br /> <br /> Ironcluster ETL, Amazon EC2 Edition allows you to instantly provision a full-featured ETL environment running on Amazon Elastic Compute Cloud (Amazon EC2). Ironcluster ETL takes away the complexity of data integration, delivering a much more agile ETL environment with the capacity you need, when you need it. No hardware to procure, no software licenses to buy. <br /> <br /> Ironcluster Hadoop ETL runs natively within your amazon EMR cluster – allowing you to leverage the massive scalability and performance of Hadoop in the Cloud
  • Both – Ironcluster ETL and Ironcluster Hadoop ETL are available on the AWS Marketplace, this means <br /> <br /> Let me tell you a bit about each… <br /> <br /> Complete Customer Quote from Greg Sokol, Data Warehouse Architect, ModCloth, an early Ironcluster user. <br /> “We needed an easy to install and upgrade, high-performance, lightweight ETL product that works well in the cloud with Amazon Web Services,”…“Ironcluster ETL has served as a great product given our requirements and priorities, helping us take full advantage of the cost and efficiency benefits we achieve with cloud computing as part of our data management architecture.”
  • Then Hadoop <br /> <br /> <br /> First roadblocl – How do you stand up your Hadoop cluster? <br /> Solution -> Now you have it! <br /> Second: -> Now What? <br />
  • Then Hadoop <br /> <br /> <br /> First roadblocl – How do you stand up your Hadoop cluster? <br /> Solution -> Now you have it! <br /> Second: -> Now What? <br />
  • A bit more detail about Hadoop <br /> <br /> The first and only ETL tool for Amazon EMR <br /> GUI <br /> Use Case Accelerators <br /> Price point <br /> FREE VERSION <br /> Fully integrated Hadoop ETL – Smarter architecture – no code generation <br /> <br /> <br /> Faster time to deployment <br /> And lower costs <br /> We’re part of the AWS marketplace <br /> You don’t have to buy your license – we’re integrated into AWS marketplace for Amazon EMR <br /> <br /> <br /> AWS <br /> Marketplace <br /> Partner network logo <br /> <br /> <br /> <br /> Free online support for the free version <br /> World-class support <br /> Free online for free version <br /> Personal support for paid version
  • In the end is all about the insights you can get from your Data, and we know people love data discovery and visualization tools <br /> <br /> The good news is you can use Syncsort DMX-h with the leading BI tool of your choice, but I specifically wanted to mention Tableau – since they are one of our strategic partners and we just happened to release a fully integrated connector, that allows you to create a Tableau data extract file directly from our interface. <br /> <br /> You simply select Tableau as the target and it will generate the TDE file, no need to install any additional software since we include the Tableau API. <br /> <br />
  • Now from the business perspective there are benefits too….
  • So when you think about amazon….

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight Presentation Transcript

  • 1 ©2014Cloudera, Inc. All rights reserved.1 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved. ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 2 Agenda ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved. • Data Warehouse Vision & Reality • What is legacy data & why an Enterprise Data Hub • Offloading legacy data and workloads to Hadoop • Transform all types of data into self-service analytics • Live Demonstration • Customer case study • Q&A
  • 3 What is this? ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.3
  • 4 Real-Time Mainframe Oracle ERP ETL ETL Data Mart Data Warehouse File XML The Data Warehouse Vision -1998 4 Data Integration & ETL Tools would enable a Single, Consistent Version of the Truth Data Mart Data Mart ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 5 Data Warehouse Reality 2014 5 Real-Time Mainframe Oracle ERP ETL ETL Data Mart File XML Data Integration & ETL Tools would enable a Single, Consistent Version of the Truth Data Mart Data Mart Dormant Data Staging / ELT New Reports SLA’s New Column Complete History ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 6 The Data Warehouse Vision vs Reality Fresher data Longer history data Faster analytics More data sources Lower costs Longer ELT batch windows Shorter data retention Slower queries Weeks/months just to add new data fields Growing costs Vision Reality ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 7 Mainframes | A Critical Source of Big Data 7 Top 25 World Banks 9 of World’s Top Insurers 23 of Top 25 US Retailers 71% Fortune 500 30 Billion Bus. Transactions / day ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 8 Suits & Hoodies – Working Together 8 Integration Gaps Expertise Gaps • COBOL appeared in 1959, Hadoop in 2005 • Mainframe & Hadoop skills shortage Security Gaps • Hosts mission critical sensitive data • Very difficult to install new software on MF Costs Gaps • Mainframe data is (expensive) Big Data • Even FTP costs CPU cycles (MIPS) • Connectivity • Data conversion (EBCDIC vs ASCII) Suits & Hoodies idea: Merv Adrian, Gartner Research. ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 9 Expanding Data Requires A New Approach 9 1980s Bring Data to Compute Now Bring Compute to Data Relative size & complexity Data Information-centric businesses use all data: Multi-structured, internal & external data of all types Compute Compute Compute Process-centric businesses use: • Structured data mainly • Internal data only • “Important” data only Compute Compute Compute Data Data Data Data ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 10 From Apache Hadoop to an enterprise data hub 10 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✖ ✖ ✖ BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE FILESYSTEM MAPREDUCE HDFS Core Apache Hadoop is great, but… 1) Hard to use and manage. 2) Only supports batch processing. 3) Not comprehensively secure. ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 11 From Apache Hadoop to an enterprise data hub 11 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE SYSTEM MANAGEMENT FILESYSTEM MAPREDUCE HDFS CLOUDERAMANAGER ✖ ✖ ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 12 From Apache Hadoop to an enterprise data hub 12 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE SYSTEM MANAGEMENT FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERAMANAGER ✖ ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 13 From Apache Hadoop to an enterprise data hub 13 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE DATA MANAGEMENT SYSTEM MANAGEMENT FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERANAVIGATORCLOUDERAMANAGER SENTRY ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 14 From Apache Hadoop to an enterprise data hub 14 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE DATA MANAGEMENT SYSTEM MANAGEMENT CLOUDERA’S ENTERPRISE DATA HUB FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERANAVIGATORCLOUDERAMANAGER SENTRY ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 15 Partners Proactive & Predictive Support Professional Services Training Cloudera: Your Trusted Advisor for Big Data 15 Advance from Strategy to ROI with Best Practices and Peak Performance ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 16 ©2014Cloudera, Inc. All rights reserved.16 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 17 The Impact of ELT & Dormant Data on the EDW 17 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.  ELT drives up to 80% of database capacity  Dormant – rarely used data – waste premium storage  ETL/ELT processes on dormant data waste premium CPU cycles Hot Warm Cold Data Transformations (ELT) of unused data
  • 1818 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 19 Where to Start? 19 How to identify dormant data? What workloads will deliver the biggest impact? How will you access & move all your data? Can you secure the new environment? How do you optimize it? How do you manage it? How do you make it business-class? What tools do you need? How will you leverage all your data, including mainframes? ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 2020 Offload Legacy Data & Workloads to The Enterprise Data Hub Phase III: Optimize & Secure Phase II: Offload Phase I: Identify One Framework. Blazing Performance, Iron-Clad Security, Disruptive Economics • Identify data & workloads most suitable for offload • Focus on those that will deliver maximum savings & performance • Access and move virtually any data e.g. mainframe to Enterprise Data Hub with one tool • Easily replicate existing staging workloads in Hadoop using a graphical user interface • Deploy on premises and in Cloud • Optimize the new environment • Manage & secure all your data with business class tools • Deliver self-service reporting ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 21
  • 22 The Problem: Volume of DataBusinesses are struggling to unlock exploding data ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 24 The Problem: Old School Software Traditional technologies are complicated, inflexible and slow moving ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 25 The Tableau RevolutionFast and easy analytics for everyone ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 26 FlexibleTransform all types of data into self-service analytics ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 27 For EveryoneEase of use leads to adoption across all departments and use cases ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 28 •LIVE DEMO ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 29 Case Study: Optimize EDW Leading Financial Org 29 0 50 100 150 200 250 ElapsedTime(m) HiveQL 217 min Syncsort DMX-h 9 min HiveQL 217 min Mainframe Offload (74-page COBOL copybook) Development Effort Syncsort DMX-h: 4 hrs. Manual Coding: Weeks! Benefits:  Cut development time from weeks to hours  Reduced complexity 47 HiveQL scripts to 4 DMX-h graphical jobs  Easily validate COBOL copybooks and find errors  Mainframe Data available to business for analytics  Staging & ELT moved out of RDBMS – Queries run faster ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 3030 Final Thoughts.. Rusty Sears ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved. Vice President of Enterprise Data Services and Big Data at Regions Financial Corporation
  • 31 ©2014Cloudera, Inc. All rights reserved.31 QUESTIONS? ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.