Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

on

  • 1,271 views

Rethink data management and learn how to break down barriers to Big Data insight with Cloudera's enterprise data hub (EDH), Syncsort offload solutions, and Tableau Software visualization and ...

Rethink data management and learn how to break down barriers to Big Data insight with Cloudera's enterprise data hub (EDH), Syncsort offload solutions, and Tableau Software visualization and analytics.

Statistics

Views

Total Views
1,271
Views on SlideShare
1,177
Embed Views
94

Actions

Likes
3
Downloads
60
Comments
0

5 Embeds 94

http://www.cloudera.com 87
http://cloudera.com 2
http://author01.mtv.cloudera.com 2
http://author01.core.cloudera.com 2
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • STEVESo what is this?If you are thinking it’s a 3.5 inch floppy disk and it stored 1.44Mb of your data you were born after 1998In 1998 the imac was launched and it was the first home computer not to have one of these as standard just a CD driveAnd to anyone born after then this is the save button in most applications – and you have no idea why it’s the save button and certainly would not call it a floppySo over christmas my mum was sitting with my 4 year old nephew using his ipad and there’s clearly some sort of confusion – so I see the two of them sitting there trying to figure out which slot on the ipad mum can insert a floppy disk with her christmas pudding recipe into.So I can tell you that getting data from a floppy disk onto an ipad is not fun at all and my mum is not sure this whole computer thing is really working out for her son or grandchild because we were largely uselessSo what’s funny is that I guess the equivalent today of a floppy disk is a memory stick – today it can store a lot more data but if I personally want to get a large file from one machine to another like a mac I use dropbox or box and it happens instantly and it’s constantly kept in sync.Technology evolution has completely changed our approach to solving a problem and that’s an important theme
  • Steve + PaulBack when I started my career in Data Warehousing in the 90’s this is what the business was promised.An Enterprise data warehouse would bring together data from every different source system across an organization to create a single trusted source of information.Data would be extracted transformed and loaded into the warehouse using ETL tools – these would be used instead of hand coding SQL or COBOL or other scripts because they would provide a graphical user interface that allowed anyone even a graduate that just joined your team to develop flows and no rocket scientists required scalability to handle the growing data volumesmetadata to enable re-use and sharing and governanceand transparent connectivity to the different sources and targets including mainframeETL would then be used to move data from the EDW to marts and delivered to reporting tools.
  • Steve + PaulThis is the reality of most Data Warehouses today. A spaghetti like architecture has evolved because the market leading ETL tools couldn’t cope with the data volumes on core operations like sort, join, merge, aggregation so that workload was pushed into the only place that could handle it – the databases with their optimizer. But that meant ELT hand coded or generated SQL that became impossible to maintain – a customer told me they called this the onion effect because their staging had become layers of SQL that nobody wanted to touch so they just added another layer on top. But if you ever really had to take the onion apart it would make everyone cry - TDWI estimates it takes upwards of 8 weeks to add a column to a table and in my experience that’s low – most times you have to wait a couple of months before they get to your request and start making the change because of the back-logToday the average cost of an integration project runs between $250K and $1M, according to Gartner
  • So there’s a massive disconnect between the original vision of the warehouse and the realityBut it’s important to note that business users are getting great information from warehouses but they still want fresher data, longer history data, faster analytics, more sources all at a lower costWhile they are seeing longer batch windows – many companies have people sitting around drinking coffee in the mroning until the warehouse is avialableThey have a small subset of a customers lifecycle
  • So the first thing we all need to recognize is that Mainframes today play a very important role in many organizations. Top telcos, retailers, insurance, healthcare and financial organizations of the world – still rely on mainframes for their most critical applications. When talking to these organizations, it’s not unusual to hear that up to 80% of their corporate data originates in the mainframe. Now, that is some serious Big Data, and organizations cannot afford to neglect it. But Can you afford to analyze it? Well, Mainframes today, costs an average of $16M a year for the typical $10B organization!That’s why many of these organizations are now looking at Hadoop and making mainframes a core piece of their Big Data strategy. Just imagine for a second the kind of insights that you could get by combining detail transactional data from mainframes with clickstream data, web logs, and sentiment anallysis…
  • Today we're in the middle of a shift in how businesses use information. In the past, you'd define a set of business processes, build applications around each of them, and then go about gathering, conforming, and merging the necessary data sets to support those applications. From an infrastructure perspective, you'd be bringing the data over to the compute, often in relational databases. But you'd be leaving quite a lot on the table.The modern realities of business demand a new approach. Today companies need, more than ever, to become information-driven, but given the amount and diversity of information available, and the rate of change in business, it's simply unsustainable to keep moving around and transforming huge volumes of data.
  • The foundational platform that's addressing this wide range of problems today is Apache Hadoop, an open source platform for scalable, fault-tolerant data storage and processing that runs on a cluster of industry-standard servers. But Hadoop, in the beginning, wasn't capable of solving these problems. Originally, Hadoop was just a scalable distributed system for storing and processing large amounts of data. You could bring workloads to an effectively limitless amount and variety of data, provided the only kind of work you wanted to do was batch processing by writing Java code, and provided you liked hiring highly-skilled computer scientists to operate it.
  • Cloudera solved the latter problem with Cloudera Manager, the leading system management application for Apache Hadoop. Customers love Cloudera manager because it makes the complex simple. Hadoop is more than a dozen services running across many machines, with limitless configuration permutations. With Cloudera Manager, customers can centrally manage and monitor their clusters from a single tool. It provides automated installation and configuration of your cluster. Cloudera Manager is really our many years of Hadoop experience realized in software, and helps you get up and running quickly.
  • Our customers liked the scalability, flexibility, and economic properties of the platform, but, for example, didn't like that they had to move data out to other MPP analytic databases just to run fast SQL queries, so we built Impala, the world's first open source MPP analytic SQL query engine expressly designed for Hadoop. With Impala, you now have a viable open source alternative to proprietary MPP analytic databases, one that also delivers the core scalability, flexibility, and economic benefits of Hadoop.Now, over the past year we've continued to add to the platform, with Search, and Spark for interactive iterative analytics and stream processing. You also get HBase, the online key-value store, to enable real-time applications on the platform. With this range of diverse ways to access your data in Hadoop, far beyond just Java and MapReduce, you can now bring your existing tools and skill sets to the platform. What's even more exciting is that we've recently made it possible for our partners and other 3rd parties to deploy, manage, and monitor their apps in the platform, again leveraging exciting your investments while letting you access an even greater breadth and depth of data, all in one place.
  • Of course, none of this would matter if the platform weren't reliable, secure, and manageable. * Hadoop today is highly available and Cloudera provides extensions for automated backup and disaster recovery. * Hadoop has had perimeter security for some time but there was a significant gap in the area of fine-grained role-based access controls, the kind you'd expect from a DBMS. That's why, together with the community, we built and contributed the Apache Sentry project which delivers this security for Hive and Impala today, and why we developed Cloudera Navigator to support metadata management, including things like rights auditing, data lineage, and data discovery native to Hadoop. * And all this in addition to the industry-leading system management and customer support you expect from Cloudera.
  • So you can see a lot has happened in just a few short years. Ultimately what you have here is an enterprise data hub, which has four necessary attributes: * It's Secure and Compliant. In addition to perimeter security and encryption, an EDH offers fine-grained (row and column-level) role-based access controls over data, just like your data warehouse. * It's Governed. You need to understand what data is in your EDH and how it’s used, so an EDH must offer data discovery, data auditing, and data lineage. * It's Unified and Manageable. You need to be able to trust that your data is safe, so an EDH must provide not only native high-availability, fault-tolerance and self-healing storage, but also automated replication and disaster recovery. It also much provide advanced system and management to enable distributed multi-tenant performance. * And it's Open. As an EDH makes it possible to cost-effectively retain data for decades, you need to ensure that the foundational infrastructure is based on open source software and an open platform for 3rd parties. Open source ensures that you are not locked in to any particular vendor’s license agreement; nobody can hold your data or applications hostage. An open platform ensures that you’re not locked into a particular vendor’s stack and that you have a choice of what tools to use with the EDH, for example over 200 ISV products – in particular, Syncsort and Tableau - work with Cloudera today.With an enterprise data hub, our customers are able to store and drive real business impactfrom more data than they'd ever thought possible.
  • And beyond just the technology, Cloudera provides everything you need to be successful with Hadoop in the enterprise, including training, professional services, the backing of the industry’s only predictive and proactive global support team, and partnership with the experts who actually build Hadoop.So where do you begin? An enterprise data hub offers the utmost flexibility to start small while thinking big. Many organizations start by using an EDH for storage or active archiving, or to accelerate ETL by offloading that processing from their data warehouse or mainframe environment. Others use an EDH to enable rapid exploration of new and interesting data sets that don’t fit well into relational systems. The best part of an EDH is regardless of where you start, the flexibility of the platform allows you to evolve it over time and move from one use case to another so in the end, you have transformed your data management infrastructure to enable your enterprise to become information-driven.You can get started for free today by visiting cloudera.com.
  • So this is the “Before” BI ArchitectureData sources feeding into a staging layer that has ETL and ELT – but that ELT is using up valuable database resources delivering data out to BI toolsBut business users experience the long wait – with an average of 8 weeks to add a single column
  • ELT consumes capacitySlow response timesUp to 80% of capacity used for ELT less resources and storage available for end user reports.Only Freshest Data is stored “on-line”Historical data archived (as low as 3 months)Granularity is lost Hot / Warm / Cold / DeadLack of agility6 months (average) to add a new data source / column & generate a new reportBest resources on SQL tuning not new SQL creation.Constant UpgradesData volume growth absorbs all resources to keep existing analysis running / perform upgradesExploration of data a wish list item
  • Data Warehouse as a practice has no linkage to a particular technology
  • Tableau mission is to Help people see and understand their data. We have had this mission for over 10 years, and remain completely committed to helping business users discover new insights.
  • The volume of data is a challenge that faces all customers today. Too much data, too many people needing it. We can see from this chart produced by IDC that the growth of data is going to continue to skyrocket in the coming years.
  • Next is the issue of the diversity of data. It’s tough when there are so many sources.
  • Finally, even if you have your big data under control and know how it belongs together, you’re dealing with old school software – hard to use, heavy, complex.
  • That’s what sparked “The Tableau Revolution” – a new type of business intelligence platform. One that was built from the ground up by people focused on making data easier to make sense of. We started by making it intuitive. We wanted you to be able to mash up any type of data. Slice it, filter it, scan it, select, parse it. We wanted it fast. And more than anything we wanted you to leverage the data from its source. This meant you’d no longer need silos, army of engineers, high priest, lots of time, software customizations, stale reports.
  • We made it flexible. First we give you the option to connect to any kind of data whether that is in spreadsheets and files, databases and cubes or in a data warehouse. We also give you the option to connect to your data live or to pull it in memory. If you have data that updates a lot, you’ll want to always have the freshest data. Use a live connection. Or maybe your company has invested a ton of dollars in a fast, state of the art performing database. You’ll want to leverage that. You can choose either, or you can even toggle between the two, switching between live and extracts as you go. Tableau is flexible and allows you to work with any data in the way that makes sense for your environment.
  • We made if for everyone. We made it easy so that anyone would want to adopt it.

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight Presentation Transcript

  • 1 ©2014Cloudera, Inc. All rights reserved.1 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved. ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 2 Agenda ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved. • Data Warehouse Vision & Reality • What is legacy data & why an Enterprise Data Hub • Offloading legacy data and workloads to Hadoop • Transform all types of data into self-service analytics • Live Demonstration • Customer case study • Q&A
  • 3 What is this? ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.3
  • 4 Real-Time Mainframe Oracle ERP ETL ETL Data Mart Data Warehouse File XML The Data Warehouse Vision -1998 4 Data Integration & ETL Tools would enable a Single, Consistent Version of the Truth Data Mart Data Mart ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 5 Data Warehouse Reality 2014 5 Real-Time Mainframe Oracle ERP ETL ETL Data Mart File XML Data Integration & ETL Tools would enable a Single, Consistent Version of the Truth Data Mart Data Mart Dormant Data Staging / ELT New Reports SLA’s New Column Complete History ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 6 The Data Warehouse Vision vs Reality Fresher data Longer history data Faster analytics More data sources Lower costs Longer ELT batch windows Shorter data retention Slower queries Weeks/months just to add new data fields Growing costs Vision Reality ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 7 Mainframes | A Critical Source of Big Data 7 Top 25 World Banks 9 of World’s Top Insurers 23 of Top 25 US Retailers 71% Fortune 500 30 Billion Bus. Transactions / day ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 8 Suits & Hoodies – Working Together 8 Integration Gaps Expertise Gaps • COBOL appeared in 1959, Hadoop in 2005 • Mainframe & Hadoop skills shortage Security Gaps • Hosts mission critical sensitive data • Very difficult to install new software on MF Costs Gaps • Mainframe data is (expensive) Big Data • Even FTP costs CPU cycles (MIPS) • Connectivity • Data conversion (EBCDIC vs ASCII) Suits & Hoodies idea: Merv Adrian, Gartner Research. ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 9 Expanding Data Requires A New Approach 9 1980s Bring Data to Compute Now Bring Compute to Data Relative size & complexity Data Information-centric businesses use all data: Multi-structured, internal & external data of all types Compute Compute Compute Process-centric businesses use: • Structured data mainly • Internal data only • “Important” data only Compute Compute Compute Data Data Data Data ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 10 From Apache Hadoop to an enterprise data hub 10 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✖ ✖ ✖ BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE FILESYSTEM MAPREDUCE HDFS Core Apache Hadoop is great, but… 1) Hard to use and manage. 2) Only supports batch processing. 3) Not comprehensively secure. ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 11 From Apache Hadoop to an enterprise data hub 11 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE SYSTEM MANAGEMENT FILESYSTEM MAPREDUCE HDFS CLOUDERAMANAGER ✖ ✖ ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 12 From Apache Hadoop to an enterprise data hub 12 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE SYSTEM MANAGEMENT FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERAMANAGER ✖ ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 13 From Apache Hadoop to an enterprise data hub 13 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE DATA MANAGEMENT SYSTEM MANAGEMENT FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERANAVIGATORCLOUDERAMANAGER SENTRY ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 14 From Apache Hadoop to an enterprise data hub 14 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE DATA MANAGEMENT SYSTEM MANAGEMENT CLOUDERA’S ENTERPRISE DATA HUB FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERANAVIGATORCLOUDERAMANAGER SENTRY ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 15 Partners Proactive & Predictive Support Professional Services Training Cloudera: Your Trusted Advisor for Big Data 15 Advance from Strategy to ROI with Best Practices and Peak Performance ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 16 ©2014Cloudera, Inc. All rights reserved.16 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 17 The Impact of ELT & Dormant Data on the EDW 17 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.  ELT drives up to 80% of database capacity  Dormant – rarely used data – waste premium storage  ETL/ELT processes on dormant data waste premium CPU cycles Hot Warm Cold Data Transformations (ELT) of unused data
  • 1818 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 19 Where to Start? 19 How to identify dormant data? What workloads will deliver the biggest impact? How will you access & move all your data? Can you secure the new environment? How do you optimize it? How do you manage it? How do you make it business-class? What tools do you need? How will you leverage all your data, including mainframes? ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 2020 Offload Legacy Data & Workloads to The Enterprise Data Hub Phase III: Optimize & Secure Phase II: Offload Phase I: Identify One Framework. Blazing Performance, Iron-Clad Security, Disruptive Economics • Identify data & workloads most suitable for offload • Focus on those that will deliver maximum savings & performance • Access and move virtually any data e.g. mainframe to Enterprise Data Hub with one tool • Easily replicate existing staging workloads in Hadoop using a graphical user interface • Deploy on premises and in Cloud • Optimize the new environment • Manage & secure all your data with business class tools • Deliver self-service reporting ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 21
  • 22 The Problem: Volume of DataBusinesses are struggling to unlock exploding data ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 23 The Problem: Diverse DataBusinesses and their people are struggling to unlock diverse data ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 24 The Problem: Old School Software Traditional technologies are complicated, inflexible and slow moving ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 25 The Tableau RevolutionFast and easy analytics for everyone ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 26 FlexibleTransform all types of data into self-service analytics ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 27 For EveryoneEase of use leads to adoption across all departments and use cases ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 28 •LIVE DEMO ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 29 Case Study: Optimize EDW Leading Financial Org 29 0 50 100 150 200 250 ElapsedTime(m) HiveQL 217 min Syncsort DMX-h 9 min HiveQL 217 min Mainframe Offload (74-page COBOL copybook) Development Effort Syncsort DMX-h: 4 hrs. Manual Coding: Weeks! Benefits:  Cut development time from weeks to hours  Reduced complexity 47 HiveQL scripts to 4 DMX-h graphical jobs  Easily validate COBOL copybooks and find errors  Mainframe Data available to business for analytics  Staging & ELT moved out of RDBMS – Queries run faster ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
  • 3030 Final Thoughts.. Rusty Sears ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved. Vice President of Enterprise Data Services and Big Data at Regions Financial Corporation
  • 31 ©2014Cloudera, Inc. All rights reserved.31 QUESTIONS? ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.