Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Use Big Data Technologies to Modernize Your Enterprise Data Warehouse


Published on

This EMC perspective provides an overview of the EMC Data Warehouse Modernization offering. It describes four tactics that can be implemented quickly, using an organization's existing skill sets, and rapidly show a return on investment.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Use Big Data Technologies to Modernize Your Enterprise Data Warehouse

  1. 1. USE BIG DATA TECHNOLOGIES TOMODERNIZE YOUR ENTERPRISEDATA WAREHOUSE Most organizations’ enterprise data warehouses were built with online transaction processing (OLTP)-centric technologies and architectures that are 15-20 years old. Over the years, more data has been bolted on to these systems, and the query load being driven by both traditional and mobile business intelligence products has increased exponentially, resulting in brittle, over-burdened, costly data warehouses that can take hours to return results. They don’t meet the growing data appetite of the business, and don’t answer the questions needed to run the business at the required levels of granularity, or at the necessary speed. Yet too much has been invested in them to simply throw them out. Big Data market dynamics have resulted in the creation of new technologies, products, and approaches that can be used to modernize these stodgy, inflexible data warehouses, and make them more responsive to the business—without throwing out what is already in place. This paper describes four tactics that can be implemented quickly, using an organization’s existing skill sets, and that canEMC PERSPECTIVE rapidly show a return on investment.
  2. 2. TACTIC #1: ACCELERATE YOUR DATA WAREHOUSE WITH MPP-BASED ARCHITECTURES Massively Parallel Processing (MPP)-based databases provide a cost effective, scale- out data warehouse environment that allows organizations to leverage Moore’s Law1 on performance-to-cost ratio improvements in x86 processors. MPP databases provide BENEFITS a non-intrusive analytical platform/data warehouse for data discovery and exploratory work on massive amounts of data. Built on inexpensive commodity clusters, MPP Leverage more detailed, databases can extend, complement, or replace parts of your existing data warehouse, more robust dimensional managing massive volumes of detailed data, while providing agile query, reporting, data dashboards, and analytics (see Figure 1). • Seasonality to forecast retail sales and energy MPP databases, while offering many of the same benefits as your existing data consumption warehouse, also provide the following advantages: • Localization to pin point • Extreme scalability on general purpose systems lending or fraud exposure • Automatic parallelization • Hyper-dimensionality for • Ability to load and query like any other database digital media attribution • Scanning and processing of all nodes in parallel or health care treatment analysis • Extreme scalability and optimized I/O • Linear scalability to easily add nodes and storage • Improved query and loading performance   Figure  1:  MPP  Data  Warehouse  Architectures  Scale  Easily  to  Speed  Results  and  Process  More   Data     Figure  1:  MPP  Data  Warehouse  Architectures  Scale  Easily  to  Speed  Results  and  Process  More   Data  1 Moores law is the observation that over the history of computing hardware, the number of transistors on integrated circuits doubles approximatelyevery two years. The result is the doubling of computing power at the same cost every 18 to 24 months.
  3. 3. An MPP data warehouse will enable more granular data for query, reporting, and dashboard drill-down and drill-across exploration. Analysis can be performed on detailed data instead of data aggregates. On the analytics side, once a model has been developed and business insights have been gleaned from these data sets, simply migrate the model and/or the insights into the existing data warehouse for integration into the current business intelligence environment. Alternatively, analytic modeling can also be done on the MPP platform, making it part of the production process. TACTIC #2: STOP MOVING DATA TO THE ANALYTICS; BRING THE ANALYTICS TO THE DATABENEFITS One of the most dramatic developments in Big Data is the advent of in-database analytics. In-database analytics addresses one of biggest shortcomings in performingLeverage low-latency (high- advanced analytics—the requirement to move large amounts of data around. That hasvelocity) data access caused many organizations and data scientists to have to settle with working with • Drive realtime customer aggregate tables because the data transfer issue is so debilitating to the analytic acquisition, predictive exploration and discovery process. In-database analytics reverses the process by maintenance, or network moving the analytic algorithms to where the data is stored, accelerating the optimization decisions development and deployment of modeling. Elimination of data movement results in substantial benefits: • Update analytic models on-demand based upon • Moving a few terabytes can take hours. With in-database analytics, it drops to current market or local zero. weather conditions • Because the movement of data is the most time-consuming activity in logical processing time, reducing data movement reduces the processing time by 1/N, where N is the number of processing units. Processing time for 1 TB can be reduced by a factor of 16 with only a five-processor system, going from 193 minutes to 12 minutes (see Figure 2).     Figure  2:  In-­‐database  Analytics  Dramatically  Speeds  Processing  Time  
  4. 4. TACTIC #3: USE ALL OF YOUR DATA WITH ABENEFITS NEXT GENERATION OPERATIONAL DATA STORE The Hadoop Distributed File System (HDFS) provides a powerful yet inexpensiveManage a wide variety of option for modernizing Operational Data Store (ODS) and Data Staging areas. HDFSstructured and unstructureddata sources is a cost-effective, large storage system with an intrinsic computing and analytical capability (MapReduce). Built on commodity clusters, HDFS simplifies the acquisition • Integrate unstructured and storage of diverse data sources, whether structured, semi-structured (e.g., web claims descriptions to logs and sensor feeds), or unstructured (e.g., social media, image, video, and audio). reduce fraudulent claims Once in the Hadoop file system, MapReduce and commercial Hadoop-based tools are • Leverage mobile data to available to prepare the data for loading into an existing data warehouse. The ability create realtime to “define schema on query” versus “define schema on load” simplifies amassing data promotions from a variety of sources, even if you are not sure when and how you might use that data later (see Figure 3). • Leverage sensor readings to optimize yield and The result is a single platform for feeding both your data warehouse and analytics pricing environment. This inexpensive, scale-out solution can be used to store ALL of your data.BENEFITS Figure  3:  Use  Hadoop  as  an  Operational  Data  Store  and  Analyze  ALL  of  the  Data  Leverage new metrics,dimensions, anddimensional attributesgleaned from unstructured TACTIC #4: LEVERAGE UNSTRUCTURED DATAdata sources TO ADD NEW METRICS TO AN ENTERPRISE • Leverage customers’ interests, passions, DATA WAREHOUSE associations, and An easy way to start building experience with Hadoop and MapReduce is to use these affiliations to improve technologies to create new metrics from an unstructured data source that can be fed micro-segmentation into the enterprise data warehouse. This will provide the ability to leverage data such as social, mobile, consumer comments, email, doctors’ notes, or claims descriptions • Add sensor-generated to identify new metrics that are better predictors of performance. Most organizations’ performance data into existing data warehouses are treasure troves of key performance indicators and your manufacturing, metrics used to monitor business performance. Use Hadoop and MapReduce to parse supply chain, or product through unstructured data to identify new business performance metrics that can be predictive maintenance integrated into the existing data warehouse (see Figure 4). models
  5. 5. Figure  4:  Parse  Unstructured  Data  Using  Hadoop/MapReduce  and  Incorporate  Results  into  the  Enterprise  Data  Warehouse  Once these new metrics are in the enterprise data warehouse, they can be used toenhance existing business intelligence queries, reports, dashboards, and analyses(see Figure 5).Figure  5:  Integrate  Social  Media  Metrics  into  the  Existing  BI  Environment  Note: implementing this tactic places companies in a good position as Hadoopcontinues its assimilation into the relational database market. Being able to createmetrics and process data on Hadoop, leveraging tools like HBase and Hive that areevolving quickly, and having BI tools connect directly to HDFS, may make datawarehouse professionals question why they need to move data to a relationaldatabase at all.MODERNIZE YOUR DATA WAREHOUSE TODAYIn the world of revolutionary, game-changing Big Data developments, datawarehouse modernization may sound like an evolutionary development. However, it issomething that can be executed today, with existing data warehouse skills, andrepresents a simple first step toward gleaning immediate business value andorganizational agility from Big Data technologies. Why are you waiting?
  6. 6. EMC CONSULTING As part of EMC® Corporation, the world’s leading developer and provider of information infrastructure technology and solutions, EMC Consulting provides strategic guidance and technology expertise to help organizations exploit information to its maximum potential. With worldwide expertise across organizations’ businesses, applications, and infrastructures, as well as deep industry understanding, EMC Consulting guides and delivers revolutionary thinking to help clients realize their ambitions in an information economy. EMC Consulting drives execution for its clients, including more than half of the Global Fortune 500 companies, to transform information into actionable strategies and tangible business results.CONTACT USFor more information, orcontact your local EMC Consultingrepresentative. EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. © Copyright 2012 EMC Corporation. All rights reserved. Published in the USA. 08/12 EMC Perspective H10915 EMC believes the information in this document is accurate as of its publication date. information is subject to change without notice.