Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop Business Cases


Published on

Published in: Sports, Technology
  • Be the first to comment

Hadoop Business Cases

  1. 1. Hadoop BusinessCasesDell | Hadoop White PaperBy Joey JablonskiDell | Hadoop White Paper Series
  2. 2. Dell | Hadoop White Paper Series: Hadoop Business CasesTable of ContentsHadoop brings new capabilities 3Management of the data fire hose 5Hadoop enters the enterprise 5 Analytics 6 Risk modeling 6Hadoop ecosystem 6Hadoop futures 7About the author 7Special thanks 7About Dell Next Generation Computing Solutions 7References 8To learn more 8This White Paper is for informational purposes only, and may contain typographical errors and technical inaccuracies.The content is provided as is, without express or implied warranties of any kind.© 2011 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden.For more information, contact Dell. Dell, the Dell logo, and the Dell badge, and PowerEdge are trademarks of Dell Inc. 2
  3. 3. Dell | Hadoop White Paper Series: Hadoop Business CasesHadoop brings new capabilitiesGrowing data volumes and interconnected systems create a need for a tool capable of building the next generation ofanalytics and data management solutions. Hadoop provides a framework for your company to analyze and managegrowing volumes of data while storing data longer than previously possible at a competitive price point. By extending thelife of data and not discarding it, you can enable staff to review historic data in new ways and analyze it as new methodsemerge.The Hadoop taxonomy is outlined in Figure 1, showing the components common to all Hadoop environments. Thesecomponents are part of the core Apache Hadoop project. The Hadoop architecture is very pluggable, allowing anycomponent to be replaced with one optimized for a specific workload, while allowing a large variety of data presentationlayers to utilize the data stored in Hadoop. The vertical bars on the right designate components not included as part of adefault Hadoop distrobution; these components are commonly provided by IT providers to enhance their Hadoopofferings.Figure 1. Core components of a Hadoop deploymentIn addition to the core Hadoop components shown in Figure 1, a variety of projects have developed as part of theHadoop ecosystem to provide specific solutions for using data within Hadoop in common ways. Many projects haveevolved for storing and processing specific types of data within Hadoop, allowing many industries to create specificsolutions built on a common storage and compute engine within Hadoop. 3
  4. 4. Dell | Hadoop White Paper Series: Hadoop Business CasesFigure 2. The core Hadoop ecosystem with additional tools for data presentationComponents of the Hadoop ecosystem can be built on one or more of the three primary Hadoop use cases: Compute Storage Database Hadoop is commonly used as a One primary component of the The Hadoop ecosystem contains distributed compute platform for Hadoop ecosystem is the Hadoop components that allow the data analyzing or processing large Distributed File System (HDFS). The within the HDFS to be presented amounts of data. The Hadoop HDFS allows users to have a single in a SQL interface. This allows the ecosystem provides APIs addressable namespace, spread use of standard functions, necessary to distribute and track across many hundreds or thousands including INSERT, SELECT, and workloads as they are run on large of servers, creating a single large file UPDATE of data within the numbers of distributed machines. system. Hadoop manages the Hadoop environment, with replication of the data within this file minimal code changes to existing system to ensure hardware failures do applications. These components not lead to data loss. Many allow developers to quickly organizations will use this scalable file access the data stored within a system as a place to store large Hadoop environment with tools amounts of data that is then accessed they are experienced at using. by jobs run within Hadoop or by external systems.Hadoop provides a consistent, scalable base of tools within an organization for storing, managing, and analyzing data,without being tied to any specific department or framework. Hadoop enables your organization to use a single set of data 4
  5. 5. Dell | Hadoop White Paper Series: Hadoop Business Casesfor all departments’ reporting, analysis, and research needs. This single source enables better quality results and eliminatesthe cost and complexity of managing multiple islands of data.Business is changing quickly; the goals of any individual tool today may not be the same tomorrow. The same goes fororganizations and their areas of focus within a large corporation. Making the decision about what data to discard in thecurrent floods of data most companies are experiencing is a difficult challenge. Hadoop enables your company to storemore data, with less overhead than ever before. This enables your staff to ask questions of that data and analyze it in newways later that are not even thought of today.Management of the data fire hoseThe evolving community around “big data” (the industryterm for environments containing large volumes ofrelated but un-structured data) finds new ways foranalyzing and managing growing volumes of data. Weare also exploring the creation of new ways for makingsense of otherwise large piles of previously Decisionsmisunderstood data sets.On any given day, most companies do not know whatquestions to ask of certain data. When this has occurredin the past, companies would purge that data because ofthe cost of storing data with indeterminate value. Today, Questionscompanies exploit tools like Hadoop for storing that datafor much longer periods of time, often until such timethat staff find new ways to understand how the data canbe used and what questions can be asked of the data. DataToday, data is as valuable as any software writtenby a company or any product it designs.The data is the component that drives next-generationproducts and enables maximum revenue attainmentfrom existing products. Hadoop provides a low-barrier-to- Figure 3. Questions bring out the value of data.entry solution for storing the additional data being created bytoday’s companies.Hadoop enters the enterpriseHadoop is rarely initially deployed as a company-wide data analytics solution; more often, Hadoop is deployed by a singledepartment or organization that sees it as a solution to certain challenges. Hadoop inevitably is then used by more andmore departments, becoming a more critical piece of the corporation’s storage and analytics solutions.Hadoop deployments commonly start with a smaller deployment within a virtual environment; this could be virtualmachines hosted on premise or in a public cloud environment. This method enables your IT staff to learn about managingHadoop and enables your developers to begin testing ideas they have about uses of Hadoop. This use of virtualinfrastructure will usually stop as soon as real workloads are tested, and this usually signifies a move to physical hardwarededicated to Hadoop. This change is primarily driven by data volumes and performance needs. At a certain inflectionpoint, moving data to a public cloud becomes too time-consuming, so companies look to internally hosted and managedHadoop solutions.It is important to understand the evolution of Hadoop in your environment to ensure that you adequatly plan each stageof the evolution. Hadoop can rapidly become a large, complex component of your information technology (IT)department. By understanding how Hadoop commonly evolves, you can better manage that evolution in yourenvironment and ensure Hadoop meets your company’s needs, without causing an undue operations burden. 5
  6. 6. Dell | Hadoop White Paper Series: Hadoop Business CasesAnalyticsAnalytics are becoming a more critical component in all business environments. Analytics are being used to provide nearreal-time reporting on the state of a business, allowing leaders to make rapid decisions to correct the course of anorganization or to capitalize on the needs of the market. The emerging market of tools for analytics allows companies tomanipulate the raw data they get from a variety of sources and make intelligent decisions about the state of the business.Many marketing and sales-focused organizations are now using Hadoop as the core of their analytics programs. Hadoopis used to store a central copy of customer data and product usage information, allowing those developing pricingmodels and sales models to refine the data in new ways, looking for new relationships. These analytics allow the analyststo look for new relationships, not previously possible with traditional, separate relational database-driven data warehouseenvironments.Another example of using analytics to minimize operational expenses is in IT. By leveraging the hyperscale compute andstorage capabilities of Hadoop, your IT personnel can optimize system reporting, analyze system performance versusoperational expenses, detect potential cases for system failures, and minimize system downtime. Your CIO and ITmanagers can analyze the most optimal operational models, determine operational inflection points, and plan the nextbudget cycle.Risk modelingMany financial services firms are beginning to use Hadoop for risk modeling. Hadoop provides a base for storing andprocessing large amounts of data, enabling firms to focus on algorithm development and optimization. Hadoop enablescompanies to avoid the difficulty in massively parallel programming, while exploiting the capabilitie s provided bycommodity hardware and software.By using Hadoop to enable your company’s risk modeling projects, data from many different sources can be pulled into asingle location and modeled by a single set of algorithms. A traditionally large company required risk modeling to occur atbusiness unit or departmental levels. This modeling was commonly was done in different ways by the different financialanalyst teams. Hadoop enables a single, companywide team to model a company’s exposure to risk and understand whatdynamics are at play against that risk position.Figure 4. Hadoop enables a single, companywide team to model exposure to risk.Hadoop ecosystemThe Hadoop ecosystem is a rapidly growing and evolving set of tools for Hadoop operations and tools specific to verticalsand uses for Hadoop. The Hadoop ecosystem contains many tools specific to operational use cases and the manipulationof specific types of data. This large ecosystem makes Hadoop a strong platform for companies as they evaluate and growtheir analytics or business intelligence environments. Some of the most common tools within the Hadoop ecosystem forsupporting scale-out environments include Flume, Sqoop, and Zookeeper.Flume is a commonly used tool within the Hadoop ecosystem for handling streaming data. Flume provides a frameworkfor agents on one or many servers to collect events and store them in a single HDFS namespace. Flume also provides thenecessary frameworks for developing work streams for processing those events, reporting on them, and taking action onthem.Sqoop is a component within the Hadoop ecosystem for enabling connectivity between Hadoop environments andtraditional SQL environments, including relational databases and data warehouses. Sqoop enables automated processesto be developed for moving data between Hadoop and data warehouses, enabling data warehouses to have access to 6
  7. 7. Dell | Hadoop White Paper Series: Hadoop Business Caseslarge amounts of data traditionally stored in other environments or not available at all to business intelligence d evelopersand analysts.Zookeeper is a component commonly used by applications that exploit data stored in the HDFS. Zookeeper provides aframework for managing distributed applications and the locks between them for consistent data access, providingnaming services, and providing synchronization between separate servers and processes that are part of a single, largerapplication.Hadoop futuresMost organizations have used specialized teams for business intelligence development and exploitation of a compan y’sdata. Hadoop enables that functionality to be pushed to a larger group of staff within the organization. Hadoop provides asingle unified interface and data store for many staff across all departments to use when analyzing company statistics anddeveloping new methods for success in a market.Hadoop empowers all your employees to think of new ways to improve the bottom line and allows them access to thenecessary information to test their theories, develop strategies, and report on changes in the business.Hadoop provides the base software and associated ecosystem to manage growing amounts of data. Hadoop enablesyour company to store more data than ever before and provide it to a larger portion of the staff for analysis both todayand tomorrow. Hadoop can be used to enable near real-time decision making by your company leadership and allowyour staff to test new ideas and analyze data in new ways.About the authorJoey Jablonski is a principal solution architect with Dell’s Data Center Solutions team. Joey works to define andimplement Dell’s solutions for Big Data, including solutions based on Apache Hadoop. Joey has spent more than 10 yearsworking in high performance computing, with an emphasis on interconnects, including Infiniband and parallel fi lesystems. Joey has led technical solution design and implementation at Sun Microsystems and Hewlett-Packard, as well asconsulted for customers, including Sandia National Laboratories, BP, ExxonMobil, E*Trade, Juelich SupercomputingCentre, and Clumeq.Special thanksThe author extends special thanks to:  Rob Hirschfeld, Principal Cloud Solutions Architect, Dell  Aurelian Dumitru, Principal Cloud Solutions Architect, Dell  John Igoe, Executive Director, Next Generation Computing Solutions, DellAbout Dell Next Generation Computing SolutionsWhen cloud computing is the core of your business and its efficiency and vitality underpin your success, the Dell NextGeneration Computing Solutions are Dell’s response to your unique needs. We understand your challeng es—fromcompute and power density to global scaling and environmental impact. Dell has the knowledge and expertise to tuneyour company’s “factory” for maximum performance and efficiency.Dell’s Next Generation Computing Solutions provide operational models backed by unique product solutions to meet theneeds of companies at all stages of their lifecycle. Solutions are designed to meet the needs of small startups whileallowing scalability as your company grows.Deployment and support are tailored to your unique operational requirements. Dell’s Cloud Computing Solutions canhelp you minimize the tangible operating costs that have hyper-scale impact on your business results. 7
  8. 8. Dell | Hadoop White Paper Series: Hadoop Business CasesReferencesBig Data Cloud To learn more To learn more about Dell cloud solutions, contact your Dell representative or visit:©2011 Dell Inc. All rights reserved. Trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Specifications arecorrect at date of publication but are subject to availability or ch ange without notice at any time. Dell and its affiliates cannot be responsible for errors or omissions in typography orphotography. Dell’s Terms and Conditions of Sales and Service apply and are available on request. Dell service offerings do not affect consumer’s statutory rights.Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerVault are trademarks of Dell Inc. 8