• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
KB Ramesh - TB2957 - Real-time, big data analytics
 

KB Ramesh - TB2957 - Real-time, big data analytics

on

  • 2,066 views

HP Expert: KB Ramesh, presentation deck from HP Discover 2012 Las Vegas “Real-time, big data analytics "

HP Expert: KB Ramesh, presentation deck from HP Discover 2012 Las Vegas “Real-time, big data analytics "

Statistics

Views

Total Views
2,066
Views on SlideShare
2,003
Embed Views
63

Actions

Likes
4
Downloads
111
Comments
0

3 Embeds 63

http://h30507.www3.hp.com 61
http://hpblogs.lithium.com 1
http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • How many in the room are executing upon data analytics?How many of you are reaping benefits to make intelligent decisions from Data Analytics?
  • Classically, there are three major levels of management and decision making within an organization: operational, tactical and strategic (see Figure 1). While these levels feed one another, they are essentially distinct. Operational data deals with day-to- day operations. Tactical data deals with medium-term decisions. Strategic data deals with long- term decisions. Decision making changes as one goes from level to level. At the operational level, decisions are structured. This means they are based on rules. (A credit card charge may not exceed the customer's credit limit.) At the tactical level, decisions are semi-structured. (Did we meet our branch quota for new loans this week?) Strategic decisions are unstructured. (Should a bank lower its minimum balances to retain more customers and acquire more new customers?)
  • Big data analytics is an area of rapidly growing diversity. Big data analytics is more emergent and multifaceted, but less understood by the IT generalist. Development of Big data analytics processes has been driven historically by the web. However, the rapid growth of applications for Big data analytics is taking place in all major vertical industry segments and now represents a growth opportunity to vendors that's worth all the hype.Therefore, trying to define it is probably not helpful. What is helpful, however, is identifying the characteristics that are common to the technologies now identified with Big data analytics. These include:The perception that traditional data warehousing processes are too slow and limited in scalabilityThe ability to converge data from multiple data sources, both structured and unstructuredThe realization that time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologies
  • Animations to go away.How many of you are reaping benefits to make intelligent decisions from Data Analytics?How many are at this level (slide 9)? – Intermediary 1So how many of you are at this level (slide 10)? - AdvancedNow how many are doing this (slide 10) with all available data (no sampling)?We need something for people who are in advanced – They learn 1. what are other tools available for advanced analytics…Methodology and thought processCredibilityHP capabilities, solution for meet their advanced analytics requirement
  • Analytics Big data: Business intelligence is scaling out beyond its traditional boundaries to "every corner of the enterprise"—from point of sale terminals to HR to, of course, IT. The role of data warehousing for IT, or "big data," is emerging as a core focus for both vendors and IT adopters seeking more effective ways to apply mature data warehousing techniques to the business of IT. One of the more interesting, emerging areas is social data analytics—both for IT and beyond IT—as businesses seek to apply techniques such as sentiment analysis, geo-location, behavioral, social graph, and rich media social data to better understand everything from customer likes and dislikes and more effective risk management, to leveraging social media within IT as a foundation for problem resolution and requirements definitions. Advanced Threat Intelligence: As targeted threats continue to flourish and increase in sophistication, the requirements for better information gathering and data-driven security are self evident. These requirements go far beyond looking at isolated denial of service or virus issues to broader situational analysis. This is another application of big data, but one which may also run into privacy issues as advanced threat intelligence expands its reach. Advanced Performance-to-Business Management Analytics: Parallel but fundamentally distinct advances in analytics as applied to service, application and infrastructure performance management are also becoming significant game changers in 2012. With new solutions from both platforms and smaller suite providers automating insights into cross-domain performance interdependencies—across what sometimes become hundreds of different sources (or many hundreds of thousands depending on how it's measured) the chances for IT to break through the insoluble areas of triage is more promising than ever. Given the many multiple advances in this area (and multiple analyst predictions in this space), it's worth noting a few distinct areas within this broader direction: User Experience Management (UEM) has come into its own and cloud has helped it along as an ultimate point of IT governance. Along with application performance insights, UEM may also explore business process and business behavior impacts, as well as shed light on how customers actually use IT services—perhaps the biggest single gap in running IT as a business. Executive Dashboards will thrive atop these advancing trends, and some will also have roots in data warehousing. Application Discovery and Dependency Mapping and the modeling it can deliver in connection with Melds: Capacity planning, performance, and business impact are all beginning to intersect in analytic "melds" across domains with both real-time and historical/trending values. Network: Applications and services all come together over the network—and network management will continue to drive forward with "application-aware" solutions with more powerful capabilities for leveraging application flows for performance, capacity, and even governance and compliance requirements. Along with this, EMA predicts the rise of next generation network management platforms, optimized to support virtualized infrastructures, more rapid deployment, and the consolidation of roles that EMA has documented with the advent of cloud computing. Predictive Analytics in Support of Automation: While automation deserves its own heading, the relation between predictive analytics and automation technologies —from Workload Automation (WLA) to IT process automation (or run book)—will continue to transform the automation landscape. Another, and not unrelated transformative factor will continue to be service modeling from the CMDB/CMS as modeled interdependencies and the policies around them will begin to advance in defining automation routines and associating them with larger processes
  • Big data storage is related in that it also aims to address the vast amounts of unstructured data fueling data growth at the enterprise level. But the technologies underpinning Big data storage, such as scale-out NAS and object-based storage, have existed for a number of years and are relatively well understood.At a very simplistic level, Big data storage is nothing more than storage that handles a lot of data for applications that generate huge volumes of unstructured data. This includes high-definition video streaming, oil and gas exploration, genomics -- the usual suspects. A marketing executive at a large storage vendor that has yet to make a statement and product introduction told me his company was considering “Huge Data” as a moniker for its Big data storage entry.
  • Scale horizontally (scale out)To scale horizontally (or scale out) means to add more nodes to a system, such as adding a new computer to a distributed software application. An example might be scaling out from one Web server system to three.As computer prices drop and performance continues to increase, low cost "commodity" systems can be used for high performance computing applications such as seismic analysis and biotechnology workloads that could in the past only be handled by supercomputers. Hundreds of small computers may be configured in a cluster to obtain aggregate computing power that often exceeds that of single traditional RISC processor based scientific computers. This model has further been fueled by the availability of high performance interconnects such as Myrinet and InfiniBand technologies. It has also led to demand for features such as remote maintenance and batch processing management previously not available for "commodity" systems.The scale-out model has created an increased demand for shared data storage with very high I/O performance, especially where processing of large amounts of data is required, such as in seismic analysis. This has fueled the development of new storage technologies such as object storage devices.Scale out solutions for database servers generally seek to move toward a shared nothing architecture going down the path blazed by Google of sharding.
  • Next Generation Data Warehousing The three leading, until recently independent Next Generation Data Warehouse vendors – Vertica, Greenplum, and Aster Data – are upending the traditional enterprise data warehouse market with massively parallel, columnar analytic databases that deliver lightening fast data loading and near real-time query capabilities. The latest iteration of the Vertica Analytic Platform, Vertica 5.0, for example, includes new elasticity capabilities to easily expand or contract deployments and a slew of new in-database analytic functions. Aster Data has pioneered a novel SQL-MapReduce framework, combining the best of both data processing approaches, while Greenplum’s unique collaborative analytic platform, Chorus, provides a social environment for Data Scientists to experiment with Big data. All three vendors experienced significant revenue growth over the last two-to-three years, with Vertica leading the way with an estimated $84 million in revenue in 2011, followed by Aster Data with $52 million, and Greenplum with $40 million.
  • HP Converged Infrastructure  uses a common modular architecture resulting in a simpler, more automated, and integrated infrastructure that truly accelerates the business. Other solutions in the market loosely integrate systems and solutions.   This results in continued silos, wasted resources, and puts you at a competitive disadvantage.Gen 8 sources: Press release and brochureCI sources: http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA3-3333ENWWe have more than 180,000 channel partners worldwide, including major and emerging software and hardware vendors and system integrators. Through our AllianceONE program, we work closely with these partners to deliver integrated solutions based on open standards. Each offering is tightly integrated and pre-tested, and brings together all the key hardware, software, and services components.HP provides infrastructure roadmap, infrastructure design & build as well as advisory consulting
  • Determining the proper workload sizing and Hadoop configuration requires experienceApplying generic workload sizing ignores your anticipated workload growth requirementsMost guidelines ignore platform features and scalability leversHow do you balance and scale the resources as you grow and evolve?What are the planning requirements to host the cluster in your data center?What may appear affordable at 10s nodes may not at 100s of nodesBest Practice: Collaborate with your chosen vendor to properly size and configure based upon your anticipated needsLeverage vendor reference architectures, appliances and experienceUtilize vendor guidelines, your anticipated needs and current projections, and Pilot application lessons-learnedSize and configure with scalability, cost-performance and evolution needs in mind
  • 1 - Hadoop is a framework, not a solution – For many reasons, people have an expectation that Hadoop answers Big data analytics questions right out of the box. For simple queries, this works. For harder analytics problems, Hadoop quickly falls flat and requires you to directly develop Map/Reduce code directly. For that reason, Hadoop is more like J2EE programming environment than a business analytics solution.2 - Hive and Pig are good, but do not overcome architectural limitations – Both Hive and Pig are very well thought-out tools that enable the lay engineer to quickly being productive with Hadoop. After all, Hive and Pig are two tools that are used to translate analytics queries in common SQL or text into Java Map/Reduce jobs that can be deployed in a Hadoop environment. However, there are limitations in the Map/Reduce framework of Hadoop that prohibit efficient operation, especially when you require inter-node communications (as is the case with sorts and joins).3 - Deployment is easy, fast and free, but very costly to maintain and develop – Hadoop is very popular because within an hour, an engineer can download, install, and issue a simple query. It’s also an open source project, so there are no software costs, which makes it a very attractive alternative to Oracle and Teradata. The true costs of Hadoop become obvious when you enter maintenance and development phase. Since Hadoop is mostly a development framework, Hadoop-proficient engineers are required to develop an application as well as optimize it to execute efficiently in a Hadoop cluster. Again, it’s possible but very hard to do.4 - Great for data pipelining and summarization, horrible for AdHoc Analysis – Hadoop is great at analyzing large amounts of data and summarizing or “data pipelining” to transform the raw data into something more useful for another application (like search or text mining) – that’s what’s it’s built for. However, if you don’t know the analytics question you want to ask or if you want to explore the data for patterns, Hadoop becomes unmanageable very quickly. Hadoop is very flexible at answering many types of questions, as long as you spend the cycles to program and execute MapReduce code.5 - Performance is great, except when it’s not – By all measures, if you wanted speed and you are required to analyze large quantities of data, Hadoop allows you to parallelize your computation to thousands of nodes. The potential is definitely there. But not all analytics jobs can easily be parallelized, especially when user interaction drives the analytics. So, unless the Hadoop application is designed and optimized for the question that you want to ask, performance can quickly become very slow – as each map/reduce job has to wait until the previous jobs are completed. Hadoop is always as slow as the slowest compute MapReduce job.
  • Characteristic MapReduce Data volumes • Can handle petabytes (or possibly scale up to greater orders of magnitude)Performance and scalability Automatic parallelization allows linear scaling, even with greater numbers of nodes Communication (phase switch from Map to Reduce) is potential performance bottleneck When application is not collocated with the data, the channel for loading data into the application becomes a potential bottleneck Incrementally adding nodes is easy Data integration • Supports structured, unstructured, and streaming data • Potentially high communication cost at transition between Map and Reduce phasesFault Tolerance • Map reduce model is designed to withstand failure without restarting the process with exception of name node. • Map reduce often involves larger cluster of 50 or moreCharacteristic In-Database analyticsData volumes • Can handle terabytes and can scale to petabytesPerformance and scalability • Designed for rapid access for analytic purposes (queries, reports, OLAP) • Shared-nothing approach provides eminent scalability • Direct operation on compressed columnar data improves performance • Compression decreases amount of data to be paged in and out of memory, and consequently, disk I/OData integration • Supports structured data • Supports real time analytics • Less amenable to integration with unstructured dataFault Tolerance • Generally assume infrequent failures. Small and medium size clusters are less likely to experience failures
  • The Big data market is on the verge of a rapid growth spurt that will see it top the $50 billion mark worldwide within the next five years. As of early 2012, the Big data market stands at just over $5 billion based on related software, hardware, and services revenue. Increased interest in and awareness of the power of Big data and related analytic capabilities to gain competitive advantage and to improve operational efficiencies, coupled with developments in the technologies and services that make Big data a practical reality, will result in a super-charged CAGR of 58% between now and 2017. As explained in our Big data Manifesto, Big data is the new definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big data, the possibilities for new innovation, improved agility, and increased profitability are nearly endlessCheck this web site http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
  • Time Series Analysis – For example, in the financial services industry, quantitative analysts can develop MapReduce applications that use the time-series data in the analytical DBMS to look for profitable tradingpatterns. Continuous Aggregation – The aggregations resulting from the MapReduce application managed within a high performance database for analysis or even operational purposes. This enables analysts to drill down at different levels of aggregation.ETL – It is often said that the bulk of the work of instituting a data warehouse involves data extraction, integration, and consolidation. A large part of that effort involves extraction, transformation, and loading (ETL) of data into the warehouse.Real-time embedded analytics – From enhancing operational activities to complex event processing, combining the results of analytics with continuous applications can add value to the bottom line.Large-scale graph and network analysis – Social network environments demonstrate the utility of managing connectivity.Data volumes –the analysis platform must be able to absorb and handle larger volumes.Performance – Scale-out infrastructure in proportion to the computational, network bandwidth, and storage resources.Data integration – combining of both structured and unstructured dataFault tolerance – It is desirable to enable recovery from a failure without having to restart the entire process Heterogeneity –resource allocation and usage by scaling the using homogenous or heterogeneous systems.Knowledge delivery –support the computational needs to deliver and present the actionable results.Latency – The time from when data is recorded to when questions are answered is a critical .
  • Machine data or “data exhaust” analysis is one of the fastest growing segments of “big data”–generated by websites, applications, servers, networks, mobile devices and other sources. The goal is to aggregate, parse and visualize this  data – log files, scripts, messages, alerts, changes,  IT configurations, tickets, user profiles etc – to spot trends and act.By monitoring and analyzing data from customer clickstreams, transactions, log files to network activity and call records–and more, there is new breed of startups that are racing to convert “invisible” machine data into useful performance insights.  The label for this type of analytics – operational or application performance intelligence.Web log file analysis (who is visiting my website?)Sentiment analysis (what are customers saying about me?)Recommendation engines (what are my customers/visitors likely to buy?)Ad targeting (which ads will appeal to a specific viewer?)Risk modeling (what is the default risk of my credit card holders?)Customer churn analysis (why are my customers leaving?)Web crawling (traditional web search)Predictive analytics (what predictions can I make based on my data?)Ad infinitum…
  • What is the Big data Workshop?HP Big data Strategy Workshop—HP offers guidance right from the start. We work with you to address all your big data challenges: volume, variety, velocity, and value of data In this 3-day workshop, we work with you to discover your current data sources, including business and technical requirementsWe help you architect your business intelligence (BI) platform beginning with guiding sound decision making around new technologyAs part of the HP Big data Strategy Workshop, our subject-matter experts take a holistic approach with key stakeholders involved in your BI and storage infrastructure initiative. During this three day workshop, we can help you understand big data benefits and challenges—and how to address your challenges with available technologies and solutionsWhat problems does it solve?Sorting out how to harness Big data—as a rich repository of information and comes with variety, velocity and volume challenges. Traditional tools won’t mine that information, leaving customers poor in information but awash in data.Guidance on addressing the problem of organizing and protecting data assets by efficiently storing huge amounts of data while also making that data secure and accessible.Grappling with understanding the impact of rapid growth in structured and unstructured data, and the evolution of big data analytics projects impacting other storage areas, such as data management, backup and recovery, data security and compliance What are the benefits?Understand the big data landscape and its challenges, benefits and critical success factorsDefine or refine your big data strategy to include your unique requirementsDiscover and uncover the hidden potential of unstructured dataSet your overall big data strategy to create a roadmap of recommendations and initiativesIntegrate structured and unstructured data in enterprise search systems data collectionsFocus on how and when certain elements of Hadoop can be used to process data volumesImprove your ability to make intelligent decisions through advanced exploratory analyticsLeverage use cases to determine when and how big data needs to be protected, archived, and secured

KB Ramesh - TB2957 - Real-time, big data analytics KB Ramesh - TB2957 - Real-time, big data analytics Presentation Transcript

  • TB2957Big data Analytics© Copyright 2012 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice.
  • Big data analyticsTransforming business intelligence – real time analyticsKB Ramesh - Director WW Storage ConsultingJune 2012© Copyright 2012 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice.
  • AgendaAdvanced analytics building blocks1. Big data ‟ introduction2. Big data Analytics ‟ whole new approach3. Big data ‟ Challenges in harnessing all the data4. Using Next Gen Analytics Architecture5. Big Data Analytics ‟ New Applications and Business Models6. HP solution7. HP follow-on services3 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Big data From threat to opportunity© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Some facts about big dataFrom problem to opportunity• Big data is NOT a problem but an opportunity• Big data isn’t just big but also diverse data types and Size in exabytes streaming data real time• Big data Analytics is the application of advanced analytic techniques to very big data sets such as − Sentiment analysis, geo-location, behavioral, social graph, and rich media social data• Value = better understanding of − customer likes and dislikes − more effective risk management, − leveraging social media within IT as a foundation Time for problem resolution & requirements definition5 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Multi-structured dataWhat is it anyway? And how it can be used to benefit organization?• It’s often a mix of structured, semi-structured, and unstructured data, plus gradations among these• Unstructured data works behind the scenes which subsequently converted to structured data.• Value is in identifying patterns to make intelligent decisions• Value is in influencing decisions if we could see the behavior patterns? Unstructured Neural networks Realm Data mining Strategic of analytical Pure data extraction/ad hoc modeling Tactical OLAP (slice and dice) Parameterized reports Operational Canned reports Structured Levels of reporting and analysis6 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Background: evolution of advanced analytics Advanced analytics required in addition to the traditional processing of theTraditional DW/BI Evolving Current State of DW/BI Converged big data Future Information Converged In-Database Applications Infrastructure analytics IDOL 10 Infrastructure Traditional Advanced Analytics – NLP DW/BI and Artificial Intelligence Traditional Hadoop• Can be fully automated Unstructured data batch DW/BI processing - Hadoop In-Database Advanced Rigor is required analytics analytics• Restricted on types of data• Transaction management „ Latency, compression and speed • New understanding of all multi-structured data (OLTP) „ Requires human intervention • Real-time advanced analytics• Volumes of data (Gbytes  „ Coverage is important rather than rigor • Superior speed with low latency Terabytes) „ Amount of data can be Tbytes  Pbytes • Process information in-memory, In-time, in-place „ Improves the system performance by scale-out „ Statistical data creation, retrieval, and data mining7 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Big Data analytics – the need for new approach Taking unstructured data into account Traditional New The questions that are answeredChallenges approach approach Competitive AdvantageScalablility No Yes What’s the best that can happen? OptimizationIngest high Volumesof data (all available What will happen next? Predictivedata) no Yes AnalysisSampling of data Yes NO What if these trends continue? ForecastingVariety of data(structired, semi- Statisticalstructured, Why is this happening? Analysisunstructured) No Yes Alerts What actions are needed?Simultaneous data Queryand query processing No Yes Drilldown Do You have opportunity or a problem? AdhocFaster access to all reports How many, how often,, where?relevant information No Yes StdAnalyze data at high reports What happened?rates(GB/sec No YesAccuracy in anlytical Degree of Intelligencemodels No Yes 8 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Big data analyticsThe need for whole new approach© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Challenges in harnessing ALL the data• In advanced analytics the data from unstructured to structured must undergo various stages before it can be used by end user can reap benefits• Creation - storing the data, how to optimize and compress it in the data creation stage.• Ingestion - transformations and integrations play a major role , new tools and techniques to process• Analysis - the data may have hidden trends and traits that are immensely useful. Statistical data mining, machine learning and NLP• Visualization -- new modes of data delivery available, visualization for various channels such as graphical vs. tabular Creation Ingestion Analysis Visualization• Storage • Integrations • Tools and • Channels• Elasticity • Tools and Technologies. • In-memory• Compression technologies • Enterprise search support• Data backup and • Standardization • Sentiment recovery strategies • Dashboards analysis10 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • CreationStorage and management© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Creation - big data storage considerations Replication and snapshots• Organizations need to reduce amount of data stored and exploit new storage technologies that improve performance and utilization• Three important directions: − Reducing data storage requirements using data Archiving compression and new physical storage structures such as columnar storage − Improving input/output (I/O) performance using solid-state drives (SSDs) − Increasing storage utilization by using tiered storage -- data stored on different types of devices based on usage Storage tiering and hybrid storage with SSD, SAS and SATA12 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Ingesting, analyzingand visualizing Consuming, processing and publishing the data© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Ingesting – unstructured dataChallenge simplified by solution that:1. Is on scale-out architecture2. Can handle petabytes of data and more3. Can handle data from numerous sources, such as social media, audio, video4. Can process the data in batch and/or real time5. Can provide faster access to relevant information6. Can improve accuracy of analytical models7. Has low latency14 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Using next gen analytics architectureWith Hadoop platform• Next-generation BI architecture is more Operational systems analytical, highly scalable (structured data)• Gives power users greater options to Extract, transform, Operational load batch; near real Statistical access and mix corporate data system time analytics tools Alerts Adhoc user• Brings unstructured and semi-structured (R and CEP) Reports, data fully into the mix using Hadoop and Operational dashboards system non-relational databases Subject Hadoop Areas Data Machine cluster warehouse data In-Database analytics Semi structured data Operational Data store Unstructured Power user Data External data15 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. “Adapted with permission from Wayne Eckerson, Founder, BI Leadership Forum, www.bileadership.com.
  • What is in Hadoop platform?Role of Hadoop in big data analytics• Able to handle enormous volumes of data, variety at greater velocity.• More likely to be used than traditional data management systems to: − Identify patterns − Archive the data − Parse logs − Transform data − Perform types of analytics that couldn’t be done on large volumes of data before capturing all source data (pre-process) − Keep more historical data (post-process)16 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Hadoop platform landscapeHadoop ecosystem map17 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Why Hadoop on HP Converged Infrastructure?Open, modular, resilient, high performance, extreme scale-out architecture World’s Most Self Sufficient Servers World’s Best Track Record • 150 design innovations and over 900 • HP manages more than 3 million square patents for HP ProLiant Gen 8 servers feet of data center space • 6x performance increase and up to 93% • Some of the largest Hadoop clusters in less down time for updates the world run on HP • 66% faster time to problem resolution • Proven success with HP Insight CMU, Vertica and Autonomy World’s Best IT Consulting Experts World’s Strongest Partner Ecosystem Converged • Worldwide Center of Excellence for Infrastructure • AllianceONE - 180,000 channel partners Hadoop in collaboration with HP Labs worldwide • Global Solution Center for Proofs of • Development and marketing agreements with SAP Concept and Microsoft on converged systems • Workload analysis & characterization • Partnerships with the top 3 Hadoop distribution expertise vendors • Consulting for roadmap, sizing &18 configuration, and implementation © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Hadoop on HP
  • Hadoop challenges & best practicesSizing and storage configurations for your workload and scalability requires collaboration• For Hadoop deployments using SAN or NAS needs to be evaluated on case by case basis. Though SAN or NAS can perform in certain scenarios but not always true.• Hadoop Deployments are on SAN or NAS devices there can be network communications overhead and can cause performance bottlenecks especially on larger clusters.• Hadoop deployments with built-in HA (HDFS) demands three time the storage that is normally required. While planning for storage it is good practice to account such requirement.19 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Limitations of Hadoop• Hadoop is a framework, not a solution• Hive and Pig are good, but do not overcome architectural limitations• Deployment is easy, fast and free, but very costly to maintain and develop• Great for data pipelining and summarization, horrible for ad hoc analysis• Performance is great, except when it’s not required20 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Source ‟ Joe Brighton blog http://www.quantivo.com/blog/top-5-reasons-not-use-hadoop-analytics
  • Analyzing and visualizingReal-time, contextual understanding of structured and unstructured dataSpecific use cases• Optimizing advertising campaigns• Identifying and addressing patterns• Uncovering trends and issues that impact business performance• Maximizing influence of user-generated content• Analyzing interactions and transactions• Address marketing challenges − Profiling, − clustering, − Sentiment analysis − Conceptual search21 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Where latency and compression mattersIn-database analytics - a platform for structured dataIntegrates data analytics into data Vertica Analytics Platform – Monetizing Big Datawarehousing functionality, enhancingData warehouse performance with Make smarter decisions in real time• Parallel computing• Shared nothing architectures Predict trends Deliver greater Improve• Data compression & patterns insight with the competitive• Columnar database architecture with accuracy right context differentiationAccelerates data analysis Drive faster Optimize• Relevant for applications requiring high-throughput innovation operations• Eliminates the overhead of moving large data sets from enterprise data warehouse to a separate analytic software application22 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • In-database analytics best practices1. Enterprise Data Warehouse scaling through parallelism2. Accelerate EDW with appliances3. Optimize batch performance by distributing storage4. Retune and rebalance workloads (auto tuning)5. Scale out through shared-nothing, massively-parallel processing (MPP)6. Push query processing to grid-enabled intelligent storage layers7. Apply efficient compression in storage layer23 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Vertica Analytic PlatformExtract value from data at speed and scaleKey features:• Real-time query & loading• Advanced in-database analytics• Columnar storage & execution• Aggressive data compression• Scale-out MPP architecture• High availability• Native BI, ETL, & Hadoop/MapReduce integration24 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • RecapIn-database analytics vs. Hadoop platform© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • To Hadoop or not to Hadoop?Given the advantages and limitations of both the architectures• Hadoop common being batch oriented platform it is seldom used in rich media analytics• Hadoop being an platform each of the tools that work with Hadoop Common needs to be evaluated, designed and developed.• Hadoop being open source but needs to invest time to develop solutions that can answer business questions.• Difficult to perform real time analytics with Hadoop Map-reduce though not impossible• It Is questionable to perform advanced analytics with Hadoop faster• In-database analytics cannot handle unstructured data and needs to be integrated into Hadoop architecture.• There is no “one size fits all” solutionHybrid architectures needed to get the best of the both worlds26 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Complement traditional BI with advanced analyticsCombine unstructured data and structured data • Time Series Analysis and Continuous Aggregation HDFS & • Real-time embedded analytics Structured data Map/Reduce • Faster access to data with low latency process • Large-scale graph & network analysis ‟ Social network environments demonstrate utility of managing connectivity • Column-oriented approach In-database Advanced • Eliminate need for multiple indexes, views and aggregations analytical analytics • Integrate data analytics into data warehousing functionality process • Eliminate overhead of moving large data sets from enterprise data warehouse to separate analytic software application • Provide significant performance benefits27 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Big data analyticsNew applications and business models Healthcare outcomes Real-life examples analysis E-commerce company: monitors server & Pricing optimization Life science research application health and performance by gaining real- time visibility to tens of TBs of unstructured, time- Fraud detection Legal discovery sensitive machine data, online bookings, deal Activity analysis and coupon use. Web application Monitoring Industry • Avoid website outages optimization Process • Optimize Web application Traffic flow Weather forecasting Wireless carrier: loads 10TB of CDR data into their optimization system every day. Social network Infrastructure analysis optimization • Make data accessible to BI tools to enable the creation of dashboards for executives to analyze Customer behavior customer behavior analysis28 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • HP solutionIntegrating all the pieces© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Human information (semi and unstructured)Big data solutions from HP Extreme Human information Online storage information (semi & unstructured) • HP large-scale configuration (structured) • X9000 and 3PAR Utility Storage Block Storage File Storage • IBRIX file system HP ProLiant DL-3xx Gen8 class of servers Online Storage/Tiering Increased performance, energy efficient, Optimized for Hadoop implementations Snapshot/mirroring Data warehouse – HP Vertical Analytics 50-1000 times query speed of conventional SQL DB Data warehouse Search and analysis – HP Autonomy Supports 1000+ content repositories and analysis / Search/advance analytics search for 400 file formats HP Technology Service Consulting • High throughput• Process real time • Faster access to • Experience• Low latency relevant data • Results30 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • HP Big Data Strategy WorkshopHelping you make big data work for your organization Problems solved Big data • Impacts of rapid data growth technologies Volume Variety • Impact of Advanced analytics and exploratory analytics on Business Velocity Value • Significance backup and recovery, data security and compliance on the business • Harnesses data as a rich repository of information Offering • 3-day workshop Benefits • Enterprise Search ‟ Focuses on many enterprise search • Understand the big data landscape, its challenges, systems benefits and critical success factors • Implementation and integration of Hadoop distributions. • Define strategy, create a roadmap • Advanced analytics/exploratory analytics with • Assess how and when to use Hadoop Hadoop, Vertica and autonomy • Big data protection ‟ securing, archiving and protecting data • Integrate structured and unstructured data collections with use cases • Determine when and how big data needs to be protected, archived, and secured31 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • HP Roadmap Service for Hadoop™PLAN for success Problems it Solves  Builds a strategy to head in the right direction & avoids fixing false starts  Creates a shared vision  Builds understanding of sources & sensitivity of data  Identifies organizational inhibitors  Addresses risk & mitigation Public  Develops a roadmap for successful planning, Traditional Private Cloud Managed Cloud Cloud deployment, & support of an Hadoop platformOffering Benefits: Effective planning and implementation of  Reduces time, cost & risk of successfully deploying an Hadoop strategy & deployment Hadoop Methodical approach to roadmap building  Leverages proven success managing extremely large HPC & Hadoop clusters Executable Roadmap with recommended investments, timeline, & risk mitigations  Creates synergies with HP Vertica’s analytic database & HP Autonomy’s meaning-based 32 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. computing platform
  • TS Consulting big data follow-on service offerings Project management Operation/improve Initiate Plan Develop Manage Analyze/ Architect & Detailed Implement/ Archive/ explore validate design develop protect Big Data Discovery Workshop Big data explore / design / architect Data profiling / data tiering Data archiving Big Data Integration Service Big Data IT Assurance Service Big Data Analytics Implementation Service Big data monitoring, maintaining and operations support of big data software33 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Products in this solution: IDOL + Vertica + HadoopThe ideal platform for social graphing and analytics Executive Dashboard OEM Explore Mobile Semi Human Structured Structured Extreme Social Connectors
  • Q&A © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
  • Find out more Attend these sessions Visit these demos After the event • TB3011: Designing a storage • KM: HP storage services • Contact your sales rep cloud -- 6/6, 4:00PM transformational journey ‟ Converged Infrastructure • TB2957: Big Data Analytics, 06/05/2012, 2.45 PM Pavilion / Management • BB3053: New HP Data Migration • KL: HP Storage Efficiency Service, Tuesday 06/05/2012, Analysis ‟ Converged 11.15 PM Infrastructure Pavilion Your feedback is important to us. Please take a few minutes to complete the session survey.36 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Thank you© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.