• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cloudera Sessions - HP Keynote - Build Your Enterprise-Ready Foundation for Big Data
 

Cloudera Sessions - HP Keynote - Build Your Enterprise-Ready Foundation for Big Data

on

  • 364 views

Big Data has moved from buzzword to reality. Building a strong foundation for Big Data not only makes good business sense, but it's a business imperative. Learn about key findings and advice from ...

Big Data has moved from buzzword to reality. Building a strong foundation for Big Data not only makes good business sense, but it's a business imperative. Learn about key findings and advice from experts on what to expect along the way and how to get started building your enterprise ready foundation for big data.

Statistics

Views

Total Views
364
Views on SlideShare
364
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hello, my name is _______. I run the _______ group/business at Hewlett Packard. In the next 30 minutes I’d like to talk to you about how you could get started today on building a solid foundation for big data in your organization, some of the key challenges you might face along the way and a list of findings from our experience with helping customers build Hadoop clusters an Big Data ecosystems from scratch.
  • To put things in context, I want to start with a quote from a VP at a prominent financial services organization in the US.When asked about where his organization was on the road to Hadoop adoption, he pointed out that his IT had bought into the Hadoop vision but setting up, maintaining and optimizing Hadoop clusters was complex, and skills were scarce. Put another way, the time and cost of Hadoop operations was outweighing the benefits of deploying Hadoop in the short term. We’ve heard similar sentiments from CIOs and CMOs across industry verticals, geographies and enterprise sizes. There is a good chance that many of you share similar concerns.
  • 2012 was an important year in the Hadoop roadmap. We saw a number of new technologies being added to the ecosystem – technologies that extend Hadoop and make it more accessible to developers with traditional programming skills in SQL. Cloudera, HP and a number of other IT vendors made significant investments into Hadoop and formed strong partnerships to enable a seamless experience for customers such as yourselves. However, many enterprises continue to struggle to evolve from the pilot to production phase successfully. Much of that can be attributed to the fact that they failed to plan in advance, and rushed into Hadoop without thinking about some broader consequences across their IT and business units.This chart lists the key considerations you need to keep in mind as you start build the foundation for not just Hadoop, but Big Data as whole. Cost: As in the case of the VP on the previous slide, skills is an issue and despite the rapidly growing ecosystem, it will continue to be an issue until supply meets demand. In addition, you need to think about how Hadoop and other Big Data technologies complement and fit into the business process within your organization. That is key to maximizing the return on your investment in Big Data technologies.Complexity: There are multiple levels of complexity when it comes to successfully implementing and scaling your Hadoop environment. You need to consider your big data solution architecture in the context of all the touch points across your enterprise, not just one business unit or data silo. The sweet spot with mining big data for business insights is to encompass multiple, disparate data sources and data types, to deliver actionable, timely insights to end users. You need to start thinking from day 1 about how Hadoop integrates to the rest of your BI ecosystem from a business perspective, and into your data center from a IT perspective.Speed: Time to value and time to insight have become important key performance indicators for many line of business leaders and IT managers alike. In today’s highly competitive and fast moving environment, it’s important to be able to deploy new technology, ramp up skills and transition to live production as soon as possible. In addition, you may have the need to process data in real time or near real time.Risk: As with any new technology, there is risk associated with production deployment and operational continuity. From a governance perspective, you need to consider the security and compliance ramifications of Hadoop and other relatively nascent and non-traditional big data technologies. Given how new the entire Hadoop ecosystem is, it’s important to partner with the right software and hardware vendor on after-the-fact support and consulting to hand hold you through the initial phase hat many enterprises find extremely challenging.
  • As we saw in the previous chart,there are multiple layers of planning and decision making as you embark on your journey to build the big data hub of tomorrow. On the next chart I will share with you some of our key findings from the many years of success HP has had with building such complete, end to end systems for customers across verticals and geographies. But before that I want to quickly talk about an important consideration that sometimes gets ignored when we talk about building big data systems.As we all know, data science is taking the world by storm, so to speak. Universities are adding new courses in this area, companies are recruiting PhDs in statistics by the dozen, some are even creating new roles such as Chief Data Officer, and Data Steward. It just goes to show you how critical this field has become to corporate strategy. However, it’s important to understand that data scientists are best at the analytics layer that utilizes the underlying technology. Their time is best spent creating, testing and deploying statistical models on data sets, not on optimizing the infrastructure that supports the analytics. HP and Cloudera have invested heavily on Hadoop software and infrastructure to ensure that your data scientists spend a majority of their time on the work that you are paying them to do. We have proven success in this area and you are best served by leveraging our expertise.In addition, given the scale out nature of Hadoop, there are bound to be multiple moving parts in your cluster – servers, processors, storage systems, network switches, and many other components that all need to be on their best behavior to provide you with 99.999% (5-nines) up time. You can guarantee that by building your cluster on the best of breed components on an open platform that allows you to configure, optimize and grow your clusters with the granularity you desire. Remember that one size does not fit all. There are very expensive Hadoop systems out there that are over-engineered and have not been benchmarked against specific workloads. With Hadoop, you need to first understand if your workloads are IO-bound, CPU-bound, memory-bound or network bound, and accordingly tweak the corresponding resources within your cluster for optimal performance and scale.Disaster recovery, high availability and performance need to be central to your planning as you set out building your Hadoop clusters. Understand that cluster growth happens. There is a good chance that what starts out as a small pilot project will soon grow to an enterprise wide deployment. Luckily, Hadoop has built in capabilities to accommodate this growth by scale out to enable a smooth transition as your needs evolve. As long as you incrementally gauge your growing requirements and add resources to your clusters so they stay balanced, you will do just fine.
  • If I had to distill our findings from the many years of success we’ve had with large scale out projects and big data deployments, I would shortlist them down to the following five bullets.The first one, we touched upon in the previous slide. We have seen a strong customer desire for workload characterization and benchmarking data when it comes to big data. We characterize our customer use cases by creating representative workloads. For instance, Hive Analysis of structured data for Log Processing, Shortest Path and PageRank for Graph processing and K-Means Clustering for Machine Learning systems. We then translate the characteristics of those workloads into balanced referenced architectures. For instance, if your workloads are primarily IO bound, you need to think about disk controller optimizations for I/O throughput (SAS vs. SATA disks) and the total amount of disks to avoid I/O contention degrading. Another example would be if you had network bound workloads, you’d need to identify network requirements & how they are affected by HDFS capacity and write intensive applications. Starting with a deep understanding of your workloads makes the rest of your planning all that much easier.Next, understand that while you may be a single line of business or department embarking on a big data project, there is a good chance that your project will grow to be an enterprise wide venture in a few short years or even earlier. Just as the volume, variety and velocity of data within and into your enterprise is growing, so are the systems that ingest, process and archive that data.The third bullet is an important one as you plan your information ecosystem. It’s easy to get locked into one vendor’s stack only to realize that your options down the road become fewer and fewer as many components of that stack do not play nice with components from competitors’ stacks. As you evaluate new big data technologies, ask your vendors tough and important questions on how well their solutions integrate into the rest of your existing and planned BI ecosystem. This could save you much heartache (and heart burn!) down the road.As I alluded to earlier, let experts stick to what they do best. HP and Cloudera are experts when it comes to data management and Hadoop software, respectively. HP has developed much expertise over the years by helping customers such as yourselves successfully deploy technologies to tame the big data beast and exploit all the opportunities it presents. We want you to leverage the foundation, both in terms of our expertise and all the good work you have put into building your BI systems until this point in time. If someone tells you that you need to start building your big data ecosystem from scratch, don’t trust them. The truth is that big data needs to complement your existing BI systems. You have put in much work and had much success in in building processes, systems and user interfaces for BI. It’s important to leverage that work and incorporate any additional insights that big data can offer, rather than throwing away all that good work and reinventing the wheel.Finally, and perhaps most importantly, think holistically when it comes to Hadoop and big data. The next slide offers some insight into what we mean by that.
  • I want elaborate on what I meant by think holistically about big data. On the left of the chart, you see a number of data sources that are rapidly populating the information ecosystem within your enterprise. In addition to transactional databases, analytic warehouses, and LoBadta sets (such a HRMS, CRM and SCM data), you may have machine generated or human generated data that is streaming in at a fast pace. In the middle is the bridge between your data sources and the end users who consume the insights from your data. That bridge is your big data infrastructure. At the bottom layer is your infrastructure is the system and technology platform that enable your infrastructure to perform and scale at the speed you desire in order to meet your SLAs. On top of that is the big data management assurance layer that provisions, monitors and optimizes your platform for maximum up time, serviceability security and compliance as defined by your over-arching governance model. And finally, you have the information insight processing layer where the analytic models can be run to mine data for insights. This could be Hadoop, Vertica or Autonomy to give you some examples of technologies that can provide real time or meaning based insights to end users across the enterprise in the form of dashboards, reports or visualizations.
  • HP has proven, quantifiable expertise designing and building large scale-out computing projects for customers for many years. We are heavily invested into big data from a software, hardware and services perspective, and bring all that experience to your site as you look to build or add to your big data ecosystem.Tabulations based on http://www.alexa.com/topsites and http://www.top500.org/ estimates
  • Hadoop, by its definition is a distributed data processing system that is designed to take advantage of very large clusters of very inexpensive machines. A carefully designed, preassembled, pretested and instrumented system is the best way to deliver these complex environments. HP offers a unique HP Cluster Manager that makes Hadoop clusters the easiest to deploy and scale. You can deploy 800 nodes in minutes not months with the push of a button. Not only can you speed the time to business insight by deploying quickly, but you also remove the complexity with the simplicity of push button deployment with the ability to scale to 4000 nodes.Only system with 3D real time and historical analytics of bottlenecks in workloads that let you optimize performance across entire cluster. This allows you to see the metrics or health at a component level as well as at the total system level. Also, this is the only system that manages from node to rack awareWith the worlds most self-sufficient servers HP ProLiant Gen8 as the foundation, you get the fastest performance so you can process and analyze your Big Data and make faster decisions that impact your business. Additional notes for speaker:10 TB Terasort benchmark: HP performs at 120GB/min (or 6.7 GB/min/node)100GB Terarsort benchmark: HP performs at 6.21GB/min/node
  • You might have noticed that there is a ProLiant Sl4500 server on exhibit at the demo station, the first server purpose built for Big Data. We asked our engineers to go and define what it takes to better tackle the issues faced by customers who are trying to monetize Big Data and grow their business…the answer was to extend the vision of converged infrastructure in a way that hasn’t been done before. It provides maximum performance, productivity and cost effectiveness in an ultra-dense solution required by these workloads. Built on HP Converged Infrastructure, the new server offers a highly efficient design that consumes up to 50 percent less space, 61 percent less power and 31 percent lower cost while using 63 percent fewer cables. Today’s “siloed” architectures (cabling servers to JBODs) are all that customers have to choose from when building out their infrastructures to handle high-growth workloads such as Hadoop, new Object Stores like Openstack’s Swift and the enterprise standard for messaging, Microsoft Exchange. With the advent of this Big Data software and the promise that it brings, many organizations have tried to deploy existing architectures not designed to handle the specific needs of these workloads. As a result, the outcomes from these early deployments have been suboptimal from a performance and cost perspective.Grow Your Hadoop ClustersWe recognized in the world of big data, one size does not fit all. With that in mind the modular design of the HP ProLiant SL4500 server series offers varied compute and storage configurations that enable you to optimize your infrastructure for specific Big Data applications (i.e. Data Analytics, Object Stores and Exchange), with a single, cost-effective architecture. Improve performance, reliability and manageabilityAnd this server has all the embedded intelligence and automation capabilities you would expect from a HP ProLiant Gen8 server with the ability to eliminate down time and safeguard valuable data with automated data protection and HP Predictive Spare Activation, which moves data to an alternate device before failures occur, maximize server productivity with HP Active Health, and automate firmware updates with HP Smart Update and to lower data center power costs and improve compute per watt by up to 70 percent than previous generations with HP Intelligent Infrastructure.Streamlined managementThis is not simply a bunch of drives and a server in a box. In addition to being able to deliver up to 2.16 petabytes (PB) within an industry-standard 42-U rack, we also ensured performance and resiliency with HP Smart Array technology, delivering industry-leading performance with a nearly seven times faster input/output operations per second (IOPS) than existing architectures. Equally as important, with the smart analytics of HP SmartCache, the system will optimize storage traffic to ensure the lowest latency response and up-front investment.Additional notes for speaker:EXTREME CAPACITYUp to 60 Hot-plug LFF drives Smart Drive technologyReady for new drive capacitiesSCALABLE THROUGHPUTSmart Array experience optimizing performance, reliability, & manageabilityHigh Speed Interconnects including InfiniBand optionsHigh speed signalingFLEXIBLE COMPUTEUp to 3 compute nodes4-8 cores, Intel E5-2400 or AMD 4200CONVERGED DESIGN9 chassis in a 42U rackPooled power, shared coolingReduced cabling
  • HP Insight CMU offers the ability to clone the disk contents of one node to other cluster nodes. This feature reduces the complexity of scale and helps ensure a consistent configuration of compute nodes within the cluster. A cluster is often split into logical groups, which may be owned by different parts of the IT organization. HP Insight CMU can manage these logical groups by associating one or more disk images with each group. Once one node in a group has been installed and configured, HP Insight CMU builds a compressed image (called a “clone image”) of this master disk. The clone image is then ready to be propagated to other members of the group using the HP Insight CMU cloning mechanism.In order to provide a scalable, onsite cluster view for a very large cluster (more than 256 nodes) the HP Insight CMU monitoring GUI provides a multi-petal view of the metrics. In one petal, the values of several nodes are aggregated. For each metric the aggregation values can be chosen (sum, mean, max, or min).Optimized Image propagation: The node then asks the image server if there are any successors waiting for upload. If there are, it starts to transfer the image to a group member, while the image server uploads a third one (see Figure 15). The propagation of the image is limited inside each network entity; this optimizes the use of the network resources to clone the cluster.
  • We’ve covered a lot of ground in the last 25 minutes or so. I want to summarize my talk by outlining some of the key considerations you need to keep in mind as you get started on your path to the enterprise ready big data foundation. Hadoop is one viable option of building your storage and processing layer for big data. It has gained rapidly in adoption in its few short years of existence and the ecosystem is growing rapidly. With Cloudera and HP, you have two trusted advisors who have worked together to provide you choice of deployment models through reference architectures and the AppSystem for Apache Hadoop. In addition, HP brings years of consulting and support experience, combined with the proven success of systems management tools such as Insight CMU, along with investment in analytics software such as the Vertica database for real time analytics and the Autonomry IDOL platform for meaning based computing. Together this set of solutions, services and software ease the acquisition, deployment and management of a fully functioning big data system, while reducing risk and total cost of ownership.
  • To get started today, please register for our structured transformation experience workshops led by HP subject-matter experts, to help you build a common vision, lay out scope and roadmaps for your big data projects and get started on the right track.ObjectivesIdentify IT strategy to provide value from Big DataDetermine functionalities for integration architecture and standardsTake a pan-IT approach that spans security, management, operations, and standardsDefine a unique roadmap and actionable steps to ensure successBenefitsUnified vision for Big Data scope and initiativeUnderstand Big Data initiative, implications, and challengesProvide IT leadership for Big Data initiative with line of businessMethodologyAgree upon project objectivesReview project scopeReach a mutual level of understandingEstablish primary stakeholder requirementsDefine business challenges and opportunitiesIdentify risk, inhibitors, and mitigation strategiesRefine requirementsConfirm key business driversConfirm risks, inhibitors, and mitigation strategiesDesign recommended plan and infrastructureSummarize interview and workshop findingsDevelop roadmap and deliver executive presentation
  • To learn more about Hadoop and HP solutions for Hadoop , and our partnership with Cloudera, please visit the following web sites.
  • That’s all the time we have today. But before we close, do you have any final questions?

Cloudera Sessions - HP Keynote - Build Your Enterprise-Ready Foundation for Big Data Cloudera Sessions - HP Keynote - Build Your Enterprise-Ready Foundation for Big Data Presentation Transcript

  • Build your enterprise-ready foundation for BigData© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • “We want to use Hadoop, butwe are spending way too muchtime on Hadoop operations.”—Vice President, Financial Services Organization© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Hadoop in 2013 — an inflection pointPilot to production takes planning $ Cost Complexity Speed Risk • Skills • Solution • Deployment • Security and • Process architecture • Real-time compliance management • Integration consumption • Support3 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Let your data scientists do their jobLeave the platform engineering to us• Leverage existing expertise• Build on best-of-breed components• Remember: One size does not fit all• Plan for high availability and performance from day 1• Build today — plan for tomorrow4 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • The inside track to Big DataLessons learned from many years of proven success in the field1. Understand your workloads2. Plan for growth3. Embrace openness4. Leverage the foundation5. Think holistically5 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Think Holistically About Big Data Big Data Dashboards, repor Data Sources Infrastructure ts, and visualization Structured Databases Big Data Warehouses Processing End ERP, CRM users Log files Big Data Unstructured Machine data Management Protection Compliance Social media Customer calls System and Technology Emails Platform6 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • We know Big Data(Even before it was called “Big Data”)8 of world’s 10 most trafficked websites 4 of world’s 5 largest search engines 3 most popular social media properties in U.S.A. years running as #1 years customer deployment7 of world’s 10 largest cloud service providers 8 vendor in High Performance Computing (HPC) 11 with HP Insight Cluster Management Utility in Top 500 HPC clusters Contribution to Apache Innovations to support Hadoop Hadoop Foundation ecosystem from HP Labs*According to alexa.com/topsites and top500.org7 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Enterprise-ready Big Data platformHP AppSystem for Apache HadoopPre-integrated, pre-tested, pre-engineeredWe’ve done all the hard work for youOut of the boxNot in months, but hours or daysSuper fastLoading, sorting, and analysisEasy scaling800 nodes in 30 minutes with CMU18 push-button deployment 800 nodes in minutes, not months 10 TB at 120 GB/min © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • The ultimate Big Data serverThe right mix of compute and storage, in less space, at a lower costGrow your Hadoop clusters VisitExtreme capacity, with SmartDrive the DemoImprove performance, reliability,and manageabilitySmoother scaling and transition toproduction workloadsStreamline managementPooled resources and reduced cabling9 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Hadoop scale out made easyHP Insight Cluster Management UtilityProvision, monitor, and controlThousands of nodes instantlyPush-button roll outProvisioning via cloning for seamlessscalingRest easyBattletested at top 500 sites, for over adecade10 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Building your Enterprise Ready Big DataFoundationSolutions and services to address your biggest pain points Analytics and• Ease of acquisition Insights• Rapid deployment• Simplified management• Risk-free scalability Storage and Choice of Processing Deployment• Lower Cost System Management Consulting and Support11 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Getting startedHP Big Data Transformation Experience WorkshopsObjectives• Identify Big Data business value that can be achieved• Create changes in how to manage and use information• Build an executive-level roadmap12 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Resourceshp.cloudera.comhp.com/go/hadoophp.com/go/bigdata13 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..
  • Thank you© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.Apache and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries..