Getting Started with Big Data: Planning Guide


Published on

Getting started with big data initiatives is easier with this practical guide for IT managers who want to implement the Apache Hadoop framework.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Getting Started with Big Data: Planning Guide

  1. 1. Getting Started with Big Data How to Move Forward with Apache Hadoop* Software1 INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY
  2. 2. Five Things to Know Big data is a disruptive force that can drive 1 competitive advantage Apache Hadoop* software is an emerging 2 technology for big data analytics There are two approaches to implementing 3 big data projects 4 Intel® technologies and software support big data Optimize and tune your big data environment 5 for best performance2 INTEL CONFIDENTIAL
  3. 3. Big Data Volume, Variety, and Velocity Volume: Data sets that are orders of magnitude larger than you have handled before • The digital universe of data could reach 8 zettabytes of data by 20151 • That equals the data held by 18 million U.S. Libraries of Congress2 Variety: More diverse data types, including: • Structured (transactions, customer information) • Semistructured and unstructured (web logs, e-mails, documents, images, video) Velocity: Arriving faster than ever before • Real-time streaming data 1 Gens, Frank. IDC Predictions 2012: Competing for 2020. IDC (December 2011). 2 “Big Data Infographic and Gartner 2012 Top 10 Strategic TechTrends.” Business Analytics 3.0 (blog) (November 11, 2011).3 INTEL CONFIDENTIAL
  4. 4. Getting Bigger Billions of Connected Devices and Internet Users By 2016, 19 billion connected devices—including 3.4 billion Internet users and machine-to-machine connections−will contribute to the flood of big data. Source: Savitz, Eric. “Cisco Predicts the Rise of the Zettabyte Era.” Forbes (May 30, 2012). INTEL CONFIDENTIAL
  5. 5. The Reason for All the Buzz Big Data Drives Competitive Advantage The real value of big data is in the insights it produces when analyzed: Finding patterns Deriving meaning Making decisions Responding to the world with intelligence5 INTEL CONFIDENTIAL
  6. 6. The Apache Hadoop* Framework An Emerging Approach to Big Data Analytics Open-source software that provides a simple programming model for distributed processing of large data sets • Provides a massively scalable storage and a data processing system (not a database) built on clusters of computers • Supplements your existing systems by handling data that’s typically a problem for them - Too large - Unstructured - Mix of types - Real-time streaming6 INTEL CONFIDENTIAL
  7. 7. Apache Hadoop* Breakthroughs Advantages over Traditional Systems It handles all kinds of data. No need to develop specific schemas. It scales quickly and affordably. Add more servers and storage as you need it! It reveals new insight. Find hidden relationships that were difficult— . or even impossible—to find in the past. • Open-source software that runs on standard servers. It reduces costs. • Lower cost per terabyte for storage and processing. It delivers higher availability. Fault tolerant; designed to recover from hardware, software, and system failures. It lowers organizational risk. Apache Hadoop* innovations continue through an active and diverse global community.7 INTEL CONFIDENTIAL
  8. 8. Two Approaches to Apache Hadoop* What’s Right for Your Organization? Apache Hadoop* software-only deployments • Free Apache Hadoop open-source software 1 • Vendor distributions that prepackage Hadoop* software with value-added enhancements and services Hadoop software integrated with traditional databases 2 • Extend existing data warehousing and analytics platforms to include Hadoop software8 INTEL CONFIDENTIAL
  9. 9. Apache Hadoop* Deployment Put the Right Infrastructure in Place Clusters of standard servers 10 gigabit Ethernet networking Intelligent storage Apache Hadoop* software9 INTEL CONFIDENTIAL
  10. 10. Intel® Technologies for Big Data Get Maximum Performance Server clusters: Intel® Xeon® processor E5 family Networking: Intel ® Ethernet 10 Gigabit Converged Network Adapters Storage: Intel ® Solid-State Drives Software: Intel ® Distribution for Apache Hadoop* software (Intel Distribution)1 1 Currently available in China, Taiwan, and the United States.10 INTEL CONFIDENTIAL
  11. 11. Intel® Distribution for Apache Hadoop* Software Enterprise ready for a variety of use cases1 Supports a wide range of analytics • Enhances Apache Hive* and Apache HBase* software Introduces graph analytics capabilities with Intel® GraphBuilder soft ware • Provides a Java library for constructing graphs that help visualize data relationships Optimizes open-source Apache Hadoop* components • Takes advantage of Intel Xeon® processor capabilities Hadoop* security, scalability, and management enhancements • Tightly integrated into the platform Support and services from Intel and its partners Find out more about the Intel Distribution 1 Currently available in China, Taiwan, and the United States.11 INTEL CONFIDENTIAL
  12. 12. Apache Hadoop* Optimization Practical Trade-offs for Hardware, Software, and System Settings Fine-tune your solution for best performance: Maximize productivity Limit energy consumption Maximize resource utilization Reduce operating costs Lower your total cost of ownership12 INTEL CONFIDENTIAL
  13. 13. Benchmark Performance Intel’s HiBench Suite Comprehensive set of benchmark tests for Apache Hadoop*software Represents important Hadoop* workloads and analytics with a mix of hardware usage characteristics Available as open-source software under Apache License 2.0 at INTEL CONFIDENTIAL
  14. 14. Get Started Five Steps for IT Managers 1 Work with your business users to articulate the big opportunities 2 Do your research to get up to speed on the technology 3 Develop use case(s) for your project 4 Identify gaps between current- and future-state capabilities 5 Develop a test environment for a production version14 INTEL CONFIDENTIAL
  15. 15. Big Data Planning Guide Everything You Need to Get Started Read the full planning guide at Learn more about the Intel® Distribution for Apache Hadoop* software at INTEL CONFIDENTIAL
  16. 16. Legal This presentation is for informational purposes only. THIS DOCUMENT IS PROVIDED “AS IS” WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION, OR SAMPLE. Intel disclaims all liability, including liability for infringement of any property rights, relating to use of this information. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted herein. Copyright © 2013 Intel Corporation. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others.16 INTEL CONFIDENTIAL