Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Series: Gartner

2,700
views

Published on

You will also learn how to understand key challenges when deploying a Hadoop cluster in production, manage the entire Hadoop lifecycle using a single management console, deliver integrated management …

You will also learn how to understand key challenges when deploying a Hadoop cluster in production, manage the entire Hadoop lifecycle using a single management console, deliver integrated management of the entire cluster to maximize IT and business agility.

Published in: Technology, Education

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,700
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • The 21st century CIO is will effectively have three major information management issues coming together in an almost coordinated, simultaneous "perfect storm.""Big data" is a first taste of the extreme information challenges that will become increasingly difficult to address. The concept of data volumes so large that even as storage technology and network infrastructures increase in capacity, the data simply grows at a faster rate. In addition to volume, information assets will exhibit irregular rates of change, governance rules will become increasingly fluid, new asset types will continue to emerge and the desire of all levels of an analyst to use that data will increase exponentially. All of this in combination begins to define the extreme information management environment and any one of 12 different dimensions can overcome your existing systems — let alone two or more of them in combination.The combination of consumerization and mobility has only begun to explore the many different types of information assets that will be introduced as well as the many information "create" or "write" situations that will occur. This is only one, obvious example of the proliferation of information sources that will highlight the importance of understanding how information assets are linked together and their inherent reliability!Comprehensive, complete and deliberate architectures are the only hope for creating standards or guidance to deal with the massive "news to noise" ratios that are pending and the wide diversity of information assets and the resulting use cases.Action Item: Determine if the organization will develop in-house architectural expertise to lead the enterprise strategy based on hiring and training, or if it will follow a strategy of out-sourcing for this expertise.
  • By far, Hadoop is the technology getting the most attention in the market for analytic use cases involving big data at rest. Its early usage in Web 2.0 companies was largely designed for batch processing of large volumes of data, data mining-style, so it's not surprising that this should be so. And while there is more to Hadoop than this, client inquiries fielded by the information management team reinforce our position that it is the most typical catalyst for Hadoop consideration.
  • While its acquisition costs are lower, using open-source Apache Hadoop entails bigger risks than traditional commercial database environments; it is less mature, is fragmented, and Apache lacks a commercial support organization. Other open-source offerings have a similar profile, but typical Hadoop implementations consist of more independent "moving parts" and version numbers than most. Enterprises can construct and maintain a Hadoop solution themselves if they have the time, resources and expertise to do so. Preferably, they will choose a commercial Apache Hadoop distribution whose component projects are pre-integrated and may be backed by the vendor's support. More mature distributions provide scripts to install all the pieces, often with a graphic script that allows the user to choose the pieces appropriate for their specific needs, hardware and network setup, etc. Numerous vendors offer Hadoop distributions with pre-integrated components, but the vendors have varying levels of credibility and experience, and offer different combinations of projects at different release stages. Data management leaders can easily go wrong.However, commercial distributions include different projects along with the core Hadoop projects, and no commercial distributions include or support all available projects. Distributions also have varying release levels of the included projects and update them at different rates. Thus, data management leaders run the risk of choosing a Hadoop solution that does not meet enterprise needs.
  • Choose one of two options, depending on your needs and circumstances: build a customstack, or use a distribution from a major Apache Hadoop provider. In the latter case,subscribing to support is recommended.■ Choose your approach (custom or distribution) based on your team's skills, whether this is atactical experiment or a strategic initiative, and on how well available distributions fit your usecase.■ Balance the enterprise's longer-term needs with immediate pressures to deliver, and considerprojects that may be needed for future initiatives.■ Evaluate the distribution vendor as a whole if you plan to run Apache Hadoop for the long term.Look at financial viability, support capabilities, partnerships and future technology plans. Aboveall, talk to reference customers; treat distributions as you would development tools orworkbenches.
  • Transcript

    • 1. MODERNIZING YOUR IT INFRASTRUCTURE WITH HADOOP Merv Adrian, VP Research Gartner Charles Zedlewski, VP Product Cloudera1
    • 2. "Big Data‖ Crystallizes ExtremeInformation Management Challenges Perishability Fidelity ―Big Data‖ refers to high volume, velocity and variety information assets that demand cost-effective, innovative forms ofValidation Linking information processing for enhanced insight Classification Contracts and decision-making. They require: Extreme information management - theTechnology Pervasive concept that your current information Use Velocity Volume infrastructure must be intentionally managed along 12 complementary dimensions to meet the challenges of the Variety Complexity 21st century Information Age.
    • 3. Where Does Big Data Come From? Email Transactions Enterprise Partner, Employee ―Dark Data‖ Customer, Supplier Contracts Observations Public CommercialCredit Weather Sensors Population Social Media Economic Network Sentiment Correlations and patterns from disparate, linked data sources yield the greatest insights and transformative opportunities 3
    • 4. The Big Data Challenge: Putting Together thePieces Quickly and Efficiently I Through 2015: N E R • 85% of Fortune 500 T organizations will beINFRASTRUCTURE I unable to exploit big A data for competitive advantage. LEADERSHIP R I • Business analytics S needs will drive 70% K of investments in theS ANALYTICS S INVESTMENT expansion andK modernization ofI ORGANIZATION informationL infrastructure. ARCHITECTURELS
    • 5. Interest in "Big Data" Is Rising Rapidly — Though ―Hadoop‖ Remains Steady 500 450 400 350 300 250 Hadoop 200 Big Data 150 100 50 0Client searches for ―Big Data‖ and "hadoop‖ on gartner.com over 12 months ending June 12
    • 6. Hadoops A Good Idea, But ConfusionHas Slowed Commercial Adoption• IT organizations are confused, but want to make a decision• The standards steward — Apache Software Foundation — distributes some, but not all "projects" in Hadoop distributions• Many distributions exist —and they differ in their components and their "openness"• Choosing the right distributions should be driven by business analytics needs
    • 7. Maturity is Growing and Will SpurAdoption As Marketing Ramps Up• Wave of new releases, beginning with Apache 1.0 and 2.0 (alpha), has crystalized the platform• Current versions are beginning to address early problems with availability, security, performance• Distribution vendors add features and support enterprises require• Rapidly evolving ecosystem includes differentiated distributions, data integration, business intelligence, and services vendors 7
    • 8. Recommendations• Dont surrender to hype-ocracy. This is early, maturing technology.• Deploy when you have a clear use case, not ―to play with the technology‖• Use a commercially supported distribution when you move to production; till then, experiment "in the cloud" or on existing hardware with ―free‖ downloads• Choose the distribution based on business need — leverage specific, supported projects• Skills will be key: define "data science" needs and hire/train for them• Manage your big data – don‘t abdicate. Practice Extreme Information Management.
    • 9. CLOUDERA: THE STANDARD FORAPACHE HADOOP IN THE ENTERPRISECHARLES ZEDLEWSKI, VP PRODUCT
    • 10. 1 HIGH AVAILABILITY 2 GRANULAR SECURITY THERE‘S NO DOWNTIME. YOUR DATA IS PROCESS AND CONTROL SENSITIVE ALWAYS AVAILABLE FOR DECISIONS DATA WITH CONFIDENCE 3 ROBUST MANAGEMENT 4 SCALABLE AND EXTENSIBLE ACHIEVE OPTIMAL PERFORMANCE VIA ADAPTS TO YOUR WORKLOAD AND CENTRALIZED ADMINISTRATION GROWS WITH THE BUSINESS 5 CERTIFIED AND COMPATIBLE 6 GLOBAL SUPPORT AND SERVICES EXTEND AND LEVERAGE EXISTING ACHIEVE SLAs AND ADHERE TO INFRASTRUCTURE INVESTMENTS EXISTING IT POLICIES10
    • 11. 1 INDUSTRY TERM VERTICAL INDUSTRY TERM 2 Social Network Analysis Web Clickstream SessionizationADVANCED ANALYTICS DATA PROCESSING Content Optimization Media Engagement Network Analytics Telco Mediation Loyalty & Promotions Analysis Retail Data Factory Fraud Analysis Financial Trade Reconciliation Entity Analysis Federal SIGINT Sequencing Analysis Bioinformatics Genome Mapping11
    • 12. CDH4 Cloudera’s Distribution Including Apache Hadoop (CDH) STORAGE COMPUTATION ACCESS INTEGRATION Big Data storage, processing and analytics platform based on Apache Hadoop – 100% open source Cloudera Enterprise 4.0 Cloudera Manager DIAGNOSTICS DEPLOYMENT CONFIGURATION MONITORING End-to-end management application for the deployment & REPORTING and operation of CDH Production Support ISSUE ESCALATION KNOWLEDGE OPTIMIZATION Our dedicated team of experts on call to help you RESOLUTION PROCESSES BASE meet your Service Level Agreements (SLAs) Cloudera University Partner Ecosystem Equipping the Big Data workforce – 12,000+ trained 250+ partners across hardware, software, platforms and services Professional Services Use case discovery, pilots, process & team development12
    • 13. All the industry leaders integrate with CDH. CDH4 STORAGE COMPUTATION ACCESS INTEGRATION Big Data storage, processing and analytics platform based on Apache Hadoop – 100% open source BI / Analytics Data Integration Database OS / Cloud / Sys Mgmt Hardware13
    • 14. END-TO-END MANAGEMENTAPPLICATION FOR APACHE HADOOP1 DEPLOY INSTALL, CONFIGURE AND START YOUR CLUSTER IN 3 SIMPLE STEPS2 CONFIGURE & MANAGE ENSURE OPTIMAL SETTINGS FOR ALL HOSTS AND SERVICES3 MONITOR, DIAGNOSE & REPORT FIND AND FIX PROBLEMS QUICKLY, VIEW CURRENT AND HISTORICAL ACTIVITY AND RESOURCE USAGE15
    • 15. REQUIRED SKILLS  LINUX ADMIN OR DBA BACKGROUND  JAVA KNOWLEDGE  NETWORKING KNOWLEDGE RESPONSIBILITIES  KEEP IMPORTANT WORKLOADS WITHIN SLA  INSTALL, CONFIGURE AND UPGRADE HADOOP  MONITOR SYSTEM HEALTH & PERFORMANCE  PLAN FOR THE FUTURE16
    • 16. CLOUDERA MANAGER DEMO17
    • 17. Q&A18
    • 18. REGISTER NOW FOR THE REMAINING ‗POWER OF HADOOP‘ WEBINARS: REALIZING THE PROMISE OF BIG DATA WITH HADOOP THANK FORRESTER AND CLOUDERA THURSDAY, JULY 26, 11AM PST YOU! WHAT THE HADOOP: WHY YOUR BUSINESS CAN’T AFFORD TO IGNORE THE POWER OF HADOOP GIGAOM AND CLOUDERA WEDNESDAY, AUGUST 29, 10AM PST THE BUSINESS ADVANTAGE OF HADOOP: LESSONS FROM THE FIELD 451 RESEARCH AND CLOUDERA WEDNESDAY, SEPTEMBER 26, 10AM PST19