Your SlideShare is downloading. ×
Hitachi Data Systems Hadoop Solution
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Hitachi Data Systems Hadoop Solution

625
views

Published on

Hitachi Data Systems Hadoop Solution. Customers are seeing exponential growth of unstructured data from their social media websites to operational sources. Their enterprise data warehouses are not …

Hitachi Data Systems Hadoop Solution. Customers are seeing exponential growth of unstructured data from their social media websites to operational sources. Their enterprise data warehouses are not designed to handle such high volumes and varieties of data.  Hadoop, the latest software platform that scales to process massive volumes of unstructured and semi-structured data by distributing the workload through clusters of servers, is giving customers new option to tackle data growth and deploy big data analysis to help better understand their business. Hitachi Data Systems is launching its latest Hadoop reference architecture, which is pre-tested with Cloudera Hadoop distribution to provide a faster time to market for customers deploying Hadoop applications. HDS, Cloudera and Hitachi Consulting will present together and explain how to get you there. Attend this WebTech and learn how to: Solve big-data problems with Hadoop. Deploy Hadoop in your data warehouse environment to better manage your unstructured and structured data. Implement Hadoop using HDS Hadoop reference architecture. For more information on Hitachi Data Systems Hadoop Solution please read our blog: http://blogs.hds.com/hdsblog/2012/07/a-series-on-hadoop-architecture.html

Published in: Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
625
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
31
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012
  • 2. Customers are seeing exponential growth of unstructured data from their social media websites to operational sources. Their enterprise data warehouses are not designed to handle such high volumes and varieties of data. Hadoop, the latest software platform that scales to process massive volumes of unstructured and semi-structured data by distributing the workload through clusters of servers, is giving customers new option to tackle data growth and deploy big data analysis to help better understand their business. Hitachi Data Systems is launching its latest Hadoop reference architecture, which is pre- tested with Cloudera Hadoop distribution to provide a faster time to market for customers deploying Hadoop applications. HDS, Cloudera and Hitachi Consulting will present together and explain how to get you there. Attend this WebTech and learn how to • Solve big-data problems with Hadoop. • Deploy Hadoop in your data warehouse environment to better manage your unstructured and structured data. • Implement Hadoop using HDS Hadoop reference architecture. HITACHI DATA SYSTEMS HADOOP SOLUTION WEBTECH EDUCATIONAL SERIES
  • 3. PRESENTERS Shankar Radhakrishnan, Solutions Manager, Hitachi Data Systems Sai Saiprabhu Director, Specialized Services, Hitachi Consulting Art Vancil Big Data Senior Manager, Hitachi Consulting Daniel Templeton, Partner Manager, Cloudera
  • 4. 4 ASK BIGGER QUESTIONS DANIEL TEMPLETON, PROGRAM MANAGER AT CLOUDERA
  • 5. Enterprise Data EvolutionAMOUNTOFDATA • Data collection & reporting • Process data faster • Store data more cost-effectively • Simplify infrastructure • Combine data from across the business • Ask new questions immediately • Enable new real-time applications CREATE COMPETITIVE ADVANTAGE IMPROVE OPERATIONAL EFFICIENCY
  • 6. Data Has Changed in the Last 30 YearsDATAGROWTH END-USER APPLICATIONS THE INTERNET MOBILE DEVICES SOPHISTICATED MACHINES STRUCTURED DATA – 10% 1980 2012 UNSTRUCTURED DATA – 90%
  • 7. Data Management Strategies Have Stayed the Same • Raw data on SAN, NAS and tape • Data moved from storage to compute • Relational models with predesigned schemas
  • 8. Too Much Data, Too Many Sources • Can’t ingest fast enough
  • 9. Too Much Data, Too Many Sources $ ! $ $ $ • Can’t ingest fast enough • Costs too much to store
  • 10. Too Much Data, Too Many Sources 1 2 3 4 5 • Can’t ingest fast enough • Costs too much to store • Exists in different places
  • 11. Too Much Data, Too Many Sources • Can’t ingest fast enough • Costs too much to store • Exists in different places • Archived data is lost
  • 12. Can’t Use It The Way You Want To • Analysis and processing takes too long
  • 13. Can’t Use It The Way You Want To 1 2 3 4 5 • Analysis and processing takes too long • Data exists in silos
  • 14. Can’t Use It The Way You Want To ? ? ? • Analysis and processing takes too long • Data exists in silos • Can’t ask new questions
  • 15. Can’t Use It The Way You Want To • Analysis and processing takes too long • Data exists in silos • Can’t ask new questions • Can’t analyze unstructured data
  • 16. 16 Transform The Way You Think About Data Cloudera
  • 17. SIMPLIFIED, UNIFIED, EFFICIENT • Bulk of data stored on scalable low cost platform • Perform end-to-end workflows • Specialized systems reserved for specialized workloads • Provides data access across departments or LOB COMPLEX, FRAGMENTED, COSTLY •Data silos by department or LOB • Lots of data stored in expensive specialized systems • Analysts pull select data into EDW • No one has a complete view The Cloudera Approach 17 Meet enterprise demands with a new way to think about data. THE CLOUDERA WAYTHE OLD WAY Single data platform to support BI, Reporting & App Serving Multiple platforms for multiple workloads
  • 18. Hadoop complements the Data Warehouse 18 OLTP Enterprise Applications Business Intelligence Data Warehouse Query (High $/Byte) CLOUDERA Store QueryTransform ETL Math Load Archive Operational BI Archival Data, Exploration, Analytics
  • 19. INGEST STORE EXPLORE PROCESS ANALYZE SERVE CDH CLOUDERA MANAGER CLOUDERA SUPPORT Cloudera Enterprise: The Platform for Big Data 19 BRINGS STORAGE & COMPUTE TOGETHER WORKS WITH EVERY TYPE OF DATA CHANGES THE ECONOMICS OF DATA MANGAGEMENT A Revolutionary Solution Built on Apache Hadoop CLOUDERA NAVIGATOR
  • 20. CDH4 20 Big Data Storage, Processing & Analytics Based on Apache Hadoop Store Land structured and unstructured data in a scalable, cost-effective repository 1 Process & Analyze Transform data in parallel and query at the speed of thought 2 Integrate Interoperate with existing platforms, systems and applications 3
  • 21. Cloudera Manager 21 End-to-End Administration for CDH Deploy Install, configure & start your cluster in 3 simple steps 1 Configure & Optimize Ensure optimal settings for all hosts & services2 Monitor, Diagnose & Report Find & fix problems quickly, view current & historical activity & resource usage 3
  • 22. Cloudera Navigator 22 Data Management Layer for Cloudera Enterprise Audit & Access Control (AVAILABLE NOW) Ensuring appropriate permissions and reporting on data access for compliance 1 Exploration & Lineage (COMING SOON) Finding out what data is available, what it looks like and where it came from 2 Lifecycle Management (COMING SOON) Migration of data based on policies3
  • 23. Cloudera Support 23 Our Team of Experts on Call to Help You Meet Your SLAs Extend Your Team Get a dedicated team at your disposal to help you solve problems quickly 1 Leverage the Experts Take advantage of our expertise to make sure your cluster operates at its best 2 Influence Roadmaps Get advocacy with the open source community to build the features and functionality you need 3
  • 24. Cloudera Manager  Management for the complete Hadoop system  The most mature & functionally advanced  The easiest to use w/built-in intelligence  Integration w/enterprise monitoring tools Cloudera Enterprise 24 CDH4  The only solution with real time query (Impala)  The only solution with HDFS high availability  The most widely deployed & proven  The broadest ecosystem of certified partners  100% open source & built for the enterprise The Best Hadoop-Based Platform Cloudera Navigator  The only data management tool for Hadoop  Cloudera Navigator 1.0: Data audit & access control Cloudera Support  Dedicated team with a global presence  Contributors and committers for every part of CDH  Tens of thousands of nodes under management across industries
  • 25. A Complete Solution 25 CLOUDERA UNIVERSITY DEVELOPER TRAINING ADMINISTRATOR TRAINING DATA SCIENCE TRAINING CERTIFICATION PROGRAMS INGEST STORE EXPLORE PROCESS ANALYZE SERVE CDH CLOUDERA MANAGER CLOUDERA SUPPORT CLOUDERA NAVIGATOR
  • 26. ALTERNATE TITLE SLIDE PRESENTER NAME DATE TITLE SLIDES Additional title slide options can be found in the HDS Icon and Slide Library. (View in slideshow mode to activate link.) NOTE CHOOSING THE RIGHT INFRASTRUCTURE FOR HADOOP SHANKAR RADHAKRISHNAN, SOLUTIONS PRODUCT MANAGER – ORACLE, SAP HANA AND BIG DATA SOLUTIONS © Hitachi Data Systems Corporation 2013. All Rights Reserved.
  • 27. HADOOP APPLICATION EXAMPLE: GENOME ANALYSIS National Institute of Genomics – Japan  Challenge: Accelerate the speed of analysis for genome data from next-generation sequencers  4 PB of data  Solution ‒ 115-node Hadoop cluster using Hitachi Compute Rack servers ‒ Reliable and scalable solution
  • 28. PROACTIVE MAINTENANCE AT HITACHI SERVER DIVISION User Inquiry Hardware Auditing Log Callcenter Log Maintenance ReportCRM Customer Data Sales/Financial Data Distribution/Stock Data Location Information Server Log Operation History BOM data Production Data Of Business System ・Proactive hardware maintenance from logs, call center data, and product information ・Leverage historical data for future product development Challenge Solution: Hadoop + SAP HANA + SAP Visual Intelligence
  • 29. • Cost-effective for low-fidelity data • Increase efficiency and utilization of resources and meet required service levels • Hardware less prone to failures • Easy to manage • Scale out to handle petabytes of unstructured and semi- structured data • Keep data closer to CPU DATA GROWTH COST COMPLEXITY INFRASTRUCTURE REQUIREMENTS FOR HADOOP
  • 30. HADOOP IN THE ENTERPRISE: ARCHITECTURE Data Warehouse Hadoop Real Time Computer (Streaming) Real Time Computer (Streaming) Outside Services (Connect to Facebook for CRM, etc.) One Platform for All Data, All Applications Other Big Data Sources (Email, Audio, Documents, etc.) Business Apps RDB Real-Time Computer (Streaming) Data Connector CxOs Data Scientist Business Users / Customers Business Intelligence Dashboard Hitachi Strength and Focus
  • 31. INTRODUCING HITACHI REFERENCE ARCHITECTURE FOR HADOOP  Pretested and validated for interoperability, performance, and scalability  Flexible − customize to fit application  Pre-validated using Cloudera, leading Hadoop distribution (certification in progress)  Complementary to existing Hitachi platforms for block, file, and object  Seamless management integration with other Hitachi solutions D A T A N O D E - H D F S T A S K T R A C K E R Name Node + Job Tracker Secondary Name Node Management LAN ENTERPRISE-READY INFRASTRUCTURE FOR HADOOP D A T A N O D E - H D F S T A S K T R A C K E R LAN
  • 32. REFERENCE ARCHITECTURE: HARDWARE COMPONENTS Qty Form factor Component Description 1 1U Management node Hitachi server CR 210H - 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard) - 5 x 3.5-inch 3TB NL-SAS 7200 RPM 1 2U HDFS master name node - Name node - Job tracker Hitachi server CR 220S - 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard) - 12 x 3.5-inch 3TB NL-SAS 7200 RPM 1 2U Secondary name node Hitachi server CR 220S - 2 x quad-core E2600 Series - 64GB main memory - 2 x GigE (onboard) - 12 x 3.5-inch 3TB NL-SAS 7200 RPM As needed 2U Data nodes - Data node - Task tracker Hitachi server CR 220S - 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard) - 12 x 3.5-inch 3TB NL-SAS 7200 RPM 2 1U or 2U Ethernet switches (10 GbE network) Cisco Nexus 5548 - 48 x GigE / 10GigE or Brocade VDX 6720-60 - 40 x GigE / 10GigE – form factor = 2U 1U 2U CR220S Switch-2 42U Internal HDD Switch-1 1U • High density (2U), high processing power (2 CPU sockets), large data storage (12 HDD) • Redundant power supplies • Eco-friendly power saving capabilities Why Compute Rack Servers?
  • 33. Component Version Description Operating System 6.3 Redhat or CentOS 64-bit Linux distribution Hadoop distribution CDH4 Cloudera Hadoop distribution Hadoop management 4.0.1 Cloudera Manager Management framework n/a Hitachi Compute Systems Manager REFERENCE ARCHITECTURE: SOFTWARE COMPONENTS Tested Software D A T A N O D E - H D F S T A S K T R A C K E R Name Node + Job Tracker HA Name Node Management LAN Reference Architecture White Paper Targeted for June 2013
  • 34. WHY HITACHI FOR HADOOP INFRASTRUCTURE  Enterprise-ready (RAS) for Hadoop ‒ Less worry about hardware failure, more focus on business value  Seamless management integration with Hitachi solutions ‒ Lower opex  Competitive pricing with commodity hardware ‒ Lower capex  One platform solution for all your data volumes, velocity and types ‒ Lower TCO, faster ROI for your big data initiatives
  • 35. 35 HITACHI CONSULTING SAI SAIPRABHU, DIRECTOR, SPECIALIZED SERVICES ART VANCIL, BIG DATA SENIOR MANAGER
  • 36. HITACHI CONSULTING As the global consulting company of Hitachi, Ltd., Hitachi Consulting brings business visions to life through in-depth industry expertise combined with innovative technology solutions and services From articulating strategy through deploying and maintaining applications, Hitachi Consulting helps clients quickly realize measurable business value and achieve sustainable ROI The Hitachi Consulting client base includes 35 percent of the Fortune 100 and 25 percent of the Fortune Global 100, along with many mid-market leaders. With offices in North America, Europe, the Middle East, and Asia, the company employs more than 5,000 professionals, with delivery centers in India and China for global delivery scale
  • 37. WHAT DO WE SEE WITH OUR CLIENTS? Business Objectives Refinement Technology Adoption without disruption Data Science Practice Adoption Business Intelligence Jump Start With Big Data Technologies Emerging Businesses Business Intelligence Practice Adoption
  • 38. DO YOU NEED AN EXECUTIVE SPONSOR?  The Internet has driven most businesses to demand better information much faster than ever before across almost every industry  Examples: Retailers can influence the next shopping visit based on analytics; Amazon can tailor a shopping visit on a variety of dimensions (personalization, price incentives, product combinations, etc.). How will similar dynamics impact your company? Perhaps your company has not yet started using Hadoop for big data initiatives. Or, perhaps you are stuck in "discovery mode" trying to find that golden nugget big idea from big data. If your company is like mine, you will not be given permission to simply play with Hadoop for months on end In most companies your time spent on a project needs to be backed by someone with a budget who wants to get something done. Let's look at successful methods to secure your big data executive sponsorship.
  • 39. HOW DO I GET STARTED? Award-winning luck #1 1. Your executive brings to you the justification for big data Award-winning luck #2 2. Your subject matter expert and your data scientist pour over the data until they find the “golden nugget” of justification If you have no budget for big data, then perhaps you are waiting for a stroke of luck? Stop waiting, and begin now to collaborate with your business consultant to discover the data value and the “essence” of your big data business opportunity
  • 40. THE NITTY-GRITTY DETAILS CEO/ CSO • Predict the Future COO • Optimize the Business Process CMO CFO/ CTO • Deliver Faster and Cheaper Hitachi helps you to choose your big data solution by targeting the message to your sponsor’s role and asking the BIG QUESTIONS • Nurture the Customer Relationship
  • 41. FOR EXAMPLE A high-end disk storage manufacturer collects daily performance data from its customers’ storage devices, but cannot effectively analyze it BECAUSE OF THE VOLUME The big questions to ask: If we stored the data in Hadoop, then  Could we detect operational patterns that predict device failure worldwide?  Could we anticipate the failure AND suggest a replacement without downtime?  Could we sell the data analysis back to the customer for a fee?  Could we reduce the support effort by delivering proactive notifications?  How much revenue would we gain/costs would we eliminate?
  • 42. SOLUTION SELECTION FRAMEWORK The solution discovery and evaluation process is a top-down survey of organizational leadership followed by a prioritization and ranking, based upon business value and organizational priorities All Possible Solutions and Purposes Solution Solution Solution Solution Solution Solution Solution Solution Prioritized Big Data Solution Selection Feasible Solutions Solution Solution
  • 43. SPONSOR CONVERSATIONS: ESTABLISHED BUSINESS INTELLIGENCE ENVIRONMENT Specific use cases that address chosen pain points to be tackled using big data capabilities Measures that show how the use cases alleviate current pain points External expertise needed to augment your big data jump start Action plan to implement prioritized use cases and evaluate larger adoption of big data capabilities Executive sponsor buy-in Executive sponsor oversight Funding
  • 44. LEVERAGE BIG DATA CAPABILITIES Extend Historical Transactions Availability Extend Data Staging, Volume Processing and Complex Data Processing Extend Complex Data Processing Ability to Process Large Volumes Flexibility and Complexity Management Leverage Emerging Capabilities Extends Existing Data Management Environment Introduces New Analytic Capabilities
  • 45. BIG DATA TECHNOLOGIES: ADOPTION STRATEGY Protect Existing Investments That are Already in the Right Place. Introduce Big Data Technologies to Enable new and Evolving Business Needs Big Data Appliance Existing Transactional Sources Social Media Sources Existing Analytic Capabilities Structured Data Management and Existing Data Management Batch or Stream Current Augmentation to Structured Data Management (Limited) Stream and Organize Stream and Organize Stream and Organize Sporadic Analytic Capabilities Big Volume Data Analyses High Velocity Data Analyses Unstructured Data Analyses Protect Investments as Needed Streamline as the Environment Matures Expand as Demand grows Introduce New Capabilities Introduce, Consolidate and Expand New Capabilities Enterprise Analytics 1 2 4 3
  • 46. SPONSOR CONVERSATIONS: EMERGING BUSINESS INTELLIGENCE ENVIRONMENT Business intelligence competencies needed to attain and sustain competitive edge Measures that help monitor business operations alignment with business strategies External expertise needed to augment your Big data and business intelligence jump start Action plan to implement and evaluate larger adoption of big data business intelligence capabilities Executive sponsor buy-in Executive sponsor oversight Funding
  • 47. NEXT STEPS • Hitachi Unified Compute Platform for Business Analytics web page • http://www.hds.com/products/hitachi-unified-compute-platform/business-analytics.html • Contact your HDS sales rep for more information
  • 48. QUESTIONS AND DISCUSSION
  • 49. UPCOMING WEBTECHS  WebTechs ‒ Take SAP HANA From Proof of Value Through Production Deployment, June 20, 9 a.m. PT, noon ET ‒ A Cloud You Can Trust–Improve Datacenter Efficiency and Agility, June 26, 9 a.m. PT, noon ET Check www.hds.com/webtech for  Links to the recording, the presentation, and Q&A (available next week)  Schedule and registration for upcoming WebTech sessions
  • 50. THANK YOU