Making Hadoop Ready for the Enterprise


Published on

Open source Apache Hadoop is a great framework for distributed processing of large data sets. But there’s a difference between “playing” with big data versus solving real problems. The reality is that Hadoop alone is not enough. In fact, almost every organization that plans to use Hadoop for production use quickly discovers that it lacks the required features for enterprise use. And, fewer still have the Hadoop specialists on hand to navigate through the complexity to build reliable, robust applications. As a result, many Hadoop projects never make it to production as executives say, “we just don’t have the skills.” In this session, we will discuss these enterprise capabilities and why they’re important: analytics, visualization, security, enterprise integration, developer/admin tools, and more. Additionally, we will share several real-world client examples who have found it necessary to use an enterprise-grade Hadoop platform to tackle some of the most interesting and challenging business problems.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • University of Ontario Institute of Technology [UOIT Case study]   Fifteen million babies are born prematurely every year. Of those, over 1 million die, often in the first month of life.   Many of these babies are in ICUs, connected to numerous monitors that measure key statistics such as heart rates, temperature, etc. Until recently, these measurements were only sampled and aggregated into 2-3 readings to indicate the health of the baby.   IBM collaborated with UOIT to develop a solution that processes 1000 pieces of information/sec … identifies patterns …correlates this with doctor’s notes and family history… applies predictive analytics … and this has allowed us to spot the onset of an infection 24 hours in advance.   Same data … but saved lives. ----------------------------------------------------- University of Ontario Institute of Technology To better detect subtle warning signs of complications, clinicians need to gain greater insight into the moment-by-moment condition of n eonatal infants in a ICU . Fifteen million babies, one in 10 births, are born prematurely every year, a global project suggests led by the WHO. Of those over 1 million die, often in the first 30 days of life – a terrible tragedy. Yet, many of these babies are in NICUs, connected to all sorts of monitors that measure key statistics such as their heart rates, skin temperature, respiration, etc. These measurements add up to 90M/patient/day, yet most of this data is just sampled periodically and written into the patient record, not used for its predictive value. IBM and UOIT developed first-of-its-kind, analytics solution using stream-computing to capture and analyze real-time data from medical monitors, alerting hospital staff to potential health problems before patients manifest clinical signs of infection or other issues. Early warning gives caregivers the ability to proactively deal with potential complications—such as detecting infections in premature infants up to 24 hours before they exhibit symptoms. Solution monitors 120 children analyzing 120K message per second, billions of messages per day. Trials expanding beyond Canada to include hospitals in US, China and Australia. IBM Innovate 2013 07/10/13 16:10 Drury Design Dynamics
  • SA_Big_Data_NYC_Feb_18_v10 07/10/13 IBM
  • Making Hadoop Ready for the Enterprise

    1. 1. © 2013 IBM Corporation Making Hadoop Ready for the Enterprise Hadoop Summit, June 27, 2013 Anjul Bhambhri Vice-President, IBM Big Data Development
    2. 2. Safe area – no graphics here Safe area – no graphics hereSafearea–nographicshere Safearea–nographicshere Big Data is the next Natural Resource Harvesting any resource requires Mining, Refining and Delivering Big Dataisthenext Natural Resource “We have for the first time an economy based on a key resource (Information) that is not only renewable, but self-generating. Running out of it is not a problem, but drowning in it is.” — John Naisbitt Cost efficiently processing the growing Volume 300x 20202005 Source: IDC Responding to the increasing Velocity 19 Billion RFID sensors and counting Source: RFID Forecasts Responding to the increasing Velocity 19 Billion RFID sensors and counting Source: RFID Forecasts Collectively analyzing the broadening Variety Source: IBM Market Information 80% of the world’s data is unstructured Collectively analyzing the broadening Variety Source: IBM Market Information 80% of the world’s data is unstructured Establishing the Veracity of big data sources 1 in 3 business leaders don’t trust the information they use to make decisions Source: IBM. BAO for the Intelligent Enterprise Establishing the Veracity of big data sources 1 in 3 business leaders don’t trust the information they use to make decisions Source: IBM. BAO for the Intelligent Enterprise 40 ZB
    3. 3. 24 hour earlier detection of infections You could detect a neonatal infections sooner? What if… Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance 120 children monitored :120K message per sec, billion messages per day Solution
    4. 4. © 2013 IBM Corporation4 Constant Contact Transforming Marketing Campaign Effectiveness with IBM Big Data • Analyze 35 billion annual emails to guide customers on best dates & times to send emails for maximum response Benefits • 40 times improvement in analysis performance • 15-25% performance increase in customer email campaigns • Analysis time reduced from hours to seconds
    5. 5. © 2013 IBM Corporation5 Automobile and Manufacturing Quality Control and Customer Satisfaction • In-flexibility and scalability limitations of existing IT solutions has been a inhibitor to competitive advantage. A new solution is needed to improve quality and operational efficiency • Inventory control of parts • Manufacturing equipment and assembly line data • Warranty and services data from dealers • Telemetry data from vehicles • Next generation of Enterprise Data Warehouse:
    6. 6. © 2013 IBM Corporation6 Big Data and Technology PlatformBig Data and Technology Platform Transactional & Application Data Machine Data Enterprise ContentSocial Data New Opportunities with Big Data & Analytics
    7. 7. © 2013 IBM Corporation7 Big Data and Technology PlatformBig Data and Technology Platform Roles and AnalyticsRoles and Analytics Data Scientist Business Analyst User New Opportunities with Big Data & Analytics
    8. 8. © 2013 IBM Corporation8 Big Data and Technology PlatformBig Data and Technology Platform Roles and AnalyticsRoles and Analytics New OutcomesNew Outcomes Enrich info base Improve customer interaction Reduce risk Gain efficiency and scale Optimize and monetize New Opportunities with Big Data & Analytics
    9. 9. Emerging Pattern of Big Data Implementation Ingest Landing and Analytics Sandbox Zone Indexes, facets Hive/HBase Col Stores Documents In Variety of Formats Analytics MapReduce Repository, Workbench Ingestion and Real-time Analytic Zone Data Sinks Filter, Transform Ingest Correlate, Classify Extract, Annotate Warehousing Zone Enterprise Warehouse Data Marts Query Engines Cubes Descriptive, Predictive Models Models Widgets Discovery, Visualizer Search Analytics and Reporting Zone Metadata and Governance Zone 9 Connectors
    10. 10. Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time The 5 Key Use Cases
    11. 11. Cloud | Mobile | Security Big Data Platform and Application Framework Gather, extract and explore data using best of breed visualization Speed time to value with analytic and application acceleratorsBI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications IBM Big Data Platform Systems Management Applications & Development Visualization & Discovery Analyze streaming data and large data bursts for real-time insights Govern data quality and manage information lifecycle Cost-effectively analyze petabytes of structured and unstructured information Deliver deep insight with advanced in-database analytics and operational analytics Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse Contextual Discovery Index and federated discovery for contextual collaborative insights
    12. 12. © 2013 IBM Corporation12 Enterprise Capabilities on Hadoop Enterprise Capabilities Administration & Security Workload Optimization Connectors Open source components Advanced Engines Visualization & Exploration Development Tools Key Platform Requirements – Built-in analytics – Enterprise-grade capabilities – Integrated with enterprise software – Ease of installation and management – Reference hardware configurations – World-class support – Full open source compatibility Business benefits – Quicker time-to-value – Reduced operational risk – Enhanced business knowledge with flexible analytical platform – Leverages and complements existing software investments IBM-certified Apache Hadoop
    13. 13. 13 © 2013 IBM Corporation Application Big SQL Engine Hadoop HiveTables HBase tables CSV Files Data Sources SQL Language JDBC / ODBC Driver JDBC / ODBC Server Big Data needs SQL • Most existing applications in the enterprise use SQL • SQL bridges the chasm between existing apps and Big Data • SQL access to all data stored in Hadoop • Via JDBC/ODBC • Using rich standard SQL • Intelligently leverage Map/Reduce parallelism OR direct access for achieving low-latency
    14. 14. © 2013 IBM Corporation14 Text Analytics: Getting measurable insights • Most of the world’s data is in unstructured or semi-structured text. • Social media is rife with discussions about products and services • Company Internal Information is locked in blobs, description fields, and sometimes even discarded • How do you get a metrics based understanding of facts from unstructured text? Healthcare Analytics: E-Medical records, hospital reports Public Sectors Case files, police records, emergency calls… Automotive Quality Insight: Tech notes, call logs, online media Insurance Fraud: Insurance claims Social Media for Marketing: twitter, facebook, blogs, forums Over 80% of stored information is unstructured* Structural analysis Mining and visualization
    15. 15. © 2013 IBM Corporation15 Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1- 0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casilas made the save. Winger Andres Iniesta scored for Spain for the win. NetherlandsStrikerArjen Robben Keeper SpainIker Casilas WingerAndres Iniesta Spain World Cup 2010 Highlights How Text Analytics Works
    16. 16. © 2013 IBM Corporation16 Text Analytics Language and Runtime  Declarative SQL-like language  Discovery tools for AQL development Text Analytics Runtime Text Analytics Runtime Input Documents Offline Runtime Development Environment AQL ExtractorAQL Extractor  High-throughput  Small memory footprint create view Employment as select R.jobType as jobType, as companyName from Company C, Role R where Follows(R.jobType,, 0, 20) and ContainsDict('EmpAssociation.dict', RightContext(R.jobType,10)); Extracted Objects Cost-based optimization Dominant Cost is CPU Role Select Join Company Dict Role Dict Select Company Join … Role Join Select Company Dict General-Purpose Linguistic Parsers DictionariesDictionaries
    17. 17. © 2013 IBM Corporation17 Enterprise Data Tools Business User Data Scientist Business Analyst Developer Administrator
    18. 18. © 2013 IBM Corporation18 Security and compliance in Big Data environments Structure d Unstructured Streaming Big Data Platform Hadoop Cluster Clients • Who is running specific big data requests? • What map-reduce jobs are they running? • Are these jobs part of an authorized program list accessing the data? • Is there an exceptional number of file permission exceptions? • Taps for Hadoop • Collects and streams audit data to Collector • Provides visibility for HDFS, MapReduce, RPC, Oozie, HBase, etc. • Securely stores audit data collected by TAPs • Provides analytics, reporting & compliance workflow automation
    19. 19. © 2013 IBM Corporation19 Data Archiving and Masking on Hadoop Data Archiving Database Hadoop Data Masking JASON MICHAELS ROBERT SMITH Mask Before Masking After Masking Mask in-databaseMask in-database ExtractExtract MaskMask Mask in HadoopMask in Hadoop Archive & PurgeArchive & Purge LoadLoad Query-able Auditable Restorable Data Query-able Auditable Restorable DataComplete Business Objects Data Integrity Schema, Metadata Retention Policies Archive files CompressCompress • Mask confidential data to avoid data breach & meet privacy compliance • Protect confidential data while preserving analytics • Support compliance with privacy regulations • Cost-effective query-able archiving • Manage, apply retention policies for compliance • Enable business users to query on Hot, Warm and Cold data
    20. 20. 20 © 2013 IBM Corporation Simplified Experience • Designed for easy and quick deployment • Built-in tools designed for users to derive value quickly • Easy connectivity to common data warehouse systems Built-in Expertise • Enables ‘what-if analysis’ and advanced analytics • Supports structured, semi-structured, and unstructured data • Built-in text processing engine and library of annotators to analyze large volumes of text-based information • Data can be used in its native format eliminating need to pre-define and map structures Integration by Design • InfoSphere BigInsights software, cluster management, and IBM System x® servers • Automatic parallelization and resource optimization to scale economically • Enterprise-class security and platform management Introducing pureData for Hadoop – BigInsights Appliance
    21. 21. 21 © 2013 IBM Corporation © 2013 IBM Corporation From Getting Starting to Enterprise Deployment: InfoSphere BigInsights Brings Hadoop to the Enterprise Enterprise Edition Breadth of capabilities Enterpriseclass Sold by # of terabytes managed PureData for Hadoop Appliance simplicity for the enterprise Quick Start Edition * Pre-announced Web-based mgmt console Jaql Integrated installApache Hadoop Basic Edition Free download Quick Start features PLUS: Accelerators Enterprise Integration Production support Production-ready features Big Sheets Text Analytics Big SQL Workload optimization/ Query support Dev tools Connectors Mgmt tools IBM Hadoop Core Free download, non-production
    22. 22. 22 © 2013 IBM Corporation Streams - Real Time Analytics 22
    23. 23. 23 © 2013 IBM Corporation InfoSphere Data Explorer – delivering insights at the point of impact Create unified view of ALL information for real-time monitoring Identify areas of information risk & ensure data compliance Analyze customer data to unlock true customer value Increase productivity & leverage past work increasing speed to market Improve customer service & reduce call times InfoSphere Data Explorer Data access & integration •Index structured & unstructured data—in place •Support existing security •Federate to external sources •Leverage MDM, governance, and taxonomies Discovery & navigation •Clustering & categorization •Contextual intelligence •Easy-to-deploy applications •All at the scale required for today’s big data challenges Providing unified, real-time access and fusion of big data unlocks greater insight and ROI
    24. 24. 24 © 2013 IBM Corporation Organizations are Building Big Data Applications on Data Explorer DataExplorerAppBuilder Warehouse Structured Enterprise Data BigInsights Data at rest Data Explorer Semi- & unstructured enterprise data Streams Data in motion
    25. 25. 25 © 2013 IBM Corporation Get Started on Your Big Data Journey Today Get Educated • IBM Big Data: • • Get Your Hands on Big Data • Download Quick Start ibm.coQuickStart
    26. 26. THINK 26
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.