Tom Norton, Torsten Herkel – BB1691 - Infrastructure positioning to underpin a big data strategy
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Tom Norton, Torsten Herkel – BB1691 - Infrastructure positioning to underpin a big data strategy

on

  • 753 views

HP Experts: Tom Norton, Torsten Herkel, presentation deck from HP Discover 2012 Frankfurt “Infrastructure positioning to underpin a big data strategy"

HP Experts: Tom Norton, Torsten Herkel, presentation deck from HP Discover 2012 Frankfurt “Infrastructure positioning to underpin a big data strategy"

Statistics

Views

Total Views
753
Views on SlideShare
668
Embed Views
85

Actions

Likes
0
Downloads
18
Comments
0

1 Embed 85

http://h30507.www3.hp.com 85

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • “How should my IT architecture change to take advantage of new Big Data technologies? How should I prepare, and what should I prioritize?” These are typical questions that IT managers are asking us when they’re considering a Big Data initiative. To help answer these questions, HP has developed a Transformation Workshop that will help frame the conversation and determine key initiative points and action items. The basics of a transformation start with a description of the “As Is” and “To Be” models.How is the IT Infrastructure designed today when it comes to data, and what are the critical aspects that need consideration? In order to effectively evaluate the current state (As Is) of a data-related IT Infrastructure environment, certain critical aspects need to be assessed. These typically can be described as follows:Information is: managed in efficient silos; separated by type of data; and separated by process.Typical transaction data are generated by business applications like SAP that use a very structured RDBMS as a way to store information and operate the business.Business information collected from different application systems and different structured data sources is currently stored in a central data warehouse used for business intelligence and analytical purposes.Some silos of information are collected and stored, while others are completely or partially deleted.Some silos have information that resides in the cloud.EACH information silo has its own governance operation and security policy tools and processes in place.The consequences of this architecture include the following:Most of the data is not collected.Possible relevant data is resident in different silos and is difficult to access .Operation Governance and Security are very focused and optimized for each silo.Actual data store technologies are limited when it comes to dealing with large volumes of data.Unstructured data is typically not managed.We can represent the actual AS IS status using the following model.This can be considered a high level IT architecture model, as it refers to the data infrastructure organization inside the IT infrastructure.
  • Big Data Characteristics: Big Data means many things to many people. However, most would agree that Big Data is a collection of data sets so large and complex that it becomes difficult to process and store conventionally. Many challenges in capturing, curating and storing are experienced that an entire industry has been born just to provide scalable solutions to search, process, analyze and manipulate the data.Here at HP we believe that Big Data consists of four characteristics:- Velocity (Speed & Direction) – Velocity deals the speed at which data is created, refreshed and transmitted. The speed at which data is produced as well as the speed at which it must be processed in order to meet demand is problematic for most organizations in terms of available computing power. Real-time or in-time processing requirements drive velocity. -Volume (Data Scale) – Transaction volumes of Big Data generally far exceeds the capability of normal IT enterprises. Having data storage requirements grow exponentially as well as the need to analyze massive amounts of data quickly overwhelms many enterprise platforms.- Variety (Data Format) – Big Data introduces several types of data formats captured from social media sources, sensor data, databases and other. Although the data is generally unstructured, Big Data can also include structured data. Handling these disparate data formats (voice, video, data, picture, etc.) uniformly requires highly-specialized analytical applications and high-volume processing platforms.- Voracity (Consumption & Processing) – In order for Big Data to be relevant, it must be continually refreshed and keep up with the consumption demands of users. Once organizations begin to the see the value of Big Data, their appetite or craving for more data types and volume increases. This demand must be met by a scalable and extensible solution. The graphic presented here represents the four V’s of Big Data characteristics and the manner in which they interact. Determining the level of interaction will help an organization define the most appropriate environment in which to process their Big Data. Four sample use cases of Big Data are displayed at each axis point of the graph. For example, if an organization has a relatively small volume of Big Data, a limited amount of variety, but a very high demand for in-time processing, then they would be best served by acquiring a Big Data platform that is predominately designed for speed.This graphic is a representation of a model that HP uses to help organizations size their Big Data processing platforms. The selection of the V’s creates the Big Data use case.
  • How can we deal with the different ”V’s” of Big Data (like Volume, Velocity, Variety or Voracity), and what are the possible impacts on “As Is” IT architecture?To help answer these questions, HP has created a new “To Be” model with input from many sources, such as an article by Hortonworks’ Shaun Connolly titled Big Data Refinery Fuels Next-Generation Data Architecture. And we have further elaborated the concept of the Big Data Refinery System, since this is the key element in an IT transformation for Big Data.The future IT infrastructure for Big Data requires:A Big Data Refinery system that can provide key ‘Big Data Services’ like:Handle the Capture/Store/Aggregation of data, Elaborate on requests or queries or other transactions on this data, Provide information interaction and linkage with other processes. Provide a development environment to create, discover and test new analytics. A well-defined interlock system to link the Refinery with the Transaction and current Business Intelligence platform An integrated approach when it comes to Governance, Protection and Management of data.1. The Refinery SystemThe core element of this infrastructure transformation is a system that will enable IT to be a Big Data Service provider to the business. The primary Services for a Refinery system that we have identified are: Capture, Store, Analyze, Develop and Search.Having a specific Refinery system will allow us to choose the best platform to store, aggregate and transform multi-structured data without interfering with actual business transactions and interactions of systems. Having all defined as Service will enable IT to choose where best to provision the service –on premises, on the cloud – and provide the best approach. Each service of the refinery has a technology impact and/or a technology decision.Here is where packages like Hadoop, MapReduce, Pig, Hive and Vertica, or solutions like Autonomy, can provide the technology platforms to build the refinery system. The platform decision will depend on the business case and on the data sources and data types to manage.2. The Interlock SystemTo unlock the value of the Big Data refinery system we need to have a clear interlock with runtime models and data for ongoing refinement and analysis, as well as linkage with the Business Intelligence part of the system. Seamless transaction and integration between these systems will unlock value. When choosing the platform for the refinery system, a key consideration, then, is the integration capabilities and impact of the platform.An Interlock platform can be composed of several and different connectors between the Refinery Transaction and Business Intelligence platform.3. Integrated Governance, Protection and ManagementThe refinery system does require a new integrated Data Lifecycle Management approach that will become a critical component of the Big Data solution. All aspects need to be considered, beginning with data creation, continuing through data storage, and terminating with data destruction. Big Data Lifecycle Management is essential to the success of any Big Data initiative.Key TakeawaysToday, IT infrastructures that support data architecture need to evolve. They will need to accommodate new systems that support the services we identify as the “Big Data Refinery”. These must be capable of storing, aggregating, and transforming multi-structured raw data sources into usable formats that help fuel new insights for the business. The connection of this system with actual business transactions and interactions, as well as business intelligence, will allow the generation of business value. A new Big Data Lifecycle Management approach will be critical to ensure long term success.Enterprises’ IT departments need to understand the analytics requirements for Big Data and plan for the transformation of today’s infrastructure in order to provide Big Data services to the business.
  • Functional Architecture: In order for any Big Data implementation to be successful, it must be based on a sound architecture. The architecture is used to construct the Big Data platform according to proper practices ensuring that it is built to the correct requirements.The example presented here represents (only) the Functional Architecture View as defined by HP’s Global Methods Information Technology Solution Architecture (ITSA) Model. It is just one of four views of the HP ITSA Model. The four views consist of Business View, Functional View, Technical View and Implementation View. The focus of this discussion is the Functional View. Together, the four views enable the solution team to understand the needs of all stakeholders in order to form a consensus of the stakeholders as well as describe and direct the solution to a successful completion. The Functional View addresses the “what” of a Big Data deployment. Essentially it answers the question “if I make this investment in Big Data, what will it provide me?” The primary stakeholders in the Functional View include the solution users, business process designers and information modelers.Specifically:- What will the completed solution do?- How will it be used?- What information will it provide?- Whom will the information be provided to?- What services should the solution provide? - What qualities should the solution have?- The Functional View is independent of technologies, products or implementation.The following describes each attribute of the Functional Architecture View:- Users & Organizations – Which users or organizations that will use the information and how they will use it will define the Voracity characteristic of the Big Data solution.- Information Structure – How the information is structured, its format, and context will define the Variety characteristic of the Big Data solution.- Information Flows – Understanding where the data will originate, the feeds that will transmit the data, the type of data flows and speed (Mps) will define the Velocity characteristic of the Big Data solution. Velocity is also influenced by the external sources the information will originate. These would include minimally cloud, Content markets and 3P service providers.- Related Systems – The systems required to process, store, transmit and analyze will define the Volume characteristic of the Big Data solution. In many senses, volume is a product of Voracity and Velocity.
  • Understand Data Sources: Creating the appropriate Big Data platform begins with understanding the data sources that will be processed. This is critical as the source is where Volume, Velocity and Variety is specified and/or determined.The primary source of Big Data consists of: - Social Network Profiles (Facebook, LinkedIn, Yahoo, Google, etc.) - Social influencers (Blogs , Twitter, Facebook “likes,” Yelp-style catalog and review sites, etc.) - Activity-generated data (Computer and mobile device log files to include web site tracking information, application logs, and sensor data) - Software as a Service (SaaS) & Cloud Applications (Salesforce.com, etc.) - Public (Microsoft Azure MarketPlace, SEC/Edgar, Wikipedia, IMDb, etc.) - Hadoop MapReduce Application Results (The next generation technology architectures for handling and parallel parsing of data from logs, Web posts, etc.) - Data Warehouse Appliances (Teradata, IBM Netezza, EMC Greenplum, etc.) - Columnar/NoSQL Data Sources (MongoDB, Cassandra, InfoBright, etc.) - Network & In-stream Monitoring Technologies (Packet evaluation and distributed query processing-like applications as well as email parsers) - Legacy documents (Archives of statements, insurance forms, medical record and customer correspondence, etc.) This graphic depicts examples of Volume, Velocity, and Variety.
  • Big Data Classification: The classification of Big Data is critical to calculating Volume. The same criterion that applies to conventional data equally applies to Big Data. Just because it’s Volume, Variety, and Voracity increases does not eliminate the need to protect the data according to its classification schema. Within the Big Data world it becomes important to devise practices where data being created is classified upon creation and marked accordingly with Meta Data and Protective marking.The following outlines some central concerns regarding Big Data classification:- Class: A classification label is required for Big Data in order to ensure that data having that classification is treated in a specific manner according to its associated data protection and handling policy. Data classification should have stringent requirements.- Retention Period: Retaining data for an extended period of time is the single largest driving factor of expanding data storage. The fact that organizations will be dealing with huge amounts of data that potentially could significantly impact their data retention requirements needs to be addressed. - Recovery Time Objective (RTO): Big Data may or may not need to be recovered depending on its source and use. Making this decision will have expensive impacts on the DRP budget if made incorrectly. One must determine the point in time by which Big Data must be recovered, e.g., 30 min., 1 hour, 1 day, etc.- Recovery Point Objective (RPO): Generally Big Data is continually refreshed; however, there may be portions of it that need to be retained for historical reasons and subsequently have a smaller window by which historical can be lost. The RPO is what point in time that the data must be restored to? -Forensic Window: The speed at which Big Data moves and the speeds at which log files are overwritten makes forensics investigations difficult at best. Subsequently, it is important to understand which types of Big Data or Big Data logs could be used for a forensics investigation. The answer to this question will shape how long data is retained.
  • All data has a lifecycle, a series of stages that it goes through. We need to understand each of these stages, and how data transitions from one stage to another.This representation from HP experience is aligned to guidance from CSA of big data protection Lifecycle management (DPLM)The typical workflow for big data is shown in blue, we capture the data – from many different sources including social media, business transactions, documents, the web and other places. We then move the data across the Internet or a private network to some form of storage. The data is retained for some period of time – depending on how it is categorized and what our retention policies require, and eventually we may archive the data to slower lower cost storage, or we may destroy the data because it has no further value.During all of these stages we have to ensure that the data is appropriately protected. We need to have a backup strategy that enables us to restore data when required; we need to protect the data to ensure that it has the right levels of confidentiality, integrity and availability, and we need a governance framework to ensure that we make the right decisions about how data should be managed and that these decisions result in the right organizational behaviour.Data capture is the first stage of our big data management lifecycle.Data capture is not just a matter of accepting whatever data we are offered. You need to understand all the different sources of data that will be captured, as different controls may be needed for each data source. You may need to do integrity checking of the data as it is captured, verifying the format, accuracy and completeness of the data. At this stage you may reject data that does not pass these checks, or it may be subject to additional steps to manage the integrity issues. Duplicate data should be identified to reduce wasteful storage, but it may still be important to record information about the duplicate, as each instance may have significance. Other security checks might involve checking that the data originates from the correct place, that it has an appropriate digital signature, or a valid timestamp.In order to plan the correct resources for data capture, you also need to understand the four “Vs,” and provide appropriate operation/governance/security to accommodate all of their requirements:Variety of data formats to be supported. Each of these may require different management approaches. There will probably be different ways that you capture video, audio, images, text or transaction records for example – and you will need to define governance, operational controls and security controls that are appropriate for eachVelocity of data. How fast will the data arrive, where will it arrive from, do you have the correct infrastructure in place to support the planned velocity, and what will happen if there are unexpected events that impact the data velocity? How could this impact the confidentiality, integrity or availability of the data?Volume of the data. How much data will you collect over time, resulting in what storage requirement – what does this mean for operation/governance/security?Voracity – how will you use the data, what analysis and processing will be required and will this scale appropriately, what protection is needed to ensure that this is done in a secure way?We will discuss each of these lifecycle stages in more detail in the next few slides.
  • Data backup can be used to allow recovery from many types of incidents that would otherwise result in loss of integrity or availability of data.Before you can devise a backup strategy you need to understand the recovery requirements in terms of two parameters:Recovery Time Objective (RTO). How much time can elapse between the start of an incident and the data being available again.Recovery Point Objective (RPO). How much data can be missing after the recovery is complete. This is usually expressed as a time, for example it may be acceptable to lose all data that was captured in the 24 hours before the incident.When you have understood the RTO and RPO for each type of data you can define a backup and restoration policy that will enable you to meet these requirements. The options for backing up each type of data will also have to take into account the four Vs (Velocity, Voracity, Variety, Volume) as these impact the frequency and size of backup that may be required.The options for backing up data will typically include:Storage Backup – data that is copied to disks so that it can be recovered very quickly.Tape Backup – this will typically be significantly cheaper than storage backup, but will take longer to recover.Electronic Vaulting – data that is copied offsite via a network connection, rather than by transporting tapes.
  • Some data has a very short lifetime, but some data needs to be preserved forever. The length of time that you need to retain data depends on the type of data, and on the compliance requirements that apply to you. This means that you must:Understand all the compliance requirements that apply to your business, in every country that may have jurisdiction over your data. These requirements vary from industry to industry, and are very different in different countries. This slide shows some examples of different compliance sources, but there are many others. Regulations such as HIPAA (Health Insurance Portability and Accountability Act) only apply to a specific industry, but others such as Sarbanes-Oxley apply to all organizations that operate within a particular geography.Categorize your data based on your understanding of the various compliance requirements. Data with similar requirements can be grouped together for a common treatment. Often you will be able to use this same categorization to identify RTO and RPO requirements as well as retention requirements.Document a retention policy that ensures each type of data is handled correctly.Create operational controls to ensure that data is moved from online storage to an appropriate retention medium based on the policy.Manage the retained data to ensure that is remains available for the required duration. Depending on the exact requirements this may, for example, require retained data to be copied to new media at regular intervals.
  • Archive media is typically slower, but cheaper, than online storage. Data can be moved to archive storage to reduce the volume of data stored on fast expensive media, while retaining the ability to access the data if needed.There are many different reasons why data may need to be archived. For example.To support requirements for infrequent ad-hoc reports.To provide historical data that can be used for comparison with current data.It is not always necessary to (always) archive the full set of data. If you understand how the data will be used then it is often possible to pre-analyze the data and store a summarized or reduced set of data that is sufficient for the requirements.It is important to have a policy that governs archival of data, and this archival policy needs to be consistent with policies that define the approach to be taken for data backup, retention and destruction. The policy will define which data is to be archived, when this will happen, and what type of storage will be used. This policy must be based on classification of the data.Data that has been moved to archive media still needs to be protected, to ensure that appropriate levels of confidentiality, integrity and availability are provided. The protection may be different to that provided for the live data, but it must be based on understanding of the risks and governance requirements.
  • Data should be destroyed when it is no longer needed, based on your data classification and archival policy.It is not sufficient to simply delete the data. This could lead to a significant breach of confidentiality, depending on what happens to the media where the data was previously stored.The method used to destroy the data should depend on the sensitivity of the data and the medium on which it is stored.Purging involves deleting the data and then (typically) overwriting the newly freed space on the storage medium to prevent the deleted data from being read. Data that has been encrypted before it was stored may sometimes be purged by deleting the encryption key, so that the data is no longer recoverable. Purging can be used to destroy some of the data on a storage medium that is still in use, unlike the other methods described here which destroy all data on the medium.HD Shredding physically destroys the hard drive, and reduces it to metal fragments.HD Wiping is less destructive than shredding, it involves overwriting the entire hard drive, often with multiple passes using different data patterns.Degaussing can be used to erase data from disks or tapes. It involves passing the media through a powerful magnetic field to randomize the magnetic domains that store the data.
  • Governance defines strategy and oversees its execution to ensure that the organizations goals are achieved. Governance should not be confused with IT management, which is responsible for planning, building and running the technology in order to deliver the services and meet the governance objectives.Governance typically involves three activities:Deciding what needs to be done. For big data governance this includes making the high level decisions that result in big data protection standards and quality metrics that are appropriate for the organization and for the types of data that will be managed.Informing people of their responsibilities. This involves publishing policies for all areas that contribute to the management of big data, and ensuring that there is clarity on who will make what decisions and how these will be communicated.Monitoring to ensure that the organization is acting on governance decisions. This involves monitoring processes and quality metrics and taking action when these are not as expected.Good governance of big data will ensure that everything needed to protect big data is in place and is managed properly.
  • You almost certainly already have many things in place to provide the appropriate levels of protection for your existing data, systems and services. When you move to a big data solution you need to review each of the areas shown here, and make updates to ensure that they are still appropriate.Computer Security Incident Response Team (CSIRT) needs to understand big data, and how they should respond to security incidents in this new environmentData Privacy requirements may be very different. Even if your big data solution is based on existing data sources, the act of aggregating this data into a single repository where connections can be discovered may have significant data privacy implications. The security technology that you use for your existing data may not be appropriate for the Velocity, Voracity, Variety, and Volume of your big data. New tools and technology may be required for many different aspects of information security.eDiscovery requirements may require specialized tools and processes to enable you to interrogate your big data and respond correctly to litigation. eDiscovery will also need to be supported by your policies for retention, archival and destruction of big data.Security Controls include not just the technical controls such as encryption, but also administrative controls such as your security policy and physical controls such as controlled offsite storage of tape media. All of these controls will need to be reviewed and updated to support the unique threats that apply to big data.
  • Data-at-rest must be protected to ensure that confidentiality is maintained, that the data is not altered – either intentionally or by accident or error – and that the data is available when it is needed. How and where you store data has a very significant impact on what is needed to protect that data. In a big data solution it is likely that you will be using many different storage techniques to ensure that data is available when and where it is needed, with the right level of confidentiality, integrity and availability. When data is replicated it is essential that all confidentiality controls are applied to every copy of the data. When you de-duplicate data this may cause a reduction in availability, if you use clustering to distribute data across multiple sites then you must apply appropriate protection in each site.Big data at rest may be protected using encryption, but this has implications for performance so may only be suitable for some types of data. The decision about how you should protect each type of data will be based on similar architectural grounds to the decision about where to store the data. You need to understand the value of the data, the potential cost if confidentiality of the data is breached, or of unauthorized alteration or loss of the data. This is where you need governance, which will set policy and ensure that the controls that you implement are appropriate for the needs of the business.
  • What issues are we seeing in terms of organization impacts?organizational structures for managing and supporting information capabilities such as data quality, advanced analytics, and business intelligence (BI)The most important issue is to create a clear understaning of how to generate value from new data sources and emerging technology.Projects should be based on clear business requirements and use cases.The business units should be prepared for usign the new analytic capability and embed it into their processes to generate actual business outcome. The requirements to adapt business unit processes are another important issue.There are technical challenges which requires changes, upgrades or addition of new technology to the current business intelligence infrastructure.This may be signinficant projects, and the issue is to break the transition down into managable tactical projects, that provides short-term value and contributes to the strategic direction.HP General Method for Business Intelligence implementations has the concept of MasterPlan, which addresses these issues.The use of Big Data may result in changed price models, product features, self-service options or new ways of interaction. Managing this kind of change is also an important issue.From IM&A Big Data Point of View, slightly modified
  • The sourcing of the service catalog for big data will be driven by the:Location of the dataCapability to process the data where it is locatedIf the data that will be processed is 100% external AND there is capability to process that data externally, then IT may be asked to choose a cloud service provider to process the data. In some cases, for example Microsoft datamart and Microsoft Azure, a single company may have the data required and may be able to serve up Hadoop nodes to process the data. Access to the data, the processing, and results will need to be setup as well. Other items such as SLAs are naturally part of the discussion. If the data to be processed/analyzed is both external and internal then negotiation as mentioned with “cloud” will take place but also IT will need to determine if and how data will flow in/out of the cloud and IT service. Similarly the service offered by IT will need to be finalized along with the infrastructure to support that service. In short, internal platform and infrastructure decisions will be needed.In internal-only sourcing will require IT to locate the appropriate data, determine what service they will be offering IaaS/PaaS/SaaS, provide connectors to the source data, move data to a new platform (if necessary), manage the service and potentially determine a chargeback method for the service. From an internal or private offering standpoint the service to be offered is key.
  • This slide gives a slightly different view and a bit more insight to the platforms and the technologies that pair up with them. TOR: Top of Rack switchMPP = Massive Parallel ProcessingAs per Autonomy FAQs, it is not recommended to run IDOL Server on a SAN device. The reasons for this are firstly that SAN devices are not as reliable. Secondly network communication to the SAN device can significantly impact performance. We only recommend using SAN devices for example to store IDOL Server backup files. Order of preference is local/DAS, NAS, SAN
  • Incremental based on your maturityFast Deployment and integrationScale to meet performance expectationsSecured by design with policies, and architectures servicesIntegrating key partners solutions--------------HP address Hadoop implementation pain points with new solutions we are bringing to market in 2012. We are partnering with three of the best Hadoop distributions in the world by adding our tools in monitoring and managing Hadoop systems, in addition to services provided to ensure that our customers have the smoothest possible implementation of Hadoop and are achieving the performance, scale and security they desire.Deploy in days, not months – Choice of solutions (RA’s and appliance), consulting services, partnerships with leading Hadoop distribution vendorsScale to thousands of nodes with the push of a button – Insight CMUManage with single pane of glass – Insight CMUOptimize with real time and 3-D historical views of compute resources – Insight CMUPerform end to end analytics - Vertica

Tom Norton, Torsten Herkel – BB1691 - Infrastructure positioning to underpin a big data strategy Presentation Transcript

  • 1. Tom Norton, TC WW Big Data Strategy leadTorsten Herkel, TC EMEA Storage Service Line Manager© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 2. Infrastructure positioningto underpin a big datastrategyTom Norton, and Torsten Herkel, / December, 2012© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 3. Agenda1. Big Data: What are key IT Infrastructure transformation2. Big Data and impact on Storage3. Where to start?4. HP approach supporting your Big Data Initiative3 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 4. Big DataWhat are key transformation on ITInfrastructure© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 5. Todays Silo’ed Data Infrastructure Business Transactions & Interactions Messagin Document Multi- Sensors SocialOperation / Governance / g Data Managemen Media Data Media CRM – ERM – t Data Operation / Governance / Security Operation / Governance / Security Operation / Governance / Security Operation / Governance / Security Security SCM FMS – HRM $ € ¥ Transaction Data Search Clicks Messaging System Collaboratio Classic ETL Processing RFID Web (example n Business Intelligence & Analytics Exchange/Operation / Governance / Office 365 Sensors File Hosting Images Feeds on cloud or devices Enterprise Data on site) Security Warehouse File Sharing Video GPS Blog Analytical, Dashboards, IM and Content VOIP Other Reports, Visualization Managemen Audio events Forum t External Content or Central Internal Managed Content Central Internal Unmanaged Content Discarded Content 5 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 6. Transformation Variables Data in many forms Structured, unstructured, text, m ultimedia Variety Velocity Data creation & transport Data consumption Big Data Voracity Streaming data, milliseconds Ingestion and processing to seconds to respond Transformation of Data Variables Volume Data quantity Scale from terabytes to petabytes to zettabytes6 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 7. Big Data Integrated Model Operation, Support, Governance, Protection Big Data Refinery Business Transactions and Message Data Interactions Web, Mobile Capture Document Management CRM – ERM – Value SCM FMS – HRM Search Store Creation Multi-Media Share and refine Business Intelligence Sensors Data Develop Analyze Enterprise Data WarehouseSocial Media Data Analytical, Dashboards Reports, Visualization7 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Architecture Framework
  • 8. Business Case drive functionalities Police Department On-Line Retailer Financial InstitutionBig Question: What causes Big Question: Why can big Big Question: Why doand deters crime? box stores cause our customers close their customer churn? accounts? Functional Architecture SolutionAcquire the functional Create a conceptual model to Define disparate data sourcesrequirements to drive the socialize the need and estimated that could yield insight fromdevelopment of a comprehensive investment required to build-out a customer behaviors that portendReference Architecture (RA) that Big Data solution. Process of an account closure. Datais used to build-out their Big Data validates data velocity and sources used to estimate varietyInitiative. volume assumptions. and voracity requirements. ----------- ------------ -----------8 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 9. Big Data Functionalities impacts Related Services Users & Value Voracity Organizations Search Develop Analytics Security Related Infrastructure Systems Information Structure Servers Storage Network Management Variety Velocity Related Services Information Flows Capture Store Integrate Governance Volume9 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 10. Big Data and impact onStorage© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 11. Understand Data sources - example Data Why we need Where are they Who Own $ € ¥ $ € ¥ SQL Financial Data transactions Transaction (SAP) load SQL Structured Manufacturing Test data Tests cycle Photo & stream data Blog ! Semi-structured videosCustomer feedback Blogs and support calls Download & photos 11 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 12. Big Data Classification ! Recovery Time Recovery Point Class Retention Period Forensic Window Objective (RTO) Objective (RPO)Vital / Critical 7 Yrs. 30 Mins. <10 Mins. 6 Mos.Sensitive 5 Yrs. 90 Mins. < 1 Hr. 3 Mos.Non critical 6 Mos. 48 Hrs. < 48 Hrs. 1 Mos.12 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 13. Big Data Management Lifecycle Movement Protection Governanc Capture Storage e Backup & Destructio Restoratio n n RetentionLegend: Archival Records Protection Workflow Mgt.13 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 14. Big Data Backup & Restoration Backup Restoration Restoration Mediums Target Time Technologies Backup & Copying of Big Data Backup Mediums Technologies Restoration to restore the original Storage HP Data Backup Protector Backup & Restoration Policy Tape HP StoreAll Fire Backup Flood Electronic Storm ! Vaulting HP StoreOnce Attack Virus14 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 15. Data ClassificationBig Data Archival Archival Calendar Moving Big Data no Archival Policylonger actively used for long-term retention Extreme Data Reduction Archive Medium Scalable Storage16 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 16. Data ClassificationBig Data Destruction Destructio n Archival PolicyRemoving BigData making itunrecoverable Stored HD HD HD Tape Internal Big Data Big Data Big Data Big Data Data HD HD Purging Wiping Shredding Degaussing17 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 17. Big Data Storage Capture Core Big Data Movement Big Data Storage Storage The placing of Storage Requirements information at rest High- High Speed Speed Ingestion Query Data De- Clustered duplication Design Hyperscale storageHP Data Protector Integrated storage, StoreOnce on the solutions not only server 120 attach © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 18. Where to start© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 19. Deriving IT Big Data Architecture Identify Business Case and Analytics IT BIG Data Identify Data Architecture Sources --------- --------- --------- --------- ------- Assess your capabilities and actual infrastructure22 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 20. Determining Service catalog sourcing Data is external • Choose provider Cloud • Negotiate pricing/service Determine all Can process data Determine Data • Setup access services / Storage / via cloud service Accessing methods Develop/ • Same as cloud Data is internal & • Determine flow of data external Hybrid Determine • Determine IT Services (IaaS, Determine Data Paas, SaaS)Processing internal integration with Protection and • Determine platform & & external Internal data Governance infrastructure sources Data is internal • Determine IT Services (IaaS, Internal Determine Paas, SaaS) Determine • Determine platforms & integration with ownership /Processing internal Development Company, L.P. The information contained herein is subject to change without notice. infrastructure 23 © Copyright 2012 Hewlett-Packard management governance
  • 21. Infrastructure impact examples Platform characteristics supporting Big Data Refinery Data Type: MPP Real-time Storage Server Data Movement: Linux High Structured or Analytics Type Type Network Impact or Availability Unstructured Windows Structured Local, DAS Commodity Linux Redundancy in-Vertica   (HP TOR:  built Moonshot), or 10GBE BL, DL class (HP5920AF) NIC: 4x1GbE (HP331FLR) Both For Index Enterprise NIC: 4x1GbE Both MirroringAutonomy  NAS, DAS (SAN) (DL or BL (HP331FLR) in-built (IBRIX, 3Par) class) Both Local Commodity Linux with Redundancy in-Hadoop  (HP TOR: HP CMU built Moonshot) 10GBE (Windows (HP5920AF) coming) NIC: 4x1GbE (HP331FLR)EDW/SQL/ StructuredDevelopment Company, L.P. The information contained herein is subject toEnterprise 24 © Copyright 2012 Hewlett-Packard  SAN change without notice.  Both OS and (3PAR, P4000, (DL or BL Application
  • 22. HP Approach supporting Big DataInitiatives© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 23. HP offers the shortest route to successSolutions that combine best in class products and services Consulting Services Advisory, architecture, compliance and performance Leading Hadoop Seamless Analytics Distributions Complete solution Leading Storage and offering, providing Network solutions Factory- value to your integrated business and solution securing your company intellectual26 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. capital.
  • 24. HP Big Data Consulting ServicesGenerating Business Value from Data Architecture Strategy Making IT departments relevant to business by gaining value out of Big Data Process and Access and Architecture for Big Data Consumption Storage Platform System Performance Tuning Infrastructure volume, variety velocity and voracity Analysis Governance and Protection Consolidation Providing Infrastructure services for Collection Consolidation Protection Consumption and Protection of Data and27 Compliance © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 25. HP Big Data Service Offerings Reducing implementation risk Planning and Big Data IT ensuring Business Infrastructure Value Transformation Enterprise wide Unstructured Data Structure Database Infrastructure to Riskless Accelerating time Meaning Based Analysis with & Data Warehouse support Big Data Integration and to results search with Hadoop with Vertica Protection Autonomy Data Autonomy Design HP Roadmap Vertica Design & Server & Network Protection, Compli & Implementation Service for Implementation Services Storage Services ance & Security Services Hadoop Services Services HP Enterprise Design HP EDW / Fast Track Backup & and Reference HP Implementation DW InfrastructureSecured Optimized Storage Architecture for Hadoop Recovery Implementation for (Cloudera) ImplementationAvailable Network Optimization Services Continuity Impact Hadoop Services Analysis Service 28 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 26. HP Big Data Infrastructure TransformationWorkshop The service to create your IT Strategy Benefits • Unify internal team around scope and initiative Objectives • Set common vision and leadership • A unified Transformation Model as reference for your • Identify an IT strategy to provide Big Data transformation services / value . • Cross functions linkages : different IT functions with • Determine Big Data functionalities to determine common model integration Architecture and standards. • Understand Big Data transformation initiative and • Define A cross IT approach that span across implications security, management, operations and • Understand how your organization can successfully standards afford unprecedented challenges of Big Data • Provide IT leadership leading Big Data Initiative with business • Understand how and where HP could contribute to • Define your unique roadmap and actionable your success steps to ensure success29 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 27. Key takeaways • Are all your teams aligned?• Principles and • Are you leading transformation Requirements leading Common vision execution and leadership• Is today infrastructure ready Be prepared• Do my teams have right Is your infra. Ready? skills? Your Big Data Initiative• Needs define your unique Scope and boundaries journey• HP can help you on any steps • Limits, challenges, key success factors and solutions into your Big Data Roadmap • Stress what you have. journey • What best about storage today?30 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 28. Thank you© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.