Big data tim


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Big Data has existed for a while and is far more than just massive volumes of data from novel sources. As customers note it is about how you manage, analyze and use and combine the data. SAP makes Big Data Real.what big data is not –e.g. SOH, BWoH.What would make them big data –e.g. sales forecasting on soh/bwoh; mashing up corporate data with social media data or with call center notes.
  • What I say: HANA is tremendously important to SAP’s vision. We are no longer the “ERP company”, as you may think. Following the acquisitions of Business Objects, Sybase and recently SuccessFactors, SAP now leads several important enterprise business categories. Furthermore, by investing in in-house innovation, we have now assembled a vertically integrated business data management stack, all the way from data management appliances to applications to on-demand application services, providing increased customer value. And HANA is at the heart of this strategy!
  • SAP big data platform: Open StrategyWe are planning on SAP formally reselling a HANA+ Hadoop bundle on SAP’s price list. vendors HortonWorks, Intel, Mappr, cloudera, Hadoop distribution are in every account and we work with them allStrategic announcement coming
  • SAP offers the applications and analytic tools that help you to infuse Big Data insights directly into your business processes, and equipping your employees, partners and customers with access to data in order to uncover and monetize insights
  • Big Data is a big opportunity for most companies and is something that they must embrace to competitive. It has existed for a while and is far more than just massive volumes of data from novel sources. As customers note it is about how you manage, analyze and use and combine the data. SAP makes Big Data Real.
  • Hadoop runs on the Hadoop Distributed File System (HDFS), a distributed file system that scales out on commodity servers. Since Hadoop is file-based, developers don’t need to create a data model to store or process data, which makes Hadoop ideal for managing semi-structured Web data, which comes in many shapes and sizes. Because it is “schema-less,” Hadoop can be used to store and process any kind of data, including structured transactional data and unstructured audio and video data. However, the biggest advantage of Hadoop is that is open source, which means that the up-front costs of implementing a system to process large volumes of data are lower than for commercial systems. However, Hadoop does require companies to purchase and manage dozens, if not hundreds, of servers and train developers and administrators to use this new technology.Apache Hadoop enables applications to work with thousands of independent computers (nodes) which are collectively referred to as a cluster (if all nodes use the same hardware) or a grid (if the nodes use different hardware). The main components used in Hadoop to run a job include: Client: submits the MapReduce job.Jobtracker: coordinates the job run. The jobtracker is a Java application whose main class is JobTracker.Task trackers: Run the tasks that the job has been split into. Tasktrackers are Java applications whose main class is TaskTracker.Hadoop distributed file system (HDSF)HDFS is file system that sits on top of native file systemDifferent blocks of a file stored in different nodesName node keeps tracks of which blocks are make up a file and where those blocks are locatedAuto rebalances, auto replicationsUniform namespaceMapReduceHadoopMapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. Computational processing can occur on data stored either in a file system (unstructured) or in a database (structured).Map: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-leve tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.Reduce: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.MapReduce allows for distributed processing of the map and reduction operations. Since each mapping operation is independent of the others, all maps can be performed in parallel – though in practice it is limited by the number of independent data sources and/or the number of CPUs near each source. Similarly, a set of 'reducers' can perform the reduction phase - provided all outputs of the map operation that share the same key are presented to the same reducer at the same time. While this process can often appear inefficient compared to algorithms that are more sequential, MapReduce can be applied to significantly larger datasets than "commodity" servers can handle – a large server farm can use MapReduce to sort a petabyte of data in only a few hours. The parallelism also offers some possibility of recovering from partial failure of servers or storage during the operation: if one mapper or reducer fails, the work can be rescheduled – assuming the input data is still available.
  • SAP HANAIn-memory platformStore billions of recordsAnalyze in real-timeBuilt-in predictive, text, and spatial algorithmsHADOOPDistributed disk platformStore infinite amounts of unstructured data Search in batchNon-relational data storeSpecialized skills to implement and codeMany add-on libraries and packages
  • Big data tim

    1. 1. Today we measure available data in zettabytes IN 2011, THE AMOUNT OF DATA SURPASSED 1.8 90% OF THE DATA IN THE WORLD TODAY has been created in the last two years alone ZETTABYTES COMBINED GDP OF: 1.8 ZETTABYTES = 57.5 BILLION 32 GB iPads **IDC Digital Universe Study Extracting Value from Chaos © 2013 SAP AG. All rights reserved. = $34.4 • • • • = TRILLION US • France Japan • UK China • Italy Germany 1 Confidential 1
    2. 2. Where is this data? Types and Volumes of Data … Traditional content types, Including unstructured data, …have grown dramatically are growing by up to 80% per year CRM Systems M2M data Transactions Sales Order Mobile ERP Systems Instant Messages Transactions Planning Email Things Sales Order Things Demand Legacy EDW © 2013 SAP AG. All rights reserved. 2013 SAP AG. All rights reserved. © Planning Legacy ERP Structured data grew by Inventory more than 40% per year Mobile Customer 2 2
    3. 3. What can’t we see? WHAT CRITICAL “NEW SIGNALS” MIGHT WE BE MISSING? Is it in our ERP Systems? Our M2M data? Social? © 2013 SAP AG. All rights reserved. Confidential 3
    4. 4. Big Data - Definition “Big Data” refers to the problems of capturing, storing, managing, and analyzing massive amounts of various types of data Big Data Challenge: turn raw data into insights that drive business value and manage in a cost effective manner; Most commonly this refers to terabytes or petabytes of data, stored in multiple formats, from different internal and external sources, with strict demands for speed and complexity of analysis © 2013 SAP AG. All rights reserved. 4
    5. 5. The SAP you need to know System of Engagement “Newer SAP” SAP Cloud Maintenance & Operations 24/7, SLA’s, DR & HA, Elasticity mobile System of Record Business Suite (ERP) Business Analytics “Foundational SAP” Data Logistics/Quality ETL In Memory Database Platform In Memory / Columnar/ MPP/ Federation © 2013 SAP AG. All rights reserved. Confidential 5
    6. 6. In Memory Database Platform Digging Deeper In Memory / Columnar/ MPP/ Federation SAP Business Suite Text Core PLM OLAP SRM OLTP SCM ERP Apps CRM Custom Predictive BI HANA SAP BW HTTP Native Apps Geospatial Models Engines Logical memory HOT disk WARM cached Bulk/Streaming/Real-time User Interface & Applications COLD Physical Table(s) Virtual Tables Ingest Engines Federation Data Logistics (Data Services , SLT, CEP) COLD 100101 011010 100101 © 2013 SAP AG. All rights reserved. Other DB Other ERP Other Data … Confidential 6
    7. 7. Open Hadoop Strategy © 2013 SAP AG. All rights reserved. Confidential 7
    8. 8. Accelerated BI with SAP BusinessObjects and SAP HANA One unified and complete BI Suite addressing the full spectrum of BI on SAP HANA Discovery and Analysis Dashboards and Apps Reporting Discover. Predict. Create. Build Engaging Experiences Share Information  Discover areas to optimize your business  Deliver engaging information to users where they need it  Securely distribute information across your organization  Adapt data to business needs  Track key performance indicators and summary data  Give users the ability to ask and answer their own questions  Tell your story with beautiful visualizations  Build custom experiences so users get what they need quickly  Build printable reports for operational efficiency © 2013 SAP AG. All rights reserved. Confidential 8
    9. 9. Data Logistics SAP Business Suite Trigger Based, Real Time SAP LT Replication Server SAP BusinessObjects tools DB Connection SQL ETL, Batch SAP BW Other query tools BICS SQL MDX HANA Studio ODBC SAP BOBJ Data Services Log Based Non SAP Data Sources SAP In-Memory Database ECDA/ODBC Sybase Replication Server In Memory Models Column Store Event Streams M2M SAP Event Stream Processor * ODBC SAP HANA Data Sources © 2013 SAP AG. All rights reserved. * SAP HANA Roadmap ** SAP ERP & BW Extractors Confidential 9
    10. 10. SAP Big Data Apps • Customer Engagement Intelligence • Predictive Analytics RDS See overview © 2013 SAP AG. All rights reserved. Confidential 10
    11. 11. Delivering On Your Business Imperatives Data Science Services Forecasting Sales and Demand  Forecast demand and managing inventory levels in perishable CPG  Model variant cannibalization and impact on manufacturer forecasts  Utility load demand forecasting Check and Compliance  Deliver faster response time and higher throughput of compliance checks to enable competitive advantage  Tackling public fraud waste and abuse by analyzing records for tax discovery Optimization Performance and Insights  Optimize transport and logistics recover from unforeseen disruptions  Maximizing guest / customer experience  Optimize depth and timing of retail markdowns to boost sales  Assess the impact of promotions, and improve profitability  Grow deposits not excessive interest costs  Directional insight on growing revenues and basket sizes Contact “DL BigDataSalesSupport” for more information about SAP Data Science Services © 2013 SAP AG. All rights reserved. Confidential 11
    12. 12. HANA + Hadoop
    13. 13. What is Hadoop  Open source project inspired by Google/Yahoo  Used at Yahoo, Facebook, eBay, LinkedIn, startups, Fortune 500 enterprises to store and process Petabytes of data on thousands of servers  Hadoop components – Cluster of commodity servers – Distributed storage layer (Hadoop Distributed File System, or HDFS) – Distributed processing infrastructure (MapReduce programming model) Cluster of Commodity Servers Hadoop    NameNode 10s to 1000s DataNode(s) © 2013 SAP AG. All rights reserved. Hadoop Software Architecture Hadoop Computation Engines Hive HBase Mahout Pig Sqoop … Map-Reduce Data storage (Hadoop Distributed File system) Confidential 13
    14. 14. Apache Hadoop Software framework for distributed data processing  Hadoop Distributed File System (HDFS) – reliable data storage on commodity hardware HDFS Name Node (stores metadata) Data Node Data Node  HIVE -- data warehousing solution on top of Hadoop with direct access to HDFS and Hbase (stores actual data in blocks) replication (stores actual data in blocks) client  MapReduce – programing model for parallel data processing and query execution © 2013 SAP AG. All rights reserved. HDFS Input MapReduce process HDFS output Confidential 14
    15. 15. Why Hadoop? Pros  Free software  Cheap hardware - commodity servers  Scalable to thousands of nodes and petabytes of data  Highly fault-tolerant storage and processing  Flexible – write Java MapReduce programs to do any kind of processing; any data- no fixed schema needed  Open source libraries & tools Cons  Specialized skillset to administer and develop – Hadoop is not free!  Require more development (programming MapReduce & other NoSQL tools) than relational technologies (SQL, stored procedure)  HIVE/PIG/Impala not as performant nor as mature as relational tech  Batch-oriented jobs, not real-time  Less mature in enterprise readiness – security, ETL, management, monitoring, etc © 2013 SAP AG. All rights reserved. Confidential 15
    16. 16. SAP HANA + Hadoop Provides Real-Time on BIG DATA Combine INSTANT Results with INFINITE Storage HADOOP 8 SAP HANA 1.0sec Infinite storage Instant Results • Modern in-memory platform • Distributed disk platform • Transact/analyze in real-time • Store infinite amounts of unstructured data • Native predictive, text, and spatial algorithms • No-SQL access © 2013 SAP AG. All rights reserved. Confidential 16