2012 10 bigdata_overview
Upcoming SlideShare
Loading in...5
×
 

2012 10 bigdata_overview

on

  • 1,162 views

 

Statistics

Views

Total Views
1,162
Views on SlideShare
1,162
Embed Views
0

Actions

Likes
1
Downloads
40
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • http://a964.g.akamaitech.net/7/964/714/ee880cbf1a3897/www.forrester.com/imagesV2/uplmisc/Big_Data_Webinar__final.pdf
  • Hadoop, you may want to either access that data from Oracle Database by issuing SQL against HDFS files or by moving the data into Oracle tables.Lets start with the latter -- moving the data into Oracle tables. Oracle Loader for Hadoop (or OLH) is a high performance loader for fast movement of data from any Hadoop cluster into Oracle Database tables. Like all other parts the Big Data Connectors, it is available on any Hadoop cluster based on Apache Hadoop in addition to the Big Data Appliance.If you want to take the results and perform additional analysis using advanced BI and data warehousing technologies or incorporate in other applications, OLH is both fast and reduces the processing load on the Database server. It runs as a map reduce job and uses the Hadoop server’s processing resources to sample, sort and pre-partition the data based on the target database metadata. It can automatically take input in delimited text files (CSV) or Hive tables or you can write your own input format. OLH can either directly load the results into the database using the parallel direct path load interface or JDBC, create Oracle formatted Datapump files. OLH has built into load balancing across the reducer nodes that prevents performance from degrading due to unbalanced loads.
  • Oracle Direct Connector for HDFS makes it possible to access to data on the Hadoop cluster in HDFS from Oracle using SQL. It provides a virtual table view of the HDFS files and the allows for parallel query access to data using the standard Oracle database external table mechanism. If you are using BDA and Exadata, the connectivity occurs using infiniband network fabric so the database access to HDFS, in the very scientific words of the development manager, “flies”. If you need to import the data in HDFS into Oracle, the Direct Connector does not require a file copy and without using Linux Fuse. Instead it uses the native Oracle Loader interface.
  • If you already use Oracle Data Integrator (or are familiar with this kind of tool and want to use ODI), then it can simplify the MapReduce process.As long as you can describe the transformation that you need to perform on the data, ODI can generate the MapReduce code for you and run that process. It can even invoke Oracle Loader for Hadoop at the end of the cycle.So if you are not an expert in Java, parallel algorithms and the Hadoop framework, there is still a way to use it all to organize your code.Note:ODI generates SQL code which is then passed into Hive (a component of many Hadoop distributions) which generates the actual Java MapReduce codeYou need Big Data Connectors, specifically the ODI Application Adaptor for Hadoop, to make all this work
  • Our view of the BI landscape is that there are fundamentally two dominant types of problems.On one hand there are questions where we can define up-front both the process and the data required to answer them. What are sales forecasts by region? What is my performance relative to expectation?On the other hand are questions where either the process or the data cannot be defined ahead of time; these questions are open ended by nature. What customers should I target? Why are my sales going down? It's also interesting to point out that these questions are far more transient than the other type, and this follows from their open ended nature. Each question leads to new questions. The interaction model for the former is more like “looking it up”; it’s a report or dashboard. On the other hand, when you don't know exactly what you need or how to ask for ii, the necessary interaction model is exploration and discovery. A dialog with the data.It also follows that, as a matter of practice, some data is modeled and other data is not. We take modeled to mean that there is a single, overarching semantic model. Of course, modeling costs time and money and so we generally only make the investment in cases where the expected return on that investment is large enough to justify the effort.The cost of storing un-modeled data has continued to drop but importantly, with the popularization of Hadoop, the promise of deriving value from un-modeled data is rising rapidly. The result is an explosion in the capture of un-modeled data.Through this view of the BI landscape we can see how Traditional Business Intelligence and Data Discovery fit in.Traditional Business Intelligence is purpose built and very strong for known questions and modeled data. Friction arises when organizations attempt to use these products for new and unpredictable questions, which require similarly new and unpredictable data models to meet the need.In the other space is the emerging market category of data discovery, where the goal is to provide everyday business users with fast answers to new questions to make better, more informed business decisions. Data discovery tools follow several key market trends:First, the growth in data volume, diversity, and complexity. Not much to say here that hasn't already been said. Organizations today are beginning to understand the value inherent in this information and are looking for tools that can unlock that value to give them competitive advantage. And more and more users need to access and understand this information.Second, the consumerization of business software. When IT is unable to deliver, business users are increasingly willing to go outside of IT in order to meet their own needs. Empowered with their choice of tools, and with expectations formed in the consumer world, expectations for amazing user experiences have never been higher.
  • How do we do it. Endeca Information Discovery provides a full featured platform for creating discovery applications that provide access to all kinds of informationDrilling into the architecture, we accomplish this with three tiers
  • Notes:This slide is a logical representation of the scope of a Big Data solution. It provides the basis for describing data flows in each stage of the Big Data process in the following slides.The scope of a Big Data solution includes taking actions and decisions on the results of analysis, hence integration with Applications.Real-time event detection can be part of a Big Data solution. This is an important point to draw out because IBM claims it’s Steams capability is a USP, see the book Understanding Big Data, Analytics for Enterprise Class Hadoop and data Streaming.

2012 10 bigdata_overview 2012 10 bigdata_overview Presentation Transcript

  • Big DataJean-Pierre Dijcks
  • Agenda• Big Data• Strategy• Technology• Use Cases
  • Big Data3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Big Data React to an Event Pro-Actively Change Outcomes “Technology presents the opportunity to transform business“* Mark Hurd, President, Oracle* Oracle Profit Magazine, Volume 17, Number 1
  • Big Data’s Key Ingredient “ Improvement merely lets you Big Data transforms hit the numbers. Creativity is our business 5% what transforms.“* Ron Johnson, CEO, JCPenney Big Data improves our business 20% What is Big Data? 75%* Fortune Magazine VOL. 165, NO. 4
  • Big Data Extends the Breadth and Speed of Data Video and ImagesBig Data:Decisions based Documentson all your data Social Data Machine-Generated Data Information Architectures Today: Transactions Decisions based on database data
  • Big Data Extends the Depth of Analytics Graph Analytics StatisticsQuery and Reporting Data Mining 2 miles Spatial Analytics Text Analytics
  • Big Data Defined Big Data: Techniques and Technologies that Enable Enterprises to Effectively and Economically Analyze All of their Data
  • Strategy
  • Strategic Transformations Reporting Analytics AutonomousRear-view Mirror Actions Transactional All Data Data
  • Oracle’s Big Data solution Endeca Information Discovery Oracle Big Data Oracle Appliance Exadata Oracle Oracle Exalytics Big Data Connectors InfiniBand InfiniBand Oracle CEP Real-Time Decisions Acquire Organize & Discover Analyze Decide11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Oracle Big Data Strategy BI Tools Semantic Text CEP Data & Advanced RTD Management Analytics Graph Spatial Data Discovery Tools Management Infrastructure Build Acquire Adopt Engineer
  • Technology
  • Big Data ApplianceHardware: • 288 CPU cores with 1152 GB RAM • 648 TB of raw disk storage • 40 Gb/s InfiniBandIntegrated Software: • Oracle Linux • Oracle Java VM • Cloudera Distribution of Apache Hadoop (CDH) • Cloudera Manager • Open-source distribution of R • NoSQL Database Community EditionAll integrated software (except NoSQL DB CE) is supported as part of Premier Support for Systems and Premier Support forOperating Systems
  • Oracle Big Data Appliance File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK Workflow Scheduling Metadata APACHE OOZIE APACHE OOZIE APACHE HIVE Languages / Compilers APACHE PIG, APACHE HIVE, APACHE MAHOUT Fast Data Read/Write Integration Access APACHE FLUME, APACHE APACHE HBASE SQOOP HDFS, MAPREDUCE Coordination APACHE ZOOKEEPER
  • Why Cloudera?• Includes Open Source Apache Hadoop – Fast evolution in critical features – Proven at very large scale• Managed Distribution – Components certified to work together in regular updates – Cloudera Manager provides Management GUI• Most popular distribution in the market
  • Oracle and Cloudera• All Cloudera software pre-installed and pre-configured on BDA – Engineered with Cloudera• All Cloudera assets included – Single Oracle Product SKU for HW & SW – Single Oracle Support SKU for HW & SW (life of the machine)• Oracle is the single point of contact for the solution
  • Price comparisonOracle Big Data Appliance “Build-Your-Own” – HP hardware and Cloudera Year 1 Year 2 Year 3 Total Year 1 Year 2 Year 3 Total Servers and BDA Cost $450,000 $428,220 switches Support $54,000 $54,000 $54,000 Support Cost $136,233 $72,000 $72,000 Cost On-site Installation & Installation $14,150 configuration not included Total $518,150 $54,000 $54,000 $626,150 Total $564,453 $72,000 $72,000 $708,453Full details at https://blogs.oracle.com/datawarehousing/entry/price_comparison_for_big_data
  • Oracle NoSQL DatabaseA distributed, scalable key-value database• Simple Data Model • Key-value pair with major+sub-key paradigm Application Application • Read/insert/update/delete operations NoSQLDB Driver NoSQLDB Driver• Scalability • Dynamic data partitioning and distribution • Optimized data access via intelligent driver• High availability • One or more replicas • Disaster recovery through location of replicas • Resilient to partition master failures • No single point of failure• Transparent load balancing Storage Nodes Storage Nodes • Reads from master or replicas Data Center A Data Center B • Driver is network topology & latency aware
  • Big Data ConnectorsOptimized integration of Hadoop with Oracle Databaseand Oracle Exadata• Oracle Loader for Hadoop• Oracle Direct Connector for Hadoop Distributed File System (HDFS)• Oracle Data Integrator Application Adapter for Hadoop• Oracle R Connector for Hadoop• Does not require Big Data Appliance – can be licensed for Hadoop running on non-Oracle hardware
  • Oracle Loader for HadoopUse The Cluster ORACLE LOADER FOR HADOOP MAP REDUCE MAP Last stage in MapReduce MAP SHUFFLE /SORT REDUCE workflow Partitioned and non- MAP REDUCE partitioned tables MAP REDUCE SHUFFLE MAP /SORT REDUCE Online and offline loads
  • Oracle Direct Connector for HDFSDirect Access from Oracle Database HDFS Oracle Database SQL Query SQL access to HDFS External Table External table view Data query or import DCH DCH HDFS Infini Band DCH Client
  • Oracle Data IntegratorSimplifying MapReduce Oracle Data Integrator Automatically generates MapReduce code Oracle Loader for Manages the process Hadoop Loads into Data Warehouse
  • What is Data Discovery? Simplified Quickly explore all relevant data Relationships  Advanced search  Structured undefined or unknown  Faceted navigation  Semi-structured No pre-defined model  Analytics  Unstructured required  Messy data Rapid, iterative change  Beyond the data warehouse
  • Business Intelligence and Data Discovery Complementary Solutions, Integrated Business Processes Known & Clearly Uncertain or Defined Questions Open-Ended Questions Who, What, When? Why, How, What Else? Un-modeled Data Insights yield Data Discovery mature modelsDiverse and Changing Models and KPIs Fast Answers to New Questions New questions Modeled Data Business Intelligence require new Proven Answers to Known Conforms to a Single Model Questions data, explorati on
  • Oracle Endeca Information DiscoveryA platform for data discovery applications across the enterprise Endeca Information Discovery (EID) helps organizations quickly explore all relevant data • Combine structured & unstructured data from disparate systems • Rapidly assemble easy to use analysis applications • Automatically organize information for search, discovery & analysis
  • Big Data: Why Deeper Analytics?Communications Enhanced churn prediction with social network analytics Consider each customer’s value as part of their social network Focus retention campaigns on high-value social networks Identify new prospective high-value customers Target promotions for upselling and cross-selling to key social network influencers Identify rotational churners and exclude from retention offers Insurance Automated deep analytics for fraud and abuse in insurance claims processing Enhance fraud analytics by considering text data (assessor’s reports, police reports, witness interviews) in addition to transaction data Investigate claims that have the highest expected risk (based on likelihood of fraud and claim size) Focus scarce investigative resources and create feedback loop for automated analysis Retail Identify and respond to shifts in behavior Combine past and most recent point-of-sale data with customer information Track and monitor shifts in individual customer behaviors and household purchases Anticipate new up-sell and cross-sell opportunities 27 | © 2012 Oracle Corporation
  • Deeper Analytics: Oracle Advanced Analytics • Oracle Advanced Analytics extends Oracle Database into a comprehensive analytical platform – Predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations • Scalable and parallel: analyze huge volumes of data • Tightly integrated with SQL: share results of analytics throughout enterprise • Built for data analysts28 | © 2012 Oracle Corporation
  • Oracle Advanced Analytics: Data Mining• 12 cutting-edge machine-learning algorithms – Parallel model creation – Data transformation and preparation for data mining – Scalable mode creation – Efficiently scoring for large volumes – Data Miner GUI to build and evaluate data mining models• Data Mining can provide valuable results: – Predict customer behavior (Classification) – Predict or estimate a value (Regression) – Segment a population (Clustering) – Identify factors more associated with a business problem (Attribute Importance) – Find profiles of targeted people or items (Decision Trees) – Determine important relationships and “market baskets” within the population (Associations) – Find fraudulent or “rare events” (Anomaly Detection) 29 | © 2012 Oracle Corporation
  • Oracle Advanced Analytics: Oracle R Enterprise • Oracle R Enterprise brings R’s statistical functionality closer to the Oracle Database 1. Eliminate R’s memory constraint by enabling R to work directly & transparently on database objects – Allows R to run on very large data sets 2. Architected for Enterprise production infrastructure – Automatically exploits database parallelism without require parallel R programming – Build and immediately deploy 3. Oracle R leverages the latest R algorithms and packages – R is an embedded component of the DBMS server30 | © 2012 Oracle Corporation
  • Use Cases
  • Big Data Architecture Pattern Analyze 2 miles Capture Text Analytics Statistics Data Mining Graph Analytics Spatial Analytics Integrate into Applications Operational Systems Real-time Event Detection Front End Back End Data Handlers Acquire Low value density data Organize Real-time & Batch Feeds Algorithms High value data Filter Index ETL Classify Correlate Store Low density High value Semantic HDFS value data NoSQL Relational data /Spatial32 | © 2012 Oracle Corporation
  • Big Data Examples Insurance Individualize auto-insurance policies based on newly captured vehicle telemetry data Insurer gains insight into customer’s driving habits delivering More accurate assessments of risks Individualized pricing based on actual individual customer driving habits Guide and motivate individual customers to improve their driving habits Travel Optimize buying experience through web log and social media data analysis Travel site gains insight into customer preferences and desires Up-selling products by correlating current sales with (subsequent) browsing behavior Increase browse-to-buy conversions via customized offers and packages Deliver personalized travel recommendations based on social media data Games Collect gaming data to optimize spend within and across games Games company gains insight into likes, dislikes and relationships of its users Enhance games to drive customer spend within games Recommend other content based on analysis of player connections and similar “likes” Create special offers or packages based on browsing and (non-)buying behavior33 | © 2012 Oracle Corporation
  • Big Data Use Case: Smart Mall Point of Sale Capture: Customer Profile: • Coupon used Jane Send Coupon: • 3 items bought (up 1)Customer enters Doe, 32, Married of item 20% • Increased spend (up $10)mall area based 2 kids (2&4 yrs) used in the whenon Cell Phone next 15 minutes 112 113 114 115 coupons Uses our 116 117 118 119 120location data 121 126 125 124 127 123 122 34 | © 2012 Oracle Corporation
  • Big Data Technology Pattern Identify User Collection & Deliver Decision Point Coupon Filter Oracle Decision RTD CEP Enrich Big Data Collection & Appliance Decision Points Models Scores Analyze Streaming Map Big Data Analyze Reduce Connectors Social Feeds35 | © 2012 Oracle Corporation