Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Aioug big data and hadoop

325 views

Published on

AIOUG Vizag Chapter June18th Big data & Hadoop Intro

Published in: Technology
  • Be the first to comment

Aioug big data and hadoop

  1. 1. © Copyright 2016. Apps Associates LLC. 1 Big Data Overview & Hadoop for DBA’s Satyendra Pasalapudi Associate Practice Director Apps Associates LLC
  2. 2. © Copyright 2016. Apps Associates LLC. 2 About Me Satyendra Kumar Pasalapudi Associate Practice Director – Infrastructure/Cloud Practice at Apps Associates Co-Founder & President of All India Oracle Users Group(AIOUG) @pasalapudi
  3. 3. © Copyright 2016. Apps Associates LLC. 3 www.ora-search.com
  4. 4. © Copyright 2016. Apps Associates LLC. 4 History of Data Management Systems Magnetic tape “flat” (sequential) files Pre-computer technologies: Printing press Dewey decimal system Punched cards Magnetic Disk IMS Relational Model defined Indexed-Sequential Access Mechanism (ISAM) Network Model IDMS ADABAS System R Oracle V2 Ingres dBase DB2 Informix Sybase SQL Server Access Postgres MySQL Cassandra Hadoop Vertica Riak HBase Dynamo MongoDB Redis VoltDB Hana Neo4J Aerospike Hierarchical model 1960-701940-50 1950-60 1970-80 1980-90 1990-2000 2000-2010
  5. 5. © Copyright 2016. Apps Associates LLC. 5 @dvantages of Cloud
  6. 6. © Copyright 2016. Apps Associates LLC. 6 Generational Change for Enterprise (IT)  Cloud supports mission critical workloads ─ 87% of Enterprises use Cloud for Mission Critical Applications  Cloud use in the enterprise continues to grow ─ Half of the Enterprises say they will use cloud for at least 75% of their workloads by 2018  No one cloud fits all ─ More than half (53 %) of enterprises use two(2) to four(4) cloud providers Source: Verizon 2016 State of the Market: Enterprise Cloud report
  7. 7. © Copyright 2016. Apps Associates LLC. 7 Cloud – Probable to Inevitable  GE undergoing most important transformation in 140 year history ─ 9000 Applications to AWS & to 4000 Applications ─ 300 ERPs (two years back) to more manageable ─ 34 Data Centers to 4 Data Centers  By 2020 - US$15b of Software Revenue  Changes ─ People - Reduce Outsourcing ─ Technology - Build Approach for things that matter ─ 20% of Applications in Cloud as of today ─ 70% of Applications by 2020 in Cloud Source: AWS 2015 Keynote – Oct 6 2015 OOW Keynote with Mark Hurd Oct 26 2015 ─ Service Management ─ Network Perimeter ─ Risk Based Security Controls ─ Self Service and Automation ─ Financial Transparency
  8. 8. © Copyright 2016. Apps Associates LLC. 8 What is Cloud
  9. 9. The Role of Data is Changing
  10. 10. © Copyright 2016. Apps Associates LLC. 10 Until now, Questions you ask drove Data model New model is collect as much data as possible – “Data-First Philosophy”
  11. 11. © Copyright 2016. Apps Associates LLC. 11 Data is the new raw material for any business on par with capital, people, labor Data is the new raw material for any business on par with capital, people, labor
  12. 12. © Copyright 2016. Apps Associates LLC. 12 Characteristics of Big Data
  13. 13. © Copyright 2016. Apps Associates LLC. 13 Cost effectively manage and analyze all available data in its native form unstructured, structured, streaming ERP CRM RFID Website Network Switches Social Media Billing Big data Challenge
  14. 14. © Copyright 2016. Apps Associates LLC. 14 Hybrid Cloud Framework HR FIN SCOM SALES PROCUREMENT PLANNING DW / BI
  15. 15. © Copyright 2016. Apps Associates LLC. 15 Big data Eco System
  16. 16. © Copyright 2016. Apps Associates LLC. 16 Not Easy to Get Analytic Value at Fast Enough Pace Tool Complexity • Early Hadoop tools only for experts • Existing BI tools not designed for Hadoop • Emerging solutions lack broad capabilities 80% effort typically spent on evaluating and preparing data Data Uncertainty • Not familiar and overwhelming • Potential value not obvious • Requires significant manipulation Overly dependent on scarce and highly skilled resources Source : Oracle
  17. 17. © Copyright 2016. Apps Associates LLC. 17 Informatica Study May 2013 Addressed by Oracle Big Data Discovery Key Challenges in Managing Big Data
  18. 18. © Copyright 2016. Apps Associates LLC. 18 Sample of Big Data Use Cases Today MEDIA/ ENTERTAINMENT Viewers / advertising effectiveness Cross Sell COMMUNICATIONS Location-based advertising EDUCATION & RESEARCH Experiment sensor analysis Retail / CPG Sentiment analysis Hot products OptimizedMarketing HEALTH CARE Patient sensors, monitoring, EHRs Quality of care LIFE SCIENCES Clinical trials Genomics HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis OIL & GAS Drilling exploration sensor analysis FINANCIAL SERVICES Risk & portfolio analysis New products AUTOMOTIVE Auto sensors reporting location, problems Games Adjust to player behavior In-GameAds LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment UTILITIES Smart Meter analysis for network capacity, ON-LINE SERVICES / SOCIAL MEDIA People & career matching Web-site optimization What is the main difference in this data? Volume, Velocity, Variety These Characteristics Challenge Your Existing Architecture
  19. 19. © Copyright 2016. Apps Associates LLC. 19 Big Data Verticals Media/A dvertising Targeted Advertisin g Image and Video Processin g Oil & Gas Seismic Analysis Retail Recomme nd Transactio ns Analysis Life Sciences Genome Analysis Financial Services Monte Carlo Simulatio ns Risk Analysis Security Anti-virus Fraud Detection Image Recogniti on Social Network/ Gaming User Demograp hics Usage analysis In-game metrics
  20. 20. © Copyright 2016. Apps Associates LLC. 20 Sample Enterprise Big Data Architecture Operational RDBMS (Oracle, SQL Server, …) In-memory Analytics (HANA, Exalytics …) In-memory processing (Spark) Hadoop Web DBMS (MySQL, Mongo, Cassandra) ERP & in- house CRM Analytic/BI software (SAS, Tableau Web Server Data Warehouse RDBMS (Oracle, Teradata …)
  21. 21. © Copyright 2016. Apps Associates LLC. 21 Enterprise Data Hub / Data Lake / Data Reservoir
  22. 22. We Need Tools Built Specifically for Big Data
  23. 23. © Copyright 2016. Apps Associates LLC. 23 Hadoop and it’s Eco System • Scale out Easily • Parallel Computing • Commodity Hardware • Solves some Problems • Complex to Run • Special Skills to Maintain Cassandra
  24. 24. © Copyright 2016. Apps Associates LLC. 24 ETL for Unstructured Data
  25. 25. © Copyright 2016. Apps Associates LLC. 25 ETL for Structured Data
  26. 26. © Copyright 2016. Apps Associates LLC. 26 Hadoop Design Principles • System shall manage and heal itself – Automatically and transparently route around failure – Speculatively execute redundant tasks if certain nodes are detected to be slow • Performance shall scale linearly – Proportional change in capacity with resource change • Compute should move to data – Lower latency, lower bandwidth • Simple core, modular and extensible
  27. 27. © Copyright 2016. Apps Associates LLC. 27 Hadoop History • Dec 2004 – Google GFS paper published • July 2005 – Nutch uses MapReduce • Feb 2006 – Starts as a Lucene subproject • Apr 2007 – Yahoo! on 1000-node cluster • Jan 2008 – An Apache Top Level Project • Jul 2008 – A 4000 node test cluster • May 2009 – Hadoop sorts Petabyte in 17 hours
  28. 28. Google File System (GFS) Map Reduce BigTable Google Applications Google Software Architecture (circa 2005)
  29. 29. Start ReduceMap Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Reduce
  30. 30. © Copyright 2016. Apps Associates LLC. 30 Hadoop Ecosystem HDFS (Hadoop Distributed File System) HBase (key-value store) MapReduce (Job Scheduling/Execution System) Data Access Sqoop Flume Client Access Hue Hive(Sql) Pig(Pl/Sql) ZooKeeper (Coordination) (Streaming/Pipes APIs) Chukwa(Monitoring) Data Mining Mahout OS – Redhat, Suse, Ubuntu,Windows Commodity Hardware Java Virtual Machine Networking Orchestration Oozie
  31. 31. © Copyright 2016. Apps Associates LLC. 31 Hadoop – Simplified View • MPP (Massively Parallel) hardware running database-like software • “Data” is stored in parts, across multiple worker nodes • “Work” operates in parallel, on the different parts of the table Controller Worker Nodes
  32. 32. © Copyright 2016. Apps Associates LLC. 32 HDFS Architecture
  33. 33. HDFS Architecture Namenode Breplication Rack1 Rack2 Client Blocks Datanodes Datanodes Client Write Read Metadata ops Metadata(Name, replicas..) (/home/foo/data,6. .. Block ops
  34. 34. © Copyright 2016. Apps Associates LLC. 34 Head Node Data 1 Data 2 Data 3 Data 4 MYFILE.TXT ..block1 -> block1 ..block2 -> block2 ..block3 -> block3 HDFS – Highly Available
  35. 35. © Copyright 2016. Apps Associates LLC. 35 Namenode and Datanodes  Master/slave architecture  HDFS cluster consists of a single Namenode, a master server that manages the file system namespace and regulates access to files by clients.  There are a number of DataNodes usually one per node in a cluster.  The DataNodes manage storage attached to the nodes that they run on.  HDFS exposes a file system namespace and allows user data to be stored in files.  A file is split into one or more blocks and set of blocks are stored in DataNodes.  DataNodes: serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode.
  36. 36. Hadoop 1 – Job & Task Trackers Master Node - The majority of hadoop deployments consist of sevaral master node instances. Having more than one master node helps eliminate the risk of single point of failure. NameNode - These processes are charged with storing a directory tree of all files in the Hadoop Distributed File SYstem (HDFS). They also keep track of where the file data is kept within in the cluster. Client Applications contact Name Nodes when they need to locate a file, or add, or copy or delete a file. DataNodes - The datanode stores data in the HDFS and is responsible for replicating data across clusters. Data Nodes interact with client applications when the NameNopde has supplied the Datanode's address. WorkerNode: Unlike a master node, whose numbers we can count on one hand, a representative Hadoop Deployment consists of dozens or hundreds of worker nodes, which provides enough processing power to analyze a few hundreds terabytes all the way upto one petabyte. Each worker node includes a DataNode as well as Task Tracker.
  37. 37. Map Reduce Job Tracker /MapReduce Workload Management Layer - This process is assigned to interact with client applications. It is responsible for distributing MapReduce tasks to particular nodes within in a cluster. This engine coordinates all aspects of hadoop such as scheduling and launching jobs. Task Tracker - This is a process in the cluster that is capable of receiving tasks( inlcuding Map, Reduce, and Shuffle) from a Job Tracker
  38. 38. © Copyright 2016. Apps Associates LLC. 38 Data Replication Similar to that of ASM  HDFS is designed to store very large files across machines in a large cluster.  Each file is a sequence of blocks.  All blocks in the file except the last are of the same size.  Blocks are replicated for fault tolerance.  Block size and replicas are configurable per file.  The Namenode receives a Heartbeat and a BlockReport from each DataNode in the cluster.  BlockReport contains all the blocks on a Datanode.
  39. 39. © Copyright 2016. Apps Associates LLC. 39 Replica Placement & Rack Aware  The placement of the replicas is critical to HDFS reliability and performance.  Optimizing replica placement distinguishes HDFS from other distributed file systems.  Rack-aware replica placement:  Goal: improve reliability, availability and network bandwidth utilization  Many racks, communication between racks are through switches.  Network bandwidth between machines on the same rack is greater than those in different racks.  Namenode determines the rack id for each DataNode.  Replicas are typically placed on unique racks  Simple but non-optimal  Writes are expensive  Replication factor is 3  Replicas are placed: one on a node in a local rack, one on a different node in the local rack and one on a node in a different rack.
  40. 40. © Copyright 2016. Apps Associates LLC. 40 Replica Selection • Replica selection for READ operation: HDFS tries to minimize the bandwidth consumption and latency. • If there is a replica on the Reader node then that is preferred. • HDFS cluster may span multiple data centers: replica in the local data center is preferred over the remote one.
  41. 41. © Copyright 2016. Apps Associates LLC. 41 Hadoop Components • Hadoop is bundled with two independent components – HDFS (Hadoop Distributed File System) • Designed for scaling in terms of storage and IO bandwidth – MR framework (MapReduce) • Designed for scaling in terms of performance
  42. 42. © Copyright 2016. Apps Associates LLC. 42 Understanding file structure 1 GB file File is split into blocks Each block is typically 64MB Each block is stored as two files – one holding data and second for metadata, checksum Bloc k
  43. 43. © Copyright 2016. Apps Associates LLC. 43 Hadoop Processes • Processes running on Hadoop – NameNode – DataNode – Secondary NameNode – Task Tracker – Job Tracker
  44. 44. © Copyright 2016. Apps Associates LLC. 44 NameNode • Single point of contact • HDFS master • Holds meta information – List of files and directories – Location of blocks • Single node per cluster – Cluster can have thousands of DataNodes and tens of thousands of HDFS client. NameNode
  45. 45. © Copyright 2016. Apps Associates LLC. 45 DataNode • Can execute multiple tasks concurrently • Holds actual data blocks, checksum and generation stamp • If block is half full, needs only half of the space of full block • At start-up, connects to NameNode and perform handshake • No binding to IP address or port, uses Storage ID • Sends heartbeat to NameNode DataNode Storage ID: XYZ001
  46. 46. © Copyright 2016. Apps Associates LLC. 46 Communication • Total Storage Capacity • Fraction of storage in use • No of data transfer currently in progress • Instructs DataNode • Replicate block to other node • Remove local block replica • Send immediate block report • Shut down the node Every 3 seconds. “I AM ALIVE” NameNod e DataNode Storage ID: XYZ001 DataNode Storage ID: XYZ002 DataNode Storage ID: XYZ003 Reply No heartbeat for 10 minutes Heartbeat
  47. 47. © Copyright 2016. Apps Associates LLC. 47
  48. 48. Coordination in a distributed system • Coordination: An act that multiple nodes must perform together. • Examples: – Group membership – Locking – Publisher/Subscriber – Leader Election – Synchronization • Getting node coordination correct is very hard!
  49. 49. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers. Introducing ZooKeeper - ZooKeeper Wiki ZooKeeper is much more than a distributed lock server!
  50. 50. What is ZooKeeper? • An open source, high-performance coordination service for distributed applications. • Exposes common services in simple interface: – naming – configuration management – locks & synchronization – group services … developers don't have to write them from scratch • Build your own on it for specific needs.
  51. 51. © Copyright 2016. Apps Associates LLC. 52 HDFS Distributions
  52. 52. © Copyright 2016. Apps Associates LLC. 53 Real Time BI • Speed, agility, and intelligence are competitive advantages that nearly all organizations seek. • Existing Traditional Reporting Systems provide information after 24 – 36 hours. • To support Operational Users and influence what should happen next, the data should be available in real time to know what is happening now.
  53. 53. © Copyright 2016. Apps Associates LLC. 54 Hadoop 2.0
  54. 54. 20092006 1 ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) MapReduce Largely Batch Processing Hadoop w/ MapReduce YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °N HDFS (Hadoop Distributed File System) Hadoop2 & YARN based Architecture Silo’d clusters Largely batch system Difficult to integrate MR-279: YARN Hadoop 2 & YARN Interactive Real-TimeBatch Enabled the Modern Data Architecture October 23, 2013
  55. 55. © Copyright 2015. Apps Associates LLC. 56 Hadoop 2.0 Multi Use Data Platform Batch, Interactive, Realtime, Online, Streaming, … HADOOP 2 Redundant, Reliable Storage (HDFS) Efficient Cluster Resource Management & Shared Services (YARN) Standard Query Processing Hive Batch MapReduce Online Data Processing Interactive Tez Real Time Stream Processing Others
  56. 56. © Copyright 2016. Apps Associates LLC. 57 Hadoop 2.0 with YARN
  57. 57. © Copyright 2016. Apps Associates LLC. 58 Resource Manager/Node Manager Components
  58. 58. © Copyright 2016. Apps Associates LLC. 59 Problems with this approach in Hadoop 1.0  It limits scalability: JobTracker runs on single machine doing several task like 1) Resource management 2) Job and task scheduling and 3) Monitoring Although there are so many machines (DataNode) available; they are not getting used. This limits scalability.  Availability Issue: In Hadoop 1.0, JobTracker is single Point of availability. This means if JobTracker fails, all jobs must restart.  Distinct map slots and reduce slots  Limitation in running non-MapReduce Application
  59. 59. © Copyright 2016. Apps Associates LLC. 60 Yarn Architecture  Rescource Manager: Arbitrates division of resources among all the applications in the system. The Resource Manager has a pluggable scheduler component, which is responsible for allocating resources to the various running applications  Node Manager: per-machine slave, runs on slave nodes, which is responsible for launching the applications’ containers, monitoring their resource usage (CPU, memory, disk, network),and reporting the same to the Resource Manager.  Application Master: Negotiate appropriate resource containers from the Scheduler, tracking their status and monitoring for progress  Container: Unit of allocation incorporating resource elements such as memory, cpu, disk, network etc, to execute a specific task of the application (similar to map/reduce slots in MRv1)
  60. 60. © Copyright 2016. Apps Associates LLC. 61 Yarn - Execution Sequence 1) A client program submits the application 2) ResourceManager allocates a specified container to start the ApplicationMaster 3) ApplicationMaster, on boot-up, registers with ResourceManager 4) ApplicationMaster negotiates with ResourceManager for appropriate resource containers 5) On successful container allocations, ApplicationMaster contacts NodeManager to launch the container 6) Application code is executed within the container, and then ApplicationMaster is responded with the execution status 7) During execution, the client communicates directly with ApplicationMaster or ResourceManager to get status, progress updates etc. 8) Once the application is complete, ApplicationMaster unregisters with ResourceManager and shuts down, allowing its own container process
  61. 61. © Copyright 2016. Apps Associates LLC. 62 Operational vs. Analytical Databases
  62. 62. © Copyright 2016. Apps Associates LLC. 63 A New Technology
  63. 63. No Means Yes!
  64. 64. © Copyright 2016. Apps Associates LLC. 65 Use Cases
  65. 65. © Copyright 2016. Apps Associates LLC. 66 Brewer's CAP Theorem
  66. 66. © Copyright 2016. Apps Associates LLC. 67 Brewer's CAP Theorem
  67. 67. © Copyright 2016. Apps Associates LLC. 68 NoSQL Technology Spectrum
  68. 68. Name Site Counter Dick Ebay 507,018 Dick Google 690,414 Jane Google 716,426 Dick Facebook 723,649 Jane Facebook 643,261 Jane ILoveLarry.com 856,767 Dick MadBillFans.com 675,230 NameId Name 1 Dick 2 Jane SiteId SiteName 1 Ebay 2 Google 3 Facebook 4 ILoveLarry.com 5 MadBillFans.com NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230 Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230 Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767 BigTable Data Model
  69. 69. Document databases • Structured documents – XML and JSON (JavaScript Object Notation) become more prevalent within applications • Web programmers start storing these in BLOBS in MySQL • Emergence of XML and JSON databases
  70. 70. Graph Database Neo4J Infinite Graph FlockDB Document JSON based MongoDB CouchDB RethinkDB XML based MarkLogic BerkeleyDB XML Key Value MemchacheD B Oracle NoSQL Dynamo Voldemort DynamoDB Riak Table Based BigTable Cassandra Hbase HyperTable Accumulo
  71. 71. © Copyright 2016. Apps Associates LLC. 72 Run the Business  Scale-out and scale-up  Collect any data  SQL  Transactional and analytic applications for the enterprise  Secure and highly available RelationalHadoop Change the Business  Scale-out, low cost store  Collect any data  Map-reduce, SQL  Analytic applications NoSQL Scale the Business  Scale-out, low cost store  Collect key-value data  Find data by key  Web applications Multiple Data Stores
  72. 72. © Copyright 2016. Apps Associates LLC. 73 Data Analytics Challenge Separate silos of information to analyze
  73. 73. © Copyright 2016. Apps Associates LLC. 74 Data Analytics Challenge Separate data access interfaces
  74. 74. © Copyright 2016. Apps Associates LLC. 75 SQL on Hadoop is Obvious Stinger
  75. 75. © Copyright 2016. Apps Associates LLC. 76 Data Analytics Challenge No comprehensive SQL interface across Oracle, Hadoop and NoSQL
  76. 76. © Copyright 2016. Apps Associates LLC. 77 Oracle Big Data Management System Rich, comprehensive SQL access to all enterprise data NoSQL
  77. 77. © Copyright 2016. Apps Associates LLC. 78 What Does Unified Query Mean for You? After Data Science ??? Anyone Before PhD
  78. 78. © Copyright 2016. Apps Associates LLC. 79 What Does Unified Query Mean for You? After Application Development Before
  79. 79. © Copyright 2016. Apps Associates LLC. 80 Storage Layer A New Hadoop Processing Engine Filesystem (HDFS) NoSQL Databases (Oracle NoSQL DB, Hbase) Resource Management (YARN) Processing Layer MapReduce and Hive Spark Impala Search Big Data SQL
  80. 80. © Copyright 2016. Apps Associates LLC. 81 Big Data SQL SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id; Relevant SQL runs on BDA nodes 10’s of Gigabytes of Data Only columns and rows needed to answer query are returned Hadoop Cluster B B B Big Data SQL Oracle Database CUSTOMERSWEB_LOGS
  81. 81. © Copyright 2016. Apps Associates LLC. 82 Big Data SQL SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id; Relevant SQL runs on BDA nodes 10’s of Gigabytes of Data Only columns and rows needed to answer query are returned Hadoop Cluster B B B Big Data SQL Oracle Database CUSTOMERSWEB_LOGS SQL Push Down in Big Data SQL • Hadoop Scans on Unstructured Data • WHERE Clause Evaluation • Column Projection • Bloom Filters for Better Join Performance • JSON Parsing, Data Mining Model Evaluation
  82. 82. © Copyright 2016. Apps Associates LLC. 83 Query All Data without Application Change or Data Conversion Oracle Big Data SQL
  83. 83. INGEST PROCESS VISUALIZE ANALYZE STORE High Level Architecture
  84. 84. © Copyright 2016. Apps Associates LLC. 85 Fast Pace Innovation Dec 18th 2015 http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
  85. 85. © Copyright 2016. Apps Associates LLC. 86 BDD Value Proposition Note: company logos and images are for illustration purposes only. Not a real use case for the company.
  86. 86. © Copyright 2016. Apps Associates LLC. 87 Oracle BDD - Technical Innovation on Hadoop Oracle Big Data Discovery Workloads Hadoop Cluster (BDA or Commodity Hardware) BDD node data node data node data node data node name node Data Processing, Workflow & Monitoring • Profiling: catalog entry creation, data type & language detection,schema configuration • Sampling: dgraph (index) file creation • Transforms: >100 functions • Enrichments: location (geo), text (cleanup, sentiment,entity, key-phrase, whitelisttagging) Self-Service Provisioning & Data Transfer • Personal Data: Upload CSV and XLS to HDFS In-Memory Discovery Indexes • DGraph: Search, Guided Navigation,Analytics Studio • Web UI: Find, Explore, Transform, Discover, Share Hadoop 2.x Filesystem (HDFS) Workload Mgmt (YARN) Metadata (HCatalog) Other Hadoop Workloads MapReduce Spark Hive Pig Oracle Big Data SQL (BDA only)
  87. 87. © Copyright 2016. Apps Associates LLC. 88 Sample Enterprise Big Data Architecture Operational RDBMS (Oracle, SQL Server, …) In-memory Analytics (HANA, Exalytics …) In-memory processing (Spark) Hadoop Web DBMS (MySQL, Mongo, Cassandra) ERP & in- house CRM Analytic/BI software (SAS, Tableau Web Server Data Warehouse RDBMS (Oracle, Teradata …)
  88. 88. © Copyright 2016. Apps Associates LLC. 89 Cloud Consultant Core Skills 50% Automation 10% Cloud Knowledge 20% Tools & Integration 20 % = + + + How to transition into a Cloud Consultant
  89. 89. © Copyright 2016. Apps Associates LLC. 90
  90. 90. Thank You!Satyendra.pasalapudi@appsassociates.com @pasalapudi https://community.oracle.com/groups/aioug-social-group
  91. 91. © Copyright 2016. Apps Associates LLC. 92 www.ora-search.com

×