SlideShare a Scribd company logo
1 of 62
Download to read offline
© Copyright 2013. Apps Associates LLC. 
1 
DBA to Data Scientist with Oracle Big Data 
August 23rd , 2014
© Copyright 2013. Apps Associates LLC. 
2 
Satyendra Kumar Pasalapudi 
Associate Practice Director – IMS @ Apps Associates 
Co Founder & Vice President of AIOUG 
@pasalapudi
© Copyright 2014. Apps Associates LLC. 
4 
Agenda 
•What is Big Data 
•Big Data Growth 
•4 Phases of Big Data 
•NoSQL Databases 
•Hadoop Basics 
•Big Data Appliance 
•Skills Required for DBA Scientist
© Copyright 2014. Apps Associates LLC. 
5 
Big Data Growth
© Copyright 2014. Apps Associates LLC. 
6 
Cost effectively manage and analyze all available data in its native form 
unstructured, structured, streaming 
ERP 
CRM 
RFID 
Website 
Network Switches 
Social Media 
Billing 
Big data Challenge
90% 
Of the world’s data has been created in the last two years 
Source: IBM
© Copyright 2014. Apps Associates LLC. 
8 
3 Macro Trends Driving Disruption
© Copyright 2014. Apps Associates LLC. 
9 
Gen X Stats
© Copyright 2014. Apps Associates LLC. 
10 
Big Data – High Data Varity & Velocity
© Copyright 2014. Apps Associates LLC. 
11 
How Did Big Data Evolve? 
•More people interacting with data 
• Smartphones 
• Internet 
•Greater volumes of data being generated (machine-to-machine generation) 
•Sensors 
•General Packet Radio Services (GPRS)
© Copyright 2014. Apps Associates LLC. 
12 
What Is Big Data? 
Big data is defined as voluminous unstructured data from many different sources, such as: 
•Social networks 
•Banking and financial services 
•E-commerce services 
•Web-centric services 
•Internet search indexes 
•Scientific searches 
•Document searches 
•Medical records 
•Weblogs
© Copyright 2014. Apps Associates LLC. 
13 
Big Data 
•Extremely large datasets that are hard to deal with using Relational Databases 
–Storage/Cost 
–Search/Performance 
–Analytics and Visualization 
•Need for parallel processing on hundreds of machines 
–ETL cannot complete within a reasonable time 
–Beyond 24hrs – never catch up
© Copyright 2014. Apps Associates LLC. 
14 
Characteristics of Big Data 
Volume 
Variety 
Velocity 
Value 
Social Networks 
RSS Feeds 
Micro Blogs
© Copyright 2014. Apps Associates LLC. 
15 
Characteristics of Big Data
© Copyright 2014. Apps Associates LLC. 
16 
Big Data Eco System
Financial services 
Discover fraud patterns based on multi-years worth of credit card transactions and in a time scale that does not allow new patterns to accumulate significant losses. Measure transaction processing latency across many business processes by processing and correlating system log data. 
Internet retailer 
Discover fraud patterns in Internet retailing by mining Web click logs. Assess risk by product type and session/Internet Protocol (IP) address activity. 
Retailers 
Perform sentiment analysis by analyzing social media data. 
Drug discovery 
Perform large-scale text analytics on publicly available information sources. 
Healthcare 
Analyze medical insurance claims data for financial analysis, fraud detection, and preferred patient treatment plans. Analyze patient electronic health records for evaluation of patient care regimes and drug safety. 
Mobile telecom 
Discover mobile phone churn patterns based on analysis of CDRs and correlation with activity in subscribers’ networks of callers. 
IT technical support 
Perform large-scale text analytics on help desk support data and publicly available support forums to correlate system failures with known problems. 
Scientific research 
Analyze scientific data to extract features (e.g., identify celestial objects from telescope imagery). 
Internet travel 
Improve product ranking (e.g., of hotels) by analysis of multi-years worth of Web click logs. 
Big Data /Hadoop Use Cases
© Copyright 2014. Apps Associates LLC. 
18 
The Four Phases of Data Conversion 
Acquire 
Organize 
Analyze 
Decide 
1 
4 
3 
2
© Copyright 2014. Apps Associates LLC. 
19 
Database Market Disruption 
$30B Database Market Being Disrupted
© Copyright 2014. Apps Associates LLC. 
20 
Operational vs. Analytical Databases
© Copyright 2014. Apps Associates LLC. 
21 
Growth is the New Reality 
Instagram gained nearly 1 million users overnight when they expanded to Android
© Copyright 2014. Apps Associates LLC. 
22 
Draw Something Viral Growth
© Copyright 2014. Apps Associates LLC. 
23 
How Do You Take This Growth?
© Copyright 2014. Apps Associates LLC. 
24 
Scaling Out RDBMS
© Copyright 2014. Apps Associates LLC. 
25 
RDBMS are Not Enough?
© Copyright 2014. Apps Associates LLC. 
26 
NoSQL Technology Scales Out
© Copyright 2014. Apps Associates LLC. 
27 
A New Technology
© Copyright 2014. Apps Associates LLC. 
28 
Use Cases
© Copyright 2014. Apps Associates LLC. 
29 
Relational vs. Documental Data Model 
JSON or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, it is language- independent, with parsers available for many languages
© Copyright 2014. Apps Associates LLC. 
30 
Brewer's CAP Theorem
© Copyright 2014. Apps Associates LLC. 
31 
Brewer's CAP Theorem
© Copyright 2014. Apps Associates LLC. 
32 
NoSQL Technology Spectrum
© Copyright 2014. Apps Associates LLC. 
33 
Operational vs. Analytical Databases
© Copyright 2014. Apps Associates LLC. 
34 
Hadoop Design Principles 
•System shall manage and heal itself 
–Automatically and transparently route around failure 
–Speculatively execute redundant tasks if certain nodes are detected to be slow 
•Performance shall scale linearly 
–Proportional change in capacity with resource change 
•Compute should move to data 
–Lower latency, lower bandwidth 
•Simple core, modular and extensible
© Copyright 2014. Apps Associates LLC. 
35 
Hadoop Intro 
•At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose. 
•GFS is not open source. 
•Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS). 
•The software framework that supports HDFS, MapReduce and other related entities is called the project Hadoop or simply Hadoop. 
•Projects Nutch and Lucene were started with “search” as the application in mind;
© Copyright 2014. Apps Associates LLC. 
36 
Hadoop Intro 
•Hadoop Distributed file system and mapreduce were found to have applications beyond search. 
•HDFS and MapReduce were moved out of Nutch as a sub-project of Lucene and later promoted into a apache project Hadoop
© Copyright 2014. Apps Associates LLC. 
37 
Hadoop History 
•Dec 2004 – Google GFS paper published 
•July 2005 – Nutch uses MapReduce 
•Feb 2006 – Starts as a Lucene subproject 
•Apr 2007 – Yahoo! on 1000-node cluster 
•Jan 2008 – An Apache Top Level Project 
•Jul 2008 – A 4000 node test cluster 
•May 2009 – Hadoop sorts Petabyte in 17 hours
© Copyright 2014. Apps Associates LLC. 
38 
What & Where is Hadoop Used For? 
Search 
•Yahoo, Amazon, Zvents 
Log Processing 
•Facebook, Yahoo, ContextWeb. Joost, Last.fm 
Recommendation Systems 
•Facebook 
Data Warehouse 
•Facebook, AOL 
Video and Image Analysis 
•New York Times, Eyealike
© Copyright 2014. Apps Associates LLC. 
39 
What & Where is Hadoop Used For? 
Amazon.com, Ancestry.com, Akamai, American Airlines, AOL, Apple, AVG , eBay, Electronic Arts, Hortonworks, Federal Reserve Board of Governors, Foursquare, Fox Interactive Media, Google, HewlettPackard, IBM, ImageShack, ISI, InMobi, Intuit, Joost, Last.fm, LinkedIn, Microsoft, NetApp, Netflix, Ooyala, Riot Games, Spotify, Qualtrics, The New York Times, SAP AG, SAS Institute, StumbleUpon, Twitter, Yodlee
© Copyright 2014. Apps Associates LLC. 
40 
Hadoop Ecosystem 
HDFS (Hadoop Distributed File System) 
HBase (key-value store) 
MapReduce (Job Scheduling/Execution System) 
Data Access 
Sqoop 
Flume 
Client Access 
Hue 
Hive(Sql) 
Pig(Pl/Sql) 
ZooKeeper (Coordination) 
(Streaming/Pipes APIs) 
Chukwa (Monitoring) 
Data Mining 
Mahout 
OS – Redhat, Suse, Ubuntu,Windows 
Commodity Hardware 
Java Virtual Machine 
Networking 
Orchestration 
Oozie
© Copyright 2014. Apps Associates LLC. 
43 
PIG 
•Compiled into a series of MapReduce jobs 
–Easier to program 
–Optimization opportunities 
•grunt> A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float); 
•grunt> B = FOREACH A GENERATE name;
© Copyright 2014. Apps Associates LLC. 
44 
Hive 
Managing and querying structured data 
•MapReduce for execution 
•SQL like syntax 
•Extensible with types, functions, scripts 
•Metadata stored in a RDBMS (MySQL) 
•Joins, Group By, Nesting 
•Optimizer for number of MapReduce required hive> SELECT a.foo FROM invites a WHERE a.ds='<DATE>‘;
© Copyright 2014. Apps Associates LLC. 
45 
Sqoop 
•It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import 
•Imports can also be used to populate tables in Hive or HBase 
•Exports can be used to put data from Hadoop into a relational database
© Copyright 2014. Apps Associates LLC. 
47 
HDFS Architecture
© Copyright 2014. Apps Associates LLC. 
50 
Architecture Overview
© Copyright 2014. Apps Associates LLC. 
51 
HDFS Distributions
© Copyright 2014. Apps Associates LLC. 
52 
Hadoop 2.0
© Copyright 2014. Apps Associates LLC. 
53 
Oracle Big Data Appliance: Introduction 
Oracle Big Data Appliance: Introduction 
•Oracle Big Data Appliance is an engineered system containing both hardware and software components. Oracle Big Data Appliance delivers: 
‒A complete and optimized solution for big data 
‒Single-vendor support for both hardware and software 
‒An easy-to-deploy solution 
‒Tight integration with Oracle Database
© Copyright 2014. Apps Associates LLC. 
54 
Oracle Big Data Appliance: Where It Stands? 
Big Data Appliance 
Data Variety 
Unstructured 
Schema-less 
Schema 
Information 
Acquire 
Organize 
Analyze
© Copyright 2014. Apps Associates LLC. 
55 
Oracle Big Data: Software Components 
Oracle Big Data Appliance 
Oracle NoSQL Database 
Oracle Big Data Connectors 
Open Source R Distribution 
Cloudera Manager & Cloudera’s Distribution Including Apache Hadoop 
Oracle Linux 5.6 and Java Hotspot VM
© Copyright 2014. Apps Associates LLC. 
56 
Oracle Big Data with Oracle Exadata
© Copyright 2014. Apps Associates LLC. 
57 
Oracle Big Data Solution 
Oracle BI Foundation Suite 
Oracle Real-Time Decisions 
Endeca Information Discovery 
Decide 
Oracle Advanced Analytics 
Oracle 
Database 
Oracle Spatial & Graph 
Acquire – Organize – Analyze 
Oracle Big Data Connectors 
Oracle Data Integrator 
Stream 
Oracle Event Processing 
Apache Flume 
Oracle 
GoldenGate 
Oracle NoSQL Database 
Cloudera Hadoop 
Oracle R Distribution
© Copyright 2014. Apps Associates LLC. 
58 
Mapping the Phases with Software 
Acquire Phase 
–Hadoop Distributed File System 
–Oracle NoSQL Database 
Organize Phase 
–Hadoop Software Framework 
–Oracle Data Integrator 
Analyze Phase 
–R Statistical Programming Environment 
–Oracle Data Warehouse
© Copyright 2014. Apps Associates LLC. 
61 
Analyze Phase 
Analyze 
Database 
+ 
Oracle R Enterprise 
Statistical Functions 
Data Mining Algorithms 
Query Capabilities
© Copyright 2014. Apps Associates LLC. 
63 
Oracle Big Data: Software Components 
Oracle Big Data Appliance 
Oracle NoSQL Database 
Oracle Big Data Connectors 
Open Source R Distribution 
Cloudera Manager & Cloudera’s Distribution Including Apache Hadoop 
Oracle Linux 5.6 and Java Hotspot VM
© Copyright 2014. Apps Associates LLC. 
64 
Oracle Big Data Sql 
•New Technology Bridges Oracle, Hadoop, and NoSQL Data Stores 
Using Oracle Big Data SQL, organizations can: 
•Combine data from Oracle Database, Hadoop and NoSQL in a single SQL query 
•Query and analyze data in Hadoop and NoSQL 
•Integrate big data analysis into existing applications and architectures 
•Extend security and access policies from Oracle Database to data in Hadoop and NoSQL 
•Maximize query performance on all data using Smart Scan 
•One Fast SQL Query for All Your Data
© Copyright 2014. Apps Associates LLC. 
65 
Data Science 
Source :http://wikibon.org/blog/role-of-the-data-scientist/
© Copyright 2014. Apps Associates LLC. 
66 
Data Scientist 
Source :http://wikibon.org/blog/role-of-the-data-scientist/
© Copyright 2014. Apps Associates LLC. 
67 
DBA to Data Scientist 
Hadoop HDFS Map Reduce NoSQL Database Hive Pig OR All the above with Big Data Appliance
© Copyright 2014. Apps Associates LLC. 
68 
Big Data Eco System
© Copyright 2013. Apps Associates LLC. 
69 
On Premise Hadoop as RDBMS “active archive” 
SALES 2013 
Oracle Database 
Structured Data Analytics from Apps 
SALES 2012 
SALES 2011 
SALES 2010 
SALES 2011 
SALES 2010 
“Hive” provides an SQL- like query layer over Hadoop and MapReduce 
Unstructured + Structured Data Analytics from Apps 
Hadoop for Structured Archive and Unstructured data
© Copyright 2013. Apps Associates LLC. 
70 
AWS EMR as RDBMS “active archive” 
SALES 2013 
Oracle Database 
Structured Data Analytics from Apps 
SALES 2012 
SALES 2011 
SALES 2010 
SALES 2011 
SALES 2010 
“Hive” provides an SQL- like query layer over Amazon EMR 
Unstructured + Structured Data Analytics from Apps 
AWS EMR for Structured Archive and Unstructured data 
Amazon Elastic MapReduce (Amazon EMR)
Thank You! 
@pasalapudi

More Related Content

What's hot

Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsJongwook Woo
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopJongwook Woo
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project GuidanceVarad Meru
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?J Langley
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Jongwook Woo
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 

What's hot (20)

Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on Hadoop
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 

Similar to Dba to data scientist -Satyendra

Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormRevolution Analytics
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack EuropeHortonworks
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...Platfora
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopMark Ginnebaugh
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 

Similar to Dba to data scientist -Satyendra (20)

Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
Aioug big data and hadoop
Aioug  big data and hadoopAioug  big data and hadoop
Aioug big data and hadoop
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
 
hadoop exp
hadoop exphadoop exp
hadoop exp
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Big Data
Big DataBig Data
Big Data
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 

More from pasalapudi123

Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebspasalapudi123
 
Ebs12.2 online patching(aioug_aug2015)
Ebs12.2 online patching(aioug_aug2015)Ebs12.2 online patching(aioug_aug2015)
Ebs12.2 online patching(aioug_aug2015)pasalapudi123
 
Ebs upgrade-to-12.2 technical-upgrade_best_practices(aioug-aug2015)
Ebs upgrade-to-12.2 technical-upgrade_best_practices(aioug-aug2015)Ebs upgrade-to-12.2 technical-upgrade_best_practices(aioug-aug2015)
Ebs upgrade-to-12.2 technical-upgrade_best_practices(aioug-aug2015)pasalapudi123
 
Getting optimal performance from oracle e business suite(aioug aug2015)
Getting optimal performance from oracle e business suite(aioug aug2015)Getting optimal performance from oracle e business suite(aioug aug2015)
Getting optimal performance from oracle e business suite(aioug aug2015)pasalapudi123
 
Oracle12c flex asm_flexcluster - Y V RAVI KUMAR
Oracle12c flex asm_flexcluster - Y V RAVI KUMAROracle12c flex asm_flexcluster - Y V RAVI KUMAR
Oracle12c flex asm_flexcluster - Y V RAVI KUMARpasalapudi123
 
Oracle12c data guard farsync and whats new - Nassyam Basha
Oracle12c data guard farsync and whats new - Nassyam BashaOracle12c data guard farsync and whats new - Nassyam Basha
Oracle12c data guard farsync and whats new - Nassyam Bashapasalapudi123
 
Oracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra PasalapudiOracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra Pasalapudipasalapudi123
 
12c In Memory Management - Saurabh Gupta
12c In Memory Management - Saurabh Gupta 12c In Memory Management - Saurabh Gupta
12c In Memory Management - Saurabh Gupta pasalapudi123
 
Oracle 12c Application development
Oracle 12c Application developmentOracle 12c Application development
Oracle 12c Application developmentpasalapudi123
 

More from pasalapudi123 (10)

Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
Dmz aa aioug
Dmz aa aiougDmz aa aioug
Dmz aa aioug
 
Ebs12.2 online patching(aioug_aug2015)
Ebs12.2 online patching(aioug_aug2015)Ebs12.2 online patching(aioug_aug2015)
Ebs12.2 online patching(aioug_aug2015)
 
Ebs upgrade-to-12.2 technical-upgrade_best_practices(aioug-aug2015)
Ebs upgrade-to-12.2 technical-upgrade_best_practices(aioug-aug2015)Ebs upgrade-to-12.2 technical-upgrade_best_practices(aioug-aug2015)
Ebs upgrade-to-12.2 technical-upgrade_best_practices(aioug-aug2015)
 
Getting optimal performance from oracle e business suite(aioug aug2015)
Getting optimal performance from oracle e business suite(aioug aug2015)Getting optimal performance from oracle e business suite(aioug aug2015)
Getting optimal performance from oracle e business suite(aioug aug2015)
 
Oracle12c flex asm_flexcluster - Y V RAVI KUMAR
Oracle12c flex asm_flexcluster - Y V RAVI KUMAROracle12c flex asm_flexcluster - Y V RAVI KUMAR
Oracle12c flex asm_flexcluster - Y V RAVI KUMAR
 
Oracle12c data guard farsync and whats new - Nassyam Basha
Oracle12c data guard farsync and whats new - Nassyam BashaOracle12c data guard farsync and whats new - Nassyam Basha
Oracle12c data guard farsync and whats new - Nassyam Basha
 
Oracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra PasalapudiOracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra Pasalapudi
 
12c In Memory Management - Saurabh Gupta
12c In Memory Management - Saurabh Gupta 12c In Memory Management - Saurabh Gupta
12c In Memory Management - Saurabh Gupta
 
Oracle 12c Application development
Oracle 12c Application developmentOracle 12c Application development
Oracle 12c Application development
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Dba to data scientist -Satyendra

  • 1. © Copyright 2013. Apps Associates LLC. 1 DBA to Data Scientist with Oracle Big Data August 23rd , 2014
  • 2. © Copyright 2013. Apps Associates LLC. 2 Satyendra Kumar Pasalapudi Associate Practice Director – IMS @ Apps Associates Co Founder & Vice President of AIOUG @pasalapudi
  • 3. © Copyright 2014. Apps Associates LLC. 4 Agenda •What is Big Data •Big Data Growth •4 Phases of Big Data •NoSQL Databases •Hadoop Basics •Big Data Appliance •Skills Required for DBA Scientist
  • 4. © Copyright 2014. Apps Associates LLC. 5 Big Data Growth
  • 5. © Copyright 2014. Apps Associates LLC. 6 Cost effectively manage and analyze all available data in its native form unstructured, structured, streaming ERP CRM RFID Website Network Switches Social Media Billing Big data Challenge
  • 6. 90% Of the world’s data has been created in the last two years Source: IBM
  • 7. © Copyright 2014. Apps Associates LLC. 8 3 Macro Trends Driving Disruption
  • 8. © Copyright 2014. Apps Associates LLC. 9 Gen X Stats
  • 9. © Copyright 2014. Apps Associates LLC. 10 Big Data – High Data Varity & Velocity
  • 10. © Copyright 2014. Apps Associates LLC. 11 How Did Big Data Evolve? •More people interacting with data • Smartphones • Internet •Greater volumes of data being generated (machine-to-machine generation) •Sensors •General Packet Radio Services (GPRS)
  • 11. © Copyright 2014. Apps Associates LLC. 12 What Is Big Data? Big data is defined as voluminous unstructured data from many different sources, such as: •Social networks •Banking and financial services •E-commerce services •Web-centric services •Internet search indexes •Scientific searches •Document searches •Medical records •Weblogs
  • 12. © Copyright 2014. Apps Associates LLC. 13 Big Data •Extremely large datasets that are hard to deal with using Relational Databases –Storage/Cost –Search/Performance –Analytics and Visualization •Need for parallel processing on hundreds of machines –ETL cannot complete within a reasonable time –Beyond 24hrs – never catch up
  • 13. © Copyright 2014. Apps Associates LLC. 14 Characteristics of Big Data Volume Variety Velocity Value Social Networks RSS Feeds Micro Blogs
  • 14. © Copyright 2014. Apps Associates LLC. 15 Characteristics of Big Data
  • 15. © Copyright 2014. Apps Associates LLC. 16 Big Data Eco System
  • 16. Financial services Discover fraud patterns based on multi-years worth of credit card transactions and in a time scale that does not allow new patterns to accumulate significant losses. Measure transaction processing latency across many business processes by processing and correlating system log data. Internet retailer Discover fraud patterns in Internet retailing by mining Web click logs. Assess risk by product type and session/Internet Protocol (IP) address activity. Retailers Perform sentiment analysis by analyzing social media data. Drug discovery Perform large-scale text analytics on publicly available information sources. Healthcare Analyze medical insurance claims data for financial analysis, fraud detection, and preferred patient treatment plans. Analyze patient electronic health records for evaluation of patient care regimes and drug safety. Mobile telecom Discover mobile phone churn patterns based on analysis of CDRs and correlation with activity in subscribers’ networks of callers. IT technical support Perform large-scale text analytics on help desk support data and publicly available support forums to correlate system failures with known problems. Scientific research Analyze scientific data to extract features (e.g., identify celestial objects from telescope imagery). Internet travel Improve product ranking (e.g., of hotels) by analysis of multi-years worth of Web click logs. Big Data /Hadoop Use Cases
  • 17. © Copyright 2014. Apps Associates LLC. 18 The Four Phases of Data Conversion Acquire Organize Analyze Decide 1 4 3 2
  • 18. © Copyright 2014. Apps Associates LLC. 19 Database Market Disruption $30B Database Market Being Disrupted
  • 19. © Copyright 2014. Apps Associates LLC. 20 Operational vs. Analytical Databases
  • 20. © Copyright 2014. Apps Associates LLC. 21 Growth is the New Reality Instagram gained nearly 1 million users overnight when they expanded to Android
  • 21. © Copyright 2014. Apps Associates LLC. 22 Draw Something Viral Growth
  • 22. © Copyright 2014. Apps Associates LLC. 23 How Do You Take This Growth?
  • 23. © Copyright 2014. Apps Associates LLC. 24 Scaling Out RDBMS
  • 24. © Copyright 2014. Apps Associates LLC. 25 RDBMS are Not Enough?
  • 25. © Copyright 2014. Apps Associates LLC. 26 NoSQL Technology Scales Out
  • 26. © Copyright 2014. Apps Associates LLC. 27 A New Technology
  • 27. © Copyright 2014. Apps Associates LLC. 28 Use Cases
  • 28. © Copyright 2014. Apps Associates LLC. 29 Relational vs. Documental Data Model JSON or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, it is language- independent, with parsers available for many languages
  • 29. © Copyright 2014. Apps Associates LLC. 30 Brewer's CAP Theorem
  • 30. © Copyright 2014. Apps Associates LLC. 31 Brewer's CAP Theorem
  • 31. © Copyright 2014. Apps Associates LLC. 32 NoSQL Technology Spectrum
  • 32. © Copyright 2014. Apps Associates LLC. 33 Operational vs. Analytical Databases
  • 33. © Copyright 2014. Apps Associates LLC. 34 Hadoop Design Principles •System shall manage and heal itself –Automatically and transparently route around failure –Speculatively execute redundant tasks if certain nodes are detected to be slow •Performance shall scale linearly –Proportional change in capacity with resource change •Compute should move to data –Lower latency, lower bandwidth •Simple core, modular and extensible
  • 34. © Copyright 2014. Apps Associates LLC. 35 Hadoop Intro •At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose. •GFS is not open source. •Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS). •The software framework that supports HDFS, MapReduce and other related entities is called the project Hadoop or simply Hadoop. •Projects Nutch and Lucene were started with “search” as the application in mind;
  • 35. © Copyright 2014. Apps Associates LLC. 36 Hadoop Intro •Hadoop Distributed file system and mapreduce were found to have applications beyond search. •HDFS and MapReduce were moved out of Nutch as a sub-project of Lucene and later promoted into a apache project Hadoop
  • 36. © Copyright 2014. Apps Associates LLC. 37 Hadoop History •Dec 2004 – Google GFS paper published •July 2005 – Nutch uses MapReduce •Feb 2006 – Starts as a Lucene subproject •Apr 2007 – Yahoo! on 1000-node cluster •Jan 2008 – An Apache Top Level Project •Jul 2008 – A 4000 node test cluster •May 2009 – Hadoop sorts Petabyte in 17 hours
  • 37. © Copyright 2014. Apps Associates LLC. 38 What & Where is Hadoop Used For? Search •Yahoo, Amazon, Zvents Log Processing •Facebook, Yahoo, ContextWeb. Joost, Last.fm Recommendation Systems •Facebook Data Warehouse •Facebook, AOL Video and Image Analysis •New York Times, Eyealike
  • 38. © Copyright 2014. Apps Associates LLC. 39 What & Where is Hadoop Used For? Amazon.com, Ancestry.com, Akamai, American Airlines, AOL, Apple, AVG , eBay, Electronic Arts, Hortonworks, Federal Reserve Board of Governors, Foursquare, Fox Interactive Media, Google, HewlettPackard, IBM, ImageShack, ISI, InMobi, Intuit, Joost, Last.fm, LinkedIn, Microsoft, NetApp, Netflix, Ooyala, Riot Games, Spotify, Qualtrics, The New York Times, SAP AG, SAS Institute, StumbleUpon, Twitter, Yodlee
  • 39. © Copyright 2014. Apps Associates LLC. 40 Hadoop Ecosystem HDFS (Hadoop Distributed File System) HBase (key-value store) MapReduce (Job Scheduling/Execution System) Data Access Sqoop Flume Client Access Hue Hive(Sql) Pig(Pl/Sql) ZooKeeper (Coordination) (Streaming/Pipes APIs) Chukwa (Monitoring) Data Mining Mahout OS – Redhat, Suse, Ubuntu,Windows Commodity Hardware Java Virtual Machine Networking Orchestration Oozie
  • 40. © Copyright 2014. Apps Associates LLC. 43 PIG •Compiled into a series of MapReduce jobs –Easier to program –Optimization opportunities •grunt> A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float); •grunt> B = FOREACH A GENERATE name;
  • 41. © Copyright 2014. Apps Associates LLC. 44 Hive Managing and querying structured data •MapReduce for execution •SQL like syntax •Extensible with types, functions, scripts •Metadata stored in a RDBMS (MySQL) •Joins, Group By, Nesting •Optimizer for number of MapReduce required hive> SELECT a.foo FROM invites a WHERE a.ds='<DATE>‘;
  • 42. © Copyright 2014. Apps Associates LLC. 45 Sqoop •It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import •Imports can also be used to populate tables in Hive or HBase •Exports can be used to put data from Hadoop into a relational database
  • 43. © Copyright 2014. Apps Associates LLC. 47 HDFS Architecture
  • 44. © Copyright 2014. Apps Associates LLC. 50 Architecture Overview
  • 45. © Copyright 2014. Apps Associates LLC. 51 HDFS Distributions
  • 46. © Copyright 2014. Apps Associates LLC. 52 Hadoop 2.0
  • 47. © Copyright 2014. Apps Associates LLC. 53 Oracle Big Data Appliance: Introduction Oracle Big Data Appliance: Introduction •Oracle Big Data Appliance is an engineered system containing both hardware and software components. Oracle Big Data Appliance delivers: ‒A complete and optimized solution for big data ‒Single-vendor support for both hardware and software ‒An easy-to-deploy solution ‒Tight integration with Oracle Database
  • 48. © Copyright 2014. Apps Associates LLC. 54 Oracle Big Data Appliance: Where It Stands? Big Data Appliance Data Variety Unstructured Schema-less Schema Information Acquire Organize Analyze
  • 49. © Copyright 2014. Apps Associates LLC. 55 Oracle Big Data: Software Components Oracle Big Data Appliance Oracle NoSQL Database Oracle Big Data Connectors Open Source R Distribution Cloudera Manager & Cloudera’s Distribution Including Apache Hadoop Oracle Linux 5.6 and Java Hotspot VM
  • 50. © Copyright 2014. Apps Associates LLC. 56 Oracle Big Data with Oracle Exadata
  • 51. © Copyright 2014. Apps Associates LLC. 57 Oracle Big Data Solution Oracle BI Foundation Suite Oracle Real-Time Decisions Endeca Information Discovery Decide Oracle Advanced Analytics Oracle Database Oracle Spatial & Graph Acquire – Organize – Analyze Oracle Big Data Connectors Oracle Data Integrator Stream Oracle Event Processing Apache Flume Oracle GoldenGate Oracle NoSQL Database Cloudera Hadoop Oracle R Distribution
  • 52. © Copyright 2014. Apps Associates LLC. 58 Mapping the Phases with Software Acquire Phase –Hadoop Distributed File System –Oracle NoSQL Database Organize Phase –Hadoop Software Framework –Oracle Data Integrator Analyze Phase –R Statistical Programming Environment –Oracle Data Warehouse
  • 53. © Copyright 2014. Apps Associates LLC. 61 Analyze Phase Analyze Database + Oracle R Enterprise Statistical Functions Data Mining Algorithms Query Capabilities
  • 54. © Copyright 2014. Apps Associates LLC. 63 Oracle Big Data: Software Components Oracle Big Data Appliance Oracle NoSQL Database Oracle Big Data Connectors Open Source R Distribution Cloudera Manager & Cloudera’s Distribution Including Apache Hadoop Oracle Linux 5.6 and Java Hotspot VM
  • 55. © Copyright 2014. Apps Associates LLC. 64 Oracle Big Data Sql •New Technology Bridges Oracle, Hadoop, and NoSQL Data Stores Using Oracle Big Data SQL, organizations can: •Combine data from Oracle Database, Hadoop and NoSQL in a single SQL query •Query and analyze data in Hadoop and NoSQL •Integrate big data analysis into existing applications and architectures •Extend security and access policies from Oracle Database to data in Hadoop and NoSQL •Maximize query performance on all data using Smart Scan •One Fast SQL Query for All Your Data
  • 56. © Copyright 2014. Apps Associates LLC. 65 Data Science Source :http://wikibon.org/blog/role-of-the-data-scientist/
  • 57. © Copyright 2014. Apps Associates LLC. 66 Data Scientist Source :http://wikibon.org/blog/role-of-the-data-scientist/
  • 58. © Copyright 2014. Apps Associates LLC. 67 DBA to Data Scientist Hadoop HDFS Map Reduce NoSQL Database Hive Pig OR All the above with Big Data Appliance
  • 59. © Copyright 2014. Apps Associates LLC. 68 Big Data Eco System
  • 60. © Copyright 2013. Apps Associates LLC. 69 On Premise Hadoop as RDBMS “active archive” SALES 2013 Oracle Database Structured Data Analytics from Apps SALES 2012 SALES 2011 SALES 2010 SALES 2011 SALES 2010 “Hive” provides an SQL- like query layer over Hadoop and MapReduce Unstructured + Structured Data Analytics from Apps Hadoop for Structured Archive and Unstructured data
  • 61. © Copyright 2013. Apps Associates LLC. 70 AWS EMR as RDBMS “active archive” SALES 2013 Oracle Database Structured Data Analytics from Apps SALES 2012 SALES 2011 SALES 2010 SALES 2011 SALES 2010 “Hive” provides an SQL- like query layer over Amazon EMR Unstructured + Structured Data Analytics from Apps AWS EMR for Structured Archive and Unstructured data Amazon Elastic MapReduce (Amazon EMR)