1. Oracle Big Data Cloud Service
Presented by : Mandeep Kaur Sandhu
Senior Oracle DBA (university Of Auckland)
Download these slides from : mandysandhu.com
2. • Introduction to Big Data
• Oracle Big data deployment models
• Oracle Big Data cloud Service
• Core Principles
• Access and Admin tasks
• Data Management tools
• Event Hub
• Conclusion
2
Goals
3. 3
What is Big Data??
Variety
Velocity
Volume
• Big data is a term
that describe Large
or complex datasets
• Traditional data
Processing system
failed to analyse
this data
• Big data identify
the value of data
4. An open Source Software Platform for distributed storage
and processing – Highly Scalable , Reliable and Available
4
What is Hadoop??
Hadoop
Logically Distributed file
system
Framework for processing
Designed to run on small/large
machine for parallel processing
Allow resource Growth
Avoid Vendor Locks in
6. Programming Model for processing large data sets
• Map - set of data and converts into another set of data
• Reduce – Take output of Map as input and combine into smaller set
MapReduce
6
7. 7
Oracle Big Data Deployment
Models
Oracle Big Data Cloud
service model
delivered in your data
centre, behind your
firewall
Oracle Big Data
Cloud at
Customer(BDCC)
On- Premises
engineered system
designed to deliver
predictable Hadoop
infrastructure
Oracle Big Data
Appliance X6
Oracle public cloud
infrastructure with
cluster nodes and
data sources
Oracle Big Data
Cloud Service
(BDCS)
8. Operational Efficiency
• Out of box installation
• Automated cluster management
• Cloudera Manager
Security
• Data in encrypted – At rest and motion
• Authorization and Authentication
• Network Firewall
Versatility
• Cloudera distribution – Apache Hadoop Enterprise Data hub
• Install and operate third party software
8
BDCS - Core Principles
9. Highly Efficient Cluster Management
• Fault Tolerant – HA Hadoop Infrastructure
• Fully tested Hadoop upgrades
Cluster Nodes
• Cluster is a collection of nodes
• Permanent nodes
• Edge Nodes
• Compute Nodes
9
BDCS - Features
10. • Master or Data node
• Last for the lifetime of the cluster
• Each nodes has:
• 32 OCPU’s
• 256 GB RAM
• 48 TB Storage
• Full Cloudera distribution – Licence and Support
10
Permanent Nodes
11. • Empty Nodes – OS and disk
• Hadoop client configs
• Interface between Hadoop cluster and outside
Network
• Permanent node
Note: No data Node role
11
Edge Nodes
12. • CPU and Memory
• No disks
• Temporary nodes
• Need to Have cluster to add compute nodes
• Cluster can be extended up to 15 cluster
compute nodes
• No HDFS data
12
Compute Nodes
13. • Oracle Linux 6 and Oracle Java – JDK8
• Cloudera Enterprise (Data Hub Edition)
• CDH 5.X with support for YARN and MR2
• Cloudera Impala
• HBASE
• Cloudera Search
• Apache Spark
• Oracle R distribution
• Oracle Big Data Spatial and Graph
13
BDCS – Included Software
14. Oracle Big Data SQL Cloud Service
• Unified SQL access
• Dedicated instances
14
BDCS – Additional Component
Oracle cloud
Cloudera 12c
B X
15. • Login to Oracle cloud
• choose Oracle Big data Cloud service
• Start Pack 1 –> 3 Nodes
• Additional Node – Added later
• Big Data SQL node
15
Oracle BDCS – Service Instance
16. • Go to Oracle big data service instance
• Create service cluster
• Provide tags and Instance Name
16
Oracle BDCS – Service Cluster
17. • Select Big data Appliance system – Service instance
• SSH keys
17
Oracle BDCS – Service Cluster
21. • Add nodes in one node increment – up to total 60 nodes
• Four Permanent Hadoop nodes – Allow additional Edge Node
• Extend/Shrink the service
21
Administrative Tasks
22. • Open Cloudera console – Hue
• Same account detail as CM
• Add Group
• Add User
• Upload file
22
Hue – Group/user and File upload
23. • GUI based console
• Login username – bigdatamgr
• Explore jobs and data stored
• Usage and Health of cluster
• YARN jobs
23
Big Data Manager Console
24. • Zeepelin Notebooks – Interactive analysis using R and Python
24
Oracle Big Manager - Notebook
25. odcp
• Command line for copy large files
• Take input and split it into chunks
• Uses spark to provide parallel transfer
Examples:
odcp hdfs:///user/mandy/bigdata01.csv hdfs:///user/mandy/bigdata01.csv_copy
odcp hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy
odcp hdfs:///user/mandy/bigdata01.csv s3://aserver/bigdata01.csv_copy
odcp s3://user/mandy/bigdata01.csv s3://mandy01/bigdata01.csv_copy
25
Data Management - odcp
26. odiff
• Oracle distribution diff – To compare large Data sets
• Compatible with cloudera distribution
• Minimum block size to compare – 5MB
• Maximum – 2GB
Examples:
/usr/bin/odiff hdfs:///user/mandy/bigdata01.csv
swift://aserver.1234/bigdata01.csv_copy
/usr/bin/odiff -V hdfs:///user/mandy/bigdata01.csv
swift://aserver.1234/bigdata01.csv_copy
/usr/bin/odiff -d hdfs:///user/mandy/bigdata01.csv
swift://aserver.1234/bigdata01.csv_copy
26
Data Management - odiff
27. bda-oss-admin
• To Manage data and resources
• Can set the environment variables
• Configure the cluster with storage provider
Examples:
bdm-oss-admin --cm-username admin --cm-password abce1234
bdm-oss-admin restart_cluster
#!/bin/bash
export CM_ADMIN="my_CM_admin_username"
27
Data Management
28. bdm-cli
• Big data command line interface to copy data and mange copy jobs
• Duplicate of odcp commands
bdm-cli copy
bdm-cli create_job
28
Data Management – bdm-cli
29. Oracle Big Data Cloud Service
Direct ingest into oracle BDCS
29
Data ingest options
Customer Data Centre
Flume
SCP
SCP(SSH
protocol)
Common ingests using
Flume or ETL work
VPN and
FastConnect
30. • Open Source stream processing
• Real time streaming
• High throughput and Low latency platform
30
Apache Kafka
Steams Processing
IOT
Anomaly Detection
Data Integration
Data Lakes
HDFS
Objects storage
Log Aggregation
Click Streams
Server logs
Messaging
Traditional Apps
Micros-services
31. • Fully Managed streaming data platform
• Provide world’s most popular message broker( kafka)
• Flexible
• Available full managed and dedicated deployment option
• Elastic – horizontally and Vertically
• Access
• REST API access
• SSH access to Kafka cluster
31
Oracle Event Hub Cloud Service
32. • Start you big data journey now
• Built and populate a data lake
• Help business to solve the problems by using data
• Register for oracle cloud free trail
https://cloud.oracle.com/tryit
32
Conclusion
33. Thank you for your time!!
Follow and Subscribe Me.
Blog mandysandhu.com
Twitter @mandysandhu14
LinkedIn kaurmandeep88