Big DATA – Integrated course
SETUP – Hadoop, Spark, Kafka and NoSQL Environment
• Install and configure Virtual Box
• Load and configure RHEL based Virtual Machine
• Install/Configure VM with basic software’s
• User setup and Database account creation
• Configure SSH and checks ports availability
Hadoop - HDFS (Hadoop - Distributed File System)
• Hadoop Distributed file system, Background, GFS
• HDFS config files – core,hdfs, mapred site xmls
• Data Replication – Static and Dynamic configuration
• Data Storage – Block Size details
• HDFS - DFS shell commands
• HDFS -Admin commands and data recovery
Hadoop - MapReduce Framework
• MapReduce Introduction
• Writing MapReduce Programs
• Mappers and Reducers details
• Running MR jobs
• Configure custom Map and Reduce jobs.
Hadoop - Apache HIVE
• Hive Installation and Meta store setup
• Hive Shell commands
• Hive QL Basics
• Hive Local and MR mode data load
• Working with Tables, Databases etc.
• Hands on Exercises and Assignments
Spark - Spark Installation and Introduction
• Apache Spark Installation (version 2.x)
• Spark shell and Pyspark shell setup.
• Spark Executor cores and Executors setup
• Spark configurations for logs .
• Writing UDF (user defined functions)
Spark- Scala Installation and Introduction
• Scala Installation (version 2.x)
• Scala setup for Spark environment
• Scala based Spark exercise
Spark - Resilient Distributed Datasets (RDD)
• Working with RDDs in Spark
• Creating RDDs from scratch
• Creating RDD from preexisting data
• Accumulators and Broadcast variables
• RDD – Transformations commands
• RDD – Actions commands
• RDD complex exercises
Spark – Spark SQL and Data Frames
• Spark SQL and the SQL Context
• Creating DataFrames from raw datasets
• Transforming and Querying DataFrames
• Using csv files and mapping schema
• Using case structures and user defined data types
Spark - Spark Mlib (Machine Learning)
• Basic Principles of Machine Learning
• Supervised and Unsupervised Learnings
• Setup Machine Learning for Spark
• Transformations, Correlation Algorithm.
• Exercise for Regression , Correlation.
Kafka- Apache Kafka
• Introduction to Apache Kafka
• Identifying the major Kafka components
• Determining what data is appropriate for use with Kafka
• Developing with Kafka producers, consumers, and brokers
Kafka- Installation and Labs
• Kafka Features and terminologies
• High level Kafka architecture
• Kafka Installation in Linux/Windows.
• Install Kafka Zookeeper
• Install Kafka Server
Kafka- Consumer, Producer and Topics
• Writing Kafka Consumer Labs
• Create Kafka Messages
• Create Kafka Topics
• Message structure and topic configuration
• Write Kafka Producer
• Configure Producer and Kafka Server
• Kafka Multi Broker Configuration
NoSQL- Introduction and Details
• NoSQL databases introduction
• Types of NoSQL databases – MongoDB, Cassandra, Couch DB
• Use cases for NoSQL databases
• Document DB types
• Comparison with RDBMS
NoSQL- MongoDB
• MongoDB installation on Linux/windows box
• Mongo Demon threads
• Mongo Shell configuration
• Mongo collection creation
• Mongo data load in collections
NoSQL – Mongo Query Language
• MongoDB query language
• Mongo create() , update() and delete() query
• Mongo find() query
Study Materials and Labs
1) Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla
Firefox and other components pre-installed
2) The VM can be used even after the training is DONE. Please note it’s NOT a remote
lab type environment. You will be able to keep the VM and all labs even after the
training is completed

Big data_hadoop_spark_kafka_nosql_training

  • 1.
    Big DATA –Integrated course SETUP – Hadoop, Spark, Kafka and NoSQL Environment • Install and configure Virtual Box • Load and configure RHEL based Virtual Machine • Install/Configure VM with basic software’s • User setup and Database account creation • Configure SSH and checks ports availability Hadoop - HDFS (Hadoop - Distributed File System) • Hadoop Distributed file system, Background, GFS • HDFS config files – core,hdfs, mapred site xmls • Data Replication – Static and Dynamic configuration • Data Storage – Block Size details • HDFS - DFS shell commands • HDFS -Admin commands and data recovery Hadoop - MapReduce Framework • MapReduce Introduction • Writing MapReduce Programs • Mappers and Reducers details • Running MR jobs • Configure custom Map and Reduce jobs. Hadoop - Apache HIVE • Hive Installation and Meta store setup • Hive Shell commands • Hive QL Basics • Hive Local and MR mode data load • Working with Tables, Databases etc. • Hands on Exercises and Assignments Spark - Spark Installation and Introduction • Apache Spark Installation (version 2.x) • Spark shell and Pyspark shell setup. • Spark Executor cores and Executors setup • Spark configurations for logs . • Writing UDF (user defined functions)
  • 2.
    Spark- Scala Installationand Introduction • Scala Installation (version 2.x) • Scala setup for Spark environment • Scala based Spark exercise Spark - Resilient Distributed Datasets (RDD) • Working with RDDs in Spark • Creating RDDs from scratch • Creating RDD from preexisting data • Accumulators and Broadcast variables • RDD – Transformations commands • RDD – Actions commands • RDD complex exercises Spark – Spark SQL and Data Frames • Spark SQL and the SQL Context • Creating DataFrames from raw datasets • Transforming and Querying DataFrames • Using csv files and mapping schema • Using case structures and user defined data types Spark - Spark Mlib (Machine Learning) • Basic Principles of Machine Learning • Supervised and Unsupervised Learnings • Setup Machine Learning for Spark • Transformations, Correlation Algorithm. • Exercise for Regression , Correlation. Kafka- Apache Kafka • Introduction to Apache Kafka • Identifying the major Kafka components • Determining what data is appropriate for use with Kafka • Developing with Kafka producers, consumers, and brokers Kafka- Installation and Labs • Kafka Features and terminologies • High level Kafka architecture • Kafka Installation in Linux/Windows. • Install Kafka Zookeeper • Install Kafka Server
  • 3.
    Kafka- Consumer, Producerand Topics • Writing Kafka Consumer Labs • Create Kafka Messages • Create Kafka Topics • Message structure and topic configuration • Write Kafka Producer • Configure Producer and Kafka Server • Kafka Multi Broker Configuration NoSQL- Introduction and Details • NoSQL databases introduction • Types of NoSQL databases – MongoDB, Cassandra, Couch DB • Use cases for NoSQL databases • Document DB types • Comparison with RDBMS NoSQL- MongoDB • MongoDB installation on Linux/windows box • Mongo Demon threads • Mongo Shell configuration • Mongo collection creation • Mongo data load in collections NoSQL – Mongo Query Language • MongoDB query language • Mongo create() , update() and delete() query • Mongo find() query Study Materials and Labs 1) Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla Firefox and other components pre-installed 2) The VM can be used even after the training is DONE. Please note it’s NOT a remote lab type environment. You will be able to keep the VM and all labs even after the training is completed