SlideShare a Scribd company logo
1 of 3
Big DATA – Integrated course
SETUP – Hadoop, Spark, Kafka and NoSQL Environment
• Install and configure Virtual Box
• Load and configure RHEL based Virtual Machine
• Install/Configure VM with basic software’s
• User setup and Database account creation
• Configure SSH and checks ports availability
Hadoop - HDFS (Hadoop - Distributed File System)
• Hadoop Distributed file system, Background, GFS
• HDFS config files – core,hdfs, mapred site xmls
• Data Replication – Static and Dynamic configuration
• Data Storage – Block Size details
• HDFS - DFS shell commands
• HDFS -Admin commands and data recovery
Hadoop - MapReduce Framework
• MapReduce Introduction
• Writing MapReduce Programs
• Mappers and Reducers details
• Running MR jobs
• Configure custom Map and Reduce jobs.
Hadoop - Apache HIVE
• Hive Installation and Meta store setup
• Hive Shell commands
• Hive QL Basics
• Hive Local and MR mode data load
• Working with Tables, Databases etc.
• Hands on Exercises and Assignments
Spark - Spark Installation and Introduction
• Apache Spark Installation (version 2.x)
• Spark shell and Pyspark shell setup.
• Spark Executor cores and Executors setup
• Spark configurations for logs .
• Writing UDF (user defined functions)
Spark- Scala Installation and Introduction
• Scala Installation (version 2.x)
• Scala setup for Spark environment
• Scala based Spark exercise
Spark - Resilient Distributed Datasets (RDD)
• Working with RDDs in Spark
• Creating RDDs from scratch
• Creating RDD from preexisting data
• Accumulators and Broadcast variables
• RDD – Transformations commands
• RDD – Actions commands
• RDD complex exercises
Spark – Spark SQL and Data Frames
• Spark SQL and the SQL Context
• Creating DataFrames from raw datasets
• Transforming and Querying DataFrames
• Using csv files and mapping schema
• Using case structures and user defined data types
Spark - Spark Mlib (Machine Learning)
• Basic Principles of Machine Learning
• Supervised and Unsupervised Learnings
• Setup Machine Learning for Spark
• Transformations, Correlation Algorithm.
• Exercise for Regression , Correlation.
Kafka- Apache Kafka
• Introduction to Apache Kafka
• Identifying the major Kafka components
• Determining what data is appropriate for use with Kafka
• Developing with Kafka producers, consumers, and brokers
Kafka- Installation and Labs
• Kafka Features and terminologies
• High level Kafka architecture
• Kafka Installation in Linux/Windows.
• Install Kafka Zookeeper
• Install Kafka Server
Kafka- Consumer, Producer and Topics
• Writing Kafka Consumer Labs
• Create Kafka Messages
• Create Kafka Topics
• Message structure and topic configuration
• Write Kafka Producer
• Configure Producer and Kafka Server
• Kafka Multi Broker Configuration
NoSQL- Introduction and Details
• NoSQL databases introduction
• Types of NoSQL databases – MongoDB, Cassandra, Couch DB
• Use cases for NoSQL databases
• Document DB types
• Comparison with RDBMS
NoSQL- MongoDB
• MongoDB installation on Linux/windows box
• Mongo Demon threads
• Mongo Shell configuration
• Mongo collection creation
• Mongo data load in collections
NoSQL – Mongo Query Language
• MongoDB query language
• Mongo create() , update() and delete() query
• Mongo find() query
Study Materials and Labs
1) Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla
Firefox and other components pre-installed
2) The VM can be used even after the training is DONE. Please note it’s NOT a remote
lab type environment. You will be able to keep the VM and all labs even after the
training is completed

More Related Content

What's hot

Hadoop spark online demo
Hadoop spark online demoHadoop spark online demo
Hadoop spark online demoTripti Jha
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using FabricKarthik .P.R
 
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
 
Demystfying nosql databases
Demystfying nosql databasesDemystfying nosql databases
Demystfying nosql databasesMike King
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWSMongoDB
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph Community
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBaseGokuldas Pillai
 
Configuring workload-based storage and topologies
Configuring workload-based storage and topologiesConfiguring workload-based storage and topologies
Configuring workload-based storage and topologiesMariaDB plc
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
2016 jan-pugs-meetup-v9.5-features
2016 jan-pugs-meetup-v9.5-features2016 jan-pugs-meetup-v9.5-features
2016 jan-pugs-meetup-v9.5-featuresSameer Kumar
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_dbhyeongchae lee
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle
 
Beyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forksBeyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forksSameer Kumar
 
MongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB
 

What's hot (19)

Hadoop spark online demo
Hadoop spark online demoHadoop spark online demo
Hadoop spark online demo
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
 
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
Demystfying nosql databases
Demystfying nosql databasesDemystfying nosql databases
Demystfying nosql databases
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWS
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
 
Configuring workload-based storage and topologies
Configuring workload-based storage and topologiesConfiguring workload-based storage and topologies
Configuring workload-based storage and topologies
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
2016 jan-pugs-meetup-v9.5-features
2016 jan-pugs-meetup-v9.5-features2016 jan-pugs-meetup-v9.5-features
2016 jan-pugs-meetup-v9.5-features
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Beyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forksBeyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forks
 
MongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB and AWS: Integrations
MongoDB and AWS: Integrations
 

Similar to Big data_hadoop_spark_kafka_nosql_training

Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackDataWorks Summit/Hadoop Summit
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareAdvanto Software
 
Innovation with Connection, The new HPCC Systems Plugins and Modules
Innovation with Connection, The new HPCC Systems Plugins and ModulesInnovation with Connection, The new HPCC Systems Plugins and Modules
Innovation with Connection, The new HPCC Systems Plugins and ModulesHPCC Systems
 
Speeding Up The Snail
Speeding Up The SnailSpeeding Up The Snail
Speeding Up The SnailMarcus Deglos
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Hadoop Training in Hyderabad
Hadoop Training in HyderabadHadoop Training in Hyderabad
Hadoop Training in HyderabadRajitha D
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
Hadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On AzureHadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On AzureErik Krogen
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructuredatastack
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoMapR Technologies
 

Similar to Big data_hadoop_spark_kafka_nosql_training (20)

Manoj CV
Manoj CVManoj CV
Manoj CV
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto software
 
Innovation with Connection, The new HPCC Systems Plugins and Modules
Innovation with Connection, The new HPCC Systems Plugins and ModulesInnovation with Connection, The new HPCC Systems Plugins and Modules
Innovation with Connection, The new HPCC Systems Plugins and Modules
 
Speeding Up The Snail
Speeding Up The SnailSpeeding Up The Snail
Speeding Up The Snail
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Hadoop Training in Hyderabad
Hadoop Training in HyderabadHadoop Training in Hyderabad
Hadoop Training in Hyderabad
 
Hadoop Training in Hyderabad
Hadoop Training in HyderabadHadoop Training in Hyderabad
Hadoop Training in Hyderabad
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Apache spark
Apache sparkApache spark
Apache spark
 
Hadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On AzureHadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On Azure
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 

More from Kamal A

All python data_analyst_r_course
All python data_analyst_r_courseAll python data_analyst_r_course
All python data_analyst_r_courseKamal A
 
Project using kafka spark mongo db project
Project using kafka spark mongo db projectProject using kafka spark mongo db project
Project using kafka spark mongo db projectKamal A
 
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainKamal A
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop projectKamal A
 
Practical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectPractical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectKamal A
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training courseKamal A
 

More from Kamal A (6)

All python data_analyst_r_course
All python data_analyst_r_courseAll python data_analyst_r_course
All python data_analyst_r_course
 
Project using kafka spark mongo db project
Project using kafka spark mongo db projectProject using kafka spark mongo db project
Project using kafka spark mongo db project
 
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domain
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop project
 
Practical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectPractical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified Architect
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training course
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 

Big data_hadoop_spark_kafka_nosql_training

  • 1. Big DATA – Integrated course SETUP – Hadoop, Spark, Kafka and NoSQL Environment • Install and configure Virtual Box • Load and configure RHEL based Virtual Machine • Install/Configure VM with basic software’s • User setup and Database account creation • Configure SSH and checks ports availability Hadoop - HDFS (Hadoop - Distributed File System) • Hadoop Distributed file system, Background, GFS • HDFS config files – core,hdfs, mapred site xmls • Data Replication – Static and Dynamic configuration • Data Storage – Block Size details • HDFS - DFS shell commands • HDFS -Admin commands and data recovery Hadoop - MapReduce Framework • MapReduce Introduction • Writing MapReduce Programs • Mappers and Reducers details • Running MR jobs • Configure custom Map and Reduce jobs. Hadoop - Apache HIVE • Hive Installation and Meta store setup • Hive Shell commands • Hive QL Basics • Hive Local and MR mode data load • Working with Tables, Databases etc. • Hands on Exercises and Assignments Spark - Spark Installation and Introduction • Apache Spark Installation (version 2.x) • Spark shell and Pyspark shell setup. • Spark Executor cores and Executors setup • Spark configurations for logs . • Writing UDF (user defined functions)
  • 2. Spark- Scala Installation and Introduction • Scala Installation (version 2.x) • Scala setup for Spark environment • Scala based Spark exercise Spark - Resilient Distributed Datasets (RDD) • Working with RDDs in Spark • Creating RDDs from scratch • Creating RDD from preexisting data • Accumulators and Broadcast variables • RDD – Transformations commands • RDD – Actions commands • RDD complex exercises Spark – Spark SQL and Data Frames • Spark SQL and the SQL Context • Creating DataFrames from raw datasets • Transforming and Querying DataFrames • Using csv files and mapping schema • Using case structures and user defined data types Spark - Spark Mlib (Machine Learning) • Basic Principles of Machine Learning • Supervised and Unsupervised Learnings • Setup Machine Learning for Spark • Transformations, Correlation Algorithm. • Exercise for Regression , Correlation. Kafka- Apache Kafka • Introduction to Apache Kafka • Identifying the major Kafka components • Determining what data is appropriate for use with Kafka • Developing with Kafka producers, consumers, and brokers Kafka- Installation and Labs • Kafka Features and terminologies • High level Kafka architecture • Kafka Installation in Linux/Windows. • Install Kafka Zookeeper • Install Kafka Server
  • 3. Kafka- Consumer, Producer and Topics • Writing Kafka Consumer Labs • Create Kafka Messages • Create Kafka Topics • Message structure and topic configuration • Write Kafka Producer • Configure Producer and Kafka Server • Kafka Multi Broker Configuration NoSQL- Introduction and Details • NoSQL databases introduction • Types of NoSQL databases – MongoDB, Cassandra, Couch DB • Use cases for NoSQL databases • Document DB types • Comparison with RDBMS NoSQL- MongoDB • MongoDB installation on Linux/windows box • Mongo Demon threads • Mongo Shell configuration • Mongo collection creation • Mongo data load in collections NoSQL – Mongo Query Language • MongoDB query language • Mongo create() , update() and delete() query • Mongo find() query Study Materials and Labs 1) Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla Firefox and other components pre-installed 2) The VM can be used even after the training is DONE. Please note it’s NOT a remote lab type environment. You will be able to keep the VM and all labs even after the training is completed