SlideShare a Scribd company logo
TRAINING SHEET
CLOUDERA DATA ANALYST TRAINING
Take your knowledge to the next level
Cloudera University’s four-day data analyst training course will teach you to apply traditional
data analytics and business intelligence skills to big data tools like Apache Impala, Apache Hive,
and Apache Pig. Cloudera presents the tools data professionals need to access, manipulate,
transform, and analyze complex data sets using SQL and familiar scripting languages.
Learn a modern toolset
Students will have the chance to learn and work with modern tools, such as:
_	Apache Impala enables instant interactive analysis of the data stored in Apache Hadoop
via a native SQL environment.
_	Apache Hive provides a SQL-like query language with HiveQL that makes data accessible
to analysts, database administrators, and others without Java programming expertise.
_	Apache Pig applies the fundamentals of familiar scripting languages to the Hadoop cluster.
Get hands-on experience
Through instructor-led discussion and interactive, hands-on exercises, participants will
navigate the Hadoop ecosystem, learning how to:
_	Acquire, store, and analyze data using features in Pig, Hive, and Impala
_	Perform fundamental ETL (extract, transform, and load) tasks with Hadoop tools
_	Use Pig, Hive, and Impala to improve productivity for typical analysis tasks
_	Join diverse datasets to gain valuable business insight
_	Perform interactive, complex queries on datasets
What to expect
This course is designed for data analysts, business intelligence specialists, developers, system
architects, and database administrators. Prior knowledge of Apache Hadoop is not required.
_	Knowledge of SQL is assumed
_	Basic familiarity with the Linux command line is expected
_	Knowledge of a scripting language (such as Bash scripting, Perl, Python, or Ruby)
is helpful but not essential.
Get certified
Upon completion of the course, attendees are encouraged to continue their study and
register for the CCA Data Analyst exam. Certification is a great differentiator. It helps
establish you as a leader in the field, providing employers and customers with tangible
evidence of your skills and expertise.
“Cloudera has not only prepared
us for success today, but has also
trained us to face and prevail
over our big data challenges in
the future by using Hadoop.”
Persado
TRAINING SHEET
Cloudera, Inc. 395 Page Mill Road Palo Alto, CA 94306 USA cloudera.com
© 2018 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks
of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies.
Information is subject to change without notice. Cloudera_Data_Analyst_Training_Sheet_106
Course Details:
Introduction
Apache Hadoop Fundamentals
_	The Motivation for Hadoop
_	Hadoop Overview
_	Data Storage: HDFS
_	Distributed Data Processing:
YARN, MapReduce, and Spark
_	Data Processing and Analysis:
Pig, Hive, and Impala
_	Database Integration: Sqoop
_	Other Hadoop Data Tools
_	Exercise Scenarios
Introduction to Apache Pig
_	What is Pig?
_	Pig’s Features
_	Pig Use Cases
_	Interacting with Pig
Basic Data Analysis with Apache Pig
_	Pig Latin Syntax
_	Loading Data
_	Simple Data Types
_	Field Definitions
_	Data Output
_	Viewing the Schema
_	Filtering and Sorting Data
_	Commonly Used Functions
Processing Complex Data
with Apache Pig
_	Storage Formats
_	Complex/Nested Data Types
_	Grouping
_	Built-In Functions for Complex Data
_	Iterating Grouped Data
Multi-Dataset Operations
with Apache Pig
_	Techniques for Combining Datasets
_	Joining Datasets in Pig
_	Set Operations
_	Splitting Datasets
Apache Pig Troubleshooting
and Optimization
_	Troubleshooting Pig
_	Logging
_	Using Hadoop’s Web UI
_	Data Sampling and Debugging
_	Performance Overview
_	Understanding the Execution Plan
_	Tips for Improving the Performance
of Pig Jobs
Introduction to Apache Hive
and Impala
_	What is Hive?
_	What is Impala?
_	Why Use Hive and Impala?
_	Schema and Data Storage
_	Comparing Hive and Impala
to Traditional Databases
_	Use Cases
Querying with Apache Hive
and Impala
_	Databases and Tables
_	Basic Hive and Impala Query
Language Syntax
_	Data Types
_	Using Hue to Execute Queries
_	Using Beeline (Hive’s Shell)
_	Using the Impala Shell
Apache Hive and Impala
Data Management
_	Data Storage
_	Creating Databases and Tables
_	Loading Data
_	Altering Databases and Tables
_	Simplifying Queries with Views
_	Storing Query Results
Data Storage and Performance
_	Partitioning Tables
_	Loading Data into Partitioned Tables
_	When to Use Partitioning
_	Choosing a File Format
_	Using Avro and Parquet File Formats
Relational Data Analysis with
Apache Hive and Impala
_	Joining Datasets
_	Common Built-In Functions
_	Aggregation and Windowing
Complex Data with Apache Hive
and Impala
_	Complex Data with Hive
_	Complex Data with Impala
Analyzing Text with Apache Hive
and Impala
_	Using Regular Expressions with
Hive and Impala
_	Processing Text Data with SerDes
in Hive
_	Sentiment Analysis and n-grams
in Hive
Apache Hive Optimization
_	Understanding Query Performance
_	Bucketing
_	Indexing Data
_	Hive on Spark
Apache Impala Optimization
_	How Impala Executes Queries
_	Improving Impala Performance
Extending Apache Hive and Impala
_	Custom SerDes and File Formats
in Hive
_	Data Transformation with
_	Custom Scripts in Hive
_	User-Defined Functions
_	Parameterized Queries
Choosing the Best Tool for the Job
_	Comparing Pig, Hive, Impala,
and Relational Databases
_	Which to Choose?
Conclusion
201712

More Related Content

What's hot

Career prospects of hadoop
Career prospects of hadoopCareer prospects of hadoop
Career prospects of hadoop
inventateqbangalore
 
New data dictionary an internal server api that matters
New data dictionary an internal server api that mattersNew data dictionary an internal server api that matters
New data dictionary an internal server api that matters
Alexander Nozdrin
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
nvvrajesh
 
2015 HortonWorks MDA Roadshow Presentation
2015 HortonWorks MDA Roadshow Presentation2015 HortonWorks MDA Roadshow Presentation
2015 HortonWorks MDA Roadshow Presentation
Felix Liao
 
SQLcl overview - A new Command Line Interface for Oracle Database
SQLcl overview - A new Command Line Interface for Oracle DatabaseSQLcl overview - A new Command Line Interface for Oracle Database
SQLcl overview - A new Command Line Interface for Oracle Database
Jeff Smith
 
Lara Technologies providing best IT Software Training.
Lara Technologies providing best IT Software Training.Lara Technologies providing best IT Software Training.
Lara Technologies providing best IT Software Training.
laratechnologies
 
SAP HANA Overview
SAP HANA OverviewSAP HANA Overview
SAP HANA Overview
Sitaram Kotnis
 
SAS Academy for Data Science
SAS Academy for Data ScienceSAS Academy for Data Science
SAS Academy for Data Science
Anabel Velazque
 
Future of-hadoop-analytics
Future of-hadoop-analyticsFuture of-hadoop-analytics
Future of-hadoop-analytics
MapR Technologies
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
Data Con LA
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
cdmaxime
 
Sap hana platform sps 11 introduces new sap hana hadoop integration features
Sap hana platform sps 11 introduces new sap hana hadoop integration featuresSap hana platform sps 11 introduces new sap hana hadoop integration features
Sap hana platform sps 11 introduces new sap hana hadoop integration features
Avinash Kumar Gautam
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_OpportunityNojan Emad
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
Douglas Bernardini
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
 
Nikunj_Hadoop_Admin_Resume
Nikunj_Hadoop_Admin_ResumeNikunj_Hadoop_Admin_Resume
Nikunj_Hadoop_Admin_ResumeNikunj Ramani
 
Concur Discovers the True Value of Data
Concur Discovers the True Value of DataConcur Discovers the True Value of Data
Concur Discovers the True Value of Data
Cloudera, Inc.
 
WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS? WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS?
nakshatraL
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)John Dougherty
 

What's hot (20)

Career prospects of hadoop
Career prospects of hadoopCareer prospects of hadoop
Career prospects of hadoop
 
New data dictionary an internal server api that matters
New data dictionary an internal server api that mattersNew data dictionary an internal server api that matters
New data dictionary an internal server api that matters
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
2015 HortonWorks MDA Roadshow Presentation
2015 HortonWorks MDA Roadshow Presentation2015 HortonWorks MDA Roadshow Presentation
2015 HortonWorks MDA Roadshow Presentation
 
SQLcl overview - A new Command Line Interface for Oracle Database
SQLcl overview - A new Command Line Interface for Oracle DatabaseSQLcl overview - A new Command Line Interface for Oracle Database
SQLcl overview - A new Command Line Interface for Oracle Database
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
Lara Technologies providing best IT Software Training.
Lara Technologies providing best IT Software Training.Lara Technologies providing best IT Software Training.
Lara Technologies providing best IT Software Training.
 
SAP HANA Overview
SAP HANA OverviewSAP HANA Overview
SAP HANA Overview
 
SAS Academy for Data Science
SAS Academy for Data ScienceSAS Academy for Data Science
SAS Academy for Data Science
 
Future of-hadoop-analytics
Future of-hadoop-analyticsFuture of-hadoop-analytics
Future of-hadoop-analytics
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Sap hana platform sps 11 introduces new sap hana hadoop integration features
Sap hana platform sps 11 introduces new sap hana hadoop integration featuresSap hana platform sps 11 introduces new sap hana hadoop integration features
Sap hana platform sps 11 introduces new sap hana hadoop integration features
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_Opportunity
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Nikunj_Hadoop_Admin_Resume
Nikunj_Hadoop_Admin_ResumeNikunj_Hadoop_Admin_Resume
Nikunj_Hadoop_Admin_Resume
 
Concur Discovers the True Value of Data
Concur Discovers the True Value of DataConcur Discovers the True Value of Data
Concur Discovers the True Value of Data
 
WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS? WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS?
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 

Similar to Cloudera data-analyst-training

Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
Harald Erb
 
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 ExamCCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
Intellipaat
 
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 ExamCCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
Intellipaat
 
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...DataWorks Summit
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Data Con LA
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotechlccinfotech
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placement
sofia taylor
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placement
Iqbal Patel
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-training
Knowledgehut
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Edureka!
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
mindscriptsseo
 
Hadoop Developer Skills | Edureka
Hadoop Developer Skills | EdurekaHadoop Developer Skills | Edureka
Hadoop Developer Skills | Edureka
Edureka!
 
Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentials
Steve Tran
 
A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014
Anuj Sahni
 
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
Progress
 

Similar to Cloudera data-analyst-training (20)

Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 ExamCCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
 
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 ExamCCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam
 
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placement
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placement
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-training
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
 
Hadoop Developer Skills | Edureka
Hadoop Developer Skills | EdurekaHadoop Developer Skills | Edureka
Hadoop Developer Skills | Edureka
 
Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentials
 
A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014
 
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
 
hadoop exp
hadoop exphadoop exp
hadoop exp
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Cloudera data-analyst-training

  • 1. TRAINING SHEET CLOUDERA DATA ANALYST TRAINING Take your knowledge to the next level Cloudera University’s four-day data analyst training course will teach you to apply traditional data analytics and business intelligence skills to big data tools like Apache Impala, Apache Hive, and Apache Pig. Cloudera presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages. Learn a modern toolset Students will have the chance to learn and work with modern tools, such as: _ Apache Impala enables instant interactive analysis of the data stored in Apache Hadoop via a native SQL environment. _ Apache Hive provides a SQL-like query language with HiveQL that makes data accessible to analysts, database administrators, and others without Java programming expertise. _ Apache Pig applies the fundamentals of familiar scripting languages to the Hadoop cluster. Get hands-on experience Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning how to: _ Acquire, store, and analyze data using features in Pig, Hive, and Impala _ Perform fundamental ETL (extract, transform, and load) tasks with Hadoop tools _ Use Pig, Hive, and Impala to improve productivity for typical analysis tasks _ Join diverse datasets to gain valuable business insight _ Perform interactive, complex queries on datasets What to expect This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Prior knowledge of Apache Hadoop is not required. _ Knowledge of SQL is assumed _ Basic familiarity with the Linux command line is expected _ Knowledge of a scripting language (such as Bash scripting, Perl, Python, or Ruby) is helpful but not essential. Get certified Upon completion of the course, attendees are encouraged to continue their study and register for the CCA Data Analyst exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise. “Cloudera has not only prepared us for success today, but has also trained us to face and prevail over our big data challenges in the future by using Hadoop.” Persado
  • 2. TRAINING SHEET Cloudera, Inc. 395 Page Mill Road Palo Alto, CA 94306 USA cloudera.com © 2018 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice. Cloudera_Data_Analyst_Training_Sheet_106 Course Details: Introduction Apache Hadoop Fundamentals _ The Motivation for Hadoop _ Hadoop Overview _ Data Storage: HDFS _ Distributed Data Processing: YARN, MapReduce, and Spark _ Data Processing and Analysis: Pig, Hive, and Impala _ Database Integration: Sqoop _ Other Hadoop Data Tools _ Exercise Scenarios Introduction to Apache Pig _ What is Pig? _ Pig’s Features _ Pig Use Cases _ Interacting with Pig Basic Data Analysis with Apache Pig _ Pig Latin Syntax _ Loading Data _ Simple Data Types _ Field Definitions _ Data Output _ Viewing the Schema _ Filtering and Sorting Data _ Commonly Used Functions Processing Complex Data with Apache Pig _ Storage Formats _ Complex/Nested Data Types _ Grouping _ Built-In Functions for Complex Data _ Iterating Grouped Data Multi-Dataset Operations with Apache Pig _ Techniques for Combining Datasets _ Joining Datasets in Pig _ Set Operations _ Splitting Datasets Apache Pig Troubleshooting and Optimization _ Troubleshooting Pig _ Logging _ Using Hadoop’s Web UI _ Data Sampling and Debugging _ Performance Overview _ Understanding the Execution Plan _ Tips for Improving the Performance of Pig Jobs Introduction to Apache Hive and Impala _ What is Hive? _ What is Impala? _ Why Use Hive and Impala? _ Schema and Data Storage _ Comparing Hive and Impala to Traditional Databases _ Use Cases Querying with Apache Hive and Impala _ Databases and Tables _ Basic Hive and Impala Query Language Syntax _ Data Types _ Using Hue to Execute Queries _ Using Beeline (Hive’s Shell) _ Using the Impala Shell Apache Hive and Impala Data Management _ Data Storage _ Creating Databases and Tables _ Loading Data _ Altering Databases and Tables _ Simplifying Queries with Views _ Storing Query Results Data Storage and Performance _ Partitioning Tables _ Loading Data into Partitioned Tables _ When to Use Partitioning _ Choosing a File Format _ Using Avro and Parquet File Formats Relational Data Analysis with Apache Hive and Impala _ Joining Datasets _ Common Built-In Functions _ Aggregation and Windowing Complex Data with Apache Hive and Impala _ Complex Data with Hive _ Complex Data with Impala Analyzing Text with Apache Hive and Impala _ Using Regular Expressions with Hive and Impala _ Processing Text Data with SerDes in Hive _ Sentiment Analysis and n-grams in Hive Apache Hive Optimization _ Understanding Query Performance _ Bucketing _ Indexing Data _ Hive on Spark Apache Impala Optimization _ How Impala Executes Queries _ Improving Impala Performance Extending Apache Hive and Impala _ Custom SerDes and File Formats in Hive _ Data Transformation with _ Custom Scripts in Hive _ User-Defined Functions _ Parameterized Queries Choosing the Best Tool for the Job _ Comparing Pig, Hive, Impala, and Relational Databases _ Which to Choose? Conclusion 201712