© Cloudera, Inc. All rights reserved.
Road to Cloudera certification
© Cloudera, Inc. All rights reserved.
The demand for skills is high and Hadoop is the future. Customers
cannot afford to move slowly in staffing their Big Data projects.
Customers are building plans to ensure projects are staffed with
skilled employees, and supported by a qualified services provider.
Job Trends from Indeed.com
What are you most concerned about
when it comes to your readiness for big
data and Hadoop?
Cloudera MDP webinar poll results, July 2016
© Cloudera, Inc. All rights reserved.
Why Cloudera training?
Aligned to best practices and the pace of change
1 Broadest range of courses
Learning paths for Developer, Admin, Analyst
2 Most experienced instructors
More than 50,000 trained since 2009
6 Widest geographic coverage
Most classes offered: 50 cities worldwide plus online
7 Most relevant platform & community
CDH deployed more than all other distributions combined
3 Leader in certification
Over 12,000 accredited Cloudera professionals
Trusted source for training
100,000+ people have attended online courses4
8 Depth of training material
Hands-on labs and VMs support live instruction
9 Ongoing learning
Video tutorials and e-learning complement training
State of the art curriculum
Courses updated as Hadoop evolves5 10Commitment to big data education
University partnerships to teach Hadoop in colleges
© Cloudera, Inc. All rights reserved.
What is available from Cloudera University?
• Private training: Course delivered at location of customer choice to internal audience
• Public training: Courses regularly scheduled around the globe. Schedule available on web
• Virtual training: Live training accessed via the internet; available for public and private courses
• OnDemand training: Pre-recorded lecture with identical content/exercises as live training options
• Certification: Rigorously developed and meaningful bodies of knowledge
OnDemand Virtual live classroom Private onsitePublic live classroom
© Cloudera, Inc. All rights reserved.
Suggested Cloudera University curricula
Developers
• Python/Scala Training
• Developer for Spark and Hadoop
• CCA: Spark and Hadoop
Developer
• Spark ML & Kafka modules
• Topic specific training (Search,
HBase)
• Hands on practice
• CCP: Data Engineer
Administrators
• Cloudera Administration training
• CCA: Administrator
Data Analysts/Data Scientists
• Data Analyst: Using Hive, Pig & Impala
• CCA: Data Analyst
• Cloudera Data Science
© Cloudera, Inc. All rights reserved.
Let’s get certified!
© Cloudera, Inc. All rights reserved.
Certification Tiers
 CCA (Cloudera Certified Associate)
 Data Analyst, Admin and Spark & Hadoop Developer
 Basic exam – but its a complex subject area
 Maps to curriculum
 CCP (Cloudera Certified Professional)
 Data Engineer
 Combination of Developer, Analyst and Big Data services
 Mastery level – beyond the introduction course
 Real world experience
© Cloudera, Inc. All rights reserved.
Exam format CCA and CCP certification
 Not multiple choice
 Hands on, practical exams similar to student exercises
 Home based, no testing centres
 Proctored through ExamsLocal.com
 Webcam and desktop recorded and monitored
 No papers / phone / drinks on desk / no talking
 AWS Cloud-based cluster
 Guacamole remote desktop in web browser
 No Internet search during exam – only local documentation
© Cloudera, Inc. All rights reserved.
Sample CCA question
 Instructions
 Connect to the MySQL database on the cluster using Sqoop and import all of the
data from the customer table into HDFS. The result must be comma delimited
text format and put into hdfs dir /user/cert/solution3
 Data Description
 A MySQL instance is running on the gateway node. In that instance, you will find
a table that contains twenty-five million (25,000,000) rows of customer data.
MySQL database information:
Installation: On the cluster node gateway
Table name: customer
Username: cloudera
Password: cloudera
© Cloudera, Inc. All rights reserved.
Sample CCP Data Engineer question #1
Instructions
 Dualcore Inc. is a leading electronics retailer. All of their customer data is in a
relational database. Your task is to ingest all this data into their Hadoop
cluster in the proper file format and compression for their needs.
 Dualcore has a number of requirements for this data. It must be stored in a
binary file format. They will keep this data for a minimum of ten years, so
select a format that supports access from multiple programming languages
and backward compatibility if the schema ever changes. They also require
that the data be stored in a compressed format. The data is queried
regularly, so choose a compression codec that is fastest for compression and
decompression and included with CDH.
Data Description ...
© Cloudera, Inc. All rights reserved.
Sample CCP Data Engineer question #2
Instructions
LoudAcre Mobile is a mobile phone service provider that is moving a portion of their
customer analytics workload to Hadoop. Before they can use their customer data,
they want you to clean it and make it consistent.
Errors were found while looking at the customer records. Unfortunately, different input
methods wrote date fields in different formats. Your task is to standardize these
date fields into a consistent format..
Data Description ...
1943233 Chrisopher Rodrigez Jan 11, 1980
8989022 John Birchall 6/7/1967
2933321 Thomas Stewart 08/22/54
© Cloudera, Inc. All rights reserved.
How to Study for CCA and CCP certification
 Set aside 2 to 3 days of dedicated study time for certification
 These certification tests are not easy
 Review the certification webpage study points
 Only study using the certification open book linked documentation
 No Google, Cloudera Training material, favourite tutorial
 Practice with CDH and spark software versions found in the test
 Be familiar with Hive, Imapla shell, Basic Linux shell and Hue UI
© Cloudera, Inc. All rights reserved.
Practice all of the study points
 Stop when confident you know the topic by practising it
 Ensure your know the syntax and experienced the gotchas
 Read all the documentation concerned with the study topic
 Know the documented examples for your copy/paste go to
 Know where to lookup parameters, config and api docs
 Be able to adapt to different scenarios or link topics together
 Questions have multi parts and dependencies
© Cloudera, Inc. All rights reserved.
Taking the exam
 CCA Data Analyst and Developer 2 Hours 9 Questions - 13 mins per
question
 CCA Admin 2 hours 10 questions - 12 mins per question
 CCP Engineer 4 hours 7 questions - 34 mins per question
 Some questions are done in 5 mins some take 20+ or 45+ mins per question
 Questions are weighted in value and can have multiple parts
 Risk of a running out of time which means
 Can’t complete the easy questions to pass
 Can’t check your answers to fix any problems to pass
 Stop any question after 20 mins and come back at the end
 Skip any question that looks too hard after quick skim read and come
back
 Finished? Always double check your answers
© Cloudera, Inc. All rights reserved.
Common certification exam problems
 Review the certification FAQ for common problems and questions marked wrong
status
 https://www.cloudera.com/more/training/certification/faq.html
 Remote desktop or network too slow!
 Do exam off peak times. Use command line shell not Hue gui.
 Unfamiliar with the questions topic. Time wasted reading docs in exam time. Study!
 Don’t use localhost instead use the correct gateway/master/worker hostname
 Rushing and stressed makes mistakes:
 Misinterpreted what the question asked.
 Are directories/files/property/columns names spelled correctly?
 Is output data format 100% correct ? check column order, data types, null values
are what was asked. Don’t assume.
 Notice any errors in logs or console when running ? Scroll back and check!
© Cloudera, Inc. All rights reserved.
Tips for studying CCA Admin
 Know Cloudera Manager UI and how to search properties
 Breadcrumbs, instances, safety valve advanced settings
 Forget to apply setting or restart service, don’t break the cluster!
 Practice topics not in the admin course but in the exam:
 Sentry setup, Load balancer, Log redaction and Encrypted zones
 Practice all the hdfs dfs and dfsadmin commands
 Practice setting up services and service instances
 Practice troubleshooting and fixing common problem applications
 Know your way around the different log files
© Cloudera, Inc. All rights reserved.
Tips for studying Data Analyst certification
 Study how to use regex to manipulate strings well
 SQL subqueries have a temp table name, don’t forget it
 Understand Sqoop warehouse dir and target dir relationship
 Practice Sqoop help to quickly view and use parameters
 Practice window analytic functions - not easy to do
 Practice type conversions for Hive and Impala
 Practice how to create partitioned/bucketed tables – lots of syntax
 Copy and paste directly from the question to quickly create the table
 Practice using the command line: beeline and impala shell
© Cloudera, Inc. All rights reserved.
Tips for studying CCA Spark and Hadoop
 No need to be an expert in Scala or Python coding.
 Only testing Spark knowledge.
 Practice Sqoop, Hdfs dfs command line and your SQL
 Certification has not yet been updated to spark 2.0 (uses 1.6)
 New students may not be familiar with Spark 1.6. Minor differences.
 Read and practice using spark documentation
 Start the 1.6 spark shell with pyspark and spark-shell not spark2-shell or
pyspark2
© Cloudera, Inc. All rights reserved.
Tips for studying CCP Data Enginner
 Study non core topics found outside the training course material
 Ignore what is not Cloudera supported
 Oozie features one third of the test!
 See gethue.com website for short oozie ui tutorials
 How to get Oozie to run on your small default cluster:
 Adjust container memory so you can run multiple containers
 Increase Node manager max container size to 7 GB
 Limit container memory max size to 3 GB and 1 cpu
 Result on a dual core 8gb 3x worker node cluster: 6 containers.
 Currently Spark 1.6 not Spark 2.0 (will be updated in the future)
© Cloudera, Inc. All rights reserved.
Qualify for free certification
 Take part in a Data Analyst, Developer or Administrator Public class to
receive a free certification exam in the given discipline
 Valid till the end of April
© Cloudera, Inc. All rights reserved.
Thank you

Road to Cloudera certification

  • 1.
    © Cloudera, Inc.All rights reserved. Road to Cloudera certification
  • 2.
    © Cloudera, Inc.All rights reserved. The demand for skills is high and Hadoop is the future. Customers cannot afford to move slowly in staffing their Big Data projects. Customers are building plans to ensure projects are staffed with skilled employees, and supported by a qualified services provider. Job Trends from Indeed.com What are you most concerned about when it comes to your readiness for big data and Hadoop? Cloudera MDP webinar poll results, July 2016
  • 3.
    © Cloudera, Inc.All rights reserved. Why Cloudera training? Aligned to best practices and the pace of change 1 Broadest range of courses Learning paths for Developer, Admin, Analyst 2 Most experienced instructors More than 50,000 trained since 2009 6 Widest geographic coverage Most classes offered: 50 cities worldwide plus online 7 Most relevant platform & community CDH deployed more than all other distributions combined 3 Leader in certification Over 12,000 accredited Cloudera professionals Trusted source for training 100,000+ people have attended online courses4 8 Depth of training material Hands-on labs and VMs support live instruction 9 Ongoing learning Video tutorials and e-learning complement training State of the art curriculum Courses updated as Hadoop evolves5 10Commitment to big data education University partnerships to teach Hadoop in colleges
  • 4.
    © Cloudera, Inc.All rights reserved. What is available from Cloudera University? • Private training: Course delivered at location of customer choice to internal audience • Public training: Courses regularly scheduled around the globe. Schedule available on web • Virtual training: Live training accessed via the internet; available for public and private courses • OnDemand training: Pre-recorded lecture with identical content/exercises as live training options • Certification: Rigorously developed and meaningful bodies of knowledge OnDemand Virtual live classroom Private onsitePublic live classroom
  • 5.
    © Cloudera, Inc.All rights reserved. Suggested Cloudera University curricula Developers • Python/Scala Training • Developer for Spark and Hadoop • CCA: Spark and Hadoop Developer • Spark ML & Kafka modules • Topic specific training (Search, HBase) • Hands on practice • CCP: Data Engineer Administrators • Cloudera Administration training • CCA: Administrator Data Analysts/Data Scientists • Data Analyst: Using Hive, Pig & Impala • CCA: Data Analyst • Cloudera Data Science
  • 6.
    © Cloudera, Inc.All rights reserved. Let’s get certified!
  • 7.
    © Cloudera, Inc.All rights reserved. Certification Tiers  CCA (Cloudera Certified Associate)  Data Analyst, Admin and Spark & Hadoop Developer  Basic exam – but its a complex subject area  Maps to curriculum  CCP (Cloudera Certified Professional)  Data Engineer  Combination of Developer, Analyst and Big Data services  Mastery level – beyond the introduction course  Real world experience
  • 8.
    © Cloudera, Inc.All rights reserved. Exam format CCA and CCP certification  Not multiple choice  Hands on, practical exams similar to student exercises  Home based, no testing centres  Proctored through ExamsLocal.com  Webcam and desktop recorded and monitored  No papers / phone / drinks on desk / no talking  AWS Cloud-based cluster  Guacamole remote desktop in web browser  No Internet search during exam – only local documentation
  • 9.
    © Cloudera, Inc.All rights reserved. Sample CCA question  Instructions  Connect to the MySQL database on the cluster using Sqoop and import all of the data from the customer table into HDFS. The result must be comma delimited text format and put into hdfs dir /user/cert/solution3  Data Description  A MySQL instance is running on the gateway node. In that instance, you will find a table that contains twenty-five million (25,000,000) rows of customer data. MySQL database information: Installation: On the cluster node gateway Table name: customer Username: cloudera Password: cloudera
  • 10.
    © Cloudera, Inc.All rights reserved. Sample CCP Data Engineer question #1 Instructions  Dualcore Inc. is a leading electronics retailer. All of their customer data is in a relational database. Your task is to ingest all this data into their Hadoop cluster in the proper file format and compression for their needs.  Dualcore has a number of requirements for this data. It must be stored in a binary file format. They will keep this data for a minimum of ten years, so select a format that supports access from multiple programming languages and backward compatibility if the schema ever changes. They also require that the data be stored in a compressed format. The data is queried regularly, so choose a compression codec that is fastest for compression and decompression and included with CDH. Data Description ...
  • 11.
    © Cloudera, Inc.All rights reserved. Sample CCP Data Engineer question #2 Instructions LoudAcre Mobile is a mobile phone service provider that is moving a portion of their customer analytics workload to Hadoop. Before they can use their customer data, they want you to clean it and make it consistent. Errors were found while looking at the customer records. Unfortunately, different input methods wrote date fields in different formats. Your task is to standardize these date fields into a consistent format.. Data Description ... 1943233 Chrisopher Rodrigez Jan 11, 1980 8989022 John Birchall 6/7/1967 2933321 Thomas Stewart 08/22/54
  • 12.
    © Cloudera, Inc.All rights reserved. How to Study for CCA and CCP certification  Set aside 2 to 3 days of dedicated study time for certification  These certification tests are not easy  Review the certification webpage study points  Only study using the certification open book linked documentation  No Google, Cloudera Training material, favourite tutorial  Practice with CDH and spark software versions found in the test  Be familiar with Hive, Imapla shell, Basic Linux shell and Hue UI
  • 13.
    © Cloudera, Inc.All rights reserved. Practice all of the study points  Stop when confident you know the topic by practising it  Ensure your know the syntax and experienced the gotchas  Read all the documentation concerned with the study topic  Know the documented examples for your copy/paste go to  Know where to lookup parameters, config and api docs  Be able to adapt to different scenarios or link topics together  Questions have multi parts and dependencies
  • 14.
    © Cloudera, Inc.All rights reserved. Taking the exam  CCA Data Analyst and Developer 2 Hours 9 Questions - 13 mins per question  CCA Admin 2 hours 10 questions - 12 mins per question  CCP Engineer 4 hours 7 questions - 34 mins per question  Some questions are done in 5 mins some take 20+ or 45+ mins per question  Questions are weighted in value and can have multiple parts  Risk of a running out of time which means  Can’t complete the easy questions to pass  Can’t check your answers to fix any problems to pass  Stop any question after 20 mins and come back at the end  Skip any question that looks too hard after quick skim read and come back  Finished? Always double check your answers
  • 15.
    © Cloudera, Inc.All rights reserved. Common certification exam problems  Review the certification FAQ for common problems and questions marked wrong status  https://www.cloudera.com/more/training/certification/faq.html  Remote desktop or network too slow!  Do exam off peak times. Use command line shell not Hue gui.  Unfamiliar with the questions topic. Time wasted reading docs in exam time. Study!  Don’t use localhost instead use the correct gateway/master/worker hostname  Rushing and stressed makes mistakes:  Misinterpreted what the question asked.  Are directories/files/property/columns names spelled correctly?  Is output data format 100% correct ? check column order, data types, null values are what was asked. Don’t assume.  Notice any errors in logs or console when running ? Scroll back and check!
  • 16.
    © Cloudera, Inc.All rights reserved. Tips for studying CCA Admin  Know Cloudera Manager UI and how to search properties  Breadcrumbs, instances, safety valve advanced settings  Forget to apply setting or restart service, don’t break the cluster!  Practice topics not in the admin course but in the exam:  Sentry setup, Load balancer, Log redaction and Encrypted zones  Practice all the hdfs dfs and dfsadmin commands  Practice setting up services and service instances  Practice troubleshooting and fixing common problem applications  Know your way around the different log files
  • 17.
    © Cloudera, Inc.All rights reserved. Tips for studying Data Analyst certification  Study how to use regex to manipulate strings well  SQL subqueries have a temp table name, don’t forget it  Understand Sqoop warehouse dir and target dir relationship  Practice Sqoop help to quickly view and use parameters  Practice window analytic functions - not easy to do  Practice type conversions for Hive and Impala  Practice how to create partitioned/bucketed tables – lots of syntax  Copy and paste directly from the question to quickly create the table  Practice using the command line: beeline and impala shell
  • 18.
    © Cloudera, Inc.All rights reserved. Tips for studying CCA Spark and Hadoop  No need to be an expert in Scala or Python coding.  Only testing Spark knowledge.  Practice Sqoop, Hdfs dfs command line and your SQL  Certification has not yet been updated to spark 2.0 (uses 1.6)  New students may not be familiar with Spark 1.6. Minor differences.  Read and practice using spark documentation  Start the 1.6 spark shell with pyspark and spark-shell not spark2-shell or pyspark2
  • 19.
    © Cloudera, Inc.All rights reserved. Tips for studying CCP Data Enginner  Study non core topics found outside the training course material  Ignore what is not Cloudera supported  Oozie features one third of the test!  See gethue.com website for short oozie ui tutorials  How to get Oozie to run on your small default cluster:  Adjust container memory so you can run multiple containers  Increase Node manager max container size to 7 GB  Limit container memory max size to 3 GB and 1 cpu  Result on a dual core 8gb 3x worker node cluster: 6 containers.  Currently Spark 1.6 not Spark 2.0 (will be updated in the future)
  • 20.
    © Cloudera, Inc.All rights reserved. Qualify for free certification  Take part in a Data Analyst, Developer or Administrator Public class to receive a free certification exam in the given discipline  Valid till the end of April
  • 21.
    © Cloudera, Inc.All rights reserved. Thank you