SlideShare a Scribd company logo
© Cloudera, Inc. All rights reserved.
Road to Cloudera certification
© Cloudera, Inc. All rights reserved.
The demand for skills is high and Hadoop is the future. Customers
cannot afford to move slowly in staffing their Big Data projects.
Customers are building plans to ensure projects are staffed with
skilled employees, and supported by a qualified services provider.
Job Trends from Indeed.com
What are you most concerned about
when it comes to your readiness for big
data and Hadoop?
Cloudera MDP webinar poll results, July 2016
© Cloudera, Inc. All rights reserved.
Why Cloudera training?
Aligned to best practices and the pace of change
1 Broadest range of courses
Learning paths for Developer, Admin, Analyst
2 Most experienced instructors
More than 50,000 trained since 2009
6 Widest geographic coverage
Most classes offered: 50 cities worldwide plus online
7 Most relevant platform & community
CDH deployed more than all other distributions combined
3 Leader in certification
Over 12,000 accredited Cloudera professionals
Trusted source for training
100,000+ people have attended online courses4
8 Depth of training material
Hands-on labs and VMs support live instruction
9 Ongoing learning
Video tutorials and e-learning complement training
State of the art curriculum
Courses updated as Hadoop evolves5 10Commitment to big data education
University partnerships to teach Hadoop in colleges
© Cloudera, Inc. All rights reserved.
What is available from Cloudera University?
• Private training: Course delivered at location of customer choice to internal audience
• Public training: Courses regularly scheduled around the globe. Schedule available on web
• Virtual training: Live training accessed via the internet; available for public and private courses
• OnDemand training: Pre-recorded lecture with identical content/exercises as live training options
• Certification: Rigorously developed and meaningful bodies of knowledge
OnDemand Virtual live classroom Private onsitePublic live classroom
© Cloudera, Inc. All rights reserved.
Suggested Cloudera University curricula
Developers
• Python/Scala Training
• Developer for Spark and Hadoop
• CCA: Spark and Hadoop
Developer
• Spark ML & Kafka modules
• Topic specific training (Search,
HBase)
• Hands on practice
• CCP: Data Engineer
Administrators
• Cloudera Administration training
• CCA: Administrator
Data Analysts/Data Scientists
• Data Analyst: Using Hive, Pig & Impala
• CCA: Data Analyst
• Cloudera Data Science
© Cloudera, Inc. All rights reserved.
Let’s get certified!
© Cloudera, Inc. All rights reserved.
Certification Tiers
 CCA (Cloudera Certified Associate)
 Data Analyst, Admin and Spark & Hadoop Developer
 Basic exam – but its a complex subject area
 Maps to curriculum
 CCP (Cloudera Certified Professional)
 Data Engineer
 Combination of Developer, Analyst and Big Data services
 Mastery level – beyond the introduction course
 Real world experience
© Cloudera, Inc. All rights reserved.
Exam format CCA and CCP certification
 Not multiple choice
 Hands on, practical exams similar to student exercises
 Home based, no testing centres
 Proctored through ExamsLocal.com
 Webcam and desktop recorded and monitored
 No papers / phone / drinks on desk / no talking
 AWS Cloud-based cluster
 Guacamole remote desktop in web browser
 No Internet search during exam – only local documentation
© Cloudera, Inc. All rights reserved.
Sample CCA question
 Instructions
 Connect to the MySQL database on the cluster using Sqoop and import all of the
data from the customer table into HDFS. The result must be comma delimited
text format and put into hdfs dir /user/cert/solution3
 Data Description
 A MySQL instance is running on the gateway node. In that instance, you will find
a table that contains twenty-five million (25,000,000) rows of customer data.
MySQL database information:
Installation: On the cluster node gateway
Table name: customer
Username: cloudera
Password: cloudera
© Cloudera, Inc. All rights reserved.
Sample CCP Data Engineer question #1
Instructions
 Dualcore Inc. is a leading electronics retailer. All of their customer data is in a
relational database. Your task is to ingest all this data into their Hadoop
cluster in the proper file format and compression for their needs.
 Dualcore has a number of requirements for this data. It must be stored in a
binary file format. They will keep this data for a minimum of ten years, so
select a format that supports access from multiple programming languages
and backward compatibility if the schema ever changes. They also require
that the data be stored in a compressed format. The data is queried
regularly, so choose a compression codec that is fastest for compression and
decompression and included with CDH.
Data Description ...
© Cloudera, Inc. All rights reserved.
Sample CCP Data Engineer question #2
Instructions
LoudAcre Mobile is a mobile phone service provider that is moving a portion of their
customer analytics workload to Hadoop. Before they can use their customer data,
they want you to clean it and make it consistent.
Errors were found while looking at the customer records. Unfortunately, different input
methods wrote date fields in different formats. Your task is to standardize these
date fields into a consistent format..
Data Description ...
1943233 Chrisopher Rodrigez Jan 11, 1980
8989022 John Birchall 6/7/1967
2933321 Thomas Stewart 08/22/54
© Cloudera, Inc. All rights reserved.
How to Study for CCA and CCP certification
 Set aside 2 to 3 days of dedicated study time for certification
 These certification tests are not easy
 Review the certification webpage study points
 Only study using the certification open book linked documentation
 No Google, Cloudera Training material, favourite tutorial
 Practice with CDH and spark software versions found in the test
 Be familiar with Hive, Imapla shell, Basic Linux shell and Hue UI
© Cloudera, Inc. All rights reserved.
Practice all of the study points
 Stop when confident you know the topic by practising it
 Ensure your know the syntax and experienced the gotchas
 Read all the documentation concerned with the study topic
 Know the documented examples for your copy/paste go to
 Know where to lookup parameters, config and api docs
 Be able to adapt to different scenarios or link topics together
 Questions have multi parts and dependencies
© Cloudera, Inc. All rights reserved.
Taking the exam
 CCA Data Analyst and Developer 2 Hours 9 Questions - 13 mins per
question
 CCA Admin 2 hours 10 questions - 12 mins per question
 CCP Engineer 4 hours 7 questions - 34 mins per question
 Some questions are done in 5 mins some take 20+ or 45+ mins per question
 Questions are weighted in value and can have multiple parts
 Risk of a running out of time which means
 Can’t complete the easy questions to pass
 Can’t check your answers to fix any problems to pass
 Stop any question after 20 mins and come back at the end
 Skip any question that looks too hard after quick skim read and come
back
 Finished? Always double check your answers
© Cloudera, Inc. All rights reserved.
Common certification exam problems
 Review the certification FAQ for common problems and questions marked wrong
status
 https://www.cloudera.com/more/training/certification/faq.html
 Remote desktop or network too slow!
 Do exam off peak times. Use command line shell not Hue gui.
 Unfamiliar with the questions topic. Time wasted reading docs in exam time. Study!
 Don’t use localhost instead use the correct gateway/master/worker hostname
 Rushing and stressed makes mistakes:
 Misinterpreted what the question asked.
 Are directories/files/property/columns names spelled correctly?
 Is output data format 100% correct ? check column order, data types, null values
are what was asked. Don’t assume.
 Notice any errors in logs or console when running ? Scroll back and check!
© Cloudera, Inc. All rights reserved.
Tips for studying CCA Admin
 Know Cloudera Manager UI and how to search properties
 Breadcrumbs, instances, safety valve advanced settings
 Forget to apply setting or restart service, don’t break the cluster!
 Practice topics not in the admin course but in the exam:
 Sentry setup, Load balancer, Log redaction and Encrypted zones
 Practice all the hdfs dfs and dfsadmin commands
 Practice setting up services and service instances
 Practice troubleshooting and fixing common problem applications
 Know your way around the different log files
© Cloudera, Inc. All rights reserved.
Tips for studying Data Analyst certification
 Study how to use regex to manipulate strings well
 SQL subqueries have a temp table name, don’t forget it
 Understand Sqoop warehouse dir and target dir relationship
 Practice Sqoop help to quickly view and use parameters
 Practice window analytic functions - not easy to do
 Practice type conversions for Hive and Impala
 Practice how to create partitioned/bucketed tables – lots of syntax
 Copy and paste directly from the question to quickly create the table
 Practice using the command line: beeline and impala shell
© Cloudera, Inc. All rights reserved.
Tips for studying CCA Spark and Hadoop
 No need to be an expert in Scala or Python coding.
 Only testing Spark knowledge.
 Practice Sqoop, Hdfs dfs command line and your SQL
 Certification has not yet been updated to spark 2.0 (uses 1.6)
 New students may not be familiar with Spark 1.6. Minor differences.
 Read and practice using spark documentation
 Start the 1.6 spark shell with pyspark and spark-shell not spark2-shell or
pyspark2
© Cloudera, Inc. All rights reserved.
Tips for studying CCP Data Enginner
 Study non core topics found outside the training course material
 Ignore what is not Cloudera supported
 Oozie features one third of the test!
 See gethue.com website for short oozie ui tutorials
 How to get Oozie to run on your small default cluster:
 Adjust container memory so you can run multiple containers
 Increase Node manager max container size to 7 GB
 Limit container memory max size to 3 GB and 1 cpu
 Result on a dual core 8gb 3x worker node cluster: 6 containers.
 Currently Spark 1.6 not Spark 2.0 (will be updated in the future)
© Cloudera, Inc. All rights reserved.
Qualify for free certification
 Take part in a Data Analyst, Developer or Administrator Public class to
receive a free certification exam in the given discipline
 Valid till the end of April
© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

What's hot

Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 

What's hot (20)

Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
 
Cloudera Showcase: SQL-on-Hadoop
Cloudera Showcase: SQL-on-HadoopCloudera Showcase: SQL-on-Hadoop
Cloudera Showcase: SQL-on-Hadoop
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internals
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 

Similar to Road to Cloudera certification

Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
DataWorks Summit
 
Hadoop applicationarchitectures
Hadoop applicationarchitecturesHadoop applicationarchitectures
Hadoop applicationarchitectures
Doug Chang
 

Similar to Road to Cloudera certification (20)

Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudData
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
 
Best Practices For Workflow
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
 
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search Training
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Hadoop applicationarchitectures
Hadoop applicationarchitecturesHadoop applicationarchitectures
Hadoop applicationarchitectures
 
Databricks Partner Enablement Guide.pdf
Databricks Partner Enablement Guide.pdfDatabricks Partner Enablement Guide.pdf
Databricks Partner Enablement Guide.pdf
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 
Cloudera data-analyst-training
Cloudera data-analyst-trainingCloudera data-analyst-training
Cloudera data-analyst-training
 
Aws certified: the journey with tips n tricks
Aws certified: the journey with tips n tricksAws certified: the journey with tips n tricks
Aws certified: the journey with tips n tricks
 
HadoopIntroduction.pptx
HadoopIntroduction.pptxHadoopIntroduction.pptx
HadoopIntroduction.pptx
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Memorandum Of Association Constitution of Company.ppt
Memorandum Of Association Constitution of Company.pptMemorandum Of Association Constitution of Company.ppt
Memorandum Of Association Constitution of Company.ppt
seri bangash
 
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
Khaled Al Awadi
 
FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134
LR1709MUSIC
 

Recently uploaded (20)

Memorandum Of Association Constitution of Company.ppt
Memorandum Of Association Constitution of Company.pptMemorandum Of Association Constitution of Company.ppt
Memorandum Of Association Constitution of Company.ppt
 
USA classified ads posting – best classified sites in usa.pdf
USA classified ads posting – best classified sites in usa.pdfUSA classified ads posting – best classified sites in usa.pdf
USA classified ads posting – best classified sites in usa.pdf
 
Cracking the Change Management Code Main New.pptx
Cracking the Change Management Code Main New.pptxCracking the Change Management Code Main New.pptx
Cracking the Change Management Code Main New.pptx
 
sales plan presentation by mckinsey alum
sales plan presentation by mckinsey alumsales plan presentation by mckinsey alum
sales plan presentation by mckinsey alum
 
What are the main advantages of using HR recruiter services.pdf
What are the main advantages of using HR recruiter services.pdfWhat are the main advantages of using HR recruiter services.pdf
What are the main advantages of using HR recruiter services.pdf
 
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
 
Byrd & Chen’s Canadian Tax Principles 2023-2024 Edition 1st edition Volumes I...
Byrd & Chen’s Canadian Tax Principles 2023-2024 Edition 1st edition Volumes I...Byrd & Chen’s Canadian Tax Principles 2023-2024 Edition 1st edition Volumes I...
Byrd & Chen’s Canadian Tax Principles 2023-2024 Edition 1st edition Volumes I...
 
Easy Way to Download and Set Up Gen TDS Software on Your Computer
Easy Way to Download and Set Up Gen TDS Software on Your ComputerEasy Way to Download and Set Up Gen TDS Software on Your Computer
Easy Way to Download and Set Up Gen TDS Software on Your Computer
 
IPTV Subscription UK: Your Guide to Choosing the Best Service
IPTV Subscription UK: Your Guide to Choosing the Best ServiceIPTV Subscription UK: Your Guide to Choosing the Best Service
IPTV Subscription UK: Your Guide to Choosing the Best Service
 
Improving profitability for small business
Improving profitability for small businessImproving profitability for small business
Improving profitability for small business
 
April 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products NewsletterApril 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products Newsletter
 
HR and Employment law update: May 2024.
HR and Employment law update:  May 2024.HR and Employment law update:  May 2024.
HR and Employment law update: May 2024.
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
 
Transforming Max Life Insurance with PMaps Job-Fit Assessments- Case Study
Transforming Max Life Insurance with PMaps Job-Fit Assessments- Case StudyTransforming Max Life Insurance with PMaps Job-Fit Assessments- Case Study
Transforming Max Life Insurance with PMaps Job-Fit Assessments- Case Study
 
Luxury Artificial Plants Dubai | Plants in KSA, UAE | Shajara
Luxury Artificial Plants Dubai | Plants in KSA, UAE | ShajaraLuxury Artificial Plants Dubai | Plants in KSA, UAE | Shajara
Luxury Artificial Plants Dubai | Plants in KSA, UAE | Shajara
 
Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024
 
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
 
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
 
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
 
FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134
 

Road to Cloudera certification

  • 1. © Cloudera, Inc. All rights reserved. Road to Cloudera certification
  • 2. © Cloudera, Inc. All rights reserved. The demand for skills is high and Hadoop is the future. Customers cannot afford to move slowly in staffing their Big Data projects. Customers are building plans to ensure projects are staffed with skilled employees, and supported by a qualified services provider. Job Trends from Indeed.com What are you most concerned about when it comes to your readiness for big data and Hadoop? Cloudera MDP webinar poll results, July 2016
  • 3. © Cloudera, Inc. All rights reserved. Why Cloudera training? Aligned to best practices and the pace of change 1 Broadest range of courses Learning paths for Developer, Admin, Analyst 2 Most experienced instructors More than 50,000 trained since 2009 6 Widest geographic coverage Most classes offered: 50 cities worldwide plus online 7 Most relevant platform & community CDH deployed more than all other distributions combined 3 Leader in certification Over 12,000 accredited Cloudera professionals Trusted source for training 100,000+ people have attended online courses4 8 Depth of training material Hands-on labs and VMs support live instruction 9 Ongoing learning Video tutorials and e-learning complement training State of the art curriculum Courses updated as Hadoop evolves5 10Commitment to big data education University partnerships to teach Hadoop in colleges
  • 4. © Cloudera, Inc. All rights reserved. What is available from Cloudera University? • Private training: Course delivered at location of customer choice to internal audience • Public training: Courses regularly scheduled around the globe. Schedule available on web • Virtual training: Live training accessed via the internet; available for public and private courses • OnDemand training: Pre-recorded lecture with identical content/exercises as live training options • Certification: Rigorously developed and meaningful bodies of knowledge OnDemand Virtual live classroom Private onsitePublic live classroom
  • 5. © Cloudera, Inc. All rights reserved. Suggested Cloudera University curricula Developers • Python/Scala Training • Developer for Spark and Hadoop • CCA: Spark and Hadoop Developer • Spark ML & Kafka modules • Topic specific training (Search, HBase) • Hands on practice • CCP: Data Engineer Administrators • Cloudera Administration training • CCA: Administrator Data Analysts/Data Scientists • Data Analyst: Using Hive, Pig & Impala • CCA: Data Analyst • Cloudera Data Science
  • 6. © Cloudera, Inc. All rights reserved. Let’s get certified!
  • 7. © Cloudera, Inc. All rights reserved. Certification Tiers  CCA (Cloudera Certified Associate)  Data Analyst, Admin and Spark & Hadoop Developer  Basic exam – but its a complex subject area  Maps to curriculum  CCP (Cloudera Certified Professional)  Data Engineer  Combination of Developer, Analyst and Big Data services  Mastery level – beyond the introduction course  Real world experience
  • 8. © Cloudera, Inc. All rights reserved. Exam format CCA and CCP certification  Not multiple choice  Hands on, practical exams similar to student exercises  Home based, no testing centres  Proctored through ExamsLocal.com  Webcam and desktop recorded and monitored  No papers / phone / drinks on desk / no talking  AWS Cloud-based cluster  Guacamole remote desktop in web browser  No Internet search during exam – only local documentation
  • 9. © Cloudera, Inc. All rights reserved. Sample CCA question  Instructions  Connect to the MySQL database on the cluster using Sqoop and import all of the data from the customer table into HDFS. The result must be comma delimited text format and put into hdfs dir /user/cert/solution3  Data Description  A MySQL instance is running on the gateway node. In that instance, you will find a table that contains twenty-five million (25,000,000) rows of customer data. MySQL database information: Installation: On the cluster node gateway Table name: customer Username: cloudera Password: cloudera
  • 10. © Cloudera, Inc. All rights reserved. Sample CCP Data Engineer question #1 Instructions  Dualcore Inc. is a leading electronics retailer. All of their customer data is in a relational database. Your task is to ingest all this data into their Hadoop cluster in the proper file format and compression for their needs.  Dualcore has a number of requirements for this data. It must be stored in a binary file format. They will keep this data for a minimum of ten years, so select a format that supports access from multiple programming languages and backward compatibility if the schema ever changes. They also require that the data be stored in a compressed format. The data is queried regularly, so choose a compression codec that is fastest for compression and decompression and included with CDH. Data Description ...
  • 11. © Cloudera, Inc. All rights reserved. Sample CCP Data Engineer question #2 Instructions LoudAcre Mobile is a mobile phone service provider that is moving a portion of their customer analytics workload to Hadoop. Before they can use their customer data, they want you to clean it and make it consistent. Errors were found while looking at the customer records. Unfortunately, different input methods wrote date fields in different formats. Your task is to standardize these date fields into a consistent format.. Data Description ... 1943233 Chrisopher Rodrigez Jan 11, 1980 8989022 John Birchall 6/7/1967 2933321 Thomas Stewart 08/22/54
  • 12. © Cloudera, Inc. All rights reserved. How to Study for CCA and CCP certification  Set aside 2 to 3 days of dedicated study time for certification  These certification tests are not easy  Review the certification webpage study points  Only study using the certification open book linked documentation  No Google, Cloudera Training material, favourite tutorial  Practice with CDH and spark software versions found in the test  Be familiar with Hive, Imapla shell, Basic Linux shell and Hue UI
  • 13. © Cloudera, Inc. All rights reserved. Practice all of the study points  Stop when confident you know the topic by practising it  Ensure your know the syntax and experienced the gotchas  Read all the documentation concerned with the study topic  Know the documented examples for your copy/paste go to  Know where to lookup parameters, config and api docs  Be able to adapt to different scenarios or link topics together  Questions have multi parts and dependencies
  • 14. © Cloudera, Inc. All rights reserved. Taking the exam  CCA Data Analyst and Developer 2 Hours 9 Questions - 13 mins per question  CCA Admin 2 hours 10 questions - 12 mins per question  CCP Engineer 4 hours 7 questions - 34 mins per question  Some questions are done in 5 mins some take 20+ or 45+ mins per question  Questions are weighted in value and can have multiple parts  Risk of a running out of time which means  Can’t complete the easy questions to pass  Can’t check your answers to fix any problems to pass  Stop any question after 20 mins and come back at the end  Skip any question that looks too hard after quick skim read and come back  Finished? Always double check your answers
  • 15. © Cloudera, Inc. All rights reserved. Common certification exam problems  Review the certification FAQ for common problems and questions marked wrong status  https://www.cloudera.com/more/training/certification/faq.html  Remote desktop or network too slow!  Do exam off peak times. Use command line shell not Hue gui.  Unfamiliar with the questions topic. Time wasted reading docs in exam time. Study!  Don’t use localhost instead use the correct gateway/master/worker hostname  Rushing and stressed makes mistakes:  Misinterpreted what the question asked.  Are directories/files/property/columns names spelled correctly?  Is output data format 100% correct ? check column order, data types, null values are what was asked. Don’t assume.  Notice any errors in logs or console when running ? Scroll back and check!
  • 16. © Cloudera, Inc. All rights reserved. Tips for studying CCA Admin  Know Cloudera Manager UI and how to search properties  Breadcrumbs, instances, safety valve advanced settings  Forget to apply setting or restart service, don’t break the cluster!  Practice topics not in the admin course but in the exam:  Sentry setup, Load balancer, Log redaction and Encrypted zones  Practice all the hdfs dfs and dfsadmin commands  Practice setting up services and service instances  Practice troubleshooting and fixing common problem applications  Know your way around the different log files
  • 17. © Cloudera, Inc. All rights reserved. Tips for studying Data Analyst certification  Study how to use regex to manipulate strings well  SQL subqueries have a temp table name, don’t forget it  Understand Sqoop warehouse dir and target dir relationship  Practice Sqoop help to quickly view and use parameters  Practice window analytic functions - not easy to do  Practice type conversions for Hive and Impala  Practice how to create partitioned/bucketed tables – lots of syntax  Copy and paste directly from the question to quickly create the table  Practice using the command line: beeline and impala shell
  • 18. © Cloudera, Inc. All rights reserved. Tips for studying CCA Spark and Hadoop  No need to be an expert in Scala or Python coding.  Only testing Spark knowledge.  Practice Sqoop, Hdfs dfs command line and your SQL  Certification has not yet been updated to spark 2.0 (uses 1.6)  New students may not be familiar with Spark 1.6. Minor differences.  Read and practice using spark documentation  Start the 1.6 spark shell with pyspark and spark-shell not spark2-shell or pyspark2
  • 19. © Cloudera, Inc. All rights reserved. Tips for studying CCP Data Enginner  Study non core topics found outside the training course material  Ignore what is not Cloudera supported  Oozie features one third of the test!  See gethue.com website for short oozie ui tutorials  How to get Oozie to run on your small default cluster:  Adjust container memory so you can run multiple containers  Increase Node manager max container size to 7 GB  Limit container memory max size to 3 GB and 1 cpu  Result on a dual core 8gb 3x worker node cluster: 6 containers.  Currently Spark 1.6 not Spark 2.0 (will be updated in the future)
  • 20. © Cloudera, Inc. All rights reserved. Qualify for free certification  Take part in a Data Analyst, Developer or Administrator Public class to receive a free certification exam in the given discipline  Valid till the end of April
  • 21. © Cloudera, Inc. All rights reserved. Thank you