Sunteți pe pagina 1din 16

1. How do I Register and Schedule my Cloudera exam?

2. Where do I take Cloudera certification exams?


3. What if I lose internet connectivity during the exam?
4. Can I take the exam at a test center?
5. How do I schedule an exam?
6. How do I reschedule or cancel an Exam Reservation?
7. What is your exam cancellation policy?
8. How can I retrieve my forgotten password?
9. What happens if I don't show up for my exam?
10. What do I need on the day of my exam?
11. How do I launch my exam?
12. What may I have on my desk during the exam?
13. Does the exam proctor have access to my computer or its contents?
14. What is Clouderas retake policy?
15. Does my certification expire?
16. Are there prerequisites? Do I need to take training to take a certification test?
17. I passed, but I'd like to take the test again to improve my score. Can I do that?
18. Can I review my test or specific test questions and answers?
19. What is the confidentiality agreement I must agree to in order to test?
20. Fraudulent Activity Policy

How do I Register and Schedule my Cloudera exam?


Follow the link on each exam page to the registration form. Once you complete your registration
on university.cloudera.com, you will receive an email with instructions asking you to create an

account at examslocal.com using the same email address you used to register with Cloudera.
Once you create an account and log in on examslocal.com, navigate to "Schedule an Exam", and
then enter "Cloudera" in the "Search Here" field. Select the exam you want to schedule and
follow the instructions to schedule your exam.
Where do I take Cloudera certification exams?
Anywhere. All you need is a computer, a webcam, Chrome or Chromium browser, and an
internet connection. For a full set of requirements,
visit https://www.examslocal.com/ScheduleExam/Home/CompatibilityCheck
What if I lose internet connectivity during the exam?
It is the sole responsibility of the test taker to maintain connectivity throughout the exam session.
If connectivity is lost, for any reason, it is the responsibility of the test taker to reconnect and
finish the exam within the scheduled time slot. No refunds or retakes will be given. Unfinished
or abandoned exam sessions will be scored as a fail.
Can I take the exam at a test center?
Cloudera no longer offers exams in test centers or approves the delivery of our exams in test
centers.
Steps to schedule your exam
1. Create an account at www.examslocal.com. You MUST use the exact same email you
used to register on university.cloudera.com.
2. Select the exam you purchased from the drop-down list (type Cloudera to find our
exams).
3. Choose a date and time you would like to take your exam. You must schedule a minimum
of 24 hours in advance.
4. Select a time slot for your exam
5. Pass the compatibility tool and install the screen sharing Chrome Extension
How do I reschedule an Exam Reservation?
If you need to reschedule your exam, please sign in at https://www.examslocal.com, click on
"My Exams", click on your scheduled exam and use the reschedule option. Email Innovative
Exams at examsupport@examslocal.com, or call +1-888-504-9178, +1-312-612-1049 for
additional support.
What is your exam cancellation policy?
If you wish to reschedule your exam, you must contact Innovative Exams at least 24 hours prior
to your scheduled appointment. Rescheduling less than 24 hours prior to your appointment
results in a forfeiture of your exam fees. All exams are non-refundable and non-transferable. All
exam purchases are valid for one year from date of purchase.

How can I retrieve my forgotten password?


To retrieve a forgotten password, please
visit: https://www.examslocal.com/Account/LostPassword
What happens if I don't show up for my exam?
You are marked as a no-show for the exam and you forfeit any fees you paid for the exam.
What do I need on the day of my exam?

One form of government issued photo identification (i.e. driver's license, passport). Any
international passport or government issued form of identification must contain Western
(English) characters. You will be required to provide a means of photo identification
before the exam can be launched. If acceptable proof of identification is not provided to
the proctor prior to the exam, you will be refused entry to the exam. You must also
consent to having your photo taken. The ID will be used for identity verification only and
will not be stored. The proctor cannot release the exam to you until identification has
been successfully verified and you have agreed to the terms and conditions of the exam.
No refund or rescheduling is provided when an exam cannot be started due to failure to
provide proper identification.

You must login to take the exam on a computer that meets the minimum requirements
provided within the compatibility
check: https://www.examslocal.com/ScheduleExam/Home/CompatibilityCheck

How do I launch my exam?


To start your exam, login at https://www.examslocal.com, click "My Exams", and follow the
instructions after selecting the exam that you want to start.
What may I have at my desk during the exam?
For CCA exams and CCAH, you may not drink, eat, or have anything on your desk. Your desk
must be free of all materials. You may not use headphones or leave your desk or the exam
session for any reason. You may not sit in front of a bright light (be backlight). Your face must be
clearly visible to the proctor at all times. You must be alone.
Does the exam proctor have access to my computer or its contents?
No. Innovative Exams does not install any software on your computer. The only access the
Innovative Exams proctor has to your computer is the webcam and desktop sharing facilitated by
your web browser. Please note that Innovative Exams provides a virtual lockdown browser
system that utilizes secure communications and encryption using the temporary Chrome
extension. Upon the completion of the exam, the proctor's "view-only access" is automatically
removed.
What is Clouderas retake policy?
Candidates who fail an exam must wait a period of thirty calendar days, beginning the day after
the failed attempt, before they may retake the same exam. You may take the exam as many times

as you want until you pass, however, you must pay for each attempt; Cloudera offers no
discounts for retake exams. Retakes are not allowed after the successful completion of a test.
Does my certification expire?
CCA certifications are valid for two years. CCP certifications are valid for three years.
CCDH, CCAH, and CCSHB certifications align to a specific CDH release and remains valid for
that version. Once that CDH version retires or the certification or exam retires, your certification
retires.
Are there prerequisites? Do I need to take training to take a certification test?
There are no prerequisites. Anyone can take a Cloudera Certification Test at anytime.
I passed, but I'd like to take the test again to improve my score. Can I do that?
Retakes are not allowed after the successful completion of a test. A test result found to be in
violation of the retake policy will not be processed, which will result in no credit awarded for the
test taken. Repeat violators will be banned from participation in the Cloudera Certification
Program.
Can I review my test or specific test questions and answers?
Cloudera certification tests adhere to the industry standard for high-stakes certification tests,
which includes the protection of all test content. As a certifying body, we go to great lengths to
protect the integrity of the items in our item pool. Cloudera does not provide exam items in any
other format than a proctored environment.
What is the confidentiality agreement I must agree to in order to test?All content,
specifically questions, answers, and exhibits of the certification exams are the proprietary and
confidential property of Cloudera. They may not be copied, reproduced, modified, published,
uploaded, posted, transmitted, shared, or distributed in any way without the express written
authorization of Cloudera. Candidates who sit for Cloudera exams must agree they have read and
will abide by the terms and conditions of the Cloudera Certifications and Confidentiality
Agreement before beginning the certification exam. The agreement applies to all exams.
Agreeing and adhering to this agreement is required to be officially certified and to maintain
valid certification. Candidates must first accept the terms and conditions of the Cloudera
Certification and Confidentiality Agreement prior to testing. Failure to accept the terms of this
Agreement will result in a terminated exam and forfeiture of the entire exam fee.
If Cloudera determines, in its sole discretion, that a candidate has shared any content of an exam
and is in violation of the Cloudera Certifications and Confidentiality Agreement, it reserves the
right to take action up to and including, but not limited to, decertification of an individual and a
permanent ban of the individual from Cloudera Certification programs, revocation of all previous
Cloudera Certifications, notification to the candidate's employer, and notification to law
enforcement agencies. Candidates found in violation of the Cloudera Certifications and
Confidentiality Agreement forfeit all fees previously paid to Cloudera or to Cloudera's
authorized vendors and may be required to pay additional fees for services rendered.

Fraudulent Activity Policy


Cloudera reserves the right to take action against any individual involved in fraudulent activities,
including, but not limited to, fraudulent use of vouchers or promotional codes, reselling exam
discounts and vouchers, cheating on an exam (including, but not limited to, creating, using, or
distributing test dumps), alteration of score reports, alteration of completion certificates,
violation of exam retake policies, or other activities deemed fraudulent by Cloudera.
If Cloudera determines, in its sole discretion, that fraudulent activity has taken place, it reserves
the right to take action up to and including, but not limited to, decertification of an individual
either temporarily until remediation occurs or as a permanent ban from Cloudera Certification
programs, revocation of all previous Cloudera Certifications, notification to a candidate's
employer, and notification to law enforcement agencies. Candidates found committing fraudulent
activities forfeit all fees previously paid to Cloudera or to Cloudera's authorized vendors and
may be required to pay additional fees for services rendered.

One form of government issued photo identification (i.e. driver's license, passport). Any
international passport or government issued form of identification must contain Western
(English) characters. You will be required to provide a means of photo identification
before the exam can be launched. If acceptable proof of identification is not provided to
the proctor prior to the exam, you will be refused entry to the exam. You must also
consent to having your photo taken. The ID will be used for identity verification only and
will not be stored. The proctor cannot release the exam to you until identification has
been successfully verified and you have agreed to the terms and conditions of the exam.
No refund or rescheduling is provided when an exam cannot be started due to failure to
provide proper identification.

You must login to take the exam on a computer that meets the minimum requirements
provided within the compatibility
check: https://www.examslocal.com/ScheduleExam/Home/CompatibilityCheck

Helpful Tips:

The username for the primary account in cloudera, and the password for that account is
cloudera.

The cloudera user has permission to run the sudo command, so separate root account
credentials are not needed.

To open a terminal window, right-click on the desktop (not in the browser) and select
Open in Terminal or click on the Applications menu at the top of the desktop and select
System Tools > Terminal.

To open a file editor, click on the Applications menu at the top of the desktop and select
either Accessories > gedit Text Editor for a simple text editor or Programming > Geany
for a simple IDE environment.

Many commands use the $STREAMING environment variable rather than long paths. The
variable represents the path to the streaming jar file, which is usually located at
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-*.jar.
In the VM, the $STREAMING environment variable has been automatically set for you.

Working with the data


The sample data files are located in the cloudera users home directory: /home/cloudera/data.
The data for this solution kit is provided in the form of a 7MB compressed archive that expands
into 200MB of JSON log data spread across 20 files. (The original challenge used two 1.6GB
archives, one for each of the Cloudera Movies server nodes, that expanded into 17GB of JSON
log data spread across 68 files. This lab uses a smaller data set to reduce the time required to run
the models.)
There is a more information about the data in the project description below.

Exam Sections and Blueprint


1. HDFS (17%)

Describe the function of HDFS daemons

Describe the normal operation of an Apache Hadoop cluster, both in data


storage and in data processing

Identify current features of computing systems that motivate a system like


Apache Hadoop

Classify major goals of HDFS Design

Given a scenario, identify appropriate use case for HDFS Federation

Identify components and daemon of an HDFS HA-Quorum cluster

Analyze the role of HDFS security (Kerberos)

Determine the best data serialization choice for a given scenario

Describe file read and write paths

Identify the commands to manipulate files in the Hadoop File System Shell

2. YARN and MapReduce version 2 (MRv2) (17%)

Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects


cluster settings

Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN


daemons

Understand basic design strategy for MapReduce v2 (MRv2)

Determine how YARN handles resource allocations

Identify the workflow of MapReduce job running on YARN

Determine which files you must change and how in order to migrate a cluster
from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running
on YARN

3. Hadoop Cluster Planning (16%)

Principal points to consider in choosing the hardware and operating systems


to host an Apache Hadoop cluster

Analyze the choices in selecting an OS

Understand kernel tuning and disk swapping

Given a scenario and workload pattern, identify a hardware configuration


appropriate to the scenario

Given a scenario, determine the ecosystem components your cluster needs


to run in order to fulfill the SLA

Cluster sizing: given a scenario and frequency of execution, identify the


specifics for the workload, including CPU, memory, storage, disk I/O

Disk Sizing and Configuration, including JBOD versus RAID, SANs,


virtualization, and disk sizing requirements in a cluster

Network Topologies: understand network usage in Hadoop (for both HDFS and
MapReduce) and propose or identify key network design components for a
given scenario

4. Hadoop Cluster Installation and Administration (25%)

Given a scenario, identify how the cluster will handle disk and machine
failures

Analyze a logging configuration and logging configuration file format

Understand the basics of Hadoop metrics and cluster health monitoring

Identify the function and purpose of available tools for cluster monitoring

Be able to install all the ecoystme components in CDH 5, including (but not
limited to): Impala, Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive, and
Pig

Identify the function and purpose of available tools for managing the Apache
Hadoop file system

5. Resource Management (10%)

Understand the overall design goals of each of Hadoop schedulers

Given a scenario, determine how the FIFO Scheduler allocates cluster


resources

Given a scenario, determine how the Fair Scheduler allocates cluster


resources under YARN

Given a scenario, determine how the Capacity Scheduler allocates cluster


resources

6. Monitoring and Logging (15%)

Understand the functions and features of Hadoops metric collection abilities

Analyze the NameNode and JobTracker Web UIs

Understand how to monitor cluster daemons

Identify and monitor CPU usage on master nodes

Describe how to monitor swap and memory allocation on all nodes

Identify how to view and manage Hadoops log files

Interpret a log file

Disclaimer: These exam preparation pages are intended to provide information about the
objectives covered by each exam, related resources, and recommended reading and courses. The
material contained within these pages is not intended to guarantee a passing score on any exam.
Cloudera recommends that a candidate thoroughly understand the objectives for each exam and
utilize the resources and training courses recommended on these pages to gain a thorough
understand of the domain of knowledge related to the role the exam evaluates.

Why Get Certified?

Prove your skills where it matters. CCA exams are performance-based;


your CCA Spark and Hadoop Developer exam requires you to write code in
Scala and Python and run it on a cluster. You prove your skills where it
matters most.

Available Anytime, Anywhere: Forget taking a day off work to travel to a


test center. CCA exams are available globally, from any computer at any
time.

Promote Your Achievement: Every CCA receives a logo for business cards,
resumes, and online profiles.

Verify Your Achievement: Every CCA certification comes with a license that
allows current and potential employers to validate your CCA status.

Current: The big data space evolves rapidly, no more so than in the Apache
Spark and Hadoop developer space. We upate our CCA exams regularly to
reflect the skills and tools relevant for today and beyond. And because
change is the only constant in open-source environments, Cloudera requires
all CCA credentials holders to stay current with two-year mandatory re-testing
in order to maintain current status and privileges.

CCA Spark and Hadoop Developer Exam (CCA175) Details

Number of Questions: 1012 performance-based (hands-on) tasks on CDH5


cluster. See below for full cluster configuration

Time Limit: 120 minutes

Passing Score: 70%

Language: English, Japanese (forthcoming)

Price: USD $295

Exam Question Format

Each CCA question requires you to solve a particular scenario. In some cases, a tool such as
Impala or Hive may be used. In other cases, coding is required. In order to speed up development
time of Spark questions, a template is often provided that contains a skeleton of the solution,
asking the candidate to fill in the missing lines with functional code. This template is written in
either Scala or Python.
You are not required to use the template and may solve the scenario using a language you prefer.
Be aware, however, that coding every problem from scratch may take more time than is allocated
for the exam.
Evaluation, Score Reporting, and Certificate

Your exam is graded immediately upon submission and you are e-mailed a score report the same
day as your exam. Your score report displays the problem number for each problem you
attempted and a grade on that problem. If you fail a problem, the score report includes the
criteria you failed (e.g., Records contain incorrect data or Incorrect file format). We do not
report more information in order to protect the exam content. Read more about reviewing exam
content on the FAQ.
If you pass the exam, you receive a second e-mail within a few days of your exam with your
digital certificate as a PDF, your license number, a Linkedin profile update, and a link to
download your CCA logos for use in your personal business collateral and social media profiles
Audience and Prerequisites

There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and
Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training
for Spark and Hadoop and the training course is an excellent preparation for the exam.

Register for CCA175


Required Skills
Data Ingest

The skills to transfer data between external systems and your cluster. This includes the following:

Import data from a MySQL database into HDFS using Sqoop

Export data to a MySQL database from HDFS using Sqoop

Change the delimiter and file format of data during import using Sqoop

Ingest real-time and near-real time (NRT) streaming data into HDFS using
Flume

Load data into and out of HDFS using the Hadoop File System (FS) commands

Transform, Stage, Store

Convert a set of data values in a given format stored in HDFS into new data values and/or a new
data format and write them into HDFS. This includes writing Spark applications in both Scala
and Python (see note above on exam question format for more information on using either Scale
or Python):

Load data from HDFS and store results back to HDFS using Spark

Join disparate datasets together using Spark

Calculate aggregate statistics (e.g., average or sum) using Spark

Filter data into a smaller dataset using Spark

Write a query that produces ranked or sorted data using Spark

Data Analysis

Use Data Definition Language (DDL) to create tables in the Hive metastore for use by Hive and
Impala.

Read and/or create a table in the Hive metastore in a given schema

Extract an Avro schema from a set of datafiles using avro-tools

Create a table in the Hive metastore using the Avro file format and an
external schema file

Improve query performance by creating partitioned tables in the Hive


metastore

Evolve an Avro schema by changing JSON files

Exam delivery and cluster information

CCA175 is a remote-proctored exam available anywhere, anytime. See the FAQ for more
information and system requirements.
CCA175 is a hands-on, practical exam using Cloudera technologies. Each user is given their own
CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka,
Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also

comes with Python (2.6, 2.7, and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive
Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.
Documentation Available online during the exam

Python 2.7 Documentation

Python 3.4 Documentation

Scala Documentation

Cloudera Product Documentation

Hadoop - Apache Hadoop 2.5.0-cdh5.3.2

Apache Hive

Sqoop Documentation (v1.4.5-cdh5.3.2)

Spark Overview - Spark 1.2.1 Documentation

Apache Crunch - Apache Crunch

Apache Pig

Kite: A Data API for Hadoop

Apache Avro 1.7.7 Documentation

Apache Parquet

Cloudera HUE

Apache Oozie

Apache Sqoop documentation

Apache Flume 1.5.0 documentation

DataFu 1.1.0

JDK 7 API Docs

Only the documentation, links, and resources listed above are accessible during the exam. All
other websites, including Google/search functionality is disabled. You may not use notes or other
exam aids.

Course Overview

Role

Days

Register for
Course

Developer Training For Spark and Hadoop

Developer

4day

Register

Designing and Developing Big Data


Applications

Developer

4day

Register

Data Science at Scale using Spark and Hadoop

Developer, Analyst

3day

Register

Search Training

Developer

3day

Register

HBase Training

Developer

3day

Register

Spark Training

Developer

3day

Register

MapReduce for Developers

Developer

4day

Register

Cloudera Administrator Training

Administer

4day

Register

Data Analyst Training

Analyst

4day

Register

Required Exams

DS700 Descriptive and Inferential Statistics on Big Data

DS701 Advanced Analytical Techniques on Big Data

DS702 - Machine Learning at Scale

Each exam may be taken in any order. All three exams must be passed within 365 days of each
other. Candidates who fail an exam must wait a period of thirty calendar days, beginning the day
after the failed attempt, before they may retake the same exam. Candidates must pay for each
exam attempt.
Each passed exam is verifiable in your exam transcript and history.

Exam Format

Each exam is a single challenge scenario. You are provided access to the scenario, the data sets,
and the cluster. You are given eight (8) hours to complete the challenge. See below for more
information on the cluster.
Required Skills
Common Skills (all exams)

Extract relevant features from a large dataset that may contain bad records,
partial records, errors, or other forms of noise

Extract features from a data stored in a wide range of possible formats,


including JSON, XML, raw text logs, industry-specific encodings, and graph
link data

DS700 - Descriptive and Inferential Statistics on Big Data

Use statistical tests to determine confidence for a hypothesis

Calculate common summary statistics, such as mean, variance, and counts

Fit a distribution to a dataset and use that distribution to predict event


likelihoods

Perform complex statistical calculations on a large dataset

DS701 - Advanced Analytical Techniques on Big Data

Build a model that contains relevant features from a large dataset

Define relevant data groupings, including number, size, and characteristics

Assign data records from a large dataset into a defined set of data groupings

Evaluate goodness of fit for a given set of data groupings and a dataset

Apply advanced analytical techniques, such as network graph analysis or


outlier detection

DS702 - Machine Learning at Scale

Build a model that contains relevant features from a large dataset

Predict labels for an unlabeled dataset using a labeled dataset for reference

Select a classification algorithm that is appropriate for the given dataset

Tune algorithm metaparameters to maximize algorithm performance

Use validation techniques to determine the successfulness of a given


algorithm for the given dataset

How to Prepare

Q. What technologies/languages do I need to know? A. You'll be provided with a cluster with


Hadoop technologies on a cluster, plus standard tools like Python and R. Among these standard
technologies, it's your choice what to use to solve the problem.
Q. How difficult are the problems? A. Think of a scaled-down Kaggle problem thats intended
to be solved in hours, not days of effort. If you can solve a Kaggle problem in a weekend, youre
in good shape. You may also take a look at a sample past exam and the solution in our free
solution kit.
Q. What should I study to prepare?
A. Coursera's intro "machine learning" course is a good level of preparation, but here are several
more links of interest.

General Data Science


o

The Open Source Data Science Masters Curriculum

Theory
o

Machine Learning

Data Science at Scale Using Spark and Hadoop

Coursera specialization

Introductory

Machine Learning (Coursera)

Advanced

Statistics

Probabilistic Graphical Models (Coursera)

Learning from Data (Caltech through EdX)

Statistics: Making Sense of Data (Coursera) or Statistics One


(Coursera)

Open Intro Statistics

Linear Algebra

https://www.coursera.org/course/matrix

Engineering
o

Tools

Spark

Cloudera Developer Training for Spark and Hadoop

Data Science at Scale Using Spark and Hadoop

R Programming (Coursera)

Exam Delivery and Cluster Information

All CCP: Data Scientist exams are remote-proctored and available anywhere, anytime. See
the FAQ for more information and system requirements.
Exams are hands-on, practical exams using data science tools on Cloudera technologies. Each
user is given their own 7-node, high-performance CDH5 (currently 5.3.2) cluster pre-loaded with
Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many
others (See a full list). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10,
Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime,
Eclipse, NetBeans, scikit-learn, octave, NumPy, SciPy, Anaconda, R, plyr, dplyrimpaladb,
SparkML, vowpal wabbit, clouderML, oryx, impyla, CoreNLP, The Stanford Parser: A statistical
parser, Stanford Log-linear Part-Of-Speech Tagger, Stanford Named Entity Recognizer (NER),
Stanford Word Segmenter, opennlp, H2O, java-ml, RapidMiner, caffe, Weka, NLTK, matplotlib,
ggplot, d3py, SparkingPandas, randomforest, R: ggplot2, Sparkling water.
Currently, the cluster is open to the internet and there are no restrictions on tools you can install
or websites or resources you may use.

S-ar putea să vă placă și