通信人家园

 找回密码
 注册

只需一步,快速开始

短信验证,便捷登录

搜索

军衔等级:

  新兵

注册:2015-10-5
跳转到指定楼层
1#
发表于 2016-3-23 14:42:17 |只看该作者 |倒序浏览
Cloudera Certified Administrator forApache Hadoop (CCA-500)
Number of Questions: 60 questions
Time Limit: 90 minutes
Passing Score: 70%
Language: English, Japanese
Exam Sections and Blueprint
1. HDFS (17%)

  • Describe the function of HDFS daemons
  • Describe the normal operation of an Apache     Hadoop cluster, both in data storage and in data processing
  • Identify current features of computing systems     that motivate a system like Apache Hadoop
  • Classify major goals of HDFS Design
  • Given a scenario, identify appropriate use     case for HDFS Federation
  • Identify components and daemon of an HDFS     HA-Quorum cluster
  • Analyze the role of HDFS security (Kerberos)
  • Determine the best data serialization choice     for a given scenario
  • Describe file read and write paths
  • Identify the commands to manipulate files in     the Hadoop File System Shell
2. YARN and MapReduce version 2 (MRv2)(17%)

  • Understand how upgrading a cluster from Hadoop     1 to Hadoop 2 affects cluster settings
  • Understand how to deploy MapReduce v2 (MRv2 /     YARN), including all YARN daemons
  • Understand basic design strategy for MapReduce     v2 (MRv2)
  • Determine how YARN handles resource     allocations
  • Identify the workflow of MapReduce job running     on YARN
  • Determine which files you must change and how     in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce     version 2 (MRv2) running on YARN
3. Hadoop Cluster Planning (16%)

  • Principal points to consider in choosing the     hardware and operating systems to host an Apache Hadoop cluster
  • Analyze the choices in selecting an OS
  • Understand kernel tuning and disk swapping
  • Given a scenario and workload pattern,     identify a hardware configuration appropriate to the scenario
  • Given a scenario, determine the ecosystem     components your cluster needs to run in order to fulfill the SLA
  • Cluster sizing: given a scenario and frequency     of execution, identify the specifics for the workload, including CPU,     memory, storage, disk I/O
  • Disk Sizing and Configuration, including JBOD     versus RAID, SANs, virtualization, and disk sizing requirements in a     cluster
  • Network Topologies: understand network usage     in Hadoop (for both HDFS and MapReduce) and propose or identify key     network design components for a given scenario
4. Hadoop Cluster Installation andAdministration (25%)

  • Given a scenario, identify how the cluster     will handle disk and machine failures
  • Analyze a logging configuration and logging     configuration file format
  • Understand the basics of Hadoop metrics and     cluster health monitoring
  • Identify the function and purpose of available     tools for cluster monitoring
  • Be able to install all the ecoystme components     in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue,     Cloudera Manager, Sqoop, Hive, and Pig
  • Identify the function and purpose of available     tools for managing the Apache Hadoop file system
5. Resource Management (10%)

  • Understand the overall design goals of each of     Hadoop schedulers
  • Given a scenario, determine how the FIFO     Scheduler allocates cluster resources
  • Given a scenario, determine how the Fair     Scheduler allocates cluster resources under YARN
  • Given a scenario, determine how the Capacity     Scheduler allocates cluster resources
6. Monitoring and Logging (15%)

  • Understand the functions and features of     Hadoop’s metric collection abilities
  • Analyze the NameNode and JobTracker Web UIs
  • Understand how to monitor cluster daemons
  • Identify and monitor CPU usage on master nodes
  • Describe how to monitor swap and memory     allocation on all nodes
  • Identify how to view and manage Hadoop’s log     files
  • Interpret a log files
  • --------------------------------------------------------
CCA Spark and Hadoop Developer Exam(CCA175)
Number of Questions: 10–12performance-based (hands-on) tasks on CDH5 cluster. See below for full clusterconfiguration
Time Limit: 120 minutes
Passing Score: 70%
Language: English, Japanese (forthcoming)
Required Skills
Data Ingest
The skills to transfer data between external systemsand your cluster. This includes the following:

  • Import data from a MySQL database into HDFS     using Sqoop
  • Export data to a MySQL database from HDFS     using Sqoop
  • Change the delimiter and file format of data     during import using Sqoop
  • Ingest real-time and near-real time (NRT)     streaming data into HDFS using Flume
  • Load data into and out of HDFS using the     Hadoop File System (FS) commands
Transform, Stage, Store
Convert a set of data values in a given format storedin HDFS into new data values and/or a new data format and write them into HDFS.This includes writing Spark applications in both Scala and Python:

  • Load data from HDFS and store results back to     HDFS using Spark
  • Join disparate datasets together using Spark
  • Calculate aggregate statistics (e.g., average     or sum) using Spark
  • Filter data into a smaller dataset using Spark
  • Write a query that produces ranked or sorted     data using Spark
Data Analysis
Use Data Definition Language (DDL) to create tables inthe Hive metastore for use by Hive and Impala.

  • Read and/or create a table in the Hive     metastore in a given schema
  • Extract an Avro schema from a set of datafiles     using avro-tools
  • Create a table in the Hive metastore using the     Avro file format and an external schema file
  • Improve query performance by creating     partitioned tables in the Hive metastore
  • Evolve an Avro schema by changing JSON files
以上,有疑问可加Q1438118790询问

举报本楼

您需要登录后才可以回帖 登录 | 注册 |

手机版|C114 ( 沪ICP备12002291号-1 )|联系我们 |网站地图  

GMT+8, 2024-11-19 14:29 , Processed in 0.327848 second(s), 15 queries , Gzip On.

Copyright © 1999-2023 C114 All Rights Reserved

Discuz Licensed

回顶部