Main Topics CCB-400 is designed to test a candidates fluency with the concepts and skills in the following areas: Core HBase Concepts Recognize the fundamental characteristics of Apache HBase and its role in a big data ecosystem. Identify differences between Apache HBase and a traditional RDBM S. Describe the relationship between Apache HBase and HDFS. Given a scenario, id entify application characteristics that make the scenario an appropriate applica tion for Apache HBase. Data Model Describe how an Apache HBase table is physically stored on disk. Identify the di fferences between a Column Family and a Column Qualifier. Given a data loading s cenario, identify how Apache HBase will version the rows. Describe how Apache HB ase cells store data. Detail what happens to data when it is deleted. Architecture Identify the major components of an Apache HBase cluster. Recognize how regions work and their benefits under various scenarios. Describe how a client finds a r ow in an HBase table. Understand the function and purpose of minor and major com pactions. Given a region server crash scenario, describe how Apache HBase fails over to another region server. Describe RegionServer splits. Schema Design Describe the factors to be considered with creating Column Families. Given an ac cess pattern, define the row keys for optimal read performance. Given an access pattern, define the row keys for locality. API Describe the functions and purpose of the HBaseAdmin class. Given a table and ro wkey, use the get() operation to return specific versions of that row. Describe the behavior of the checkAndPut() method. Administration Recognize how to create, describe, and access data in tables from the shell. Des cribe how to bulk load data into Apache HBase. Recognize the benefits of managed region splits. Sample Questions Question 1 You want to store clickstream data in HBase. Your data consists of the following : the source id, the name of the cluster, the URL of the click, the timestamp fo r each click Which rowkey would you use if you wanted to retrieve the source ids with a scan and sorted with the most recent first? A. <(Long)timestamp> B. C. D. Question 2 Your application needs to retrieve 200 to 300 non-sequential rows from a table w ith one billion rows. You know the rowkey of each of the rows you need to retrie ve. Which does your application need to implement? A. Scan without range B. Scan with start and stop row C. HTable.get(Get get)
D. HTable.get(List gets) Question 3 You perform a check and put operation from within an HBase application using the following: table.checkAndPut(Bytes.toBytes("rowkey"), Bytes.toBytes("colfam"), Bytes.toBytes("qualifier"), Bytes.toBytes("barvalue"), newrow)); Which describes this check and put operation? A. Check if rowkey/colfam/qualifier exists and the cell value "barvalue" is equa l to newrow. Then return true. B. Check if rowkey/colfam/qualifier and the cell value "barvalue" is NOT equal t o newrow. Then return true. C. Check if rowkey/colfam/qualifier and has the cell value "barvalue". If so, pu t the values in newrow and return false. D. Check if rowkey/colfam/qualifier and has the cell value "barvalue". If so, pu t the values in newrow and return true. Question 4 What is the advantage of the using the bulk load API over doing individual Puts for bulk insert operations? A.Writes bypass the HLog/MemStore reducing load on the RegionServer. B.Users doing bulk Writes may disable writing to the WAL which results in possib le data loss. C.HFiles created by the bulk load API are guaranteed to be co-located with the R egionServer hosting the region. D.HFiles written out via the bulk load API are more space efficient than those w ritten out of RegionServers. Question 5 You have a WebLog table in HBase. The Row Keys are the IP Addresses. You want to r etrieve all entries that have an IP Address of 75.67.12.146. The shell command y ou would use is: A. B. C. D.
get 'WebLog', '75.67.21.146' scan 'WebLog', '75.67.21.146' get 'WebLog', {FILTER => '75.67.21.146'} scan 'WebLog', {COLFAM => 'IP', FILTER => '75.67.12.146'}
Answers Question Question Question Question Question
1: 2: 3: 4: 5:
B D D A A
Disclaimer: These exam preparation pages are intended to provide information abo ut the objectives covered by each exam, related resources, and recommended readi ng and courses. The material contained within these pages is not intended to gua rantee a passing score on any exam. Cloudera recommends that a candidate thoroug hly understand the objectives for each exam and utilize the resources and traini ng courses recommended on these pages to gain a thorough understand of the domai n of knowledge related to the role the exam evaluates.