Sunteți pe pagina 1din 7

What does relational data model consist of?

Tables, rows, columns and values

Imagine that in your application you need to associate various data with users, i.e. their
preferences, behavioural information and so on. You are willing to persist user profiles on the
disk. Given the choice between CSV and JSON, which format would you choose?

JSON, obviously

Why are columnar file formats used in data warehousing? Mark all the correct answers

Columnar stores occupy less disk space due to compression.

Columnar stores allow more efficient slicing of data (both horizontal and vertical).

Compared to text formats, why is the SequenceFile format faster? Mark all the correct
statements.

Scalar values are serialized/deserialized with a simple copy.

Serialized data occupies less disk space thus saving I/O time.

Simplified grammar (not tracking paired quotes, brackets, and so on) leads to a streamlined,
unconditional code.

True or False? Optimizing the computation itself (optimizing the computation time) will not help
to reduce the completion time of an I/O-bound process

False

True or False? Switching from compressed to uncompressed data for a CPU-bound process may
increase the completion time despite the saved CPU time.

True

What fact is more relevant to the horizontal scaling of the filesystems than to the vertical
scaling?

Usage of commodity hardware

The operation 'modify' files is not allowed in distributed FS (GFS, HDFS). What was NOT a reason
to do it?

Increasing reliability and accessibility

How to achieve uniform data distribution across the servers in DFS?

By splitting files into blocks

Question 10

What does a metadata DB contain?

Location on the file blocks

File creation time

File permissions

Select the correct statement about HDFS:

A client requires access to all the servers to read files

If you have a very important file, what is the best way to protect it in HDFS?

Both ways are allowed and implemented in HDFS

Question 13

You were told that two servers in HDFS were down: Datanode and Namenode, your reaction:

Restore Namenode first


Question 14

What the block size in HDFS does NOT depend on?

The block size on the local Datanodes filesystem

Input

Capacity = 1PB

Block size = 64MB

Metadata = 300 B

Rep. Factor = 3

Formula = Capacity / (Block size * Rep. Factor ) * Metadata

Formula = 1000000000 / (64*3)* 0.0003

Answer = 1562.5mb = 1.5625 gb

~1.6gb

Speed = 60MB/s

Seek time = 5ms

1seg = 1000ms

1000/5 =200 time less


For an avarage reading speed of 60mb/s the minimum block size should be ~64mb

touch test.txt

mkdir /user/acachuan/assignment1

put test.txt /user/acachuan/assignment1

ls -h /user/acachuan/assignment1

chmod ugo-rx /user/acachuan/assignment1/test.txt

head -10 /user/acachuan/assignment1/test.txt

mv test.txt test2.txt

hdfs dfs -df -h /data/wiki/en_articles_part/articles-part

hdfs dfs -du -h /data/wiki/en_articles_part/articles-part

Local FS

! ls /home/jovyan

# Create test.txt in local home dir

! touch ~/test.txt

with open('/home/jovyan/test.txt', 'w') as myFile:

for i in range(21):

myFile.write('%d\n' % i)

! ls /home/jovyan

! cat /home/jovyan/test.txt

Hadoop FS

! hdfs dfs -ls /user/jovyan/

Found 2 items
-rw-r--r-- 1 jovyan supergroup 239 2017-11-28 21:41 /user/jovyan/README.md

# Create assignment1 dir in hdfs

! hdfs dfs -mkdir /user/jovyan/assignment1

! hdfs dfs -ls /user/jovyan/

Found 2 items

-rw-r--r-- 1 jovyan supergroup 239 2017-11-28 21:41 /user/jovyan/README.md

drwxr-xr-x - jovyan supergroup 0 2018-05-02 00:04 /user/jovyan/assignment1

# Put test.txt into assignment1

! hdfs dfs -put ~/test.txt /user/jovyan/assignment1

! hdfs dfs -ls /user/jovyan/assignment1

Found 1 items

-rw-r--r-- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt

# output the size and the owner of the file

! hdfs dfs -du /user/jovyan/assignment1/test.txt

! hdfs dfs -ls /user/jovyan/assignment1

53 /user/jovyan/assignment1/test.txt

Found 1 items

-rw-r--r-- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt

# revoke ‘read’ permission for ‘other users’

! hdfs dfs -chmod o-r /user/jovyan/assignment1/test.txt


! hdfs dfs -ls /user/jovyan/assignment1

Found 1 items

-rw-r----- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt

# read the first 10 lines of the file

! hdfs dfs -cat /user/jovyan/assignment1/test.txt | head

# rename ‘test.txt’ to ‘test2.txt’

! hdfs dfs -mv /user/jovyan/assignment1/test.txt /user/jovyan/assignment1/test2.txt

! hdfs dfs -ls /user/jovyan/assignment1

Found 1 items

-rw-r----- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test2.txt

Check the commands, they should be like these:


$ hdfs dfs -mkdir assignment1

$ hdfs dfs -put test.txt assignment1/

$ hdfs dfs -ls assignment1/test.txt or hdfs dfs -stat "%b %u" assignment1/test.txt

$ hdfs dfs -chmod o-r assignment1/test.txt

$ hdfs dfs -cat assignment1/test.txt | head -10

$ hdfs dfs -mv assignment1/test.txt assignment1/test2.txt

$ hdfs fsck /data/wiki/en_articles_part/articles-part -files -blocks -locations

$ hdfs fsck -blockId blk_1073971670

hdfs dfs -text /datalake/demo/* | head -c 80

S-ar putea să vă placă și