Document

What does relational data model consist of?
Tables, rows, columns and values
Imagine that in your application you need to associate various data with users, i.e. their
preferences, behavioural information and so on. You are willing to persist user profiles on the
disk. Given the choice between CSV and JSON, which format would you choose?
JSON, obviously
Why are columnar file formats used in data warehousing? Mark all the correct answers
Columnar stores occupy less disk space due to compression.
Columnar stores allow more efficient slicing of data (both horizontal and vertical).
Compared to text formats, why is the SequenceFile format faster? Mark all the correct
statements.
Scalar values are serialized/deserialized with a simple copy.
Serialized data occupies less disk space thus saving I/O time.
Simplified grammar (not tracking paired quotes, brackets, and so on) leads to a streamlined,
unconditional code.
True or False? Optimizing the computation itself (optimizing the computation time) will not help
to reduce the completion time of an I/O-bound process
False
True or False? Switching from compressed to uncompressed data for a CPU-bound process may
increase the completion time despite the saved CPU time.
True
What fact is more relevant to the horizontal scaling of the filesystems than to the vertical
scaling?
Usage of commodity hardware
The operation 'modify' files is not allowed in distributed FS (GFS, HDFS). What was NOT a reason
to do it?
Increasing reliability and accessibility
How to achieve uniform data distribution across the servers in DFS?
By splitting files into blocks
Question 10
What does a metadata DB contain?
Location on the file blocks
File creation time
File permissions
Select the correct statement about HDFS:
A client requires access to all the servers to read files
If you have a very important file, what is the best way to protect it in HDFS?
Both ways are allowed and implemented in HDFS
Question 13
You were told that two servers in HDFS were down: Datanode and Namenode, your reaction:
Restore Namenode first

Question 14
What the block size in HDFS does NOT depend on?
The block size on the local Datanodes filesystem
Input
Capacity = 1PB
Block size = 64MB
Metadata = 300 B
Rep. Factor = 3
Formula = Capacity / (Block size * Rep. Factor ) * Metadata
Formula = 1000000000 / (64*3)* 0.0003
Answer = 1562.5mb = 1.5625 gb
~1.6gb
Speed = 60MB/s
Seek time = 5ms
1seg = 1000ms
1000/5 =200 time less

For an avarage reading speed of 60mb/s the minimum block size should be ~64mb
touch test.txt
mkdir /user/acachuan/assignment1
put test.txt /user/acachuan/assignment1
ls -h /user/acachuan/assignment1
chmod ugo-rx /user/acachuan/assignment1/test.txt
head -10 /user/acachuan/assignment1/test.txt
mv test.txt test2.txt
hdfs dfs -df -h /data/wiki/en_articles_part/articles-part
hdfs dfs -du -h /data/wiki/en_articles_part/articles-part
Local FS
! ls /home/jovyan
# Create test.txt in local home dir
! touch ~/test.txt
with open('/home/jovyan/test.txt', 'w') as myFile:
for i in range(21):
myFile.write('%d\n' % i)
! ls /home/jovyan
! cat /home/jovyan/test.txt
Hadoop FS
! hdfs dfs -ls /user/jovyan/
Found 2 items
-rw-r--r-- 1 jovyan supergroup 239 2017-11-28 21:41 /user/jovyan/README.md
# Create assignment1 dir in hdfs
! hdfs dfs -mkdir /user/jovyan/assignment1
! hdfs dfs -ls /user/jovyan/
Found 2 items
-rw-r--r-- 1 jovyan supergroup 239 2017-11-28 21:41 /user/jovyan/README.md
drwxr-xr-x - jovyan supergroup 0 2018-05-02 00:04 /user/jovyan/assignment1
# Put test.txt into assignment1
! hdfs dfs -put ~/test.txt /user/jovyan/assignment1
! hdfs dfs -ls /user/jovyan/assignment1
Found 1 items
-rw-r--r-- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt
# output the size and the owner of the file
! hdfs dfs -du /user/jovyan/assignment1/test.txt
53 /user/jovyan/assignment1/test.txt
Found 1 items
-rw-r--r-- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt
# revoke ‘read’ permission for ‘other users’
! hdfs dfs -chmod o-r /user/jovyan/assignment1/test.txt

Found 1 items
-rw-r----- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt
# read the first 10 lines of the file
! hdfs dfs -cat /user/jovyan/assignment1/test.txt | head
# rename ‘test.txt’ to ‘test2.txt’
! hdfs dfs -mv /user/jovyan/assignment1/test.txt /user/jovyan/assignment1/test2.txt
Found 1 items
-rw-r----- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test2.txt
Check the commands, they should be like these:

$ hdfs dfs -mkdir assignment1
$ hdfs dfs -put test.txt assignment1/
$ hdfs dfs -ls assignment1/test.txt or hdfs dfs -stat "%b %u" assignment1/test.txt
$ hdfs dfs -chmod o-r assignment1/test.txt
$ hdfs dfs -cat assignment1/test.txt | head -10
$ hdfs dfs -mv assignment1/test.txt assignment1/test2.txt
$ hdfs fsck /data/wiki/en_articles_part/articles-part -files -blocks -locations
$ hdfs fsck -blockId blk_1073971670
hdfs dfs -text /datalake/demo/* | head -c 80

Document

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Document

Încărcat de

Drepturi de autor:

Formate disponibile

What does relational data model consist of?

Tables, rows, columns and values

Columnar stores occupy less disk space due to compression.

Scalar values are serialized/deserialized with a simple copy.

Usage of commodity hardware

Increasing reliability and accessibility

How to achieve uniform data distribution across the servers in DFS?

By splitting files into blocks

What does a metadata DB contain?

Location on the file blocks

File creation time

Select the correct statement about HDFS:

A client requires access to all the servers to read files

Both ways are allowed and implemented in HDFS

Restore Namenode first

What the block size in HDFS does NOT depend on?

The block size on the local Datanodes filesystem

Block size = 64MB

Formula = Capacity / (Block size * Rep. Factor ) * Metadata

Formula = 1000000000 / (64*3)* 0.0003

Answer = 1562.5mb = 1.5625 gb

Seek time = 5ms

1000/5 =200 time less

put test.txt /user/acachuan/assignment1

chmod ugo-rx /user/acachuan/assignment1/test.txt

head -10 /user/acachuan/assignment1/test.txt

hdfs dfs -df -h /data/wiki/en_articles_part/articles-part

hdfs dfs -du -h /data/wiki/en_articles_part/articles-part

# Create test.txt in local home dir

with open('/home/jovyan/test.txt', 'w') as myFile:

! hdfs dfs -ls /user/jovyan/

# Create assignment1 dir in hdfs

! hdfs dfs -mkdir /user/jovyan/assignment1

! hdfs dfs -ls /user/jovyan/

-rw-r--r-- 1 jovyan supergroup 239 2017-11-28 21:41 /user/jovyan/README.md

drwxr-xr-x - jovyan supergroup 0 2018-05-02 00:04 /user/jovyan/assignment1

# Put test.txt into assignment1

! hdfs dfs -put ~/test.txt /user/jovyan/assignment1

! hdfs dfs -ls /user/jovyan/assignment1

-rw-r--r-- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt

# output the size and the owner of the file

! hdfs dfs -du /user/jovyan/assignment1/test.txt

! hdfs dfs -ls /user/jovyan/assignment1

-rw-r--r-- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt

# revoke ‘read’ permission for ‘other users’

! hdfs dfs -chmod o-r /user/jovyan/assignment1/test.txt

-rw-r----- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test.txt

# read the first 10 lines of the file

! hdfs dfs -cat /user/jovyan/assignment1/test.txt | head

# rename ‘test.txt’ to ‘test2.txt’

! hdfs dfs -mv /user/jovyan/assignment1/test.txt /user/jovyan/assignment1/test2.txt

! hdfs dfs -ls /user/jovyan/assignment1

-rw-r----- 1 jovyan supergroup 53 2018-05-02 00:09 /user/jovyan/assignment1/test2.txt

Check the commands, they should be like these:

$ hdfs dfs -put test.txt assignment1/

$ hdfs dfs -chmod o-r assignment1/test.txt

$ hdfs dfs -cat assignment1/test.txt | head -10

$ hdfs dfs -mv assignment1/test.txt assignment1/test2.txt

$ hdfs fsck /data/wiki/en_articles_part/articles-part -files -blocks -locations

$ hdfs fsck -blockId blk_1073971670

hdfs dfs -text /datalake/demo/* | head -c 80

S-ar putea să vă placă și

Formula = 1000000000 / (643) 0.0003