Documente Academic
Documente Profesional
Documente Cultură
Imagine that in your application you need to associate various data with users, i.e. their
preferences, behavioural information and so on. You are willing to persist user profiles on the
disk. Given the choice between CSV and JSON, which format would you choose?
JSON, obviously
Why are columnar file formats used in data warehousing? Mark all the correct answers
Columnar stores allow more efficient slicing of data (both horizontal and vertical).
Compared to text formats, why is the SequenceFile format faster? Mark all the correct
statements.
Serialized data occupies less disk space thus saving I/O time.
Simplified grammar (not tracking paired quotes, brackets, and so on) leads to a streamlined,
unconditional code.
True or False? Optimizing the computation itself (optimizing the computation time) will not help
to reduce the completion time of an I/O-bound process
False
True or False? Switching from compressed to uncompressed data for a CPU-bound process may
increase the completion time despite the saved CPU time.
True
What fact is more relevant to the horizontal scaling of the filesystems than to the vertical
scaling?
The operation 'modify' files is not allowed in distributed FS (GFS, HDFS). What was NOT a reason
to do it?
Question 10
File permissions
If you have a very important file, what is the best way to protect it in HDFS?
Question 13
You were told that two servers in HDFS were down: Datanode and Namenode, your reaction:
Input
Capacity = 1PB
Metadata = 300 B
Rep. Factor = 3
~1.6gb
Speed = 60MB/s
1seg = 1000ms
touch test.txt
mkdir /user/acachuan/assignment1
ls -h /user/acachuan/assignment1
mv test.txt test2.txt
Local FS
! ls /home/jovyan
! touch ~/test.txt
for i in range(21):
myFile.write('%d\n' % i)
! ls /home/jovyan
! cat /home/jovyan/test.txt
Hadoop FS
Found 2 items
-rw-r--r-- 1 jovyan supergroup 239 2017-11-28 21:41 /user/jovyan/README.md
Found 2 items
Found 1 items
53 /user/jovyan/assignment1/test.txt
Found 1 items
Found 1 items
Found 1 items
$ hdfs dfs -ls assignment1/test.txt or hdfs dfs -stat "%b %u" assignment1/test.txt