Sunteți pe pagina 1din 1

15IT423E Data Science and Big Data Analytics

Assignment – 3
Due Date : 26th April 2019
Max Marks: 35

Note: Copied assignments will be given zero marks

Q1. Consider a disk with sector size 1MB; a sector is the smallest unit of data transfer to the disk. If
disk access time is 128MB/s, how long does it take to transfer a 256 MB file that is stored
sequentially? If it is not stored sequentially (I.e. randomly split up across the disk), what is the worst
case time to transfer the file? Assume seek time and rotational delay for the disk is a total of 10ms.
(5 marks)

Q2. Two processes A and B write to the same disk (normal HDD) simultaneously. The disk head
alternates between reading 100 blocks of A and B (disk block size is 512 bytes). If disk seek time is
around 10ms and transfer rate is 100MB/s, what is the effective speed of transfer seen by A and B?
(5 marks)

Q3. Consider a cluster with disks having read speed of 128MB/s, and seek time of 10ms. What
should the hdfs block size be so that seek times are less than 1% of the total read time (for a block)
from the disk?
(5 marks)

Q4. Assume a 400-node cluster, with 16TB of disk space per node, a block size of 128 MB and a
replication factor of 3. If on an average, 256 bytes of namenode main memory are required for each
block in HDFS, what should be minimum size of namenode memory for the cluster (assuming only
single primary namenode in the cluster) ?
(5 marks)

Q5. Illustrate how map-reduce can be used to count the number of words in the text:
“Give a man a fish and you feed him for a day; teach a man to fish and he will eat forever”
Assume 3 mappers and 3 reducers. The reducers split up the words into 3 ranges: a-d, e-l, m-z.
(5 marks)

Q6. Consider a Hadoop system with 25 jobs, each job requiring 15 map tasks and 5 reduce tasks.
Compare classic map reduce with map reduce 2/ YARN in terms of how many entities the
JobTracker needs to communicate with, in the case of classic map reduce versus the number of
entities that the Resource Manager and Application Master need to communicate with, in the case of
YARN (compute the approximate number in each case). Assume for the sake of simplicity that each
node can run a maximum of only 5 tasks, for both classic mapreduce as well as YARN.
(10 marks)

S-ar putea să vă placă și