Sunteți pe pagina 1din 2

Big Data Analysis (old syllabus)

IIT Gandhinagar
Department of Computer Science & Engineering

Prerequisite:
Knowledge in basic analytical algorithms, Graph Theory, Programming knowledge in any object-oriented language.

UNIT I – INTRODUCTION TO BIG DATA (3 hours)


Introduction, distributed file system, Big Data and its importance, Five Vs, Drivers for Big data, Big data analytics,
Big data applications, Algorithms using map reduce, applications like Matrix-Vector Multiplication by Map Reduce.
UNIT II - HADOOP ARCHITECTURE (3 hours)
Apache Hadoop Moving Data in and out of Hadoop, MapReduce, Data Serialization, Hadoop Architecture, Hadoop
Storage: HDFS, Common Hadoop Shell commands, Anatomy of File Write and Read., NameNode, Secondary
NameNode, and DataNode, Hadoop MapReduce paradigm, Map and Reduce tasks, Job, Task trackers - Hadoop
Configuration – HDFS Administering –Monitoring & Maintenance, Hive Architecture and Installation, Comparison
with Traditional Database, HiveQL
UNIT III- MACHINE LEARNING (8 hours)
Soft Computing: Neural Networks, Fuzzy Logic Systems, and Support Vector Machines, Basic Mathematics of Soft
Computing, Learning and Statistical Approaches to Regression and Classification - Support Vector Machines, Single-
Layer Networks: The Perceptron, The Adaptive Linear Neuron (Adaline) and the Least Mean Square Algorithm -
Multilayer Perceptron: basic of deep learning.
UNIT IV - PROCESSING & STORING STREAMING DATA (9 hours)
Distributed Stream Data Processing: Co-ordination, Partition and Merges, Transactions. Duplication Detection using
Bloom Filters, Apache Spark Streaming Examples Choosing a storage system, NoSQL Storage Systems, Visualizing
Data, Mobile Streaming Apps
UNIT V - PARALLEL PROGRAMMING (9 hours)
Parallel programming with Message Passing Interface (MPI): MPI compilation and running process, Implementation
of MPI for clusters, Dynamic process management, Fault tolerance, RMA Performance measurement, Parallel Virtual
Machine (PVM): Overview, Setup, console details-Extended PVM.
UNIT VI- CLOUD COMPUTING (9 hours)
Cloud Enabling Technologies, Characteristics of Cloud Computing -Benefits of Cloud Computing, Cloud Service
Models, Cloud Deployment models, Cloud computing Infrastructure, Cloud Challenges, Understanding IaaS-
Improving performance through Load balancing, Server Types within IaaS solutions, utilizing cloud based NAS
devices, Understanding Cloud based data storage, Cloud based backup devices.
UNIT VII – BIG DATA PRIVACY, ETHICS AND SECURITY (9 hours)
Privacy – Reidentification of Anonymous People Hadoop Kerberos Security Implementation & Configuration,
Integrating Hadoop with Enterprise Security Systems, Securing Sensitive Data in Hadoop, SIEM system, Setting up
audit logging in Hadoop cluster.
Books & Reference:
1. Chris Eaton, Dirk Deroos et al., “Understanding Big data”, McGraw Hill, 2012.
3. Tom White, “HADOOP: The definitive Guide”, O Reilly 2012.
4. Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, 3rd
ed, 2010.
7. Ben Spivey, Joey Echeverria, “Hadoop Security Protecting Your Big Data Problem”, O’Reilly Media, 2015.
8. Anthony T .Velte, Toby J.Velte, Robert Elsenpeter, “Cloud Computing: A Practical Approach”, Tata McGraw Hill
Edition, Fourth Reprint, 2010.
9. Kris Jamsa, “Cloud Computing: SaaS, PaaS, IaaS, Virtualization, Business Models, Mobile, Security and more”,
Jones & Bartlett Learning Company LLC, 2013.
10. Rajkumar Buyya, “High Performance Cluster Computing: Programming and Applications”, Vol 2, Prentice Hall
PTR, NJ, USA, 1999.

S-ar putea să vă placă și