Hadoop and Java Ques - Ans

1. What is Big Data?
Any data that cannot be stored into traditional RDBMS is termed as Big Data. As we know
most of the data that we use today has been generated in the past 20 years. And this data is
mostly unstructured or semi structured in nature. More than the volume of the data it is the
nature of the data that defines whether it is considered as Big Data or not.
Here is an interesting and explanatory visual on What is Big Data?
2. What do the four Vs of Big Data denote?

IBM has a nice, simple explanation for the four critical features of big data:
a) Volume Scale of data
b) Velocity Different forms of data
c) Variety Analysis of streaming data
d) Veracity Uncertainty of data
Here is an explanatory video on the four Vs of Big Data
3. How big data analysis helps businesses increase their revenue? Give
example.
Big data analysis is helping businesses differentiate themselves for example Walmart the
worlds largest retailer in 2014 in terms of revenue - is using big data analytics to increase its
sales through better predictive analytics, providing customized recommendations and
launching new products based on customer preferences and needs. Walmart observed a
significant 10% to 15% increase in online sales for $1 billion in incremental revenue. There
are many more companies like Facebook, Twitter, LinkedIn, Pandora, JPMorgan Chase,
Bank of America, etc. using big data analytics to boost their revenue.
Here is an interesting video that explains how various industries are leveraging big data
analysis to increase their revenue
4. Name some companies that use Hadoop.

Yahoo (One of the biggest user & more than 80% code contributor to Hadoop)
Facebook
Netflix
Amazon
Adobe
eBay
Hulu
Spotify
Rubikloud
Twitter
To view a detailed list of some of the top companies using Hadoop CLICK HERE
5. Differentiate between Structured and Unstructured data.
Data which can be stored in traditional database systems in the form of rows and columns,
for example the online purchase transactions can be referred to as Structured Data. Data
which can be stored only partially in traditional database systems, for example, data in XML
records can be referred to as semi structured data. Unorganized and raw data that cannot
be categorized as semi structured or structured data is referred to as unstructured data.
Facebook updates, Tweets on Twitter, Reviews, web logs, etc. are all examples of
unstructured data.
6. On what concept the Hadoop framework works?

Hadoop Framework works on the following two core components-
1)HDFS Hadoop Distributed File System is the java based file system for scalable and
reliable storage of large datasets. Data in HDFS is stored in the form of blocks and it
operates on the Master Slave Architecture.
2)Hadoop MapReduce-This is a java based programming paradigm of Hadoop framework

that provides scalability across various Hadoop clusters. MapReduce distributes the
workload into various tasks that can run in parallel. Hadoop jobs perform 2 separate tasksjob. The map job breaks down the data sets into key-value pairs or tuples. The reduce job
then takes the output of the map job and combines the data tuples to into smaller set of
tuples. The reduce job is always performed after the map job is executed.
Here is a visual that clearly explain the HDFS and Hadoop MapReduce Concepts-
7) What are the main components of a Hadoop Application?

Hadoop applications have wide range of technologies that provide great advantage in
solving complex business problems.
Core components of a Hadoop application are-
1) Hadoop Common
2) HDFS
3) Hadoop MapReduce
4) YARN
Data Access Components are - Pig and Hive
Data Storage Component is - HBase
Data Integration Components are - Apache Flume, Sqoop, Chukwa
Data Management and Monitoring Components are - Ambari, Oozie and Zookeeper.
Data Serialization Components are - Thrift and Avro
Data Intelligence Components are - Apache Mahout and Drill.
8. What is Hadoop streaming?

Hadoop distribution has a generic application programming interface for writing Map and
Reduce jobs in any desired programming language like Python, Perl, Ruby, etc. This is
referred to as Hadoop Streaming. Users can create and run jobs with any kind of shell
scripts or executable as the Mapper or Reducers.
9. What is the best hardware configuration to run Hadoop?

The best configuration for executing Hadoop jobs is dual core machines or dual processors
with 4GB or 8GB RAM that use ECC memory. Hadoop highly benefits from using ECC
memory though it is not low - end. ECC memory is recommended for running Hadoop
because most of the Hadoop users have experienced various checksum errors by using non
ECC memory. However, the hardware configuration also depends on the workflow
requirements and can change accordingly.
10. What are the most commonly defined input formats in Hadoop?
The most common Input Formats defined in Hadoop are:
Text Input Format- This is the default input format defined in Hadoop.
Key Value Input Format- This input format is used for plain text files
wherein the files are broken down into lines.
Sequence File Input Format- This input format is used for reading files in
sequence.
We have further categorized Big Data Interview Questions for Freshers and Experienced-
Hadoop Interview Questions and Answers for Freshers - Q.Nos1,2,4,5,6,7,8,9
Hadoop Interview Questions and Answers for Experienced - Q.Nos-3,8,9,10

Hadoop HDFS Interview Questions and Answers
1. What is a block and block scanner in HDFS?
Block - The minimum amount of data that can be read or written is generally referred to as a
block in HDFS. The default size of a block in HDFS is 64MB.
Block Scanner - Block Scanner tracks the list of blocks present on a DataNode and verifies
them to find any kind of checksum errors. Block Scanners use a throttling mechanism to
reserve disk bandwidth on the datanode.
2. Explain the difference between NameNode, Backup Node and

Checkpoint NameNode.
NameNode: NameNode is at the heart of the HDFS file system which manages the
metadata i.e. the data of the files is not stored on the NameNode but rather it has the
directory tree of all the files present in the HDFS file system on a hadoop cluster. NameNode
uses two files for the namespacefsimage file- It keeps track of the latest checkpoint of the namespace.
edits file-It is a log of changes that have been made to the namespace since checkpoint.
Checkpoint NodeCheckpoint Node keeps track of the latest checkpoint in a directory that has same structure
as that of NameNodes directory. Checkpoint node creates checkpoints for the namespace
at regular intervals by downloading the edits and fsimage file from the NameNode and
merging it locally. The new image is then again updated back to the active NameNode.
BackupNode:
Backup Node also provides check pointing functionality like that of the checkpoint node but it
also maintains its up-to-date in-memory copy of the file system namespace that is in sync
with the active NameNode.
3. What is commodity hardware?

Commodity Hardware refers to inexpensive systems that do not have high availability or high
quality. Commodity Hardware consists of RAM because there are specific services that need
to be executed on RAM. Hadoop can be run on any commodity hardware and does not
require any super computer s or high end hardware configuration to execute jobs.
4. What is the port number for NameNode, Task Tracker and Job
Tracker?
NameNode 50070
Job Tracker 50030
Task Tracker 50060
5. Explain about the process of inter cluster data copying.

HDFS provides a distributed data copying facility through the DistCP from source to
destination. If this data copying is within the hadoop cluster then it is referred to as inter
cluster data copying. DistCP requires both source and destination to have a compatible or
same version of hadoop.
6. How can you overwrite the replication factors in HDFS?

The replication factor in HDFS can be modified or overwritten in 2 ways-
1)Using the Hadoop FS Shell, replication factor can be changed per file basis using the
below command-
$hadoop fs setrep w 2 /my/test_file (test_file is the filename whose replication factor will
be set to 2)
2)Using the Hadoop FS Shell, replication factor of all files under a given directory can be
modified using the below command-
3)$hadoop fs setrep w 5 /my/test_dir (test_dir is the name of the directory and all the files
in this directory will have a replication factor set to 5)
7. Explain the difference between NAS and HDFS.
NAS runs on a single machine and thus there is no probability of data

redundancy whereas HDFS runs on a cluster of different machines thus there is
data redundancy because of the replication protocol.
NAS stores data on a dedicated hardware whereas in HDFS all the data
blocks are distributed across local drives of the machines.
In NAS data is stored independent of the computation and hence Hadoop

MapReduce cannot be used for processing whereas HDFS works with Hadoop
MapReduce as the computations in HDFS are moved to data.
8. Explain what happens if during the PUT operation, HDFS block is
assigned a replication factor 1 instead of the default value 3.
Replication factor is a property of HDFS that can be set accordingly for the entire cluster to
adjust the number of times the blocks are to be replicated to ensure high data availability.
For every block that is stored in HDFS, the cluster will have n-1 duplicated blocks. So, if the
replication factor during the PUT operation is set to 1 instead of the default value 3, then it
will have a single copy of data. Under these circumstances when the replication factor is set
to 1 ,if the DataNode crashes under any circumstances, then only single copy of the data
would be lost.
9. What is the process to change the files at arbitrary locations in

HDFS?
HDFS does not support modifications at arbitrary offsets in the file or multiple writers but files
are written by a single writer in append only format i.e. writes to a file in HDFS are always
made at the end of the file.
10. Explain about the indexing process in HDFS.

Indexing process in HDFS depends on the block size. HDFS stores the last part of the data
that further points to the address where the next part of data chunk is stored.
11. What is a rack awareness and on what basis is data stored in a

rack?
All the data nodes put together form a storage area i.e. the physical location of the data
nodes is referred to as Rack in HDFS. The rack information i.e. the rack id of each data node
is acquired by the NameNode. The process of selecting closer data nodes depending on the
rack information is known as Rack Awareness.
The contents present in the file are divided into data block as soon as the client is ready to
load the file into the hadoop cluster. After consulting with the NameNode, client allocates 3
data nodes for each data block. For each data block, there exists 2 copies in one rack and
the third copy is present in another rack. This is generally referred to as the Replica
Placement Policy.
We have further categorized Hadoop HDFS Interview Questions for Freshers and
Experienced-
Hadoop Interview Questions and Answers for Freshers - Q.Nos2,3,7,9,10,11
Hadoop Interview Questions and Answers for Experienced - Q.Nos- 1,2,

4,5,6,7,8
Hadoop MapReduce Interview Questions and Answers
1. Explain the usage of Context Object.
Context Object is used to help the mapper interact with other Hadoop systems. Context
Object can be used for updating counters, to report the progress and to provide any
application level status updates. ContextObject has the configuration details for the job and
also interfaces, that helps it to generating the output.
2. What are the core methods of a Reducer?

The 3 core methods of a reducer are
1)setup () This method of the reducer is used for configuring various parameters like the
input data size, distributed cache, heap size, etc.
Function Definition- public void setup (context)

2)reduce () it is heart of the reducer which is called once per key with the associated reduce
task.
Function Definition -public void reduce (Key,Value,context)
3)cleanup () - This method is called only once at the end of reduce task for clearing all the
temporary files.
Function Definition -public void cleanup (context)
3. Explain about the partitioning, shuffle and sort phase

Shuffle Phase-Once the first map tasks are completed, the nodes continue to perform
several other map tasks and also exchange the intermediate outputs with the reducers as
required. This process of moving the intermediate outputs of map tasks to the reducer is
referred to as Shuffling.
Sort Phase- Hadoop MapReduce automatically sorts the set of intermediate keys on a
single node before they are given as input to the reducer.
Partitioning Phase-The process that determines which intermediate keys and value will
be received by each reducer instance is referred to as partitioning. The destination partition
is same for any key irrespective of the mapper instance that generated it.
4. How to write a custom partitioner for a Hadoop MapReduce job?
Steps to write a Custom Partitioner for a Hadoop MapReduce Job-
A new class must be created that extends the pre-defined Partitioner
Class.
getPartition method of the Partitioner class must be overridden.
The custom partitioner to the job can be added as a config file in the
wrapper which runs Hadoop MapReduce or the custom partitioner can be added
to the job by using the set method of the partitioner class.
5. What is the relationship between Job and Task in Hadoop?
A single job can be broken down into one or many tasks in Hadoop.
6. Is it important for Hadoop MapReduce jobs to be written in Java?
It is not necessary to write Hadoop MapReduce jobs in java but users can write MapReduce
jobs in any desired programming language like Ruby, Perl, Python, R, Awk, etc. through the
Hadoop Streaming API.
7. What is the process of changing the split size if there is limited

storage space on Commodity Hardware?
If there is limited storage space on commodity hardware, the split size can be changed by
implementing the Custom Splitter. The call to Custom Splitter can be made from the main
method.
8. What are the primary phases of a Reducer?

The 3 primary phases of a reducer are
1)Shuffle
2)Sort
3)Reduce
9. What is a TaskInstance?
The actual hadoop MapReduce jobs that run on each slave node are referred to as Task
instances. Every task instance has its own JVM process. For every new task instance, a
JVM process is spawned by default for a task.
10. Can reducers communicate with each other?

Reducers always run in isolation and they can never communicate with each other as per
the Hadoop MapReduce programming paradigm.
We have further categorized Hadoop MapReduce Interview Questions for Freshers and
Experienced-
Hadoop Interview Questions and Answers for Freshers - Q.Nos- 2,5,6
Hadoop Interview Questions and Answers for Experienced - Q.Nos1,3,4,7,8,9,10

Hadoop HBase Interview Questions and Answers
1. When should you use HBase and what are the key components of
HBase?
HBase should be used when the big data application has
1)A variable schema
2)When data is stored in the form of collections
3)If the application demands key based access to data while retrieving.
Key components of HBase are
Region- This component contains memory data store and Hfile.
Region Server-This monitors the Region.
HBase Master-It is responsible for monitoring the region server.
Zookeeper- It takes care of the coordination between the HBase Master component and the
client.
Catalog Tables-The two important catalog tables are ROOT and META.ROOT table tracks
where the META table is and META table stores all the regions in the system.
2. What are the different operational commands in HBase at record

level and table level?
Record Level Operational Commands in HBase are put, get, increment, scan and delete.
Table Level Operational Commands in HBase are-describe, list, drop, disable and scan.
3. What is Row Key?

Every row in an HBase table has a unique identifier known as RowKey. It is used for
grouping cells logically and it ensures that all cells that have the same RowKeys are colocated on the same server. RowKey is internally regarded as a byte array.
4. Explain the difference between RDBMS data model and HBase data
model.
RDBMS is a schema based database whereas HBase is schema less data model.
RDBMS does not have support for in-built partitioning whereas in HBase there is automated
partitioning.
RDBMS stores normalized data whereas HBase stores de-normalized data.
5. Explain about the different catalog tables in HBase?

The two important catalog tables in HBase, are ROOT and META. ROOT table tracks where
the META table is and META table stores all the regions in the system.
6. What is column families? What happens if you alter the block size of
ColumnFamily on an already populated database?
The logical deviation of data is represented through a key known as column Family. Column
families consist of the basic unit of physical storage on which compression features can be
applied. In an already populated database, when the block size of column family is altered,
the old data will remain within the old block size whereas the new data that comes in will
take the new block size. When compaction takes place, the old data will take the new block
size so that the existing data is read correctly.
7. Explain the difference between HBase and Hive.

HBase and Hive both are completely different hadoop based technologies-Hive is a data
warehouse infrastructure on top of Hadoop whereas HBase is a NoSQL key value store that
runs on top of Hadoop. Hive helps SQL savvy people to run MapReduce jobs whereas
HBase supports 4 primary operations-put, get, scan and delete. HBase is ideal for real time
querying of big data where Hive is an ideal choice for analytical querying of data collected
over period of time.
8. Explain the process of row deletion in HBase.

On issuing a delete command in HBase through the HBase client, data is not actually
deleted from the cells but rather the cells are made invisible by setting a tombstone marker.
The deleted cells are removed at regular intervals during compaction.
9. What are the different types of tombstone markers in HBase for

deletion?
There are 3 different types of tombstone markers in HBase for deletion-
1)Family Delete Marker- This markers marks all columns for a column family.
2)Version Delete Marker-This marker marks a single version of a column.
3)Column Delete Marker-This markers marks all the versions of a column.
10. Explain about HLog and WAL in HBase.

All edits in the HStore are stored in the HLog. Every region server has one HLog. HLog
contains entries for edits of all regions performed by a particular Region Server.WAL
abbreviates to Write Ahead Log (WAL) in which all the HLog edits are written
immediately.WAL edits remain in the memory till the flush period in case of deferred log
flush.
We have further categorized Hadoop HBase Interview Questions for Freshers and
Experienced-
Hadoop Interview Questions and Answers for Freshers - Q.Nos-1,2,4,5,7
Hadoop Interview Questions and Answers for Experienced - Q.Nos2,3,6,8,9,10

Hadoop Sqoop Interview Questions and Answers
1. Explain about some important Sqoop commands other than import
and export.
Create Job (--create)
Here we are creating a job with the name my job, which can import the table data from
RDBMS table to HDFS. The following command is used to create a job that is importing data
from the employee table in the db database to the HDFS file.
$ Sqoop job --create myjob \
--import \
--connect jdbc:mysql://localhost/db \
--username root \
--table employee --m 1
Verify Job (--list)

--list argument is used to verify the saved jobs. The following command is used to verify the
list of saved Sqoop jobs.
$ Sqoop job --list
Inspect Job (--show)

--show argument is used to inspect or verify particular jobs and their details. The following
command and sample output is used to verify a job called myjob.
$ Sqoop job --show myjob
Execute Job (--exec)

--exec option is used to execute a saved job. The following command is used to execute a
saved job called myjob.
$ Sqoop job --exec myjob
2. How Sqoop can be used in a Java program?

The Sqoop jar in classpath should be included in the java code. After this the method
Sqoop.runTool () method must be invoked. The necessary parameters should be created to
Sqoop programmatically just like for command line.
3. What is the process to perform an incremental data load in Sqoop?

The process to perform incremental data load in Sqoop is to synchronize the modified or
updated data (often referred as delta data) from RDBMS to Hadoop. The delta data can be
facilitated through the incremental load command in Sqoop.
Incremental load can be performed by using Sqoop import command or by loading the data
into hive without overwriting it. The different attributes that need to be specified during
incremental load in Sqoop are-
1)Mode (incremental) The mode defines how Sqoop will determine what the new rows are.
The mode can have value as Append or Last Modified.
2)Col (Check-column) This attribute specifies the column that should be examined to find
out the rows to be imported.
3)Value (last-value) This denotes the maximum value of the check column from the
previous import operation.
4. Is it possible to do an incremental import using Sqoop?

Yes, Sqoop supports two types of incremental imports-
1)Append
2)Last Modified
To insert only rows Append should be used in import command and for inserting the rows
and also updating Last-Modified should be used in the import command.
5. What is the standard location or path for Hadoop Sqoop scripts?

/usr/bin/Hadoop Sqoop
6. How can you check all the tables present in a single database using
Sqoop?
The command to check the list of all tables present in a single database using Sqoop is as
follows-
Sqoop list-tables connect jdbc: mysql: //localhost/user;

7. How are large objects handled in Sqoop?
Sqoop provides the capability to store large sized data into a single field based on the type
of data. Sqoop supports the ability to store-
1)CLOB s Character Large Objects
2)BLOBs Binary Large Objects
Large objects in Sqoop are handled by importing the large objects into a file referred as
LobFile i.e. Large Object File. The LobFile has the ability to store records of huge size, thus
each record in the LobFile is a large object.
8. Can free form SQL queries be used with Sqoop import command? If
yes, then how can they be used?
Sqoop allows us to use free form SQL queries with the import command. The import
command should be used with the e and query options to execute free form SQL queries.
When using the e and query options with the import command the target dir value must
be specified.
9. Differentiate between Sqoop and distCP.

DistCP utility can be used to transfer data between clusters whereas Sqoop can be used to
transfer data only between Hadoop and RDBMS.
10. What are the limitations of importing RDBMS tables into Hcatalog
directly?
There is an option to import RDBMS tables into Hcatalog directly by making use of hcatalog
database option with the hcatalog table but the limitation to it is that there are several
arguments like as-avrofile , -direct, -as-sequencefile, -target-dir , -export-dir are not
supported.
We have further categorized Hadoop Sqoop Interview Questions for Freshers and
Experienced-
Hadoop Interview Questions and Answers for Freshers - Q.Nos- 4,5,6,9
Hadoop Interview Questions and Answers for Experienced - Q.Nos1,2,3,6,7,8,10

Hadoop Flume Interview Questions and Answers
1) Explain about the core components of Flume.
The core components of Flume are
Event- The single log entry or unit of data that is transported.
Source- This is the component through which data enters Flume workflows.
Sink-It is responsible for transporting data to the desired destination.
Channel- it is the duct between the Sink and Source.
Agent- Any JVM that runs Flume.
Client- The component that transmits event to the source that operates with the agent.
2) Does Flume provide 100% reliability to the data flow?

Yes, Apache Flume provides end to end reliability because of its transactional approach in
data flow.
Hadoop Flume Interview Questions and Answers for Freshers - Q.Nos1,2

Hadoop Zookeeper Interview Questions and Answers
1) Can Apache Kafka be used without Zookeeper?
It is not possible to use Apache Kafka without Zookeeper because if the Zookeeper is down
Kafka cannot serve client request.
2) Name a few companies that use Zookeeper.

Yahoo, Solr, Helprace, Neo4j, Rackspace
Hadoop ZooKeeper Interview Questions and Answers for Freshers Q.Nos- 1,2
Hadoop Pig Interview Questions and Answers
1) What do you mean by a bag in Pig?
Collection of tuples is referred as a bag in Apache Pig
2) Does Pig support multi-line commands?
Yes
3) What are different modes of execution in Apache Pig?Apache Pig runs in
2 modes- one is the Pig (Local Mode) Command Mode and the other is the Hadoop
MapReduce (Java) Command Mode. Local Mode requires access to only a single machine
where all files are installed and executed on a local host whereas MapReduce requires
accessing the Hadoop cluster.
We have further categorized Hadoop Pig Interview Questions for Freshers and Experienced-
Hadoop Interview Questions and Answers for Freshers - Q.Nos-1,2
Hadoop Interview Questions and Answers for Experienced - Q.Nos- 3

Hadoop Hive Interview Questions and Answers
1) What is a Hive Metastore?
Hive Metastore is a central repository that stores metadata in external database.
2) Are multiline comments supported in Hive?

No
3) What is ObjectInspector functionality?

ObjectInspector is used to analyze the structure of individual columns and the internal
structure of the row objects. ObjectInspector in Hive provides access to complex objects
which can be stored in multiple formats.
Hadoop Hive Interview Questions and Answers for Freshers- Q.Nos1,2,3

Hadoop YARN Interview Questions and Answers
1)What are the stable versions of Hadoop?
Release 2.7.1 (stable)
Release 2.4.1
Release 1.2.1 (stable)
2) What is Apache Hadoop YARN?

YARN is a powerful and efficient feature rolled out as a part of Hadoop 2.0.YARN is a large
scale distributed system for running big data applications.
3) Is YARN a replacement of Hadoop MapReduce?

YARN is not a replacement of Hadoop but it is a more powerful and efficient technology that
supports MapReduce and is also referred to as Hadoop 2.0 or MapReduce 2.
We have further categorized Hadoop YARN Interview Questions for Freshers and
Experienced-
Hadoop Interview Questions and Answers for Freshers - Q.Nos- 2,3
Hadoop Interview Questions and Answers for Experienced - Q.Nos- 1
Hadoop Interview Questions

Answers Needed
Hadoop Flume Interview Questions
1) How can Flume be used with HBase?
2) Explain about the different channel types in Flume. Which channel type is faster?
3) Which is the reliable channel in Flume to ensure that there is no data loss?
4) Explain about the replication and multiplexing selectors in Flume.
5) Differentiate between FileSink and FileRollSink.
6) How multi-hop agent can be setup in Flume?
7) Does Apache Flume provide support for third party plug-ins?
8) Is it possible to leverage real time analysis on the big data collected by Flume directly? If
yes, then explain how.
Hadoop Zookeeper Interview Questions

1)What is the role of Zookeeper in HBase architecture?
2)Explain about Zookeeper in Kafka
3)Explain how Zookeeper works.
4)List some examples of Zookeeper use cases.
5)How to use Apache Zookeeper command line interface?
6)What are the different types of ZNodes?
7)What are watches?
8)What problems can be addressed by using Zookeeper?
Interview Questions on Hadoop Pig

1)Explain the need for MapReduce while programming in Apache Pig.
2)Explain about co-group in Pig.
3)Explain about the BloomMapFile.
4)Differentiate between Hadoop MapReduce and Pig
5)What is the usage of foreach operation in Pig scripts?
6)Explain about the different complex data types in Pig.
7)What Flatten does in Pig?
Interview Questions on Hadoop Hive

1)Explain about the different types of join in Hive.
2)How can you configure remote metastore mode in Hive?
3)Explain about the SMB Join in Hive.
4)Is it possible to change the default location of Managed Tables in Hive, if so how?
5)How data transfer happens from Hive to HDFS?
6)How can you connect an application, if you run Hive as a server?
7)What does the overwrite keyword denote in Hive load statement?
8)What is SerDe in Hive? How can you write yourown customer SerDe?
9)In case of embedded Hive, can the same metastore be used by multiple users?
Hadoop YARN Interview Questions

1)What are the additional benefits YARN brings in to Hadoop?
2)How can native libraries be included in YARN jobs?
3)Explain the differences between Hadoop 1.x and Hadoop 2.x
Or
4)Explain the difference between MapReduce1 and MapReduce 2/YARN
5)What are the modules that constitute the Apache Hadoop 2.0 framework?
6)What are the core changes in Hadoop 2.0?
7)How is the distance between two nodes defined in Hadoop?
8)Differentiate between NFS, Hadoop NameNode and JournalNode.
We hope that these Hadoop Interview Questions and Answers have pre-charged you for
your next Hadoop Interview.Get the Ball Rolling and answer the unanswered questions in
the comments below.Please do! It's all part of our shared mission to ease Hadoop Interviews
for all prospective Hadoopers.We invite you to get involved.
50 interview questions:
Hadoop Developer Interview Questions
1) Explain how Hadoop is different from other parallel computing solutions.
2) What are the modes Hadoop can run in?
3) What will a Hadoop job do if developers try to run it with an output directory that is already
present?
4) How can you debug your Hadoop code?
5) Did you ever built a production process in Hadoop? If yes, what was the process when
your Hadoop job fails due to any reason? (Open Ended Question)
6) Give some examples of companies that are using Hadoop architecture extensively.
Hadoop Admin Interview Questions

7) If you want to analyze 100TB of data, what is the best architecture for that?
8) Explain about the functioning of Master Slave architecture in Hadoop?
9) What is distributed cache and what are its benefits?
10) What are the points to consider when moving from an Oracle database to Hadoop
clusters? How would you decide the correct size and number of nodes in a Hadoop cluster?
11) How do you benchmark your Hadoop Cluster with Hadoop tools?
Hadoop Interview Questions on HDFS

12) Explain the major difference between an HDFS block and an InputSplit.
13) Does HDFS make block boundaries between records?
14) What is streaming access?
15) What do you mean by Heartbeat in HDFS?
16) If there are 10 HDFS blocks to be copied from one machine to another. However, the
other machine can copy only 7.5 blocks, is there a possibility for the blocks to be broken
down during the time of replication?
17) What is Speculative execution in Hadoop?
18) What is WebDAV in Hadoop?
19) What is fault tolerance in HDFS?
20) How are HDFS blocks replicated?
21) Which command is used to do a file system check in HDFS?
22) Explain about the different types of writes in HDFS.
Hadoop MapReduce Interview Questions

23) What is a NameNode and what is a DataNode?
24) What is Shuffling in MapReduce?
25) Why would a Hadoop developer develop a Map Reduce by disabling the reduce step?
26) What is the functionality of Task Tracker and Job Tracker in Hadoop? How many
instances of a Task Tracker and Job Tracker can be run on a single Hadoop Cluster?
27) How does NameNode tackle DataNode failures?
28) What is InputFormat in Hadoop?
29) What is the purpose of RecordReader in Hadoop?
30) What is InputSplit in MapReduce?
31)In Hadoop, if custom partitioner is not defined then, how is data partitioned before it is
sent to the reducer?
32) What is replication factor in Hadoop and what is default replication factor level Hadoop
comes with?
33) What is SequenceFile in Hadoop and Explain its importance?
34) If you are the user of a MapReduce framework, then what are the configuration
parameters you need to specify?
35) Explain about the different parameters of the mapper and reducer functions.
36) How can you set random number of mappers and reducers for a Hadoop job?
37) How many Daemon processes run on a Hadoop System?
38) What happens if the number of reducers is 0?
39) What is meant by Map-side and Reduce-side join in Hadoop?
40) How can the NameNode be restarted?
41) Hadoop attains parallelism by isolating the tasks across various nodes; it is possible for
some of the slow nodes to rate-limit the rest of the program and slows down the program.
What method Hadoop provides to combat this?
42) What is the significance of conf.setMapper class?
43) What are combiners and when are these used in a MapReduce job?
44) How does a DataNode know the location of the NameNode in Hadoop cluster?
45) How can you check whether the NameNode is working or not?
Pig Interview Questions

46) When doing a join in Hadoop, you notice that one reducer is running for a very long time.
How will address this problem in Pig?
47) Are there any problems which can only be solved by MapReduce and cannot be solved
by PIG? In which kind of scenarios MR jobs will be more useful than PIG?
48) Give an example scenario on the usage of counters.
Hive Interview Questions

49) Explain the difference between ORDER BY and SORT BY in Hive?
50) Differentiate between HiveQL and SQL.
Hadoop MapReduce vs. Apache Spark Who

Wins the Battle?
12 Nov 2014
There are various approaches in the world of big data which make Apache Hadoop just the
perfect choice for iterative data processing, interactive queries and ad hoc queries. Every
Hadoop user is aware of the fact that Hadoop MapReduce framework is meant majorly for
batch processing and thus using Hadoop MapReduce for machine learning processes, adhoc data exploration and other similar processes is not apt.
Most of the Big Data vendors are making their efforts for finding and ideal solution to this
challenging problem that has paved way for the advent of a very demanding and popular
alternative named Apache Spark. Spark makes development completely a pleasurable
activity and has a better performance execution engine over MapReduce whilst using the
same storage engine Hadoop HDFS for executing huge data sets.
Apache Spark has gained great hype in the past few months and is now being regarded as
the most active project of Hadoop Ecosystem.
Before we get into further discussion on what empowers Apache Spark over Hadoop
MapReduce let us have a brief understanding of what actually Apache Spark is and then
move on to understanding the differences between the two.
Introduction to the User Friendly Face of Hadoop -Apache Spark

Spark is a fast cluster computing system developed by the contributions of near about 250
developers from 50 companies in the UC Berkeleys AMP Lab, for making data analytics
faster and easier to write and as well to run.
Apache Spark is an open source available for free download thus making it a user friendly
face of the distributed programming framework i.e. Big Data. Spark follows a general
execution model that helps in in-memory computing and optimization of arbitrary operator
graphs so that querying data becomes much faster when compared to the disk based
engines like MapReduce.
Apache Spark has a well designed application programming interface that consists of
various parallel collections with methods such as groupByKey, Map and Reduce so that you
get a feel as though you are programming locally. With Apache Spark you can write
collection oriented algorithms using the functional programming language Scala.
Why Apache Spark was developed?
Hadoop MapReduce that was envisioned at Google and successfully implemented and
Apache Hadoop is an extremely famous and widely used execution engine. You will find
several applications that are on familiar terms with how to decompose their work into a
sequence of MapReduce jobs. All these real time applications will have to continue their
operation without any change.
However the users have been consistently complaining about the high latency problem with
Hadoop MapReduce stating that the batch mode response for all these real time applications
is highly painful when it comes to processing and analyzing data.
Now this paved way for Hadoop Spark, a successor system that is more powerful and
flexible than Hadoop MapReduce. Despite the fact that it might not be possible for all the
future allocations or existing applications to completely abandon Hadoop MapReduce, but
there is a scope for most of the future applications to make use of a general purpose
execution engine such as Hadoop Spark that comes with many more innovative features, to
accomplish much more than that is possible with MapReduce Hadoop.
Apache Spark vs Hadoop-What makes Spark superior over Hadoop?

Apache Spark is an open source standalone project that was developed to collectively
function together with HDFS. Apache Spark by now has a huge community of vocal
contributors and users for the reason that programming with Spark using Scala is much
easier and it is much faster than the Hadoop MapReduce framework both on disk and inmemory.
Thus, Hadoop Spark is just the apt choice for the future big data applications that
possibly would require lower latency queries, iterative computation and real time processing
on similar data.
Hadoop Spark has lots of advantages over Hadoop MapReduce framework in terms of a
wide range of computing workloads it can deal with and the speed at which it executes the
batch processing jobs.
Click here to know more about our IBM Certified Hadoop Developer course
Difference between MapReduce and Spark
Src: www.tapad.com
i) Hadoop vs Spark Performance
Hadoop Spark has been said to execute batch processing jobs near about 10 to 100 times
faster than theHadoop MapReduce framework just by merely by cutting down on the
number of reads and writes to the disc.
In case of MapReduce there are these Map and Reduce tasks subsequent to which there is
a synchronization barrier and one needs to preserve the data to the disc. This feature of
MapReduce framework was developed with the intent that in case of failure the jobs can be
recovered but the drawback to this is that, it does not leverage the memory of the Hadoop
cluster to the maximum.
Nevertheless with Hadoop Spark the concept of RDDs (Resilient Distributed Datasets) lets
you save data on memory and preserve it to the disc if and only if it is required and as well it
does not have any kind of synchronization barriers that possibly could slow down the
process. Thus the general execution engine of Spark is much faster than Hadoop
MapReduce with the use of memory.
ii) Hadoop MapReduce vs Spark-Easy Management

It is now easy for the organizations to simplify their infrastructure used for data processing as
with Hadoop Spark now it is possible to perform Streaming, Batch Processing and Machine
Learning all in the same cluster.
Most of the real time applications use Hadoop MapReduce for generating reports that help in
finding answers to historical queries and then altogether delay a different system that will
deal with stream processing so as to get the key metrics in real time. Thus the organizations
ought to manage and maintain separate systems and then develop applications for both the
computational models.
However with Hadoop Spark all these complexities can be eliminated as it is possible to
implement both stream and batch processing on the same system so that it simplifies the
development, deployment and maintenance of the application.With Spark it is possible to
control different kinds of workloads, so if there is an interaction between various workloads in
the same process it is easier to manage and secure such workloads which come as a
limitation with MapReduce.
iii) Spark vs MapreduceReal Time Method to Process Streams

In case of Hadoop MapReduce you just get to process a batch of stored data but with
Hadoop Spark it is as well possible to modify the data in real time through Spark Streaming.
With Spark Streaming it is possible to pass data through various software functions for
instance performing data analytics as and when it is collected.
Developers can now as well make use of Apache Spark for Graph processing which maps
the relationships in data amongst various entities such as people and objects. Organizations
can also make use of Apache Spark with predefined machine learning code libraries so that
machine learning can be performed on the data that is stored in various Hadoop clusters.
iv) Spark vs MapReduce -Caching

Spark ensures lower latency computations by caching the partial results across its memory
of distributed workers unlike MapReduce which is disk oriented completely. Hadoop Spark is
slowly turning out to be a huge productivity boost in comparison to writing complex Hadoop
MapReduce pipelines.
v) Spark vs MapReduce- Ease of Use

Writing Spark is always compact than writing Hadoop MapReduce code. Here is a Spark
MapReduce example-The below images show the word count program code in Spark and
Hadoop MapReduce.If we look at the images, it is clearly evident that Hadoop MapReduce
code is more verbose and lengthy.
Spark MapReduce Example- Wordcount Program in Spark
Spark MapReduce Example- Wordcount Program in Hadoop MapReduce
Spark MapReduce Comparison -The Bottomline
Hadoop MapReduce is meant for data that does not fit in the memory
whereas Apache Spark has a better performance for the data that fits in the
memory, particularly on dedicated clusters.
Hadoop MapReduce can be an economical option because of Hadoop as a

service offering(HaaS) and availability of more personnel. According to the
benchmarks, Apache Spark is more cost effective but staffing would be
expensive in case of Spark.
Apache Spark and Hadoop MapReduce both are failure tolerant but
comparatively Hadoop MapReduce is more failure tolerant than Spark.
Spark and Hadoop MapReduce both have similar compatibility in terms of

data types and data sources.
Programming in Apache Spark is easier as it has an interactive mode

whereas Hadoop MapReduce requires core java programming skills,however
there are several utilities that make programming in Hadoop MapReduce easier.
Will Apache Spark Eliminate Hadoop MapReduce?
Hadoop MapReduce is being condemned by most of the users as a log jam in Hadoop
Clustering for the reason that MapReduce executes all the jobs in Batch Mode which implies
that analyzing data in real time is not possible. With the advent of Hadoop Spark which is
proven to be a great alternative to Hadoop MapReduce the biggest question that hinders
the minds of Data Scientists is Hadoop vs. Spark- Who wins the battle?
Apache Spark executes the jobs in micro batches that are very short say approximately 5
seconds or less than that. Apache Spark has over the time been successful in providing
more stability when compared to the real time stream oriented Hadoop Frameworks.
Nevertheless every coin has two faces and yeah so does Hadoop Spark comes with some
backlogs such as inability to handle in case if the intermediate data is greater than the
memory size of the node, problems in case of node failure and the most important of all is
the cost factor.
Hadoop Spark makes use of the journaling (also known as Recomputation) for providing
resiliency in case there is a node failure by chance as a result we can conclude that the
recovery behavior in case of node failure is just similar as that in case of Hadoop
MapReduce except for the fact that the recovery process would be much faster.
Spark also has the spill to disk feature incase if for a particular node there is insufficient RAM
for storing the data partitions then it provides graceful degradation for disk based data
handling. When it comes to cost, with street RAM prices being 5USD per GB, we can have
near about 1TB of RAM for 5K USD thus making memory to be a very minor fraction of the
overall node costing.
One great advantage that comes coupled with Hadoop MapReduce over Apache Spark is
that in case if the data size is greater than memory then under such circumstances Apache
Spark will not be able to leverage its cache and there is much probability that it will be far
slower than the batch processing of MapReduce.
Confused Hadoop vs. Spark Which One to Choose?
If the question that is leaving you confused on Hadoop MapReduce or Apache Spark or
rather say to choose Disk Based Computing or RAM Based Computing, then the answer to
this question is straightforward. It all depends and the variables on which this decision
depends keep on changing dynamically with time.
Nevertheless, the current trends are in favor of the in-memory techniques like the Apache
Spark as the industry trends seem to be rendering a positive feedback for it. So to conclude
with we can state that, the choice of Hadoop MapReduce vs. Apache Spark depends on the
user-based case and we cannot make an autonomous choice.
Hadoop 2.0 (YARN) Framework The Gateway

to Easier Programming for Hadoop Users
25 Nov 2014
Evolution of Hadoop 2.0 (YARN) -Swiss Army Knife of Big Data
With a rapid pace in evolution of Big Data, its processing frameworks also seem to be
evolving in a full swing mode. With the introduction of Hadoop in 2005 to support cluster
distributed processing of large scale data workloads through the MapReduce processing
engine, Hadoop has undergone a great refurbishment over time. The result of this is a better
and advanced Hadoop framework that does not merely support MapReduce but renders
support to various other distributed processing models also.
In this piece of writing we provide the users an insight on the novel Hadoop 2.0
(YARN) and help them understand the need to switch from Hadoop 1.0 to Hadoop 2.0.
The huge data giants on the web such as Google, Yahoo and Facebook who had adopted
Apache Hadoop had to depend on the partnership of Hadoop HDFS with the resource
management environment and MapReduce programming. These technologies collectively
enabled the users to manage processes and store huge amounts of semi-structured,
structured or unstructured data within Hadoop clusters. Nevertheless there were certain
intrinsic drawbacks with Hadoop MapReduce pairing. For instance, Google and other users
of Apache Hadoop had various alluding issues with Hadoop 1.0 of not having the ability to
keep track with the flood of information that they were collecting online due to the batch
processing arrangement of MapReduce.
Introduction to Hadoop YARN (Hadoop 2.0)
Hadoop 2.0 popularly known as YARN (Yet another Resource Negotiator) is the latest
technology introduced in Oct 2013 that is being used widely nowadays for processing and
managing distributed big data.
Hadoop YARN is an advancement to Hadoop 1.0 released to provide performance

enhancements which will benefit all the technologies connected with the Hadoop Ecosystem
along with the Hive data warehouse and the Hadoop database (HBase). Hadoop YARN
comes along with the Hadoop 2.x distributions that are shipped by Hadoop distributors.
YARN performs job scheduling and resource management duties devoid of the users having
to use Hadoop MapReduce on Hadoop Systems.
Hadoop YARN has a modified architecture unlike the intrinsic characteristics of Hadoop 1.0
so that the systems can scale up to new levels and responsibilities can be clearly assigned
to the various components in Hadoop HDFS.
Need to Switch from Hadoop 1.0 to Hadoop 2.0 (YARN)

The foremost version of Hadoop had both advantages and disadvantages. Hadoop
MapReduce is a standard established for big data processing systems in the modern era but
the Hadoop MapReduce architecture does have some drawbacks which generally come into
action when dealing with huge clusters.
Limitations of Hadoop 1.0

Issue of Availability: Hadoop 1.0 Architecture had only one single point of availability
i.e. the Job Tracker, so in case if the Job Tracker fails then all the jobs will have to restart.
Issue of Scalability: The Job Tracker runs on a single machine performing various
tasks such as Monitoring, Job Scheduling, Task Scheduling and Resource Management. In
spite of the presence of several machines (Data Nodes), they were not being utilized in an
efficient manner, thereby limiting the scalability of the system.
Cascading Failure Issue: In case of Hadoop MapReduce when the number of nodes is
greater than 4000 in a cluster, some kind of fickleness is observed. The most common kind
of failure that was observed is the cascading failure which in turn could cause the overall
cluster to deteriorate when trying to overload the nodes or replicate data via network
flooding.
Multi-Tenancy Issue: The major issue with Hadoop MapReduce that paved way for
the advent of Hadoop YARN was multi-tenancy. With the increase in the size of clusters in
Hadoop systems, the clusters can be employed for a wide range of models.
Hadoop MapReduce devotes the nodes of the cluster in the Hadoop System so that they
can be repurposed for other big data workloads and applications. Nevertheless, with Big
Data and Hadoop, ruling the data processing applications for cloud deployments, the
number of nodes in the cluster is likely to increase and this issue is addressed with a switch
from 1.x to 2.x.
This is not just the end of the limitations coming from Hadoop MapReduce apart from the
above mentioned issues there were several other concerns addressed by Hadoop
programmers with version 1.0 such as inefficient utilization of the resources, hindering
constraints in running any other Non-MapReduce applications, running ad-hoc queries,
carrying out real time analysis and limitations in running the message passing approach.
Understanding the Differences between the Components of Hadoop 1.0

and Hadoop 2.0
The Hadoop 1.0 or the so called MRv1 mainly consists of 3 important components namely:
1) Resource Management: This is an infrastructure component that takes care of

monitoring the nodes, allocating the resources and scheduling various jobs.
2) Application Programming Interface (API): This component is for the users to
program various MapReduce applications.
3) Framework: This component is for all the runtime services such as Shuffling, Sorting
and executing Map and Reduce processes.
The major difference with Hadoop 2.0 is that, in this next generation of Hadoop the cluster
resource management capabilities are moved into YARN.
YARN
YARN has taken an edge over the cluster management responsibilities from MapReduce, so
that now MapReduce just takes care of the Data Processing and other responsibilities are
taken care of by YARN.
Hadoop 2.0 (YARN) and Its Components
In Hadoop 2.0, the Job Tracker in YARN mainly depends on 3 important components
1. Resource Manager Component: This component is considered as the negotiator

of all the resources in the cluster. Resource Manager is further categorized into an
Application Manager that will manage all the user jobs with the cluster and a pluggable
scheduler. This is a relentless YARN service that is designed for receiving and running the
applications on the Hadoop Cluster. In Hadoop 2.0, a MapReduce job will be considered as
an application.
2. Node Manager Component: This is the job history server component
of YARN which will furnish the information about all the completed jobs. The NM keeps a
track of all the users jobs and their workflow on any particular given node.
3. Application Master Component (aka User Job Life Cycle Manager): This is the
component where the job actually resides and the Application Master component is
responsible for managing each and every Map Reduce job and is concluded once the job
completes processing.
A Gist on Hadoop 2.0 Components
RM-Resource Manager
1.It is the global resource scheduler
2.It runs on the Master Node of the Cluster
3.It is responsible for negotiating the resources of the system amongst the
competing applications.
4.It keeps a track on the heartbeats from the Node Manager
NM-Node Manager
1.Node Manager communicates with the resource manager.
2.It runs on the Slave Nodes of the Cluster

AM-Application Master
1.There is one AM per application which is application specific or

framework specific.
2.The AM runs in Containers that are created by the resource manager on

request.
Migration from Hadoop 1.0 to Hadoop 2.0
With the advent of YARN framework as a part of the Hadoop 2.0 platform, there are several
applications and tools available now for Hadoop programmers that will help them make the
best out of big data which they never thought of.
YARN has been capable of providing the organizations something that is far beyond Map
Reduce, by separating the cluster resource management function completely from the data
processing function. With comparatively less overloaded sophisticated programming
protocols and being cost effective, companies preferably would like to migrate their
applications from Hadoop 1.0 to Hadoop 2.0. An edge that YARN provides to Hadoop
Users is that it is backward compatible (i.e. one can easily run an existing Map Reduce job
on Hadoop 2.0 without making any modifications) thus compelling the companies to migrate
from Hadoop 1.0 to Hadoop 2.0 without even giving it a second thought.
Despite the fact that most of the Hadoop applications have migrated from Hadoop 1.0 to
Hadoop 2.0 there are migrations that are still in progress and companies are consistently
striving hard to accomplish this long needed upgrade for their applications.
With Hadoop YARN, it is now easy for Hadoop Developers to build applications directly
with Hadoop, devoid of having to bolt them from any other outside third party vendor tools
which was the case with Hadoop 1.0.This is another important reason why companies will
establish Hadoop 2.0 as a platform for creating applications and manipulating data for more
effectively and efficiently.
YARN is the elephant sized change that Hadoop 2.0 has brought in but undoubtedly there
are lots of challenges involved as companies migrate from Hadoop 1.0 to Hadoop 2.0
however the basic changes to the MR framework will have greater usability level for Hadoop
in the upcoming big data scenarios. Hadoop 2.0 being more isolated and scalable over the
earlier version, it is anticipated that soon there will be several novel tools that will get the
most out of the new features in YARN (Hadoop 2.0).
5 Healthcare applications of Hadoop and Big

data
16 Mar 2015
The New York based research and consulting firm, Institute for Health Technology
Transformation estimates that in 2011, the US Healthcare industry generated 150 billion
gigabytes (150 Exabytes) of data. This data was mostly generated by various regulatory
requirements, record keeping, compliance and patient care. Since then, there has been an
exponential increase in data which has lead to an expenditure of $1.2 trillion towards
healthcare data solutions in the Healthcare industry. McKinsey projects that the use of Big
Data in healthcare can reduce the healthcare data management expenses by $300 billion $500 billion.
Big Data in healthcare originates from the large electronic health datasets these datasets
are very difficult to manage with the conventional hardware and software. The use of legacy
data management methods and tools also makes it impossible to usefully leverage all this
data. Big Data in healthcare is an overpowering concept not just because of the volume of
data but also due to the different data types and the pace at which healthcare data
management needs to be managed. The sum total of data related to the patient and their
well-being constitutes the Big Data problem in the healthcare industry.Big Data Analytics
has actually become an on the rise and crucial problem in healthcare informatics as well.
Healthcare informatics also contributes to the development of Big Data analytic technology
by posing novel challenges in terms of data knowledge representation, database design,
data querying and clinical decision support.
Despite the fact that, most of the data in the health care sector is stored in printed form, the
recent trend is moving towards rapid digitization of this data. Big Data in healthcare industry
promises to support a diverse range of healthcare data management functions such as
population health management, clinical decision support and disease surveillance. The
Healthcare industry is still in the early stages of getting its feet wet in the large scale
integration and analysis of big data.
With 80% of the healthcare data being unstructured, it is a challenge for the healthcare
industry to make sense of all this data and leverage it effectively for Clinical operations,
Medical research, and Treatment courses.
The volume of Big data in healthcare is anticipated to grow over the coming years and the
healthcare industry is anticipated to grow with changing healthcare reimbursement models
thus posing critical challenges to the healthcare environment. Even though, profit is not the
sole motivator, it is extremely important for the big data healthcare companies to make use
of the best in class techniques and tools that can leverage Big Data in healthcare effectively.
Else these big data healthcare companies might have to skate on thin ice when it comes to
generating profitable revenue.
Need of Hadoop in Healthcare Data Solutions

Charles Boicey an Information Solutions Architect at UCI says that Hadoop is the
only technology that allows healthcare to store data in its native form. If Hadoop
didnt exist we would still have to make decisions about what can come into our
data warehouse or the electronic medical record (and what cannot). Now we can
bring everything into Hadoop, regardless of data format or speed of ingest. If I
find a new data source, I can start storing it the day that I learn about it. We
leave no data behind.
By the end of 2016, the number of health records of millions of people is likely to increase
into tens of billions. Thus, the computing technology and infrastructure must be able to
render a cost efficient implementation of:
Parallel Data Processing that is unconstrained.
Provide storage for billions and trillions of unstructured data sets.
Fault tolerance along with high availability of the system.

Hadoop technology is successful in meeting the above challenges faced by the healthcare
industry as MapReduce engine and HDFS have the capability to process thousands of
terabytes of data. Hadoop makes use of cheap commodity hardware making it a pocket
friendly investment for the healthcare industry.
Here are 5 healthcare data solutions of Big Data and Hadoop
1. Hadoop technology in Cancer Treatments and Genomics

Deepak Singh, the principal product manager at Amazon Web Services, said,
Weve definitely seen an uptake in adopting Hadoop in the life sciences
community, mostly targeting next-generation sequencing, and simple read
mapping because what developers discovered was that a number of
bioinformatics problems transferred very well to Hadoop, especially at scale.
Image Credit: mobilehealthglobal.com

Industry reports indicate that, there are about 3 billion base pairs that constitute the human
DNA and it is necessary for such large amounts of data to be organized in an effective
manner if we have to fight cancer. The biggest reason why cancer has not been cured yet is
because of the fact that cancer mutates in different patterns and reacts in different ways
based on the genetic makeup of an individual. Hence, oncology researchers have come up
with a solution that in order to cure cancer, patients will need to be given personalized
treatment based on the type of cancer the individual patients genetics make up. Leveraging
Hadoop technology will offer great support for parallelization and help in mapping the 3
billion DNA base pairs using MapReduce programs.
Ketan Paranjape, the global director of health and life sciences at Intel, talks
about his efforts to build on those investments as he discusses the current state
and future directions in health care analytics. The goal of using Hadoop in
Healthcare, Paranjape says, is to collect and analyze data that can do everything
from assess public health trends in a region of millions of people to pinpoint
treatment options for one cancer patient.
David Cameron, Prime minister of UK has announced a government funding of 300m in

August, 2014 for a 4 year project that will target to map 100,000 human genomes by the end
of 2017 in collaboration with the American Biotechnology firm Illumina and Genomics
England. The main goal of this project is to make use of big data in healthcare to develop
personalized medication for cancer patients.
2. Hadoop technology in Monitoring Patient Vitals

There are several hospitals across the world that use Hadoop to help the hospital staff work
efficiently with Big Data. Without Hadoop, most patient care systems could not even imagine
working with unstructured data for analysis.
Image Credit: slideshare.net

Childrens Healthcare of Atlanta treats over 6,200 children in their ICU units. On average, the
duration of stay in Pediatric ICU varies from a month to a year. Childrens Healthcare of
Atlanta used a sensor beside the bed that helps them continuously track patient signs such
as blood pressure, heartbeat and the respiratory rate. These sensors produce large chunks
of data, which using legacy systems cannot be stored for more than 3 days for analysis.The
main motive of Childrens Healthcare of Atlanta was to store and analyze the vital signs. If
there is any change in pattern, then the hospital wanted an alert to be generated to a team
of doctors and assistants. All this was successfully achieved using Hadoop ecosystem
components - Hive, Flume, Sqoop, Spark, and Impala.
3. Hadoop technology in the Hospital Network

A Cleveland Clinic spinoff company known as Explorys is making use of Big Data in
healthcare to provide the best clinical support, reduce the cost of care measurement and
manage the population of at-risk patients. Explorys has reportedly built the largest database
in the healthcare industry with over a hundred billion data points all thanks to Hadoop.
Explorys uses Hadoop technology to help their medical experts analyze data bombardments
in real time from diverse sources such as financial data, payroll data, and electronic health
records.
The analytics tool developed by Explorys is used for data mining so that it helps clinicians
determine the deviations among patients and the effects treatments have on their health.
These insights help the medical practitioners and health care providers find out the best
treatment plans for a set of patient populations or for an individual patient.
4. Hadoop technology in Healthcare Intelligence

Healthcare Insurance Business operates by collating the associated costs (the risk) and
equally dividing it by the number of members in the risk group. In such circumstances, the
data and the outcomes are always dynamic and changing.Using Hadoop technology in
Healthcare Intelligence applications helps hospitals, payers and healthcare
agencies increase their competitive advantages by devising smart business
solutions.
For instance, lets assume that, a healthcare insurance company is interested in finding the
age in a particular region where individuals below that age are not victims of certain
diseases. This data will help the insurer compute the cost of insurance policy. To gather
desired age, insurance companies will have to process huge data sets to extract meaningful
information such asmedicines, diseases, symptoms, opinions, geographic region detail etc.
In this scenario, using Hadoops Pig, Hive and MapReduce is the best solution to process
such large datasets.
5. Hadoop technology in Fraud Prevention and Detection
At least 10% of the Healthcare insurance payments are attributed to fraudulent claims.
Worldwide this is estimated to be a multi billion dollar problem. Fraudulent claims is not a
novel problem but the complexity of the insurance frauds seems to be increasing
exponentially making it difficult for the healthcare insurance companies to deal with them.
Image Credit: ibmbigdatahub.com

Big Data Analytics helps healthcare insurance companies find different ways to identify and
prevent fraud at an early stage. Using Hadoop technology, insurance companies have been
successful in developing predictive models to identify fraudsters by making use of real-time
and historical data of medical claims, weather data, wages, voice recordings, demographics,
cost of attorneys and call center notes. Hadoops capability to store large unstructured data
sets in NoSQL databases and using MapReduce to analyze this data helps in the analysis
and detection of patterns in the field of Fraud Detection.
The upswing for big data in healthcare industry is due to the falling cost of storage. As early
as 5 years ago, the cost of a scalable relational database with a permanent software license
was $100,000 per TB along with an additional cost of $20,000per year for support and
maintenance. Now with the advent of Hadoop in Big Data Analytics it is possible to store,
manage and analyze the same amount of data with a yearly subscription of just $1,200. The
increasing demand for using Hadoop technology in Healthcare will eliminate the concept of
one size fits all kind of medicines and treatments in the healthcare industry. The coming
years will see the Healthcare industry provide personalized patient medications at controlled
costs.
Did you like our top 5 healthcare data solutions of Big Data? If you work in the healthcare
industry or have an idea of any other healthcare data solutions that help big data healthcare
companies harness the power of Hadoop, please leave a comment below!
NoSQL vs SQL- 4 Reasons Why NoSQL is

better for Big Data applications
19 Mar 2015
1000 users of a web application, was a major load on the app, in the early days and 10,000
users were considered an extreme scenario.
As per the web statistics report in 2014, there are about 3 billion people who are connected
to the world wide web and the amount of time that the internet users spend on the web is
somewhere close to 35 billion hours per month, which is increasing gradually.
With the availability of several mobile and web applications, it is pretty common to have
billions of users- who will generate a lot of unstructured data. There is a need for a database
technology that can render 24/7 support to store, process and analyze this data.
Can the conventional SQL scale up to these requirements?
Its important that youre not just going with a traditional database
because thats what everyone else is using, said Evaldo de Oliveira,
Business Development Director at FairCom. Pay attention to whats
going on in the NoSQL world because there are some problems that SQL
cannot handle.
Relational Databases The fundamental concept behind databases, namely MySQL,
Oracle Express Edition, and MS-SQL that uses SQL, is that they are all Relational Database
Management Systems that make use of relations (generally referred to as tables) for storing
data.
In a relational database, the data is correlated with the help of some common characteristics
that are present in the Dataset and the outcome of this is referred to as the Schema of the
RDBMS.
Limitations of SQL vs NoSQL:

Relational Database Management Systems that use SQL are Schema
Oriented i.e. the structure of the data should be known in advance ensuring that
the data adheres to the schema.
Examples of such predefined schema based applications that use SQL
include Payroll Management System, Order Processing, and Flight Reservations.

It is not possible for SQL to process unpredictable and unstructured
information. However, Big Data applications, demand for an occurrence-oriented

database which is highly flexible and operates on a schema less data model.
SQL Databases are vertically scalable this means that they can only be
scaled by enhancing the horse power of the implementation hardware, thereby

making it a costly deal for processing large batches of data.
IT enterprises need to increase the RAM, SSD, CPU, etc., on a single server
in order to manage the increasing load on the RDBMS.

With increasing size of the database or increasing number of users,
Relational Database Management Systems using SQL suffer from serious

performance bottlenecks -making real time unstructured data processing a hard
row to hoe.
With Relational Database Management Systems, built-in clustering is
difficult due to the ACID properties of transactions.

NoSQL is a database technology driven by Cloud Computing, the Web, Big Data and the Big
Users.
NoSQL now leads the way for the popular internet companies such as LinkedIn, Google,
Amazon, and Facebook - to overcome the drawbacks of the 40 year old RDBMS.
Image Credit: cloudave.com

NoSQL Database, also known as Not Only SQL is an alternative to SQL database which
does not require any kind of fixed table schemas unlike the SQL.
NoSQL generally scales horizontally and avoids major join operations on the data. NoSQL
database can be referred to as structured storage which consists of relational database as
the subset.
NoSQL Database covers a swarm of multitude databases, each having a different kind of
data storage model. The most popular types are Graph, Key-Value pairs, Columnar and
Document.
Click here to know more about our IBM Certified NoSQL course
NoSQL vs SQL 4 Key Differences:
1. Nature of Data and Its Storage- Tables vs. Collections
The foremost criterion for choosing a database is the nature of data that your enterprise is
planning to control and leverage. If the enterprise plans to pull data similar to an accounting
excel spreadsheet, i.e. the basic tabular structured data, then the relational model of the
database would suffice to fulfill your business requirements but the current trends demand
for storing and processing unstructured and unpredictable information.
To the contrary, molecular modeling, geo-spatial or engineering parts data is so complex to

be dealt with that the Data Model created for this kind of data is highly complicated due to
several levels of nesting. Though several attempts were made to model this kind of data with
the 2D (Row-Column) Database - it did not fit .
Image Credit: couchbase.com

To overcome this drawback, NoSQL database was considered as an alternate option.
NoSQL Databases ease the representation of multi-level hierarchies and nesting using the
JSON i.e. JavaScript Object Notation format.
In this world of dynamic schema where changes pour in every hour it is not possible to
adhere to the Get it Right First Strategy - which was a success with the outmoded static
schema.
Web-centric businesses like Amazon, eBay, etc., were in need of a database like NoSQL vs
SQL that can best match up with the changing data model rendering them greater levels of
flexibility in operations.
2. Speed Normalization vs. Storage Cost
RDBMS requires a higher degree of Normalization i.e. data needs to be broken down into
several small logical tables to avoid data redundancy and duplication. Normalization helps
manage data in an efficient way, but the complexity of spanning several related tables
involved with normalization hampers the performance of data processing in relational
databases using SQL.
On the other hand, in NoSQL Databases such as Couchbase, Cassandra, and MongoDB,
data is stored in the form of flat collections where this data is duplicated repeatedly and a
single piece of data is hardly ever partitioned off but rather it is stored in the form of an entity.
Hence, reading or writing operations to a single entity have become easier and faster.
NoSQL databases can also store and process data in real time - something that SQL is not
capable of doing it.
3. Horizontal Scalability vs. Vertical Scalability

The most beneficial aspect of NoSQL databases like HBase for Hadoop, MongoDB,
Couchbase and 10Gens is - the ease of scalability to handle huge volumes of data.
For instance, if you operate an eCommerce website similar to Amazon and you happen to
be an overnight success - you will have tons of customers visiting your website.
Under such circumstances, if you are using a relational database, i.e., SQL, you will have to
meticulously replicate and repartition the database so as to fulfill the increasing demand of
the customers.
Most people who choose NoSQL as their primary data storage are
trying to solve two main problems: scalability and simplifying the
development process, said Danil Zburivsky, solutions architect at
Pythian
Image Credit: couchbase.com

The manner in which NoSQL vs SQL databases scale up to meet the business requirements
affects the performance bottleneck of the application.
Generally, with increase in demand, relational databases tend to scale up vertically which
means that they add extra horsepower to the system - to enable faster operations on the
same dataset.On the contrary, NoSQL Databases like the HBase, Couchbase and MongoD,
scale horizontally with the addition ofextra nodes (commodity database servers) to the
resource pool, so that the load can be distributed easily.
4. NoSQL vs SQL / CAP vs. ACID

Relational databases using SQL have been legends in the database landscape for
maintaining integrity through the ACID properties (Atomicity, Consistency, Isolated, and
Durable) of transactions and most of the storage vendors rely on properties.
However, the main motive is to shore up isolated non-dividable transactions - where

changes are permanent, leaving the data in a consistent state.
NoSQL Databases work on the concept of the CAP priorities and at a time you can decide to
choose any of the 2 priorities out of the CAP Theorem (Consistency-Availability-Partition
Tolerance) as it is highly difficult to attain all the three in a changing distributed node system.
One can term NoSQL Databases as BASE , the opposite of ACID - meaning:
BA= Basically Available In the bag Availability
S= Soft State The state of the system can change anytime devoid of executing any query
because node updates take place every now and then to fulfill the ever changing
requirements.
E=Eventually Consistent- NoSQL Database systems will become consistent in the long run.
Image Credit: smist08.wordpress.com/

Why should you choose a NoSQL Database like HBase, Couchbase or Cassandra over
RDBMS?
1)Applications and databases need to work with Big Data
2)Big Data needs a flexible data model with a better database architecture
3)To process Big Data, these databases need continuous application availability with
modern transaction support.
NoSQL in Big Data Applications
1.
HBase for Hadoop, a popular NoSQL database is used extensively by

Facebook for its messaging infrastructure.
2.
HBase is used by Twitter for generating data, storing, logging, and

monitoring data around people search.
3.
HBase is used by the discovery engine Stumble upon for data analytics
and storage.
4.
MongoDB is another NoSQL Database used by CERN, a European Nuclear

Research Organization for collecting data from the huge particle collider Hadron
Collider.
5.
LinkedIn, Orbitz, and Concur use the Couchbase NoSQL Database for
various data processing and monitoring tasks.
The Database Landscape is flooded with increased data velocity, growing data variety, and
exploding data volumes and only NoSQL databases like HBase, Cassandra, Couchbase can
keep up with these requirements of Big Data applications.
What is Hadoop 2.0 High Availability?

23 Mar 2015
In one of our previous articles we had discussed about Hadoop 2.0 YARN framework and
how the responsibility of managing the Hadoop cluster is shifting from MapReduce towards
YARN. Here we will highlight the feature - high availability cluster of Hadoop 2.0 which
eliminates the single point of failure (SPOF) in the Hadoop cluster by setting up a secondary
NameNode.
The early adopters of Hadoop 1.0 Google, Facebook and Yahoo, had to depend on the
joint venture of Resource Management Environment, HDFS and the Map Reduce
programming. The partnership among these technologies added value to the processing,
managing and storage of Semi Structured, Structured and Unstructured Data in the Hadoop
Cluster for these data giants.
However, the limitations in the Hadoop Map Reduce pairing paved way for Hadoop 2.0. For
instance, Yahoo reported that Hadoop 1.x is not able to pace up with flood of information
they were collecting online due to the Map Reduces batch processing format and the
NameNodes SPOF had always been a bothersome issue in case of failures.
Hadoop 2.0 An Overview

Hadoop 2.0 boasts of improved scalability and availability of the system via a set ofbundled
features that represent a generational swing in the Hadoop Architecture with the introduction
of YARN
Hadoop 2.0 also introduces the solution to the much awaited High Availability problem.
Hadoop introduced YARN - that has the ability to process terabytes and
Petabytes of data present in HDFS with the use of various non-MapReduce
applications namely GIRAPH and MPI.
Hadoop 2.0 divides the responsibilities of the overloaded Job Tracker into 2
different divine components i.e. the Application Master (per application) and the
Global Resource Manager.
Hadoop 2.0 improves horizontal scalability of the NameNode through

HDFS Federation and eliminates the Single Point of Failure Problem with the
NameNode High Availability
Hadoop NameNode High Availability problem:
Hadoop 1.0 NameNode has single point of failure (SPOF) problem- which means that if the
NameNode fails, then that Hadoop Cluster will become out-of-the-way. Nevertheless, this is
anticipated to be a rare occurrence as applications make use of business critical hardware
with RAS features (Reliability, Availability and Serviceability) for all the NameNode servers.
In case, if NameNode failure occurs then it requires manual intervention of the Hadoop
Administrators to recover the NameNode with the help of a secondary NameNode.
NameNode SPOF problem limits the overall availability of the Hadoop Cluster in the
following ways:
If there are any planned maintenance activities of hardware or software

upgrades on the NameNode then it will result in overall downtime of the Hadoop
Cluster.
If any unplanned event triggers, which results in the machine crashing,

then the Hadoop cluster would not be available unless the Hadoop Administrator
restarts the NameNode.
Hadoop 2.0 High Availability Features

Hadoop 2.0 overcomes this SPOF shortcoming by providing support for multiple
NameNodes. It introduces Hadoop 2.0 High Availability feature that brings in an extra
NameNode (Passive Standby NameNode) to the Hadoop Architecture which is configured
for automatic failover.
The main motive of the Hadoop 2.0 High Availability project is to render availability to big
data applications 24/7 by deploying 2 Hadoop NameNodes One in active configuration and
the other is the Standby Node in passive configuration.
Earlier there was one Hadoop NameNode for maintaining the tree hierarchy of the HDFS
files and tracking the data storage in the cluster. Hadoop 2.0 High Availability allows users to
configure Hadoop clusters with uncalled- for NameNodes so as to eliminate the probability of
SPOF in a given Hadoop cluster. The Hadoop Configuration capability allows users to build
clusters horizontally with several NameNodes which can operate autonomously through a
common data storage pool, thereby, offering better computing scalability when compared to
Hadoop 1.0
With Hadoop 2.0, Hadoop architecture is now configured in a manner that it supports
automated failover with complete stack resiliency and a hot Standby NameNode.
Image Credit :blog.cloudera.com

From the above graph, it is evident that both the active and passive (Standby) NameNodes
have state-of-the-art metadata that ensures flawless failover for large Hadoop clusters
indicating that there would not be any downtime for your Hadoop cluster and it will be
available all the time.
Hadoop 2.0 is keyed up to identify any failures in NameNode host and processes, so that it
can automatically switch to the passive NameNode i.e. the Standby Node to ensure high
availability of the HDFS services to the Big Data applications. With the advent of Hadoop 2.0
HA its time for Hadoop Administrators to take a breather, as this process does not require
manual intervention.
With HDP 2.0 High Availability, the complete Hadoop Stack i.e. HBase, Pig, Hive,
MapReduce, Oozie are equipped to tackle the NameNode failure problem- without having to
lose the job progress or any related data. Thus, any critical long running jobs that are
scheduled to be completed at a specific time will not be affected by the NameNode failure.
Hadoop Users Expectations from Hadoop 2.0 High Availability
When Hadoop users were interviewed about the High Availability Requirements from
Hadoop 2.0 Architecture, some of the most common High Availability requirements that they
came up with are:
No Data Loss on Failure/No Job Failure/No Downtime Hadoop users

stated that with Hadoop 2.0 High Availability should ensure that there should not
be any impact on the applications due to any individual software or hardware
failure.
Stand for Multiple Failures - Hadoop users stated that with Hadoop 2.0
High Availability the Hadoop Cluster must be able to stand for more than one
failure simultaneously. Preferably, Hadoop configuration must allow the
administrator to configure the degree of tolerance or let the user make a choice
at the resource level - on how many failures can be tolerated by the cluster.
Self Recovery from a Failure Hadoop users stated that with Hadoop
2.0 High Availability, the Hadoop Cluster must heal automatically (self healing)
without any manual intervention to restore it back to a highly available state
after the failure, with the pre-assumption that sufficient physical resources are
already available.
Ease of Installation: According to Hadoop users, setting up High

Availability should be a trifling activity devoid of requiring the Hadoop
Administrator to install any other open source or commercial third party
software.
No Demand for Additional Hardware Requirements: Hadoop users

say that Hadoop 2.0 High Availability feature should not demand that the users
deploy, maintain or purchase additional hardware. 100% Commodity hardware
must be used to achieve high availability i.e. there should not be any further
dependencies on non commodity hardware such as Load Balancers.
Hadoop 2.0 High Availability Feature AvatarNode Implementation at
Facebook to solve the availability problem
Facebook is a fast growing large data organization that has close to 500 million active users
who share more than 30 billion pieces of content on the web in the form of blog posts,
photos, news stories, links, comments, etc. Approximately people spend 700 billion minutes
on Facebook per month and this data is said to double semi annually. How does Facebook
render high availability to such a huge database of users? Facebook uses Hadoop 2.0 High
Availability feature to ensure that 100 Petabytes of data is online 24/7 with the use of the
special AvatarNode.
Ashish Thusoo, Engineering Manager at Facebook, stated at the Hadoop

Summit that Facebook uses Hadoop 2.0 and Hive extensively to
process large data sets. This infrastructure is used for a variety of
different jobs - including adhoc analysis, reporting, index generation
and many others. We have one of the largest clusters with a total
storage disk capacity of more than 20PB and with more than 23000
cores. We also use Hadoop and Scribefor log collection, bringing in
more than 50TB of raw data per day. Hadoop has helped us scale with
these tremendous data volumes.
In the HDFS architecture, all the file system metadata requests are passed through a single
server known as the NameNode and the file system sends and receives data through a
group of Data Nodes. The presence of Data Nodes in the HDFS architecture is redundant
and at any given point of time the file system can afford to handle the failure of a Data Node,
however, if the NameNode is down then the overall functionality of the HDFS will be at stake
and any of the applications connected to it will cease to operate.
Andrew Ryan, Hadoop professional at Facebook mentioned at one of the

Hadoop Summit that it is necessary for a large organization like
Facebook to understand the degree and level of NameNode as a Single
Point of Failure so that they can build a solution that will overcome the
shortcomings of NameNode as SPOF.
AvatarNode is Born to Address NameNode Problem
Image Credit:.slideshare.net
The limitation of SPOF with the HDFS architecture was overcome with the birth of
AvatarNode at Facebook. Wondering why it has such an unusual name?
Dhruba Borthakur, a famed HDFS developer at Facebook has named it after the James
Cameron movie Avatar released then in 2009 - during the birth of AvatarNode.
AvatarNode has been contributed by Facebook as open source software to the Hadoop
Community, to offer highly available NameNode that has hot failover and failback.
AvatarNode is a double node or can be called as two-node cluster that provides a highly
available NameNode to the Big Data applications with a manual failover.
AvatarNode is now the heartthrob inside Facebook as it is a huge WIN to the NameNode
SPOF problem. AvatarNode is running heavy production workloads and contributes to
improved administration and reliability of Hadoop Clusters at Facebook.
Hadoop 2.2 is now supported on Windows that is now craving interest from organizations
that are dedicated to Microsoft platforms only. There is no doubt that there would be growing
pains as organizations migrate to the latest release of Hadoop, however the basic changes
to the MapReduce framework will add value for Hadoop in Big Data set-ups. Hadoop 2.0
is just a messenger of the growing technology and a revitalized concept for building and
implementing Big Data Applications. There is anticipation for various tools that will make the
most of the Hadoop 2.0 High Availability and the new HDFS Architecture will support
features in YARN.
How is Hadoop Transforming the

Telecommunication industry?
25 Mar 2015
The past decade has seen an exponential growth in the telecommunication industry. The
cost of communication has steadily decreased and thanks to the innovation in the electronics
industry, mobile phones have become affordable and feature rich.
Now one no longer expects a phone to just make and receive calls and text messages.
Smartphones have become an invaluable part of our lifestyle. The developing countries
such as India and China have been driving the mobile phone market and are no longer
considered as electronic dump zones by the major players of the mobile phone
manufacturing industry.
How big the telecommunication industry really is?
Technology and business are coming closer than ever before. In the modern age no industry
touches as many technology related business sectors as the Telecoms.
Image Credit : teletech.com

Telecommunication industry includes the traditional local and long distance telephone calls,
text messaging, wireless communication, optical fiber based high speed broadband
communication, television streaming and other modes of satellite based communication.
Telephone service, cable television, internet services and wireless are all being provided as
an integrated telecom solution and at present, the International Telecommunications Union
(ITU) estimates approximately 6.9 billion wireless service subscriptions worldwide as of mid
2014.
The Big Shift in Telecom Trends
The revenue model of Telecom sector has seen a marked shift from the traditional voice and
messaging driven model to a data driven model.
This churn in the industry has given rise to an enormous amount of data which has never
been seen before. Around the globe, hundreds of thousands of phone calls take place
simultaneously, which need to be accurately tracked by the second and the report has to be
made available to the customers in the form of an itemized bill.
Big Data has made its presence felt across industries and Telecom sector is no different. Big
data telecom is in need of robust, scalable and accurate data analysis software which is
capable of tracking and analyzing such large volume communication in real time.
The need for a scalable and robust Big data telecom solution
As is the case in most other industries, Apache Hadoop has come to the rescue for the
Telecom sector as well in Telecom data analytics for providing real time monitoring and Big
data solutions.
Big telecom companies have a number of verticals such as Marketing, Product, Sales,
Human Resources, Information Technology, Research and Development, etc. that are in
constant need of information. Using Hadoop, the existing databases can be suitably mined
and information can be extracted which would eventually be fed to the respective verticals
for decision making.
For example, the Telecom data consumption trend of the past few years can be extrapolated
to determine the expected bandwidth usage by consumers in the future and appropriate
products can be pitched to them. Also, the technology division can scale up its network
infrastructure well in advance to determine.
Modern data architecture using Hadoop
Image Credit : www.hortonworks.com

Big data telecom gives you the unique ability to segregate data and computation which is a
big improvement over the traditional tools.
Hadoop data analysis engines perform the calculations and processing at the database end
and transmit only the final result which allows big data analysis of telecommunications data
in an efficient, fast and secure manner. It prevents unnecessary clogging of bandwidth which
can be efficiently used for other more important network oriented operations.
Hortonworks puts Hadoop as one of the best placed tools to look into Telecom data
analytics. It states that of the present solutions capable of handling Telecom data effectively,
Hadoop is the best suited for delivering a modern data architecture. It allows Telcos to store
new types of data, retain it for longer periods, join different data sets together and derive
new information from the resultant combination which can be valuable for the business
users.
Hortonworks lists down how modern data architectures using Hadoop can provide Big data
solutions to telecommunication industry which is in an attempt to gain competitive
advantage:
Handling Call Data Records (CDRs)
One of the biggest challenges the telecommunication industry faces is to have an
infrastructure in place for analyzing CDRs which can be ably addressed using Hadoop.
Telecommunications companies carry out a lot of forensics on their data for monitoring the
quality of service. It involves using Hadoop to perform dropped call analysis, monitor and
report poor sound quality, root cause analysis and pattern recognition. Considering the fact
that millions of records flow into the Big data Telecom databases every second, there is a
need to perform real time, accurate analysis and using Hadoop provides exactly that, with
the help of Apache Flume (capable of ingesting millions of CDRs into Hadoop database per
second) and Apache Storm (capable of processing data in real time and identifying irregular,
potentially troublesome patterns). These can be combined to improve the overall customer
satisfaction levels.
Proactive Servicing of Telecom data Equipment
Big telecom companies stay ahead of the market and invest in huge Telecom data
infrastructure well in advance in order to gain competitive advantage and be ready to service
customers as soon as there is a demand for their service. This calls for regular performance
monitoring of the Telecom data equipment such as s, conductors, signal boosters, antennas,
etc.
Using Hadoop enables telecom companies to analyze big data produced by
telecommunications systems through performance indicators (voltage and current levels, up
time, down time, efficiency, etc.). Using Hadoop makes this real time analysis easy to
perform and store.
Pitching new products
Product innovation is one of the key factors for any big telecom company to attain and
maintain its competitive advantage. It is important to analyze the usage history and forecast
the next generation of products which the customers are likely to expect and be ready with
them as soon as the demand arises. These require complex analysis on terabytes of data
and sort them according to customer demographics, geography, profession and a number of
other factors.
Using Apache Hadoop has made Telecom data analytics possible in a secure, reliable and
efficient manner.
Network Analytics
Todays telecom customer has got a number of service providers to choose from and the
switching costs have come down drastically, which means the telecom companies,
especially the giant dot of Telecoms, need to keep a careful watch on the performance of
their network. The network bottlenecks have to be identified and resolved within a matter of
minutes for a company to retain its customer base and attract new customers.
Using Hadoop gives them the ability to dig through Petabytes of data and extract
meaningful information in a matter of seconds.
Companies such as Telefonica, China Mobile and Verizon crunch their big data through
Hadoop to grow and maintain their services. "The ability to identify, gather, store and share
large amounts of data is one of the true benefits of cloud computing," said Verizon CMO
John Harrobin. "Providing those capabilities on top of our high-performing, secure and
scalable architecture strengthens the Verizon Cloud offering for our clients."
So, if you relate to the Telecom sector, in any capacity, learning Hadoop will always be a
huge advantage, especially in the current scenario dominated by the boom of the Telecom
data analytics in the telecommunication industry.
Big Data Analytics- The New Player in ICC

World Cup Cricket 2015
27 Mar 2015
Big Data in World Cup Cricket 2015 Criclytics
With the ICC World Cup Cricket 2015 round the corner; battle is on for the ICC World Cup
2015.The big final is between Australia and New Zealand.
Image Credit : worldcupnext.com
India and South Africa, despite their best efforts were knocked off the running, in the semis.
South Africa have been branded as chokers - as they tend to miss out on the most
important matches, even though they are a top team on ICC Cricket Rankings.
So how do we know this? How to experts sit and analyze who the top teams are? Which
batsman has the better batting average? Which bowler has the best economy rate?
All of the above questions are solved with the help of Big Data Analytics in Cricket, or more
appropriately termed Criclytics.
Cricket- A game, where 10 full member countries and 96 part time countries participate. ICC
Cricket World Cup is the most awaited event for these participants. Cricket is played as a
professional sport and has been so for the past 160 years. Data in Cricket is generated
every day for 365 days. With the ball by ball information of 531253 cricket players in close to
540290 cricket matches at 11960 cricket grounds across the world - this Big Data is
extremely voluminous. There are definitely tons of permutations and combinations of cricket
data to effectively predict the accuracy of the next ball that possibly can be bowled.
Can this data be accrued and analyzed to make accurate predictions for a game that is
known to be unpredictable?
The answer is straightforward. Using Hadoop and other big data technologies cricket data
can be analyzed to the niftiest precision.
Scope of Big Data in Cricket Predictions for World Cup 2015
Regardless of the kind of sport - the more insight you have, the more you know, the better
are your chances of enjoying its success- whether you are a broadcaster, player or a diehard fan.
It is not difficult to see how much big data analysis is used in post-match reviews and in
planning the game strategy for the next match. For instance, with real time Data analysis,
experts were able to predict that the teams going forward to the semifinals were going to be
India, Australia, South Africa and New Zealand. This prediction was based on each teams
past performances in the year. Experts had also hinted that since India had not won any of
the matches against Australia in the past year, it was not likely that they would win in the
semis. But we had hoped that the unpredictability of cricket will come through. Unfortunately
for India Data won again.
Image Credit:indianexpress.com
Big Data and Hadoop technology have a perspective in various industries but they are now
on the flap of targeting sports. Big Data in Sports can make a significant difference in
preventing injuries, scoring touchdowns and signing contracts from coaches to players.
Stephen Benjamin, IT Director at Cadence Design Systems Private
Limited says To an extent data analytics can help in creating a level playing field as this
insightful information is available to all, but the players still require ability and talent to
execute their plans on the ground
Big Data Analytics providers have come with progressively more refined methods for
monitoring and capturing the exponentially growing volumes of cricket data. Wearable
computers , CCTV Cameras, and sensors keep a track of each and every aspect of cricket
players' performance, training levels, intake of calories, interaction with fans and many more
in the pursuit of 'improved performance on the pitch'.
Big Data in Sports is categorized into 2 main analyses. The first category of predictions is
the one in which is analytics is done to provide fun statistics that entertain audiences/viewers
and the second one is performance focused analytics that helps the teams to plan in a better
manner or improve a players' performance.
The former kind of predictions are exposed to the viewers such as the weather conditions,
swing, average speed of the bowler and many more. Some of these cricket predictions are
just meant to enliven the viewers for example - 'Whenever Virat Kohli scores a century, India
wins' or 'New Zealand have never played a World Cup final before and Australia is chasing
the Cricket World Champion title for the 5th time.'
How Big is the Cricket Data in real time?

We see that a single match in World Cup Cricket 2015 generates chunks of data in the form
of batting averages or bowling figures.
If we take into consideration data related to a Batsmen then it will include number of balls
faced, number of sixes scored, number of fours scored, overall runs scored, strike rate and
so on. Similarly, if we account for the data related to a bowler then it will include the number
of wickets taken, the bowling average, number of runs given to opposing batsmen, number
of balls bowled and so on.
These are just the statistical data we are talking about what about the video data which
shows how a batsmen has responded to a particular ball, how a ball swung in the
preliminary phase of the match, etc. Thus, with the help of Hadoop and other big data
technologies there is a massive opportunity to analyze these statistics and make the best
cricket predictions for the World Cup Cricket 2015 that will help in taking precise decisions
on and off the cricket pitch.
Amitava Ghosh, ex-CTO at TaxiForSure says While raw talent can never be
replaced, data science and statistical analysis is going to help in putting together a sharper
strategy for the game, just like it helps in business."
How cricket data is used for analytics in World Cup Cricket 2015?
The main goal of ICC is to provide real time and interesting story telling statistics to the
viewers through Big Data Analytics. ICC is using 40 years of historical world cup data to give
out the best cricket predictions and enhance the experience of the viewers at World Cup
Cricket 2015.
Predictive analytics is the secret to achieving this goal of ICC as it foretells specific
outcomes for events which take place in a match by looking at the previous trends in data
and considering various other variables.
Tools that use Big Data Analytics for Cricket Predictions:

1) Insights by ESPNCricInfo
ESPNCricInfo has developed a tool called Insights with the dawn of World Cup Cricket
2015 by the amalgamation of big data analytics with cricket. Insights tool makes use of 2
decades of historical cricket data points and statistics which are plotted at the Data and
Analytics Hub in Bangalore - with the intent of creating a set of products to slot in fans in this
multi-screen era.
Image Credit :ESPNCricInfo.com
With Insights tool, now Big Data Analytics can tell you the odds of the popular batsmen
Virat Kohli playing in a particular zone in opposition to a left hand bowler. Isnt that amazing?
The Big data functions plot in approximately 25 different variables for every ball bowled in a
match during the World Cup Cricket 2015 which are passed through an extensive analytics
process complemented by historical data with the help of local cricketing intelligence of the
ESPNCricInfo team.
Mr. Ramesh Kumar, the Head of ESPN Digital Media India and
ESPNCricInfo says that Unlike in cricket, big data analytics have become well-defined
tools in other sports such as baseball and basketball. We map every cricket ball in multiple
dimensions, be it in any time zone. Various coaches and players have been around us to get
this data. So, this tool will be available for free only up to a certain layer.
2) ScoreWithData-IBM Game Changer
Just 7 hours before the 1 st quarter final of ICC World Cup Cricket 2015 began,
ScoreWithData predicted that the South African Cricketer Imran Tahir is going to be ranked
as the Power Bowler and this prediction through Big Data Analytics came true as the South
African Team won the knockout match against Sri Lanka because of the outstanding
performance of Tahir making it an unforgettable experience for all the cricket lovers as they
take pleasure in watching World Cup Cricket .
Image Credit :07.ibm.com/
This best cricket prediction owes IBM - for its analytic innovation ScoreWithData that
consists of Twitteratti Speaks, IBM Social Sentiment Index and Wisden Impact Index which
provides insights to cricket fans. IBM uses Social Media in particular Twitter for spreading
the word about their creative and interesting predictions.
Twitteratti Speaks uses Data Analytics Engine and Social Media Data to
identify and keep a track of all the highlight moments of every cricket match.
The highlight of a match are identified by the prevailing reactions of the cricket
fans by scanning and searching through billions of Twitter feeds and finally
results are projected visually.
IBM Social Sentiment Index uses real-time data streams for generating the
best cricket predictions on the teams as to who has a greater probability of
winning the match, who are the most talked of players.
Widen Impact Index provides proportional and correlated predictions by

leveraging the historical and current cricket data as the cricket match
progresses.
Big Data in Sports has a vital role to play in decision making in future by recording the
experiences and predicting behaviors based on cricket data. If you have any of the best
cricket predictions done using Big Data Analytics for ICC World Cup Cricket 2015 then do
share with us in comments section below.
What is Salesforce and 5 Reasons why it is the

Best CRM
31 Mar 2015
CRM software solutions were offered in an on-premise delivery model 15 years ago.
Companies installed the software on their servers and maintained it in-house. However, with
the on-premise CRM, IT companies had to take on the technical responsibilities which
include system design, infrastructure and server management.
This paved way for a novel kind of CRM - the Cloud based CRM. In any cloud based CRM,
applications are hosted by the vendor and organizations can gain access to data through the
web without having to worry about any technical aspects of managing it.
Cloud CRM has become popular over time upsetting the concept of the on-premise CRM
model because there was no software licensing fee involved, the organization did not
need dedicated IT staff or infrastructure. This has in turn reduced the cost of monthly
services and eased the set up process.
What is Salesforce CRM?

Organizations were in need of a cloud CRM to deal with all sorts of customer concerns
varying from marketing and sales to customer service - to streamline the enterprise and help
in saving valuable resources and time. Salesforce was founded in 1999 by former Oracle
executive Marc Benioff, Parker Harris, Dave Moellenhoff, and Frank Dominguez - with a
vision to reinvent the Cloud CRM model and now, Salesforce defines the new era of cloud
computing.
What does Salesforce do?
Salesforce is cloud based CRM software, developed to make organizations function
efficiently and profitably by reducing the cost of managing hardware infrastructure.
Salesforce offers a wide range of features in all the functional areas of a company:
Salesforce in Marketing Team: Salesforce professional edition helps the

marketing team of a company create and track various marketing campaigns to
measure the success rate and automatically provide leads to the sales team of
the company.
Salesforce in Customer Support Team: Salesforce also keeps a track

of various customer issues and tracks them for resolution based on various
escalation rules such as the importance of the client and elapsed time. This
improves customer satisfaction levels as the issues do not fall through loopholes
and are directly escalated to the next level.
Salesforce in Management: With visual dashboards and extensive

reporting features, Salesforce provides the management of a company with
visibility on what is happening in different teams.
Salesforce in Training: Salesforce has very robust training and support

features that are above industry standards. Salesforce users can easily find
answers to their questions from the extensive online help manual and the video
walk facility.
Salesforce in Application Integration: Salesforce can be integrated

with other systems to extend its functionality through the Salesforce business
app store - the AppExchange.
Salesforce CRM was awarded as the Worlds Most Innovative Company by Forbes
Innovators for 4 continuous years (2011-2014). Salesforce also featured in FORTUNEs 100
Best Companies to work for in 2012-2014.
With over 2 million customers (amassed in a short while) which include popular data giants
namely Fujitsu, Facebook, Coca-Cola, LOreal, Sony, and Vodafone - Salesforce CRM is
truly the undisputed market leader in the CRM Software market . Integrator Bluewolf
interviewed close to 450 client companies and concluded that 84% of the enterprises believe
in customer engagement as the primary key element for future growth of the enterprise.
Buyers perceive Salesforce as the platform integral for rendering qualitative customer
engagement experience.
Tien Tzuo, the present CEO of Zuora and ex-CMO of Salesforce said that CRM Software is
an entry point. What we (Salesforce) are planning to be is the business web. That means
that we want to be the platform for all business apps. When a business starts its day, they
open up their apps on us and all day, all their apps are running on our foundation.
Click here to know more about Salesforce training in ADM 201 and DEV 401
certifications.
5 REASONS WHY SALESFORCE IS THE BEST CRM
1.
CURRENT TRENDS AND SUSTAINED GROWTH

According to Gartner, the overall value of the Global CRM Software Market was 20.4 billion
USD by the end of 2013 - which is anticipated to increase by 13.7 % every year, making the
global CRM software market a billion dollar space. Salesforce is right at the top contributing
to 16% of the overall CRM Software market share, making Salesforce and CRM software
inseparable.
Yvonne Genovese, Vice President at Gartners Marketing Leaders Research said Marketing
will be the largest growing Salesforce CRM category through 2017. The International Data
Corporation (or IDC) expects the overall market for marketing automation to grow from $3.2
billion in 2010 to $4.8 billion in 2015.
Image Credit: marketrealist.com
According to the Gartner research reports, 94% of the Salesforce CRM revenue generated
from the Support and Subscription fees whilst only 6% of the revenue comes from
Professional Services. Salesforce is concentrating on Supports and Subscriptions to grow

its revenue exponentially.
Image Credit: marketrealist.com
The companys CEO Marc Benioff stated that Salesforce.com has recorded constant
currency and deferred revenue growth of 30% or more year-over year. The company
expects revenue to rise to between $6.45 billion and $6.50 billion in 2016.
2. VISION AND EXECUTION

With a complete long term vision and its high capability to execute, Salesforce CRM
Software claims the highest position in the Customer Relationship Management software
market. The ability to execute this vision on ground has helped Salesforce reach the top.
In 2014, Gartner, once again titled Salesforce as the leading CRM software. Salesforce
has maintained this position for 4 years in a row.Salesforce has emerged as the leader in the
Gartners Leader Magic quadrant for 2014 (the image below illustrates the leading position
of Salesforce when compared to other CRM software players).
3. STRATEGIC ACQUISITIONS
Salesforce has a smart acquisition policy to increase enhanced capabilities. As part of their
long-term vision, they were successful in numerous strategic acquisitions from the year
2006. Strategic Acquisitions have helped the Salesforce CRM Software build highly
broadened marketing eco system referred to as the Marketing Cloud.
Image Credit: salesforceben.com
In June 2013, ExactTarget was acquired for $2.5 billion which helped enhance the Marketing
Cloud through email campaign management and marketing automation. ExactTarget
acquisition has also resulted in Salesforce owning Pardot, an application for marketing
automation that primarily works in the area of online marketing campaigns. Pardot helps
boost sales by creating and deploying marketing campaigns. Pardot has a vital significance
in increased revenue and efficiency of Salesforce CRM Software.
The most recent strategic acquisition has been that of RelateIQ for $390 million which helps
eliminate manual data entry by automated tracking of relationships in the CRM space. This
will certainly be a critical value addition offering in Salesforce Marketing Cloud.
Image Credit: http://marketrealist.com/
Image Credit:marketrealist.com
4. HIGHLY DIVERSIFIED OFFERINGS

Salesforce is the innovator of the SaaS approach in the enterprise space. Its proficiency in
on-demand software helps reduce the costfor customers as it provides a common
networking, hardware and software platform. Customers can improve sales and enhance
communication through various SaaS offerings of Salesforce namely Salesforce Chatter,
Sales Cloud, and Service Cloud applications.
Image Credit: thoughtexecution.com
Salesforce acquired Heroku in 2010 to provide its customers with PaaS (Platform as a
Service) to provide support for various programming languages. Users can customize their
applications with developer tools like TheAppExchange and Database.com. With diverse
offerings and wide product portfolio, Salesforce is inventing the future while other
competitive CRM software applications like the Siebel are just trying to get into it.
5. Dawn of Salesforce1
Gartner has stated that by 2015, an overwhelming 60% of Internet users will have a
preference for mobile customer service applications, with various devices and applications
being available on a single platform.
To stay tuned with the increasing demand and growing trend, Salesforce launched
Salesforce1 in October 2013- an innovative CRM platform for software vendors, developers,
and customers for connecting applications and third-party services such as Dropbox,
Evernote and LinkedIn.Instant and customized Customer Service, On-Screen guided
support and live video support are just some of the remarkable features of Salesforce1
which contribute to its dominance in the CRM software space.
Salesforce1 has seen significant growth of active mobile application users which is whopping
96% with 46% increase in the active users for customized mobile applications. Thus,
Salesforce1 is successful in leveraging the growth in the Customer Relationship
Management Software market by meeting the increasing demand mobile devices service
providers.
What the future holds for Salesforce CRM Software?

With different Salesforce reviews on the web, it is palpable that though the cost
for Salesforce CRM Software service is more, has the best user-friendly interface when
compared to other CRM softwares such as Siebel, SugarCRM. According to IDC, the overall
Marketing Automation market is poised to touch $5.5 billion in 2016 from $3.7 billion in
2011which presents exceptional growth projections for Salesforce CRM in the marketing
category.
Salesforce is on the brink to provide an array of diversified cloud applications and services to
meet the unquenchable demand for cloud computing. Salesforce Customer Relationship
Management software is all set to grow exponentially in the CRM software market.
5 Big Data and Hadoop Use Cases in Retail

Analytics
02 Apr 2015
Data analytics - used for improving marketing efficiency and raising profits is not a new fad in
the retail industry. The retail market being a highly competitive space has companies
innovating in the field of consumer behavior analysis to find new buying trends.
Popularly termed as market basket analysis, it refers to the collection of data that comprises
of the various buying habits and the preferences of customers. This allows the retail outlets
to place say shampoo and conditioner or toothpaste and toothbrushes together, as the data
pattern shows that people who buy a shampoo will invariably also buy a conditioner. Placing
the two products together merely eliminates the choice of the customer and subtly influences
the purchase behavior.
A survey conducted in June 2013 by Gartner predicts that the Big Data spending in Retail
Analytics will cross the $232 billion mark by 2016. Gartner surveyed close to 2/3 rd of the 720
businesses and IT leaders who saidthey have already invested in Big Data or
expect to invest in Big Data by the end of June 2015. Gartner survey also projects
that 73% of the retail organizations plan to invest in Retail Analytics 2 years down the lane.
Retail Big Data Analytics Market is anticipated to grow from $1.8 billion in 2014 to $4.5 billion
dollars in 2019.
Retail Analytics truly started with Target having figured out, quite early on that data
analytics can take the consumer buying experience to a whole other level.
When Target statistician Andrew Pole built a data mining algorithm which ran test after test
analyzing the data, useful patterns emerged which showed that consumers as a whole
exhibit similar purchase behaviors. It got a little out of hand, when Target accurately
predicted that a teen girl was pregnant (even before her family knew) and sent her
customized products catalogue to ease her buying needs.
Today, the breed of consumers have changed significantly. Big Data related to consumer
purchase behavior is volatile, voluminous and the veracity of this big data has the big retail
companies scrambling to innovate new data mining techniques and resorting to cheaper
open source software, like Hadoop to accurately store and analyze data in real time.
Big retail corporations such as Tesco ($130 billion), Walmart ($473 billion) and Target ($73
billion) have all experienced a significant decline in their revenues due to the fickle and
unpredictable shopping habits of the new generation of customers.
The new millennial would prefer looking at a Smartphone for price comparison and
immediately placing the order without even walking down to the store. Companies like
Walmart and Tesco have realized that the only way to bring about a change in data analytics
is to partner with startup organizations that have in-built platforms to comprehend and hold
on to consumers through intense data analysis.
Need for Retail Big Data Analytics
The supermarket chain TESCO has 600 million records of retail data growing at rapid pace
of million records every week with 5 years of sales history and 350 stores. It would be
practically impossible to analyze this amount of data at once, with the help of legacy
systems. There would be a significant amount of data loss as the processing speed of
legacy systems is limited.
With increase in retail channels and increased demand for social media, consumers are able
to compare the services, products and the prices regardless of the fact that they shop online
or in retail stores. With access to pool of information, consumers interact with retail channels
via social media platforms by empowering themselves in influencing other customers to
make a shift from one brand or another through online review, comments and tweets.
Image Credit :slidershare.net

Challenges posed by Big Data in Retail
1) It is becoming more and more difficult for the retailers to predict the consumer buying
habits. Technology is a double-edged sword where one edge is sharpened with retailers
having access to more information regarding their consumers and on the other hand the
consumers are motivated to choose from a wide variety of options, which increases the risks
of change in consumer buying habits.
2) Trends in Retail are changing at a very rapid pace due to enhanced methods of
communications, changing technologies and varying consumer tastes making it difficult for
retailers to analyze the changing trends without Retail Analytics.
3) As the saying goes that a Happy Customer will tell 1 person but an unhappy customer will
tell 10. Thus, it is extremely important for retailers to employ sentiment analysis
using Hadoop, for precise and accurate predictions as the customers are unforgiving. Any
negative encounter with a particular retailer is intended to go viral on the web through
various social media platforms.
4) Customers probably cry, laugh and sigh on the promotions and advertisements of the
products as it strikes their sentimental heart strings. Retailers spend millions of dollars on
advertising their products but it is very likely that the consumers might not purchase the
products. Therefore, it is necessary for a retail organization to adopt retail analytics to
understand the customers purchase behavior.
5 Big Data and Hadoop Use Cases in Retail
1) Retail Analytics in Fraud Detection and Prevention
The recurrence of data infringements has rocketed to such a high point that every week
there is one mega retailer hit by frauds. Fraud Detection is a serious issue determined to
avoid losses and maintain the customers trust. Most commonly observed frauds include
fraudulent return of products purchased and stolen credit or debit card information.
Image Credit :deloitte.wsj.com

Manipulators are always inventing novel tools and technologies for fraud, and retailers must
employ retail analytics to identify fraudulent activities and prevent them before they take
place. With a swarm of big data technologies like Hadoop, MapReduce and Spark it is
possible to perform analysis on more than 50 Petabytes of data to accurately predict the
risks which was previously impossible.
Image Credit:slideshare.net
Giant retailer Amazon has an intensive program to detect and prevent credit card frauds,
which has led to 50% reduction in frauds within first 6 months. Amazon developed fraud
detection tools that use scoring approach in predictive analysis. This retail analytics depends
on huge datasets that contain not just financial information of the transactions but it also
keeps a track of browser information, IP address of the users and any other related technical
data that might help Amazon refine their Analytic models to detect and prevent fraudulent
activities.
2) Retail Analytics in localization and personalization of Customer
Driven promotions
Retailers personalize wide range of elements that include store formats, promotion
strategies, pricing of the products, staffing. Personalization may be dependent on various
factors such as demographics, location specific attributes (proximity to certain other
businesses) and the purchase behavior of the customer.
eCommerce companies like Amazon, eBay have mastered the art of personalized service.
Retailers are trying to do the same by providing customized messages, shopping offers, and
seasonal freebies. With Big data technologies like Hadoop personalized experience will
bring in paramount customer service resulting in happy customers.
Image Credit:bigdata.lgcns.com
Localization and personalization techniques require various analytical approaches to be
implemented that include behavioral targeting, price optimization and store site selection
analytics. If the motive is to localize clusters then a technique for clustering needs to be
used. Localization in retail sector is not usually geographically oriented; however retailers
can target pricing, offers and other product assortments depending on the behavior of the
customer to provide them a personalized shopping experience.
Image Credit:slideshare.net
Amazon has pioneered personalization strategy by using product based collaborative retail
analytics. Amazon provides data driven recommendations to customers depending on
previous purchase history, browser cookies, and wish lists.
3) Retail Analytics in Supply Chain Management
In 2012, Cognizants Sethuraman M.S. quoted in a paper,Big Datas Impact on
the Data Supply Chain - The worlds leading commercial information providers
deal with more than 200 million business records, refreshing them more than 1.5
million times a day to provide accurate information to a host of businesses and
consumers. They source data from various organizations in over 250 countries,
100 languages and cover around 200 currencies. Their databases are updated
every four to five seconds.
Big Data has the prospective to transform processes across various industries and this tech
trend could be the way to increase efficiency in the retail supply chain management. Supply
chain management is significant to the retailers in the long run. Retailers make every effort
to create optimized, flexible, global and event driven supply chain model to increase
efficiencies and enhance relationship with supply chain stakeholders.
Supply Chain Management is inefficient without Retail Big Data Analytics because it
would be very difficult to track individual packages in real time and gather useful information
related to shipments. Retail Analytics in Supply chain involves optimizing inventory,
replenishment, and shipment costs.
Metro Group retailer uses retail analytics to detect the movement of goods inside the stores
and display relevant information to the store personnel and customer. For example, if a
consumer takes an item into the trial room, the product recommendation system
recommends other related products while the customer is trying on the apparel. The store
personnel inform the customers whether the products are in sock or not. The retail analytics
system of Metro Group also keeps a track of the movement patterns on and off the shelf for
customer analytics for a later point in time. Retail Analytics also alert managers at Metro
Group about abnormalities in product by identifying unusual patterns, for example the
product is taken off from the shelf several times but it is not purchased.
4) Retail Analytics in Dynamic Pricing
100% price transparency is a pre-requisite - with customers marching towards comparison
between online and showroom prices. There is a need to build a dynamic pricing platform
with retail analytics that can power millions of pricing decisions amongst the biggest retailers.
Dynamic Pricing in Retail Analytics can be implemented in 2 ways1) Internal Profitability Intelligence Every online transaction is tracked at unit level
profitability by taking into consideration various variable costs such as vendor funding,
COGS (Cost of Goods Sold) and shipping charges.
2) External Competitor Intelligence- For a given set of retailer products, retail analytics
provide real time intelligence information about those products on competitors website with
corresponding prices.
Image Credit: onlinemarketexperts.com
Amazons analytical platform has a great advantage in dynamic pricing as it responds to the
competitive market rapidly by changing the prices of its products every 2 minutes (if
required) whilst other retailers change the prices of the products every 3 months.

Staples.com is another giant retailer leveraging retail analytics for dynamic pricing by
identifying various opportunities for price optimization to generate incremental revenue and
margin. Other retailers that leverage Retail Big Data Analytics are RadioShack and Groupon.
5) Retail Analytics in Integrated Forecasting
Forecasting demand and sale volume in retail is much more difficult than any other industry,
because of millions of products and their variants, seasonal impacts on the demand of a
product, changing trends in fashion, changing customer preferences, thousands of different
retail outlets and various other promotional influences affect the demand and sale volume of
any product. . All this results in huge amounts of Big Data being generated which can be
leveraged through Retail Analytics to get a precise view of demand and sales volume.
Big Data in Retail can be analyzed to generate millions of predictions on daily basis at a
product level or at store level.
Image Credit : slideshare.net

Staples, a US based supply chain store uses Hadoop and other Big Data technologies to
forecast sales by processing close to 10 millions of data transaction every week as input and
forecasts the daily and weekly sales of the office supplies across 1100 retail outlets in US.
Staples uses these predictions to target, market promotions based on geographical area.
Staples was able to see significant drop - close to 25% in their overall promotion costs with
the use of Retail Analytics.
Retailers are making the most out of Big Data technologies like Hadoop to reduce costs and
maximize profitability. Big Data in retail needs to be integrated with best in class
technologies like Hadoop to gain insights that can help retailers can react quickly to
changing trends.
If there are any other efficient Big Data and Hadoop uses cases in the Retail sector that
exploit the power of big data technologies then please do share with us in the comments
below!.
Top 6 Hadoop Vendors providing Big Data

Solutions in Open Data Platform
08 Apr 2015
Marcus Collins, a Research Analyst at Gartner said Big data analytics and the Apache
Hadoop open source project are rapidly emerging as the preferred Big Data
solutions to address business and technology trends that are disrupting
traditional data management and processing.
Image Credit: alliedmarketresearch.com
Allied Market Research predicts that the Hadoop-as-a-Service market will grow to $50.2
billion by 2020. The Global Hadoop Market is anticipated to reach $8.74 billion by 2016,
growing at a CAGR of 55.63 % from 20122016.
Wikibons latest market analysis states that- spending on Hadoop software and
subscriptions accounted for less than 1% of $27.4 billion or approximately $187 million in
2014 in overall Big Data spending. Wikibon predicts that the spending on Hadoop software
and subscriptions will increase to approximately $677 million by the end of 2017, with overall
big data market anticipated to reach the $50 billion mark.
Image Credit: Hortonworks.com
Big Data and Hadoop are on the verge of revolutionizing enterprise data management
architectures. Cloud and enterprise vendors are competing to venture a claim in the big data
gold-rush market with pure plays of several top Hadoop Vendors. Apache Hadoop is an
open source big data technology with HDFS, Hadoop Common, Hadoop MapReduce and
Hadoop YARN as the core components .However, without the packaged solutions and
support of commercial Hadoop vendors, Hadoop distributions can just go unnoticed.
Need for Commercial Hadoop Vendors
Today, Hadoop is an open-source, catch-all technology solution with incredible scalability,
low cost storage systems and fast paced big data analytics with economical server costs.
Hadoop Vendor distributions overcome the drawbacks and issues with the open source
edition of Hadoop. These distributions have added functionalities that focus on:
Support: Most of the Hadoop vendors provide technical guidance and

assistance that makes it easy for customers to adopt Hadoop for enterprise level
tasks and mission critical applications.
Reliability: Hadoop vendors promptly act in response whenever a bug is

detected. With the intent to make commercial solutions more stable, patches and
fixes are deployed immediately.
Completeness: Hadoop vendors couple their distributions with various

other add-on tools which help customers customize the Hadoop application to
address their specific tasks.
Here is a list of top Hadoop Vendors who will play a key role in big data market growth for
the coming years:
Image Credit: randomramblings.postach.io
1) Amazon Web Services Elastic MapReduce Hadoop Distribution

The Amazon Hadoop Vendor has been there since the dawn of Hadoop, and Hadoopers
boast of its success stories for the innovative Hadoop distributions in the open data platform.
AWS Elastic MapReduce renders an easy to use and well organized data analytics platform
built on the powerful HDFS architecture. With major focus on map/reduce queries, AWS
EMR exploits Hadoop tools to a great extent by providing a high scale and secure
infrastructure platform to its users. Amazon Web Services EMR is among one of the top
commercial Hadoop distributions with the highest market share leading the global market.
AWS EMR handles important big data uses like web indexing, scientific simulation, log
analysis, bioinformatics, machine learning, financial analysis and data warehousing. AWS
EMR is the best choice for organizations who do not want to manage thousands of servers
directly - as they can rent out this cloud ready infrastructure of Amazon for big data analysis.
DynamoDB is another major NoSQL database offering by AWS Hadoop Vendor that was
deployed to run its giant consumer website. Redshift is a completely managed petabyte
scale data analytics solution that is cost effective in big data analysis with BI tools. Redshift
has costs as low as $1000 per terabyte annually. According to Forrester, Amazon is the
King of the Cloud - for companies in need of public cloud hosted Hadoop platforms for big
data management services.
2) Hortonworks Hadoop Distribution
Hortonworks Hadoop vendor, features in the list of Top 100 winners of Red Herring.
Hortonworks is a pure play Hadoop company that drives open source Hadoop distributions
in the IT market. The main goal of Hortonworks is to drive all its innovations through the
Hadoop open data platform and build an ecosystem of partners that speeds up the process
of Hadoop adoption amongst enterprises.
Principal Analyst of Forrester, Mike Gualtieri said "Where the open source community
isn't moving fast enough, Hortonworks will start new projects and commit
Hortonworks resources to get them off the ground."
Apache Ambari is an example of Hadoop cluster management console developed by
Hortonworks Hadoop vendor for provision, managing and monitoring Hadoop clusters. The
Hortonworks Hadoop vendor is reported to attract 60 new customers every quarter with
some giant accounts like Samsung, Spotify, Bloomberg and eBay. Hortonworks has
garnered strong engineering partnerships with RedHat, Microsoft, SAP and Teradata.
Hortonworks has grown its revenue at a rapid pace. The revenue generated by Hortonworks
totaled $33.38 million in first nine months of 2013 which was a significant increase by
109.5% from the previous year. However, the professional services revenue generated by
Hortonworks Hadoop vendor increases at a faster pace when compared to support and
subscription services revenue.
3) Cloudera Hadoop Distribution
Cloudera Hadoop Vendor ranks top in the big data vendors list for making Hadoop a reliable
platform for business use since 2008.Cloudera, founded by a group of engineers from
Yahoo, Google and Facebook - is focused on providing enterprise ready solutions of Hadoop
with additional customer support and training. Cloudera Hadoop vendor has close to 350
paying customers including the U.S Army, AllState and Monsanto. Some of them boast of
deploying 1000 nodes on a Hadoop cluster to crunch big data analytics for one petabyte of
data. Cloudera owes its long term success to corporate partners - Oracle, IBM, HP, NetApp
and MongoDB that have been consistently pushing its services.
Cloudera Hadoop vendor is just on the right path towards its goal with 53% of the Hadoop
market when compared to 11% of Hadoop Market possessed by MapR and 16% by
Hortonworks Hadoop vendors. Forrester says Clouderas approach to innovation is to
be loyal to core Hadoop but to innovate quickly and aggressively to meet
customer demands and differentiate its solution from those of other commercial
Hadoop vendors.
4) MapR Hadoop Distribution

MapR has been recognized extensively for its advanced distributions in Hadoop marking a
place in the Gartner report Cool Vendors in Information Infrastructure and Big Data, 2012.
MapR has scored the top place for its Hadoop distributions amongst all other vendors.
MapR has made considerable investments to get over the obstacles to worldwide adoption
of Hadoop which include enterprise grade reliability, data protection, integrating Hadoop into
existing environments with ease and infrastructure to render support for real time operations.
In 2015, MapR plans to make further investments to maintain its significance in the Big Data
vendors list. Apart from this MapR is all set to announce its technical innovations for Hadoop
with the intent of supporting business-as-it-happens- to increase revenue, mitigate risks and
reduce costs.
The below image illustrates comparison of the top 3 Hadoop vendors that will play a deciding
role to make a better choice.
Image Credit: experfy.com
5) IBM Infosphere BigInsights Hadoop Distribution

IBM Infosphere BigInsights is an industry standard IBM Hadoop distribution that combines
Hadoop with enterprise grade characteristics.IBM provides BigSheets and BigInsights as a
service via its Smartcloud Enterprise Infrastructure .With IBM Hadoop distributions users can
easily set up and move data to Hadoop clusters in no more than 30 minutes with data
processing rate of 60 cents per Hadoop cluster, per hour. With IBM BigInsights innovation,
customers can get to market at a rapid pace with their applications that incorporate
advanced Big Data analytics by harnessing the power of Hadoop.
6) Microsoft Hadoop Distribution

Forrester rates Microsoft Hadoop Distribution as 4/5- based on the Big Data Vendors
current Hadoop Distributions, market presence and strategy - with Cloudera and
Hortonworks scoring 5/5
Microsoft is an IT organization not known for embracing open source software solutions, but
it has made efforts to run this open data platform software on Windows. Hadoop as a service
offering by Microsofts big data solution is best leveraged through its public cloud product
-Windows Azures HDInsight particularly developed to run on Azure. There is another
production ready feature of Microsoft named Polybase that lets the users search for
information available on SQL Server during the execution of Hadoop queries. Microsoft has
great significance in delivering a growing Hadoop Stack to its customers.
According to analyst Mike Gualtieri at Forrester: Hadoops momentum is unstoppable
as its open source roots grow wildly into enterprises. Its refreshingly unique
approach to data management is transforming how companies store, process,
analyze, and share big data.
Commercial Hadoop Vendors continue to mature overtime with increased worldwide
adoption of Big Data technologies and growing vendor revenue. There are several top
Hadoop vendors namely Hortonworks, Cloudera, Microsoft and IBM. These Hadoop vendors
are facing a tough competition in the open data platform. With the war heating up amongst
big data vendors, nobody is sure as to who will top the list of commercial Hadoop vendors.
With Hadoop buying cycle on the upswing, Hadoop vendors must capture the market share
at a rapid pace to make the venture investors happy.
Cloud Computing vs. Distributed Computing

11 Apr 2015
Global Industry Analysts predict that the global cloud computing services market is
anticipated to reach $127 billion by the end of 2017.
Most organizations today use Cloud computing services either directly or indirectly. For
example when we use the services of Amazon or Google, we are directly storing into the
cloud. Using Twitter is an example of indirectly using cloud computing services, as Twitter
stores all our tweets into the cloud. Distributed and Cloud computing have emerged as novel
computing technologies because there was a need for better networking of computers to
process data faster.
Centralized Computing Systems, for example IBM Mainframes have been around in
technological computations since decades. In centralized computing, one central computer
controls all the peripherals and performs complex computations. However, centralized
computing systems were ineffective and a costly deal in processing huge volumes of
transactional data and rendering support for tons of online users concurrently. This paved
way for cloud and distributed computing to exploit parallel processing technology
commercially.
What is Distributed Computing?

According to Tanenbaum, Van Steen editors of the book Distributed Systems-Principles and
Paradigm, a distributed computing is defined as A distributed system is a collection of
independent computers that appears to its users as a single coherent system
Distributed Computing can be defined as the use of a distributed system to solve a single
large problem by breaking it down into several tasks where each task is computed in the
individual computers of the distributed system. A distributed system consists of more than
one self-directed computer that communicates through a network. All the computers
connected in a network communicate with each other to attain a common goal by making
use of their own local memory. On the other hand, different users of a computer possibly
might have different requirements and the distributed systems will tackle the coordination of
the shared resources by helping them communicate with other nodes to achieve their
individual tasks.
Generally, in case of individual computer failures there are toleration mechanisms in place.
However, the cardinality, topology and the overall structure of the system is not known
beforehand and everything is dynamic.
Distributed Computing System Examples
World Wide Web
Social Media Giant Facebook
HadoopsDistributed File System (HDFS)
ATM
Cloud Network Systems(Specialized form of Distributed Computing

Systems)
Google Bots, Google Web Server, Indexing Server

To a normal user, distributed computing systems appear as a single system whereas
internally distributed systems are connected to several nodes which perform the designated
computing tasks. Lets consider the Google web server from users point of view. When
users submit a search query they believe that Google web server is single system where
they need to log in to Google.com and search for the required term. What really happens is
that underneath is a Distributed Computing technology where Google develops several
servers and distributes them in different geographical locations to provide the search result
in seconds or at time milliseconds.
The below image illustrates the working of master/slave architecture model of distributed
computingarchitecture where the master node has unidirectional control over one or more
slave nodes. The task is distributed by the master node to the configured slaves and the
results are returned to the master node.
Image Credit :researchgate.net
Benefits of Distributed Computing

1) Distributed computing systems provide a better price/performance ratio when compared
to a centralized computer because adding microprocessors is more economic than
mainframes.
2) Distributed Computing Systems have more computational power than centralized

(mainframe) computing systems. Distributed Computing Systems provide incremental
growth so that organizations can add software and computation power in increments as and
when business needs.
Cloud computing
In a world of intense competition, users will merely drop you, if the application freezes or
slows down. Thus, the downtime has to be very much close to zero. For users, regardless of
the fact that they are in California, Japan, New York or England, the application has to be up
24/7,365 days a year. Mainframes cannot scale up to meet the mission critical business
requirements of processing huge structured and unstructured datasets. This paved way for
cloud distributed computing technology which enables business processes to perform critical
functionalities on large datasets.
Facebook has close to 757 million active users daily with 2 million photos viewed every
second, more than 3 billion photos uploaded every month, and more than one million
websites use Facebook Connect with 50 million operations every second. Distributed
Computing Systems alone cannot provide such high availability, resistant to failure and
scalability. Thus, Cloud computing or rather Cloud Distributed Computing is the need of the
hour to meet the computing challenges.
What is Cloud computing?

David Cearley, VP of Gartner said "Cloud computing is a major technology trending
that has permeated the market over the last two years. It sets the stage for a
new approach to IT that enables individuals and businesses to choose how they'll
acquire or deliver IT services, with reduced emphasis on the constraints of
traditional software and hardware licensing models
Cloud computing is a style of computing where massively scalable and flexible IT-related
capabilities are delivered as a service to the users using Internet technologies, services may
include: infrastructure, platform, applications, and storage space. The users pay for these
services, resources they actually use. They do not need to build infrastructure of their own.
Image Credit: imscindiana.com
According to Tech Target Cloud computing enables companies to consume

competing resources as a utility just like electricity rather than having to
build and maintain computing infrastructures in-house.
Cloud computing usually refers to providing a service via the internet. This service can be
pretty much anything, from business software that is accessed via the web to off-site storage
or computing resources whereas distributed computing means splitting a large problem to
have the group of computers work on it at the same time.
Examples of Cloud computing
YouTube is the best example of cloud storage which hosts millions of user
uploaded video files.
Picasa and Flickr host millions of digital photographs allowing their users
to create photo albums online by uploading pictures to their services servers.
Google Docs is another best example of cloud computing that allows users
to upload presentations, word documents and spreadsheets to their data servers.
Google Docs allows users edit files and publish their documents for other users
to read or make edits.
Benefits of Cloud computing
Image Credit :belden.com
1) A research has found out that 42% of working millennial would compromise with the
salary component if they can telecommute, and they would be happy working at a 6% pay
cut on an average. Cloud computing globalizes your workforce at an economical cost as
people across the globe can access your cloud if they just have internet connectivity.
2) A study found that 73% of knowledge workers work in partnership with each other in
varying locations and time zones. If an organization does not use cloud computing, then the
workers have to share files via email and one single file will have multiple names and
formats. With the innovation of cloud computing services, companies can provide a better
document control to their knowledge workers by placing the file one central location and
everybody works on that single central copy of the file with increased efficiency.
Frost & Sullivan conducted a survey and found that companies using cloud computing
services for increased collaboration are generating 400% ROI. Ryan Park, Operations
Engineer at Pinterest said "The cloud has enabled us to be more efficient, to try out
new experiments at a very low cost, and enabled us to grow the site very
dramatically while maintaining a very small team."
Cloud Computing vs. Distributed Computing
1) Goals
The goal of Distributed Computing is to provide collaborative resource sharing by connecting
users and resources. Distributed Computing strives to provide administrative scalability
(number of domains in administration), size scalability (number of processes and users), and
geographical scalability (maximum distance between the nodes in the distributed system).
Cloud Computing is all about delivering services or applications in on demand environment
with targeted goals of achieving increased scalability and transparency, security, monitoring
and management.In cloud computing systems, services are delivered with transparency not
considering the physical implementation within the Cloud.
2) Types
Distributed Computing is classified into three types-
Distributed Information Systems The main goal of these systems is to

distribute information across different servers through various communication
models like RMI and RPC.
Distributed Pervasive Systems-These kind of distributed systems

consist of embedded computer devices such as portable ECG monitors, wireless
cameras, PDAs, sensors and mobile devices. Distributed Pervasive systems are
identified by their instability when compared to more traditional distributed
systems.
Distributed Computing Systems In this kind of systems, the

computers connected within a network communicate through message passing
to keep a track of their actions.
Cloud Computing is classified into 4 different types of cloud
Private Cloud A cloud infrastructure dedicated to a particular IT

organization for it to host applications so that it can have complete control over
the data without any fear of security breach.
Public Cloud-A cloud infrastructure hosted by service providers and made

available to the public. In this kind of cloud, customers have no control or
visibility about the infrastructure. For example, Google and Microsoft own and
operate their own their public cloud infrastructure by providing access to the
public through Internet.
Community Cloud-A multi-tenant cloud infrastructure where the cloud is

shared by several IT organizations.
Hybrid Cloud-A combination or 2 or more different types of the above

mentioned clouds (Private, Public and Community) forms the Hybrid cloud
infrastructure where each cloud remains as a single entity but all the clouds are
combined to provide the advantage of multiple deployment models.
3) Characteristics
In Distributed Computing, a task is distributed amongst different computers for computational
functions to be performed at the same time using Remote Method Invocations or Remote
Procedure Calls whereas in Cloud Computing systems an on-demand network model is
used to provide access to shared pool of configurable computing resources.
Distributed Cloud Computing has become the buzz-phrase of IT with vendors and
analysts agreeing to the fact that distributed cloud technology is gaining traction in the minds
of customers and service providers. Distributed Cloud Computing services are on the
verge of helping companies to be more responsive to market conditions while restraining IT
costs. Cloud has created a story that is going To Be Continued, with 2015 being a
momentous year for cloud computing services to mature.
Looking for a perfect match-Why not try big data

analysis this time?
14 Apr 2015
A couple of months ago an article was circulating on wired.com about how Chris McKinlay, a
35 year old UCLA Ph.D. graduate devised an algorithm to hack OkCupid by optimally using
the data that was already there. McKinlay was not satisfied with the compatible match
making algorithms the dating sites were using as it did not help him find his Mrs. Perfect with
similar tastes who could become his soul mate. He devised a match making algorithm that
suggested 20,000 compatible women with his tastes and preferences. After dating several
women matching his compatibility percentage, he finally found his soul mate Tien Wang on
his 88 th date. Technological innovations in big data paved for perfect match making online.
Image Credit: wired.com

Online dating statistics state that, of the 54 million singles in US, close to 40 million users
have signed up with one of the popular dating sites like Match.com, OkCupid, eHarmony,
Hinge or Tinder. Users of online dating websites spend an average of 22 minutes every time
they visit an online dating website and close to 12 hours a week in online dating activities.
66% of the people have gone on a date with someone they have met online through these
online dating sites.
An analysis of online dating statistics shows that 1 in 10 Americans use a dating site and
25% of them have found their soul mates through these websites. Kelton Study in 2015,
found that 1/3 rdof Americans (close to 80 million) people have used an Online Dating App or
a site for finding their soul mate. Big data analysis has never been so amusing with millions
of American singles pouring their hearts (and mobile phone batteries) out in search of true
love.
According to the market research by IBIS world in 2014, Online dating industry in US is
worth 2 billion dollars which has grown at the rate of 3.5% since 2008 and the Canadian
dating industry amounts to $153 million. Juniper Research estimates that due to the
excessive use of mobile phone apps, the online dating market is all set to rise from $1 billion
in 2011 to $2.3 billion in 2016. With intense competition in Online Dating industry, companies
are making every effort to maintain the credibility by matching the perfect partner to the
perfect person at the perfect time.
Image Credit: ibisworld.com

Match.com is 20 years old now that has helped create 517000 relationships, 92,000
marriages and 1 million babies. Match.com claims that it has more than 70 terabytes of data
about its customers that helps them unlock the mysteries of their heart. According to
eHarmony, 542 eHarmony users get married daily in US.
Image Credit: washingtonpost.com

The secret behind perfect match making by OkCupid (acquired by Match.com in 2011 for
$50 million), Match.com and eHarmony is the big data analysis techniques behind the
scenes.
In a world of data driven supremacy, it is not possible to connect people unless dating sites
connect with online dating data. Online dating data is generally in the form of a questionnaire
that helps users describe themselves about their likes, dislikes, interests, passions and other
useful information. It is not a short questionnaire that you answer what is your favorite sport
and color and the results help you find your life partner. The Online dating companies
provide questionnaires of up to as much as 400 questions. Users have to answer questions
on different topics varying from hypothetical situations to political views and taste
preferences to increase their online dating success rate.
Image Credit: linkurio.us

Dating sites need to generate as much online dating data as possible for more probability of
success in matching up partners who like each other. The questionnaire though helps in
generating large datasets, there are still some weaknesses to the nature of online dating
data collected through this method which makes big data analytics in dating
more challenging. There is high probability that users might not be honest in answering the
questionnaire or users end up providing inaccurate information unintentionally.
For example, females usually tend to lie about their weight, age and build while males might
provide inaccurate information about their height, income and age intentionally. Another
instance where a user might end up providing inaccurate data unintentionally is that he/she
might believe that they love listening to classical music but the accuracy of this data can
better be determined by analysis of the Spotify playlist or iTunes history.
Weaknesses in online dating data can lead to an incompatible match. Some of the dating
websites are making efforts to generate online dating data for big data analytics by
analyzing the behavior of users on the dating website based on the kind of profiles they visit.
Few other dating sites employ collaborative filtering (preferences and tastes of several users
are grouped into sets of similar users) to recommend dates based on their preferences and
tastes.
The unpredictability of human behavior has made big data analytics the key to finding Mr. or
Mrs. Right through online dating sites or apps because big data never lies .Online Dating
data is collected from social media platforms, credit rating agencies, history of online
shopping websites and various online behaviors like media consumption. Online Dating sites
then apply big data analytics to the treasure trove of collected information which helps them
determine the attributes that are attractive to online daters so that they can provide better
matches and perfect soul mates to their customers. With sophisticated technology in place,
Big Data Analytics promises to help you find true love via various online dating algorithms
and predictive analytics by sifting through a store of big data of millions of user profiles.
Online Dating giants like Match.com, eHarmony and OkCupid collect online dating data for
big data analytics from Facebook profiles, online shopping pages to determine the likes and
dislikes of a person as the data from these sites is much more helpful in predicting human
behavior based on actions than what the users fill out in the questionnaire.
A McKinsey report states that Companies must be able to apply advanced analytics
to the large amount of structured and unstructured data at their disposal to gain
a 360-degree view of their customers. Their engagement strategies should be
based on an empirical analysis of customers recent behaviors and past
experiences with the company, as well as the signals embedded in customers
mobile or social-media data.
Match.com provides its users with a questionnaire of 15 to 100 questions and then points
are allocated to the user based on the pre-defined parameters in the system such as
religion, income, education, hair color, age, etc. The users are then matched to people who
have similar points.Match.com uses advanced big data analytics to find out any
discrepancies in what people actually do on the website and what they actually confess. If
any discrepancies are found, the match making algorithms adjust the compatible match
results based on this behavior.
Amarnath Thombre, President at Match.com said- People have a check list of what
they want, but if you look at who they are talking to, they break their own rules.
They might list money as an important quality in a partner, but then we see
them messaging all the artists and guitar players.
Match.com does not take any risk in determining the accuracy of online dating data for big
data analysis.Match.com has started using facial recognition technology that helps them in
finding out the category of matches that the user prefers and highlight the features that
users are more attracted to.
Big data professionals at Match.com say that even if people are not so specific about the
height, weight, hair color or race they definitely have some kind of facial shape they want to
go for in their partner.Match.com aims to find a persons type by facial feature analysis so
that they can pair them up with the category of people who fit their type. These exclusive
services cost 5000 USD for 6 months; however, Match.com is willing to pay the price as it
gives them a sharper edge in competitive world.
With more than 565,000 couples married successfully and 438 people in US saying I Do
every day because of eHarmony, the credit is owed to IBM Big Data and Analytics product
IBM Pure Data System for Hadoop that renders personalized matches accurately and
quickly.Statistics on Online Dating site eHarmony show that it generates approximately 13
million matches a day for its 54 million user base and altogether has more than 125 TB of
data to analyze - which increases every day.
eHarmony asks its users to fill up a questionnaire of 400 questions when signing up which
helps them collect online dating data based on physical traits, location based preferences,
hobbies, passions and much more. Dataset of eHarmony is greater than 4 TB of data,
photos excluded. The best thing is that the match making algorithms of eHarmony use all the
online dating data it collects to find the perfect match for its users. The 400 questions
questionnaire is not the end. It collects data on the behavior of users in website such as how
many pictures they upload to the database, how many times they log in, what kind of profile
they visit frequently, etc. The data collected is sorted by specialized analysis algorithms
which help users find a perfect match.
Jason Chunk, Vice President of eHarmony said - From the data, you can tell who is
more introverted, who is likely to be an initiator, and we can also see if we give
people matches at certain times of the day, they would be more likely to make
communication with their matches. It kind of snowballs from there. We use a
number of tools on top of that, as well.
eHarmony uses MongoDB to ease the match making process for couples. Big Data and
machine learning processes of eHarmony use a flow algorithm which process a billion
prospective matches a day. The compatibility matching system of eHarmony was initially
built on RDBMS but it took more than 2 weeks for the matching algorithm to
execute.eHarmony has successfully reduced the time of execution by 95%( less than 12
hours) for the compatibility matching system algorithm to run by switching to MongoDB.
It is evident that big data plays a vital role in online dating revolution. Dating companies are
harnessing the power of big data analytics to become perfectionists in helping people find
true love online. As dating sites continue to collect tons of online dating data through
different sources and refine their match making algorithms to harness the power of big data,
we are not far witnessing the day when dating sites will know better than us on who our soul
mate is.
Emerging Trends in Big Data Analysis for 2015

17 Apr 2015
Gartner predicts 85% of Fortune 500 companies will exploit big data for competitive
advantage in 2015. IDC also forecasts that Big Data Analytics market will outpour from $3.2
billion in 2010 to $17 billion in 2015 with estimates that the Big Data Analytics services
market is growing 6 times faster than the entire IT sector. IDC estimates that cloud based big
data analytics is expected to grow 3 times faster than the on-premise solutions in 2015.
Image Credit : hpc-asia.com

A recent big data news on Forbes highlighted how An Apple and IBM each day can keep
you healthy. Apple announced its partnership with IBM to use big data analytics in
transforming digital health - to save lives by using its renowned supercomputer Watson, to
crunch healthcare data collected through Apples gadgets.
Last year when Twitter and IBM announced their partnership it seemed an unlikely pairing,
but the recent big data news on New York Times about this partnership took a leap forward
with IBMs Watson all set to mine Tweets for sentiments. The new analytics insight from IBM
will harvest big data from millions of Tweets and make use of Watson supercomputer to
analyse these Tweets for sentiment and behaviour.
These Big Data trends show how organizations are harnessing the power of big data and
making technological advancements in big data analytics to get competitive advantage. Big
data analytics is making waves in every industry sector with novel tools and technology
trends. The ability to bind the intensifying amount of big data that is generated - has
transformed almost every sector, such as decoding human DNA cells in
minutes, pinpointing marketing efforts, controlling blood pressure levels, tracking calories
consumed,finding true love by predicting human behaviour, predicting a players
performance level based on historical data, foiling terrorist attacks, provide
personalized medicine to cancer patients, personalized shopping recommendations for
users, etc.
Hadoop, NoSQL, MongoDB, and Apache Spark are the buzzwords with big data
technologies - reverberating to leave a digital trace of data in everyones life which can be
used for analysis. The big data analytics market in 2015 will revolve around the Internet of
Things (IoT), Social media sentiment analysis, increase in sensor driven wearables, etc.
1) Big Data Analysis to drive Datafication
Eric Schmidt, Executive Chairman at Google says: From the dawn of civilization until
2003, humankind generated five Exabytes of data. Now we produce five
Exabytes every two daysand the pace is accelerating.
The process that makes a business, data driven is by collecting huge data from various
sources and storing them in centralized places to find new insights that lead to better
opportunities - can be termed as Datafication. Datafication will take big data analysis to new
heights - into real insights, future predictions and intelligent decisions.
A recent CivSource news article highlighted the creation of a big data transit team in Toronto
routing path - for big data analytics in transportation sector. Tom Tom, global leader in Traffic,
Navigation and Map products found that in Vancouver, Montreal and Toronto, commuters
lose an average of 84 hours a year because of being delayed due to heavy traffic. As a
solution to this problem, Toronto created a big data transit team for analysis of big data in the
transportation services department. They partnered with McMaster University to analyse
historical travel data. To establish Toronto as a truly smart city, it has requested vendors to
showcase proven products for measuring and monitoring traffic and travel.
Datafication is not a new trend but the speed with which data is being generated in real time
operational analytics systems is breath-taking. This is likely to bring about novel trends in big
data analytics. Datafication of organizations will soon impact our lives in this fast changing
world by formulating a data driven society.
Image Credit : slideshare.net

Datafication and IoT (Internet of Things) is the upcoming trend in 2015. The number of
connected devices to the Internet is anticipated to be more than 25 billion by the year 2020,
according to Gartner. Internet of Things connects conventional devices and products to the
web for analysis. Organizations can think of several prospective aspects such as - how
people move at their workplace (workforce analytics), how various products are used
(product behaviour analytics) and how a driver behaves on the road (transportation
analytics).
People expect that if they are hungry, they want their SmartBand to provide them
suggestions and route map to the nearest Caf. They would not mind if there is a voice
enabled mobile application for pre-ordering drinks and meals while driving, which displays a
notification when their order is ready. This might seem to have a futuristic ring to it but smart
applications and gadgets today learn more about the wants and requirements of users. They
generate real-time big data that will help businesses serve their customers better through
intense analytic processes.
2015 will sense more sensor driven datafication that will make human lives more datafied in
various fields such as datafication of cultures, datafication of relationships ,emotions and
sentiments, datafication of back-office and offline processes, datafication of speech and
much more to go. Many organizations are foraying into the business of sensor driven
datafication that will focus on extensive research and development so that they can gain a
gradual increase in the market share.
Coffee Vending Machine that can interact with a persons mattress to sense when she or he
is going to wake up , post a notification to the users Smartphone asking which flavour would
they prefer as they wake up and automatically order those coffee beans from Amazon when
the user is running low on supplies is the wave of future analytics.
2) Big Data Analytics to gain power of novel Security tools

Technology information source week recently published an update about the new funding
investment plans of security analytics firm Niara. Niara is building its big data security
analytics platform to detect many sophisticated threats that existing security tools cannot
detect. With a funding budget of $20 million, the security platform is anticipated to be
available by second half of 2015.
Big data security advances is another important emerging trend for 2015 to protect the
public as a whole. Target data breach is recorded as one of the largest data breaches in
history of US. It was reported that close to 40 million debit and credit card numbers have
been stolen during the busiest shopping time of the year (Nov 27 Dec 15).Big data
breaches have been quite common in US, every week there is some announcement in
media from the US government or the businesses directly that they have been impacted by
big data breach.
Image Credit: forbes.com

Big data breaches make a fuss over the reputation of the organizations leading to various
legal, regulatory and financial consequences. Businesses are targeting to invest in advanced
methods of encryption to prevent big data breaches by adopting novelties in security
technologies and training staff on security aspects.
The capabilities of traditional security software systems are anticipated to increase in 2015
with novel techniques employed for data collection, storage and analysis. Data assets of
organizations are increasingly maturing and stakeholders plan to adopt various predictive
analysis techniques to bridge the security gap in big data by predicting probable data
threats. Industry experts have started looking at big data analysis as a robust tool for
protecting data security by identifying signals of tenacious security threats. In 2015, big data
security has the potential to make more noise in the market as an emerging trend.
Neil Cook, CTO of Cloudmark says- In identifying the source of a current spam attack by
tracking where the attacker has sourced target email addresses, it is possible to identify
other address lists that attacker has downloaded and use that information to predict, and
prevent, the next attack.
2015 will welcome the dawn of big data analytics security tools to combine text mining,
ontology modelling and machine learning to provide comprehensive and integrated security
threat detection, prediction and prevention programs.
Click here to know more about our IBM Certified Hadoop Developer course.
3) Deep Learning soon to become the buzz word in Big Data Analysis
With deep learning, there could be a day when big data analysis would be used to identify
different kinds of data such as colors, objects or shapes in a video. The world will experience
a great pull from big data vendors in cognitive engagement and advanced analytics. Lets
hope for some innovative hit in deep learning to real time business situations by end of 2015.
Googles latest deep learning system built on recurrent neural networks aims to identify
motion in videos and interpret various objects present in the video by feature pooling
networks.
Deep Learning is a machine learning technique based on artificial neural networks. Deep
learning involves ingesting big data to neural networks to receive predictions in response.
Deep Learning is still an evolving technology but has great potential to solve business
problems.IT giants are making researches in deep learning to strive hard to build up
customer choice and expectations.
Image Credit : Youtube.com

Deep learning helps systems to find out items of interest from huge amount of binary and
unstructured big data without the need of specific programming instructions or models. Deep
learning employs artificial neural networks to find patterns in large unstructured data sets
without having to program specific functions manually.
For instance, there was a deep learning algorithm which identified from the data in Wikipedia
- that Texas and California are popular states in US. The biggest difference here, from the
previous machine learning algorithm and deep learning algorithm in the example is that - the
algorithm need not be modelled to understand the fundamental concepts of state and
country.
It is just the beginning- big data has lots to be explored with diverse and unstructured text
by using advanced analytic techniques. Deep learning is a machine learning derivation in the
making which is yet to be tested with real time applications.
4) NoSQL Matures- Increased demand for more, better NoSQL
According to Allied Market Research Global NoSQL Market- Size, Industry Analysis, Trends,
Opportunities, Growth and Forecast, 2013 - 2020", the global NoSQL market, is projected to
reach$4.2 billionby 2020, recording a CAGR of 35.1% during 2014 - 2020. The matured
usage of NoSQL in big data analysis will drive the NoSQL market as it gains momentum.
There are approximately 20 NoSQL databases each one having its own gaining expertise.
For instance, a NoSQL graph database helps analyse network of relationships between
sales staff and customers more quickly than a RDBMS. In a survey conducted in 2014, 24%
of the people said that they would prefer a NoSQL database as it is faster and render more
flexible development than RDBMS whereas 21% of the people said they would prefer a
NoSQL database because of it lower software and deployment cost.
Image Credit : dataversity.com

Among the different types of NoSQL databases, the key-value pair database stores are
anticipated to gain more traction because of their extensive usage in e-commerce, social
network management and web session management. The key-value stores are driving the
NoSQL market, though NoSQL databases have been there around quite some time but they
are gaining momentum because of excessive businesses need of big data analysis.
For instance, if a PwC client installs sensors on store shelves to monitor what are the
available products, how much time the customers take to handle them and for how long do
the customers stand in front of these shelves-undoubtedly, the sensors will generate tons of
data that will grow exponentially over time. Under such circumstances Key-Value pair of
NoSQL database comes to rescue because of its high performance and light weight
features.
2014 was a fantastic year for Big Data analytics but the emerging big data trends show that
2015 looks even better with many more technological innovations. If there are any other
significantly emerging trends in big data analysis that will make noise in the market, please
leave a comment below.
Salesforce Careers- Salesforce Administrator

vs. Salesforce Developer
18 Apr 2015
Gartner reports recognize, Salesforce CRM as the leader in Enterprise Application

Platform as a Service (PaaS) as of March 2015.
Image Credit: Gartner.com
Salesforce CEO Benioff said Salesforce reached $5 billion in annual revenue faster
than any other enterprise software company and now its our goal to be the
fastest to reach $10 billion.
With approximately 200,000 customer companies using salesforce1 platform and close to
2000 companies built on top of the salesforce1 platform- there is an emerging boom for
Salesforce careers in the enterprise application sector. Companies are hiring competent
salesforce developers, salesforce administrators and architects to implement innovative
business solutions and maximize their investment in salesforce.
The demand for salesforce careers is on the rise with salesforce.com hitting huge sales
record every quarter. Consultants with integration skills of a salesforce developer and
salesforce administrator will be the need of the hour in salesforce as enterprises connect
salesforce with other legacy solutions and cloud applications. Thus, to make themselves
marketable and pursue salesforce careers, consultants must receive salesforce
certifications to stand out among other potential employees.
Global Business Consulting firm Bluewolf conforms that the demand for salesforce
Administrators and salesforce developers will increase by 25 percent five years down the
lane.
A recent research by G2 Crowd found that-
Users rate salesforce 85% higher in terms of overall CRM functionality that
is above the industry average of 75%.

Salesforce CRM has 80% better scalability than any other CRM vendor
outside the leader category in Gartners quadrant and 23% better scalability
than vendors present in the leaders category.
84% of salesforce customers say they would recommend this CRM to their
peers.
85% of salesforce users believe that it is headed in the right direction

If you are a business entrepreneur, you might be aware of the fact that training professionals
in Salesforce requires lot of investment. To increase the ROI of your business, salesforce
professionals must be aware of all the latest methodologies and technologies that will help
you enhance production. For maximizing the returns, enterprise must hire certified
Salesforce Administrators and Developers. The right CRM software like salesforce with
certified professionals can help you boost your business sales, inquiries and revenues.
Sky is the limit when we talk about Salesforce Customer Relationship Management
software. You will not find a single field that a Salesforce Administrator cannot add or any
single piece of code that a Salesforce Developer cannot execute. The IT market is in dearth
of Salesforce Administrators and Salesforce Developers.
Consultants looking for high paying salesforce jobs must dig deep into various aspects such
as -the roles and responsibilities of a salesforce administrator and developer, undergo
salesforce admin training, undergo salesforce developer training, and receive various
salesforce certifications.
certifications
Salesforce Administrator
Salesforce Administrator in broader terms can be defined as a person responsible for
managing and administering the configuration side of salesforce. He/She is the one who
performs various declarative changes and manages the new releases into production
environment. A Salesforce Administrator is a professional responsible for running the already
existing Salesforce instances smoothly. This means that a Salesforce Administrator need not
have a good grasp of integrations and various other downstream consequences because he
does not configure any new functionality.
The job of a Salesforce Administrator for a small IT enterprise need not be a full time
opportunity. In the early stages of Salesforce CRM implementation the administrator will
have to devote near about half a day (50% time of full time position) but once the application
is live, managing daily activities of Salesforce CRM hardly requires about 10-25% of the full
time opportunity.
Understanding the Responsibilities of a Salesforce Administrator
Image Credit: klipfolio.com
The role of a Salesforce Administrator is to click and not to code:
Salesforce Administrator will be responsible for adding new users, check

system permissions on users to restrict or provide data access, and modify
existing accounts.
It is the responsibility of the Salesforce Administrator to remove any kind

of duplicate contacts or account by mass updating them or merging them.
Salesforce Administrator is responsible for customizing and developing the

setup menu by modifying the page layouts, pick list values and creation of
assignment rules.
Salesforce Administrator will create reports from the data stored in the
Salesforce CRM and produce information asset that will help boost the business
revenue.
He/she should provide ongoing documentation to his colleagues and

customers by updating the existing documentation so that the customers and
colleagues are on track with the novel Salesforce enhancements and releases.
Required Skills for a Salesforce Administrator
With SaaS, system administration has become easier than conventional software. Even
though, a Salesforce administrator need not possess extensive programming skills but there
is a specific skill set requirement every Salesforce administrator must possess in order to
build career as a successful Salesforce professional:
Salesforce Administrator must have a solid understanding of the

organizational structure and the various business processes involved in it so that
he can maintain strong relationships with key groups.
Out of the ordinary project management skills and analytical skills are a
must to take action against the requested changes and classify customizations.
Good presentation skills, motivational skills and communication skills are a

surplus to the career of a Salesforce administrator.
Certifications and Trainings for a Salesforce Administrator
It is at all times a smart decision to hire trained and certified Salesforce professionals. A
certified Salesforce administrator will stand for rapid implementation and make the best out
of Salesforce. If you plan to build your career as a Salesforce administrator then ADM
201 salesforce certification is a must to respond to various business requirements and
perform administrative tasks with the latest version of the Salesforce.
Salesforce Administrator 201 certification trains an individual on how to customize tabs,
fields, page layouts and various other business processes, how to setup workflow
automation, how to create high value reports and dashboards, how to import and maintain
clean data, how to create a safe and secure Salesforce environment. Trained and Certified
Salesforce administrators provides the enterprises an assurance that the professional has
in-depth knowledge and is confident enough to take the best out of Salesforce.
Salesforce Developer
A Salesforce Developer is responsible for building functionalities in a sandbox with
Visualforce or Apex before it is given to the Salesforce Administrator for scheduling
deployment.
Understanding the Responsibilities of a Salesforce Developer
Image Credit: salesforce.com
The role of a Salesforce developer is to code the application logic and carry out the following
responsibilities:
Salesforce developers are responsible for building functionality by creating

Salesforce triggers and creating Visualforce pages based on the requirements of
the customer.
Salesforce developers contribute towards building customized applications

by making the best use of point-and-click capabilities of the Salesforce platform.
Salesforce Customer Relationship Management Software has large number

of application programming interfaces and a Salesforce developer is responsible
for controlling the implausible adaptability of these APIs for integration of
information and processes so the all the systems within the CRM can
communicate efficiently.
A Salesforce developer is responsible for designing new Salesforce

solutions and come up with effective project execution plans. It is the
responsibility of the Salesforce developer to add value to the project in all phases
definition, development, and deployment.
Required Skills for a Salesforce Developer
A Salesforce developer should be an individual who is curious about

technology. He /She should be a go-getter willing to Google around and look for
threads related to a problem and find a solution for it.
A Salesforce developer who can quickly grasp and adept to the

organizations culture is a boon because culture and positive habits contribute to
a successful Salesforce implementation.
For an organization to build a customer relationship management software

patience, time and incredible vision are the 3 primary requirements. A Salesforce
developer must be a premeditated long term thinker and a strong CRM believer.
Certification & Trainings for a Salesforce Developer
An individual who aspires to be a successful Salesforce developer must undergo intensive
training by Salesforce experts so that they can get a grasp on how to build custom
applications that will change the manner in which an enterprise connects with their
customers and employees. Apart from extensive training, Salesforce Developer
Certification DEV 401 is mandate to exhibit the knowledge, abilities, and skills for
developing custom applications and analytics by exploiting the declarative capabilities.
Image Credit: blogatsalesforce.blogspot.in
It is difficult to find professional Salesforce developers when compared to Salesforce

administrators because it is difficult to learn coding than learning the declarative
configurations options with Salesforce. Thus, some of the Salesforce Developers tend to
manage the administrator resources and some cannot. Salesforce Developers undertake
both ADM 201 and DEV 401 salesforce certifications to qualify for the role of both
Salesforce Developer and Salesforce Administrator. These salesforce certifications can
be taken online if an individual wants to master his or her career in Salesforce configuration
using code and in declarative configuration. With increasing demand for Salesforce jobs,
professionals who have expertise with both configuration methods are more worthy to an
organization.
The lines between a Salesforce Administrator and a Salesforce Developer are becoming
fuzzy day by day. A salesforce professional is made to take both the ADM and the DEV
certifications no matter if he joins as a salesforce administrator or a salesforce developer. It
is mandatory that he/she does both the salesforce certifications.
Top 5 Big Data Startup Success Stories

27 Apr 2015
According to a combined study by EMC and IDC, 2837 Exabytes (Exabyte is a billion
gigabytes) of data was generated in the digital universe and it is expected to grow to 40,000
Exabytes by the end of 2020. Competing for a piece of pie from what IDC predicts to be a
$32.4billion market by end of 2017, several big data start-ups are jumping onto the scene
and few others are refining their analytics strategies.
As the data we generate grows every micro second and accelerates at a rapid pace, the
need for business decision making is shifting its focus towards foresight based approach.
Big Data startups are banking upon analytics to disrupt the market in the foresight based
approach. Big data startups are leveraging recommendation systems by making sense of
big data-predicting user intentions and rendering services and products people are looking
for before they even know that they need them.
Image Credit: visualistan.com

During US President Obamas visit to India, government installed approximately 15,000
cameras as a measure of security surveillance.If there is not sufficient man power to keep
track of all the streams of video, the government could use one of the many big data
analytics solutions provided by big data start-ups. These big data solutions would help in
recording the unstructured data generated from the video streams for executing pattern
matching algorithms-to find out any happenings depending on various pre-defined set of
parameters.
It is great to have data but it does not make any sense to the business unless it can be used
for quick and effective business decision making to bring in profits. Data analysis and
visualization techniques and tools are gaining demand because of the personalized products
and services startups have to offer-helping businesses draw deep insights about their
customers from the data before they do.
Big Data market is heating up with startups building viable products which target real world
pain points with huge fundings from solid management teams.Big data start-ups are betting
that customized data driven services and products will give them an edge over other giant IT
organizations. Startups are finding ways to outperform the competition - the giant IT
organizations, through big data analytics by customizing their services and products .From
effective marketplaces to frictionless online transactions, from improvised customer
interactions to perceptive predictions, from platforms which can instantly mine huge troves of
data to catalogs of information available for sale, the startups ecosystem loves big data. Big
Data Start-ups that put data first are able to fine-tune faster with the competitive market.
Venture capitalists are on the verge of investing in cloud computing, big data analytics, or
software development. The emerging trends in startups funding is giving the big data
startups a shot in getting along with giant IT customers like Apple, Facebook , and Google.
Funding from venture capitalists helps startups establish a strong customer base through
personalized services and products. For instance, with smartphones and wearable devices
like the FitBit Fitness trackers and Apple Watch, the novel big data startups are collecting the
huge amounts of unstructured data generated from these devices and analysing it to
develop customized products and services that customers can use. Venture capitalists
perceive bright future of investing in big data startups as it is these small companies which
aggregate tons of data to form value so that their customers can make the best use of it.
1) Spotify Big Data Startup Success Stories
With 60 million active users worldwide, close to 6 million paying customers, 20 million songs
and approximately 1.5 billion playlists, Spotify produces close to 1.5 TB of compressed data
on daily basis. Spotify has one of the biggest Hadoop clusters with 694 heterogeneous
nodes running close to 7000 jobs in a day.
Spotify delivered 4.5 billion hours of listening time in 2013 and is revolutionizing the manner
in which people listen to music. Spotify plays a prominent role in the way music industry
evolves. Streaming services of Spotify not only use big data to improvise music engagement
and render a personalized experience, but they also identify upcoming music artists and
predict their potential for success. Spotify uses predictive analysis through big data to add
zeal of enjoyment and fun to user experience. It also uses the huge stream of big data to
predict the winners at Grammy Awards with the prediction accuracy of 67%.
Image Credit: gizmodo.com

Spotify recently acquired Echo Nest for $100 million. It is one of the top companies involved
in recommendation technology and music analytics. Echo Nest mines big data and has
compiled trillions of data points from 35 million songs by 2.5 million music artists.
Brian Whitman ,co-founder of Echo Nest said - We crawl the web constantly,
scanning over 10 million music related pages a day.Every word anyone utters
on the Internet about music goes through our systems that look for descriptive
terms, noun phrases and other text and those terms bucket up into what we call
cultural vectors or top terms.
2) Uber Big Data Startup Success Stories
You are no more just a passenger or a fare to Uber but a big data goldmine that uber
leverages for analytics. Uber knows everything from where you work, where you eat, where
you live, where you travel and when you do all these things. Uber a start-up in analytics is
using the wealth of information collected to render personalized services to generate huge
ROI by selling this data to its customer base.
Uber is innovating new ways towards money making by selling the transactional data it has
aggregated based on its rides. Uber has recently partnered with Starwood Hotels and
Resorts. It has launched a service that allows users to connect with Starwood preferred
Guest account. The advantage customers have is that the earn Starwood reward points
when they take a ride with Uber. Customers give Uber the complete rights to share all the
information about their ride with Starwood when they sign up. The below screenshot depicts
the same.
Image Credit: bigdataexchange.com

Starwood now will have access to all Uber ride information which can be leveraged for
analytics. For instance, if you are a regular business traveller to your office in Seattle and
take a ride with Uber frequently on your every travel then Uber records all the information
about your pick up and drop off points. A Starwood marketing personnel can immediately
pitch in if he/she notices that you choose to stay at any other Starwood property because
Uber knows this. You will be flooded with several offers from Starwood ensuring that your
next stay in Seattle is with one of the Starwood property helping them generate revenue over
other competitors.
Uber uses regression analysis to find out the size of neighbourhood which in turn helps Uber
to find out the busiest neighbourhoods on Friday nights so that they can add additional surge
charge to their customers bill. Uber takes ratings from its drivers and riders and leverages
this data to analyse it for customer satisfaction and loyalty.
This big data start-up has future plans to partner with supreme luxury brands, retailers,
restaurants to collect data about the shopping malls you visit, the clubs you visit, the places
you dine in .It plans to reveal this information to its customers so that the make use of it for
targeted marketing.
3) Netflix Big Data Startup Success Stories
Netflix is a big data company that meticulously gathers information from more than 50
million subscribers at a remarkable pace. Netflix collects tons of data from its customers to
under understand the behaviour of users and determine their preferences for movies and TV
shows.It uses machine learning to predict an individuals tastes depending on the choices
they make. It collects various customer metrics such as what kind of movie people watch,
when they watch, where they watch, the time spent on selecting movies, when users stop
watching the movie or pause it, what devices they use to watch, searches, ratings, etc.
Image Credit: Netflix.com
Netflix big data startup leverages all the information collected to provide movie
recommendations by predicting the movies or TV shows the customers are likely to
enjoy.As, happy customers are likely to continue their subscription while bringing in
profitability. Netflix derives powerful insights from the collected big data to customize and
improvise the experience of users on the platform depending on their preferences and
tastes.
4) Rubikloud Big Data Startup Success Stories
Rubikloud is among one of the top big data startups that helps online retailers to make use
of big data to increase their ROI by predicting consumer behaviour. Rubikloud has signed up
with several retailers from health, beauty, fashion and other vertical owing to annual sales of
$25 billion sales per annum.
Image Credit: Rubikloud.com

In January 2015, this big data startup raised funding of $7 million for customer analytics
through big data. Rubikloud helps large retailers and also brick-and-mortar stores to
collect, normalize and analyse big data by directly connecting their respective data stores.
5) Vidooly Big Data Startup Success Stories
Vidooly is a big data start-up based in India founded by serial entrepreneurs Subrat, Ajay
and Nishant. Vidooly is the Alexa for YouTube.Vidooly uses Googles Big Query for
YouTube big data analytics to help brands, content creators and multi-channel networks
increase their audience base and generate more ROI from the YouTube platform. Vidooly
provides its customers with different video tag suggestions, competitor tracking, best time to
upload a video, behaviour analysis of subscribers, annotation optimization, comment
moderation, etc.
Vidooly big data engine analyses more than 1 million YouTube channels daily with close to
500 million video views on a monthly basis and more than 20 million audience data to track
their behaviour on YouTube.
Image Credit: Vidooly.com

Vidooly recently introduced a chrome extension VidLog that goes beyond YouTube analytics
by providing an optimization progress report of any kind of YouTube video straight into the
chrome browser in real time.The customer base of Vidooly includes Indian Food network,
Ping Network, Times Music, Sony Music, Hoopla Kids.
Startups need to adopt big data strategy to survive the competition in the market. With
increasing customer expectations, big data companies that use analytics tools to enhance
the consumer experience will continue to prosper and the startups that dont will not survive.
If there are any other booming big data startups leveraging analytics to harness consumer
insights that we might have missed on, please share with us in comments below.
Hottest IT Certifications of 2015- Hadoop

Certification
29 Apr 2015
Peter Golmacher, a Cowen & Co. analyst sums up the opportunities for certified Hadoop
professionals very delightfully- We believe Hadoop is a big opportunity and we can
envision a small number of billiondollar companies based on Hadoop. We think
the bigger opportunity is Apps and Analytics companies selling products that
abstract the complexity of working with Hadoop from end users and sell
solutions into a much larger end market of business users. The biggest
opportunity in our mind, by far, is the Big Data Practitionersthat create entirely
new business opportunities based on data where $1M spent on Hadoop is the
backbone of a $1B business.
A recent survey by Spiceworks about certifications found that half of the IT professionals will
be paying for continued IT education, as they think that certifications add value to their
career. 80% of IT respondents said they are targeting to complete some kind of training and
certification this year.IT professionals taking Hadoop certification are exponentially
increasing as Hadoop is anticipated to be the most sought after skill in IT for 2015. IT
professionals are racing to acquire Hadoop Certification from top-notch vendors to bridge
the Big Data talent gap.
Sarah Sproehnle, Director at Educational Services Cloudera said "Companies are
struggling to hire Hadoop talent.
Image Credit : pinterest.com
Expert predictions reveal skyward demand for advanced analytics skills in the next few
years.
Image Credit : pinterest.com
Hadoop is gaining huge popularity in US and all over the world, as companies in all
industries such as-Real estate, Media, Retail, Healthcare, Finance, Energy, Sports,
Dating, and Utilities are embracing Hadoop. The industries adopting Hadoop in enterprise
big data projects want to ensure that the professionals they hire are experts in handling the
zettabytes of data. Organizations across different vertical industries are in the process of
adopting Hadoop as an enterprise big data solution. Thus, they consider Hadoop Training
and Hadoop certification as a proof of persons skill in handling big data.
Success in Big Data market requires skilfully talented professionals who can demonstrate
their mastery with tools and techniques of Hadoop Stack.Organizations are on the hunt for
finding talent that can launch and scale big data projects. Hadoop certification courses help
companies find true big data specialists that have ability to execute challenges with live big
data sets.
Hadoop Certifications from popular vendors bestows professionals with the most demanding
and recognizedHadoop jobs. There are several top-notch big data vendors like Cloudera,
Hortonworks,IBM, and MapR offering Hadoop Developer Certification and Hadoop
Administrator Certification.
Hadoop Certification Advantages
IT professionals from different technical backgrounds are making efforts transitioning to
Hadoop to get high paid jobs. Professionals opt for Hadoop Certification to portray their
exceptional Hadoop skills. A certified Hadoop professional has an edge over a candidate
without a Hadoop Certification.
Hadoop certification allows individuals to highlight their knowledge and

skills to their customers and employers.
Hadoop certified professionals get recognition in the industry for widely

sought after big data skills that helps them establish customer confidence.
Hadoop certified professionals are confident in speaking about the

technological aspects of Hadoop when networking with other non-certified
professionals.
With increased enterprise adoption of Hadoop distributions from popular

vendors like Cloudera, Hortonworks, MapR, and IBM, more number of companies
are keen to hire Hadoop certified professionals from these vendors.
Top Hadoop Certification Vendors
1) Cloudera Hadoop Certification Earning Hadoop Cloudera Certification paves
way for interesting career opportunities for IT professionals. Cloudera is one of the top
Hadoop certification vendor providing companies with shining big data talent through its
various Hadoop Certification courses. The top 3 ClouderaHadoop Certifications that can
set you apart as a Big Data professional are -
CCP DS CCP DS identifies quality data scientists with expertise in

implementing big data science solutions to address the real world situations.
Hadoop certification cost for CCP DS is $200.
CCAH- CCAH aims to identify Hadoop administrators with expertise in

configuring, deploying, maintaining and securing Hadoop clusters for enterprise
uses. Hadoop certification cost for CCAH (CDH4) is $295 and the cost for
upgrading to CDH5 is $125.
CCDH- CCDH aims to identify Hadoop developers with expertise in coding,

maintaining and optimizing Hadoop clusters for enterprise projects. Hadoop
developer certification cost for CCDH is $295.
2) Hortonworks Hadoop Certification - Hortonworks Hadoop certification courses
provide individuals with the opportunity to prove their Hadoop skills relevant to on-the-job
performance as- Hortonworks certifies individuals by testing their talent in performing tasks
live, on Hadoop clusters. The top 3 Hortonworks HadoopCertifications are-
HCAHD HCAHDHadoop developer certification tests hands-on experience

of individuals on working with frameworks like Sqoop, Flume, Pig and Hive.
Hadoop certification cost for HCAHD is $250.
HCAHA- HCAHA Hadoop Administrator certification tests hand-on

experience of individuals in deploying and managing Hadoop clusters in the
enterprise. Hadoop certification cost for HCAHA is $250.
3) MapR Hadoop Certification-MapR Hadoop Certification provides wide recognition

to the candidate and gives them a competitive edge for leveraging big data expertise .The
top 3 MapR Hadoop Certifications are
MCHD-This Hadoop developer certification demonstrates the expertise in

development of YARN and MapReduce program. Hadoop Certification cost for
MCHD is $250.
MCHA- This Hadoop administrator certification demonstrates the expertise

in MapR and Hadoop cluster administration. Hadoop Certification cost for MCHA
is $250.
MCHBD- This Hadoop developer certification demonstrates the expertise in

development of HBase (NoSQL Datastore) programs. Hadoop Certification cost
for MCHBD is $250.
4) IBM Hadoop Certification - IBM Hadoop certification provides some quick practical
experience on understanding of Hadoop framework. IBM Hadoop Developer Certification
requires the candidate to go through an intense one-on-one Hadoop training program with
an industry mentor- to get a good clench of the Hadoop concepts. IBM Hadoop certification
is issued to the candidate once he/she successfully completes an end-to-end hands-on
project approved by IBM.
The comprehensive Hadoop certification cost for this including Hadoop training is $1100.
Click here to get a discount of $40 on IBM Certified Hadoop Developer course
Hadoop Training-A Pre-Requisitefor Hadoop Certification Preparation
According to Sand Hill Groups survey on Hadoop, respondents said that the inadequate
number of knowledgeable Hadoop professionals and gap in Hadoop skills is a major setback
when it comes to implementing Hadoop. The only way to address these issues is proper
Hadoop training that will help professionals clear Hadoop certification to meet up with the
requirements for industry demanding Hadoop knowledge and skills.
Hadoop Certification can be obtained from any of the top vendors after acquiring online or inclass Hadoop certification training. In-class Hadoop certification training is not practical for
full time IT professionals due to their busy schedule.
DeZyre offers best in class live faculty LED online Hadoop training that will help
professionals master the concepts of Hadoop and gain confidence in taking up challenging
Hadoop certification exam from any of the vendors. With DeZyres Hadoop Training material,
weekly 1-on-1 meetings with experienced mentors, lifetime access and accredited IBM
Hadoop certification on successful completion of project you can gain confidence in

building game-changing big data applications.
The above video recordings are an example of exemplary way of teaching at DeZyre. The
interactive and engaging environment at DeZyre is in par with the in-class Hadoop training
environment. DeZyres expert faculty interacts with like-minded people and are quick in
clarifying the doubts of the candidates. DeZyres Hadoop Training course addresses the
Hadoop objectives of all data professionals Data analysts, Hadoop Developer, Hadoop
Administrator, and Hadoop Architect.
Please refer this article to understand the benefits of DeZyres Hadoop online live
training.
Objectives of Hadoop Certification Training
By the end of Hadoop Training Course a professional will be all set to clear the Hadoop
certification by mastering the following Hadoop concepts:
1) HDFS and MapReduce Framework
2) Master the concepts of data loading using Flume and Sqoop
3) Master the art of writing complex MapReduce programs
4) Setting up and configuring a Hadoop cluster
5) Analyse data using Pig and Hive
6) Implement any kind of live Hadoop and Big Data project in the enterprise
Apart from taking Hadoop Training, Hadoop Certification preparation involves taking several
practice tests provided by the vendors, understanding and answering the most commonly
asked Hadoop certification questions from various Hadoop certification dumps available on
the Internet, and hands-on experience in implementing the Hadoop concepts.
2015 will be the year when all industries will adopt Hadoop as a cornerstone of business
technology agenda and CIOs will make Hadoop platform a priority it is mandatory to take
Hadoop training and Hadoop certification to grow with the increasing demand for Hadoop
skills. If you have any questions related to Hadoop Training or IBM accredited Hadoop
Certification from DeZyre, please leave a comment below and DeZyre experts will get
back to you.
Emerging Trends in CRM Software for 2015

04 May 2015
Gartner predicts the CRM Software market to grow at compound annual growth
rate of 14.8% reaching $36 billion by end of 2017.
Image Credit: Gartner.com

Salesforce CEO ,Marc Benioff after forging a new alliance with Microsoft said - "The
market has mostly spoken that the world is moving to services. We're moving
toward an ecosystem of services.The best decision I ever made in the industry
was to build Salesforce.com from the ground up more than 15 years ago with a
philanthropic foundation."
Mary Wardley, Vice President of IDC says- The concept of what is a CRM application
has shifted dramatically in the past five years. The wave towards CRM, globally,
has become a marvel that none can deny as several techno giants race each
other into this new world of cloud CRM with continued importance for customer
experience, customer engagement and customer data analytics. Novel CRM
trends will give marketing and sales professionals all the required data inside
their inbox.
Image Credit: softwareadvice.com

1) Increasing Cloud based CRM on demand than on-premise
A research on CRM trends by Gartner anticipated that half of the customer relationship
management software implementations in 2014 were cloud based. This is expected to grow
by 14.8% annually till 2025.Gartner predicts that 85% of the CRM software usage will be
cloud based. With better quality data, faster customer relationship management software
upgrades and implementations, lesser upfront costs, cloud CRM will be the heart of digital
initiatives for the years to come.
A recent news story published on Techworld discussed about the migration of Seaco to AWS
Cloud. To enhance its ERP system, the shipping container firm Seaco bid adieu to its data
centre by moving its SAP platform to Amazon public cloud by completely evacuating their
data centres in the first week of April, 2015.Seaco is experiencing some great improvements
in ERP over their legacy systems after migrating to AWS cloud. The Director of IT services at
Seaco stated that the batch jobs were executing 90% faster and AWS cloud has helped
them bring down the run time for billing cycle by 70% which was a major pain point earlier.
Gartner predicts that by the end of 2015 the percentage of companies using cloud based
customer relationship management software will increase above 50%.The problem with onpremise CRM softwaresystems is that every new development requires a costly upgrade.
Thus with increasing number of updates and upgrades, companies prefer to switch to cloud
CRM software rather than being trapped in new expensive upgrades of on-premise customer
relationship management software as it involves decreased infrastructure costs whilst
providing an adaptable platform for future growth.
Seven years ago, Oracle was not willing to adopt the concept of cloud based CRM but
recently Shawn Price, Vice President of Oracle in his keynote speech at Oracle CloudWorld
2015 agreed that with changing business models of companies and consumer behaviour
,their focus would be on cloud CRM solutions in 2015.
2) Businesses to focus on Effective Cloud CRM Solutions for Security
The high profile security breaches on Target and Sony empowered wisdom on how
vulnerable any network can be. The increasing cyber-attacks will force businesses to
emphasize on effective cloud CRM solutions for security. ReportsnReports, a market
research organization predicts that with increased demand for security in customer
relationship management software applications, this market is anticipated to grow from 4.2
billion in 2014 to 8.7 billion in 2018.
Microsoft has already accomplished its milestone by adopting an ISO standard to protect the
personal information in cloud environments. This ISO standard will be applicable to
Dynamics CRM Online, Intune, Microsoft Azure and Office 365.Microsoft is the first public
cloud service provider that has accomplished this momentous goal.
With growing shared knowledge base, cloud security will continue to be refined in sales
CRM software applications. We can expect Big-Data style analytics to emerge as a major
trend in cloud CRM software security for 2015 with complex analytics algorithms detecting
anomalous network behaviour and any other malicious activities on the cloud network.
Alexander Linden, Gartner analyst said The biggest driver for implementing Big
Data is to enhance the customer experience. Organizations need to direct their
Big Data resources to discover opportunities for enhanced business performance
and customer-focused competitive advantage
Click Here to Read More about the Emerging Trends in Big Data and Hadoop
2015 will witness increased collaboration between security experts with Security-as-aService (SaaS) to play a vital role in Cloud based customer relationship management
software applications to identify emerging threats with zero-hour accuracy.
3) More powerful social listening tools for Cloud based CRM
Social media is much more than a trend in this ever changing competitive world impacting all
areas of business. In 2015, sales CRM software vendors will focus on social engagement so
that businesses become knowledgeable about their customers. Nowadays, customers make
their decisions based on online reviews and discussions. Conversations that used to happen
in person now take place on Twitter, LinkedIn or Facebook. 2015 will see sales CRM
software vendors investing in social listening tools to empower the sales and marketing
teams of businesses to be customer centric at every point leading to more sales, more
connections and 100% customer satisfaction.
Microsoft Dynamics 2015 customer relationship management software update has
enhanced social engagement. Marketers can now listen, analyse and drive customer
engagement with social insights about their campaigns and brands being displayed on the
platform. Microsoft social listening tool that comes with Dynamic 2015 update is free if the
business has more than 10 professional users. It works well within Microsoft Dynamics
Marketing, Microsoft Dynamics CRM, or standalone. Businesses can now access social
listening information about their brand through Dynamics customer relationship management
software to identify customer problems, discover more opportunities to up sell and add
additional value to their business than other salespeople of competing organizations.
2015 will sense the onset of engagement with customers by businesses spotting trends
through social pulse in ways never before possible - through Cloud CRM applications. As
customers become more informed and connected; businesses have started to rethink upon
the CRM software processes, software and strategies to surface the competitive advantage.
Experts anticipate the demand for CRM software systems that include social customer
relationship management software tools to be on the rise for 2015.
4) Mobile CRM to become more powerful
Earlier customer relationship management software applications had limited functionality but
2015 is going to be somewhat different with CRM vendors investing a hefty sum to make
mobile platforms of CRM applications more powerful. CRM vendors with powerful mobile
applications have the potential to effectively organize the sales teams of a business so that
they can spend more time selling.
CIO recently published the launch of Sales Engage Cloud by Salesforce as one of the
popular tech trends story on April 15 ,2015.Sales Cloud Engage will provide data, marketing
content and deep insights on leads before engagement so that representative can craft
personalized nurture campaigns without having to deal with the hassle of building ad-hoc
marketing campaigns manually. Sales Engage Cloud will provide a provision to integrate
these lead nurture campaigns with the Salesforce1 mobile app so that salespeople can add
leads directly to their nurture campaigns from their smart phones. Sales Engage Cloud will
also help salespeople pull up complete engagement histories of their prospects with the
company. Salesforceannounced the availability of Sales Engage Cloud at $50 per month for
one seat by end of April, 2015.
certifications
With Bring Your Own Device (BYOD) policies becoming common in most of the
organizations, Gartner predicts that 30% of organizations nowadays issue Tablets as primary
devices for salespeople. CRM software systems are piled up with so much of marketing and
sales historical data that representative end up spending more time entering data than using
it. Companies will strive hard to strike a balance between data collection and ease of use in
2015. CRM software vendors are more on the forefront to have a simple and easy platform
that will render support to mobile devices rather than having a complex system that nobody
wants to use.
Companies choosing a CRM software vendor with powerful mobile apps in 2015 will boost
profitability with salespeople closing more deals, reporting more accurately and being geared
up for better meetings.
5) Wearables-The Next Big Thing in CRM Software Systems
Wearable are the next phase of mobile revolution in CRM software which organizations will
adopt to connect with their customers in new ways. Integration of wearable computing
devices with CRM applications will help organizations across various industries to have real
time access to account data, effectively engage with customers, and discover various
opportunities for up-selling and cross-selling to enhance relationships with customers at
every encounter.
Image Credit: businessinsider.com

Wearables will revolutionize the manner in which organizations collect intelligence about
their customers. Wearables will become a powerful force in CRM trends as they can provide
several data points other than just text.
A recent news story on Enterprise Software by CIO revealed about Salesforce bringing its
cloud service toApple Watch. Salesforce already has partnered
with Google and Philips healthcare wearable device makers and now its next venture is
Apple. Cloud, mobile, social and data science revolutions will converge on the wrist in 2015
with Salesforce CRM software.
What do you think will be the most emerging trends in CRM software for 2015 that we have
missed on? Please let us know in the comments below.
How much Java is required to learn Hadoop?

11 May 2015
One of the most frequently asked question by prospective Hadoopers is- How much
Java should I know to enter the exciting world of Hadoop?
Hadoop is an open source software built on Java thus making it necessary for every
Hadooper to be well-versed with at least core Java basics. Having knowledge of advanced
Java concepts is a plus to learn Hadoop but definitely not compulsory.
Apache Hadoop is one of the most commonly adopted enterprise solution by big IT giants
making it one of the top 10 IT job trends for 2015. Thus, it is mandatory for intelligent
technologists to pick up Hadoop quickly with Hadoop ecosystem getting bigger day by day.
The outpouring demand for big data analytics is landing many IT professionals to switch their
careers to Hadoop technology. Professionals need to consider the skills before they begin to
learn Hadoop.
Skills to Learn Hadoop
Hadoop is written in Java, thus knowledge of Java basics is essential to

learn Hadoop.
Hadoop runs on Linux, thus knowing some basic Linux commands will take
you long way in pursuing successful career in Hadoop.
According to Dice, Java-Hadoop combined skill is in great demand in the IT
industry with increasing Hadoop jobs.
Career counsellors at DeZyre frequently answer the question posed by many of the
prospective students or professionals who want to switch their career to big data or HadoopHow much Java should I know to learn Hadoop?
Most of the prospective students exhibit some kind of disappointment when they ask this
question they feel not knowing Java to be a limitation and they might have to miss on a
great career opportunity. It is one of the biggest myth that- a person from any other
programming background other than Java cannot learn Hadoop.
There are several organizations who are adopting Apache Hadoop as an enterprise solution
with changing business requirements and demands. The demand for Hadoop professionals
in the market is varying remarkably. Professionals with any of the diversified tech skills like
Mainframes, Java, .NET , PHP or any other programming language expert can learn
Hadoop.
If an organization runs an application built on mainframes then they might be looking for
candidates who possess Mainframe +Hadoop skills whereas an organization that has its
main application built on Java would demand a Hadoop professional with expertise in
Java+Hadoop skills.
Lets consider this analogy with an example-
The below image shows a job posting on Monster.com for the designation of a Senior Data
Engineer-
The job description clearly states that any candidate who knows Hadoop and has strong
experience in ETL Informatica can apply for this job to build a career in Hadoop technology
without expertise knowledge in Java.The mandatory skills for the job have been highlighted
in red which include Hadoop, Informatica,Vertica, Netezza, SQL, Pig, Hive. The skill
MapReduce in Java is an additional plus but not required.
Here is another image which shows a job posting on Dice.com for the designation of a Big
Data Engineer-
The job description clearly underlines the minimum required skills for this role as Java, Linux
and Hadoop. Candidates who have expertise knowledge in Java, Linux and Hadoop can
only apply for this job and anybody with Java basics would not be the best fit for this job.
Some of the job roles require the professional to have explicit in-depth knowledge of Java
programming whereas few other job roles can be excelled even by professionals who are
well-versed with Java basics.
To learn Hadoop and build an excellent career in Hadoop, having basic knowledge of Linux
and knowing the basic programming principles of Java is a must. Thus, to incredibly excel in
the entrenched technology of Apache Hadoop, it is recommended that you at least learn
Java basics.
activated with free Java course
Java and Linux- Building Blocks of Hadoop
Apache Hadoop is an open source platform built on two technologies Linux operating system
and Java programming language. Java is used for storing, analysing and processing large
data sets. The choice of using Java as the programming language for the development of
hadoop is merely accidental and not thoughtful. Apache Hadoop was initially a sub project of
the open search engine Nutch. The Nutch team at that point of time was more comfortable in
using Java rather than any other programming language. The choice for using Java for
hadoop development was definitely a right decision made by the team with several Java
intellects available in the market. Hadoop is Java-based, so it typically requires professionals
to learn Java for Hadoop.
Apache Hadoop solves big data processing challenges using distributed parallel processing
in a novel way. Apache Hadoop architecture mainly consists of two components-
1.Hadoop Distributed File System (HDFS) A virtual file system
2.Hadoop Java MapReduce Programming Model Component- Java based system tool
HDFS is the virtual file system component of Hadoop that splits a huge data file into smaller
files to be processed by different processors. These small files are then replicated and
stored on various servers for fault tolerance constraints. HDFS is a basic file system
abstraction where the user need not bother on how it operates or stores files unless he/she
is an administrator.
Googles Java MapReduce framework is the roost of large scale data

processing( YARN can also be used for data processing with Hadoop 2.0).Hadoop Java
MapReduce component is used to work with processing of huge data sets rather than
bogging down its users with the distributed environment complexities.
The Map function mainly filters and sorts data whereas Reduce deals with integrating the
outcomes of the map () function. Googles Java MapReduce framework provides the users
with a java based programming interface to facilitate interaction between the Hadoop
components. There are various high level abstraction tools like Pig (programmed in Pig Latin
) and Hive (programmed using HiveQL) provided by Apache to work with the data sets on
your cluster. The programs written using either of these languages are converted to
MapReduce programs in Java.The MapReduce programs can also be written in various

other scripting languages like Perl, Ruby, C or Python that support streaming through the
Hadoop streaming API, however, there are certain advanced features that are as of now
available only with Java API.
Image Credit: saphanatutorial.com
At times, Hadoop developers might be required to dig deep into Hadoop code to understand
the functionality of certain modules or why a particular piece of code is behaving strange.
Under, such circumstances knowledge of Java basics and advanced programming concepts
comes as a boon to Hadoop developers. Technology experts advice prospective Hadoopers
to learn Java basics before they deep dive into Hadoop for a well-rounded real world
Hadoop implementation. Career counsellors suggest students to learn Java for Hadoop
before they attempt to work on Hadoop Map Reduce.
How to learn Java for Hadoop?
If you are planning to enrol for Hadoop training, ramp up on your knowledge on Java
beforehand.
Professionals aspiring to pursue a successful career in Hadoop can try to

learn Java on their own by reading various e-books or by checking out free Java
tutorials available online. The learning approach through Java tutorials will work
out if a person is skilled at programming. Java tutorials will help you understand
and retain information with practical code snippets. This approach might not be
the best choice for less experienced programmers as they might not be able to
comprehend the code snippets and other examples in the Java tutorial with ease.
There are several reputed online e-learning classes which provide great
options to learn Java for Hadoop. Knowledge experts explain Java basics, plus the
students can clarify any doubts they have then and there and engage in
discussion with other students to improve their knowledge base in Java.
Candidates who enrol for DeZyres IBM certified Hadoop training can activate a free
java course to ramp up their java knowledge. Individuals who are new to Java can also get
started to learn Hadoop just by understanding the Java basics taught as part of the free java
course curriculum at DeZyre. DeZyres 20 hours Java Course curriculum covers all the Java
basics needed to learn Hadoop such as-
Installing and Configuring Java and Eclipse
Arrays
Objects and Classes
Control Flow Statements
Inheritance and Interfaces
Exception Handling
Serialization
Collections
Reading and Writing files

Spending few hours on Java basics will act as a great catalyst to learn Hadoop.
If you are interested in becoming a Hadoop developer, but you are concerned about Java,
then you can talk to one of our career counsellors. Please send an email to
rahul@dezyre.com
JAVA
Q1. What is a cookie ?show Answer
Ans. A cookie is a small piece of text stored on a user's computer by the browser for a specific
domain. Commonly used for authentication, storing site preferences, and server session identification.
Q2. Can we reduce the visibility of the overridden method ?show Answer
Ans. No
Q3. What are different types of inner classes ?show Answer
Ans. Simple Inner Class, Local Inner Class, Anonymous Inner Class , Static Nested Inner Class.
Q4. Difference between TreeMap and HashMap ?show Answer
Ans. They are different the way they are stored in memory. TreeMap stores the Keys in order whereas
HashMap stores the key value pairs randomly.
Q5. What is the difference between List, Set and Map ?show Answer
Ans. List - Members are stored in sequence in memory and can be accessed through index.
Set - There is no relevance of sequence and index. Sets doesn't contain duplicates whereas multiset
can have duplicates. Map - Contains Key , Value pairs.
Q6. Difference between Public, Private, Default and Protected ?
show Answer
Ans. Private - Not accessible outside object scope.
Public - Accessible from anywhere.
Default - Accessible from anywhere within same package.
Protected - Accessible from object and the sub class objects.
Q7. What is servlet Chaining ?show Answer
Ans. Multiple servlets serving the request in chain.
Q8. What are the Wrapper classes available for primitive types ?show Answer
Ans. boolean - java.lang.Boolean
byte - java.lang.Byte
char - java.lang.Character
double - java.lang.Double
float - java.lang.Float
int - java.lang.Integer
long - java.lang.Long
short - java.lang.Short
void - java.lang.Void
Q9. What are concepts introduced with Java 5 ?show Answer
Ans. Generics , Enums , Autoboxing , Annotations and Static Import.
Q10. Does Constructor creates the object ?show Answer
Ans. New operator in Java creates objects. Constructor is the later step in object creation.
Constructor's job is to initialize the members after the object has reserved memory for itself.
Q11. Can static method access instance variables ?show Answer
Ans. Though Static methods cannot access the instance variables directly, They can access them
using instance handler.
Q12. Does Java support Multiple Inheritance ?show Answer
Ans. Interfaces does't facilitate inheritance and hence implementation of multiple interfaces doesn't
make multiple inheritance. Java doesn't support multiple inheritance.
Q13. Difference between == and .equals() ?show Answer
Ans. "equals" is the member of object class which returns true if the content of objects are same
whereas "==" evaluate to see if the object handlers on the left and right are pointing to the same
object in memory.
Q14. Difference between Checked and Unchecked exceptions ?show Answer
Ans. Checked exceptions and the exceptions for which compiler throws an errors if they are not
checked whereas unchecked exceptions and caught during run time only and hence can't be
checked.
Q15. What is a Final Variable ?show Answer
Ans. Final variable is a variable constant that cannot be changed after initialization.
Q16. Which class does not override the equals() and hashCode() methods, inheriting them
directly from class Object?show Answer
Ans. java.lang.StringBuffer.
Q17. What is a final method ?show Answer
Ans. Its a method which cannot be overridden. Compiler throws an error if we try to override a method
which has been declared final in the parent class.
Q18. Which interface does java.util.Hashtable implement?show Answer
Ans. Java.util.Map
Q19. What is an Iterator?show Answer
Ans. Iterator is an interface that provides methods to iterate over any Collection.
Q20. Which interface provides the capability to store objects using a key-value pair?show
Answer
Ans. java.util.map
Q21. Difference between HashMap and Hashtable?show Answer
Ans. Hashtable is synchronized whereas HashMap is not.
HashMap allows null values whereas Hashtable doesnt allow null values.
Q22. Does java allow overriding static methods ?show Answer

Ans. No. Static methods belong to the class and not the objects. They belong to the class and hence
doesn't fit properly for the polymorphic behavior.
Q23. When are static variables loaded in memory ?show Answer
Ans. They are loaded at runtime when the respective Class is loaded.
Q24. Can we serialize static variables ?show Answer
Ans. No. Only Object and its members are serialized. Static variables are shared variables and
doesn't correspond to a specific object.
Q25. What will this code print ?
String a = new String ("TEST");
String b = new String ("TEST");
if(a == b) {
System.out.println ("TRUE");
} else {
System.out.println ("FALSE");
}show Answer
Ans. FALSE.
== operator compares object references, a and b are references to two different objects, hence the
FALSE. .equals method is used to compare string object content.
Q26. There are two objects a and b with same hashcode. I am inserting these two objects
inside a hashmap.
hMap.put(a,a);
hMap.put(b,b);
where a.hashCode()==b.hashCode()
Now tell me how many objects will be there inside the hashmap?show Answer
Ans. There can be two different elements with the same hashcode. When two elements have the
same hashcode then Java uses the equals to further differentation. So there can be one or two
objects depending on the content of the objects.
Q27. Difference between long.Class and Long.TYPE ?show Answer
Ans. They both represent the long primitive type. They are exactly the same.
Q28. Does Java provides default copy constructor ?show Answer
Ans. No
Q29. What are the common uses of "this" keyword in java ?show Answer
Ans. "this" keyword is a reference to the current object and can be used for following 1. Passing itself to another method.
2. Referring to the instance variable when local variable has the same name.
3. Calling another constructor in constructor chaining.
Q30. What are the difference between Threads and Processes ?show Answer
Ans. 1. when an OS wants to start running program it creates new process means a process is a
program that is currently executing and every process has at least one thread running within it.
2). A thread is a path of code execution in the program, which has its own local variables, program
counter(pointer to current execution being executed) and lifetime.
3. When the JavaVirtual Machine (JavaVM, or just VM) is started by the operating system, a new
process is created. Within that process, many threads can be created.
4. Consider an example : when you open Microsoft word in your OS and you check your task manger
then you can see this running program as a process. now when you write something in opened word
document, then it performs more than one work at same time like it checks for the correct spelling, it
formats the word you enter , so within that process ( word) , due to different path execution(thread) all
different works are done at same time.
5. Within a process , every thread has independent path of execution but there may be situation
where two threads can interfere with each other then concurrency and deadlock come is picture.
6. like two process can communicate ( ex:u open an word document and file explorer and on word
document you drag and drop another another file from file explorer), same way two threads can also
communicate with each other and communication with two threads is relatively low.
7. Every thread in java is created and controlled by unique object of java.lang.Thread class.
8. prior to jdk 1.5, there were lack in support of asynchronous programming in java, so in that case it
was considered that thread makes the runtime environment asynchronous and allow different task to
perform concurrently.
Q31. How can we run a java program without making any object?show Answer
Ans. By putting code within either static method or static block.
Q32. Explain multithreading in Java ?show Answer
Ans. 1. Multithreading provides better interaction with the user by distribution of task
2. Threads in Java appear to run concurrently, so it provides simulation for simultaneous activities.
The processor runs each thread for a short time and switches among the threads to simulate simultaneous execution (context-switching) and it make appears that each thread has its own
processor.By using this feature, users can make it appear as if multiple tasks are occurring
simultaneously when, in fact, each is
running for only a brief time before the context is switched to the next thread.
3. We can do other things while waiting for slow I/O operations.
In the java.iopackage, the class InputStreamhas a method, read(), that blocks until a byte is read from
the stream or until an IOExceptionis thrown. The thread that executes this method cannot do anything
elsewhile awaiting the arrival of another byte on the stream.
Q33. Can constructors be synchronized in Java ?show Answer

Ans. No. Java doesn't allow multi thread access to object constructors so synchronization is not even
needed.
Q34. can we create a null as a key for a map collection ?show Answer
Ans. Yes , for Hashtable. Hashtable implements Map interface.
Q35. What is the use of hashcode in Java ?show Answer
Ans. Hashcode is used for bucketing in Hash implementations like HashMap, HashTable, HashSet
etc. The value received from hashcode() is used as bucket number for storing elements. This bucket
number is the address of the element inside the set/map. when you do contains() then it will take the
hashcode of the element, then look for the bucket where hashcode points to and if more than 1
element is found in the same bucket (multiple objects can have the same hashcode) then it uses the
equals() method to evaluate if object are equal, and then decide if contain() is true or false, or decide
if element could be added in the set or not.
Q36. Why java doesn't support multiple Inheritence ?show Answer
Ans. class A {
void test() {
System.out.println("test() method");
}
}
class B {
void test() {
System.out.println("test() method");
}
}
Suppose if Java allows multiple inheritance like this,
class C extends A, B {
}
A and B test() methods are inheriting to C class.
So which test() method C class will take? As A & B class test() methods are different , So here we
would Facing Ambiguity.
Q37. Why threads block or enters to waiting state on I/O?show Answer

Ans. Threads enters to waiting state or block on I/O because other threads can execute while the I/O
operations are performed.
Q38. What are transient variables in java?show Answer

Ans. Transient variables are variable that cannot be serialized.
Q39. What is the difference between yield() and sleep()?show Answer
Ans. When a object invokes yield() it returns to ready state. But when an object invokes sleep()
method enters to not ready state.
Q40. What are wrapper classes ?show Answer
Ans. They are wrappers to primitive data types. They allow us to access primitives as objects.
Q41. What is the difference between time slicing and preemptive scheduling ?show Answer
Ans. In preemptive scheduling, highest priority task continues execution till it enters a not running
state or a higher priority task comes into existence. In time slicing, the task continues its execution for
a predefined period of time and reenters the pool of ready tasks.
Q42. What is the initial state of a thread when it is created and started?show Answer
Ans. Ready state.
Q43. What one should take care of, while serializing the object?
show Answer
Ans. One should make sure that all the included objects are also serializable. If any of the objects is
not serializable then it throws a NotSerializable Exception.
Q44. What is a String Pool ?show Answer
Ans. String pool (String intern pool) is a special storage area in Java heap. When a string is created
and if the string already exists in the pool, the reference of the existing string will be returned, instead
of creating a new object and returning its reference.
Q45. Why is String immutable in Java ?show Answer
Ans. 1. String Pool
When a string is created and if the string already exists in the pool, the reference of the existing string
will be returned, instead of creating a new object. If string is not immutable, changing the string with
one reference will lead to the wrong value for the other references.
2. To Cache its Hashcode
If string is not immutable, One can change its hashcode and hence not fit to be cached.
3. Security
String is widely used as parameter for many java classes, e.g. network connection, opening files, etc.
Making it mutable might possess threats due to interception by the other code segment.
Q46. what is the use of cookie and session ? and What is the difference between them ?show
Answer
Ans. Cookie and Session are used to store the user information. Cookie stores user information on
client side and Session does it on server side. Primarily, Cookies and Session are used for
authentication, user preferences, and carrying information across multiple requests. Session is meant
for the same purpose as the cookie does. Session does it on server side and Cookie does it on client
side. One more thing that quite differentiates between Cookie and Session. Cookie is used only for
storing the textual information. Session can be used to store both textual information and objects.
Q47. Which are the different segments of memory ?
show Answer
Ans. 1. Stack Segment - contains local variables and Reference variables(variables that hold the
address of an object in the heap)
2. Heap Segment - contains all created objects in runtime, objects only plus their object attributes
(instance variables)
3. Code Segment - The segment where the actual compiled Java bytecodes resides when loaded
Q48. Which memory segment loads the java code ?show Answer
Ans. Code segment.
Q49. which containers use a border Layout as their default layout ?show Answer
Ans. The window, Frame and Dialog classes use a border layout as their default layout.
Q50. Can a lock be acquired on a class ?show Answer
Ans. Yes, a lock can be acquired on a class. This lock is acquired on the class's Class object.
Q51. What state does a thread enter when it terminates its processing?show Answer
Ans. When a thread terminates its processing, it enters the dead state.
Q52. How many bits are used to represent Unicode, ASCII, UTF-16, and UTF-8 characters?show
Answer
Ans. Unicode requires 16 bits and ASCII require 7 bits. Although the ASCII character set uses only 7
bits, it is usually represented as 8 bits. UTF-8 represents characters using 8, 16, and 18 bit patterns.
UTF-16 uses 16-bit and larger bit patterns.
Q53. Does garbage collection guarantee that a program will not run out of memory?show
Answer
Ans. Garbage collection does not guarantee that a program will not run out of memory. It is possible
for programs to use up memory resources faster than they are garbage collected. It is also possible
for programs to create objects that are not subject to garbage collection
Q54. What is an object's lock and which object's have locks?show Answer
Ans. An object's lock is a mechanism that is used by multiple threads to obtain synchronized access
to the object. A thread may execute a synchronized method of an object only after it has acquired the
object's lock. All objects and classes have locks. A class's lock is acquired on the class's Class
object.
Q55. What is casting?show Answer
Ans. There are two types of casting, casting between primitive numeric types and casting between
object references. Casting between numeric types is used to convert larger values, such as double
values, to smaller values, such as byte values. Casting between object references is used to refer to
an object by a compatible class, interface, or array type reference
Q56. What restrictions are placed on method overriding?show Answer
Ans. Overridden methods must have the same name, argument list, and return type. The overriding
method may not limit the access of the method it overrides. The overriding method may not throw any
exceptions that may not be thrown by the overridden method.
Q57. How does a try statement determine which catch clause should be used to handle an
exception?show Answer
Ans. When an exception is thrown within the body of a try statement, the catch clauses of the try
statement are examined in the order in which they appear. The first catch clause that is capable of
handling the exception is executed. The remaining catch clauses are ignored.
Q58. Describe what happens when an object is created in Java ?show Answer
Ans. 1. Memory is allocated from heap to hold all instance variables and implementation-specific data
of the object and its superclasses. Implemenation-specific data includes pointers to class and method
data.
2. The instance variables of the objects are initialized to their default values.
3. The constructor for the most derived class is invoked. The first thing a constructor does is call the
constructor for its superclasses. This process continues until the constructor for java.lang.Object is
called,
as java.lang.Object is the base class for all objects in java.
4. Before the body of the constructor is executed, all instance variable initializers and initialization
blocks are executed. Then the body of the constructor is executed. Thus, the constructor for the base
class completes first and constructor for the most derived class completes last.
Q59. What is the difference between StringBuffer and String class ?show Answer
Ans. A string buffer implements a mutable sequence of characters. A string buffer is like a String, but
can be modified. At any point in time it contains some particular sequence of characters, but the
length and content of the sequence can be changed through certain method calls. The String class
represents character strings. All string literals in Java programs, such as "abc" are constant and
implemented as instances of this class; their values cannot be changed after they are created.
Q60. Describe, in general, how java's garbage collector works ?show Answer
Ans. The Java runtime environment deletes objects when it determines that they are no longer being
used. This process is known as garbage collection. The Java runtime environment supports a
garbage collector that periodically frees the memory used by

objects that are no longer needed. The Java garbage collector is a mark-sweep garbage collector that
scans Java's dynamic memory areas for objects, marking those that are referenced. After all possible
paths to objects are investigated, those objects that are not marked (i.e. are not referenced) are
known to be garbage and are collected.
Q61. What is RMI ?show Answer
Ans. RMI stands for Remote Method Invocation. Traditional approaches to executing code on other
machines across a network have been confusing as well as tedious and error-prone to implement.
The nicest way to think about this problem is that some object happens to live on another machine,
and that you can send a message to the remote object and get a result as if the object lived on your
local machine. This simplification is exactly what Java Remote Method Invocation (RMI) allows you to
do.
Q62. What is JDBC? Describe the steps needed to execute a SQL query using JDBC.show
Answer
Ans. The JDBC is a pure Java API used to execute SQL statements. It provides a set of classes and
interfaces that can be used by developers to write database applications.
The steps needed to execute a SQL query using JDBC:
1. Open a connection to the database.
2. Execute a SQL statement.
3. Process th results.
4. Close the connection to the database.
Q63. Are constructors inherited? Can a subclass call the parent's class constructor? When?
show Answer
Ans. You cannot inherit a constructor. That is, you cannot create a instance of a subclass using a
constructor of one of it's superclasses. One of the main reasons is because you probably don't want
to override the superclasses constructor, which would be possible if they were inherited. By giving the
developer the ability to override a superclasses constructor you would erode the encapsulation
abilities of the language.
Q64. What is JSON ?show Answer
Ans. JSON is "JavaScript Object Notation", primarily used for client-server or server-server
communication. Its a much lighter and readable alternative to XML. JSON is language independent
and is easily parse-able in all programming languages.
Q65. What is the role of JSON.stringify ?show Answer
Ans. JSON.stringify() turns an object into a JSON text and stores that JSON text in a string.
So If we stringfy above notation , it will become
{"name":"xyz","gender":"male";"age":30}
Q66. When were Enums introduced in Java ?show Answer

Ans. Enums were introduced with java 5.
Q67. Which function is used to convert a JSON text into an object ?show Answer
Ans. eval
Q68. Which data types are supported by JSON ?show Answer
Ans. Number
String
Boolean
Array
Object
null
Q69. What are the benefits of JSON over XML ?show Answer
Ans. Lighter and faster than XML as on-the-wire data format
Object Representation - Information is presented in object notations and hence better understandable.
Easy to parse and conversion to objects for information consumption.
Support multiple data types - JSON supports string, number, array, boolean whereas XML data are all
string.
Q70. What are the methods of Object Class ?show Answer

Ans. clone() - Creates and returns a copy of this object.
equals() - Indicates whether some other object is "equal to" this one.
finalize() - Called by the garbage collector on an object when garbage collection determines that there
are no more references to the object
getClass() - Returns the runtime class of an object.
hashCode() - Returns a hash code value for the object.
toString() - Returns a string representation of the object.
notify(), notifyAll(), and wait() - Play a part in synchronizing the activities of independently running
threads in a program.
Q71. Explain JMS ( Java Messaging Services ) ?show Answer
Ans. JMS Provides high-performance asynchronous messaging. It enables Java EE applications to
communicate with non-Java systems on top of various transports.
Q72. Explain EJB (Enterprise Java Beans) ?show Answer
Ans. EJB Provides a mechanism that make easy for Java developers to use advanced features in
their components, such as remote method invocation (RMI), object/ relational mapping (that is, saving
Java objects to a relational database), and distributed transactions across multiple data sources.
Q73. Which MVC is struts2 based on ?show Answer
Ans. MVC2
Q74. What is an API ( Application Programming Interface ) ?show Answer

Ans. An API is a kind of technical contract which defines functionality that two parties must provide: a
service provider (often called an implementation) and an application. an API simply defines services
that a service provider (i.e., the implementation) makes available to applications.
Q75. What is URL?
show Answer
Ans. URL is Uniform Resource Locator which is representation of HTTP address.
Q76. Explain features of struts2 ?show Answer
Ans. 1) It is an Action based MVC based framework which has adopt mvc2.
2) Struts2 is a pull-MVC (or MVC2) framework where action takes the role of the model rather than
the controller. The pull concepts means views ability to pull data from an action, rather than having a
separate model object available.
3) The Model View-Controller pattern in Struts2 is implemented with five core components actions,
interceptors, value stack / OGNL, result types and results / view technologies.
4) XML configuration as well as Annotation option available.
5) POJO based action available so we can write test cases easily.
6) Integration with Spring, tiles and OGNL based expression langugae.
7) Theme based tag libraries integrated with struts tag as well as support of Ajax tag.
8) Can have various view options like jsp, velocity, freemarker etc.
9) We can embed plugin through which we can modify and extend framework features.
Q77. What is HTTP ?show Answer
Ans. HTTP or Hypertext Transfer Protocol is internet protocol for tranmission of hypertext ( text with
meta data ) over internet.
Q78. what is content negotiation?show Answer
Ans. Suppose we want to visit a site for any information, information can be represented in different
languages like English,German or may be other and their format for presentation can also differ from
HTML to PDF or may be Plain text. In this case when an client makes an HTTP request to a server,
client can also specify the media types here. Client can specify what it can accept back from host and
on the basis of availability the host will return to the client. This is called content negotiation because
client and server negotiated on the language and format of the content to be shared.
Q79. Is tomcat an application or Web server ?show Answer
Ans. Application server.
Q80. Is Apache an application or Web server ?show Answer

Ans. Web server.
Q81. What are the default or implicitly assigned values for data types in java ?show Answer
Ans. boolean ---> false
byte ----> 0
short ----> 0
int -----> 0
long ------> 0l
char -----> /u0000
float ------> 0.0f
double ----> 0.0d
any object reference ----> null
Q82. What is difference between Encapsulation And Abstraction?show Answer
Ans. 1.Abstraction solves the problem at design level while encapsulation solves the problem at
implementation level
2.Abstraction is used for hiding the unwanted data and giving relevant data. while Encapsulation
means hiding the code and data into a single unit to protect the data from outside world.
3. Abstraction lets you focus on what the object does instead of how it does it while Encapsulation
means hiding the internal details or mechanics of how an object does something.
4.For example: Outer Look of a Television, like it has a display screen and channel buttons to change
channel it explains Abstraction but Inner Implementation detail of a Television how CRT and Display
Screen are connect with each other using different circuits , it explains Encapsulation.
Q83. Can I import same package/class twice? Will the JVM load the package twice at runtime?
show Answer
Ans. One can import the same package or same class multiple times. Neither compiler nor JVM
complains wil complain about it. And the JVM will internally load the class only once no matter how
many times you import the same class.
Q84. Explain static blocks in Java ?show Answer
Ans. A static initialization block is a normal block of code enclosed in braces, { }, and preceded by the
static keyword. Here is an example:
static {
// whatever code is needed for initialization goes here
}
A class can have any number of static initialization blocks, and they can appear anywhere in the class
body. The runtime system guarantees that static initialization blocks are called in the order that they
appear in the source code.
Q85. Which access specifier can be used with Class ?show Answer
Ans. For top level class we can only use "public" and "default". We can use private with inner class.
Q86. Explain Annotations ?show Answer
Ans. Annotations, a form of metadata, provide data about a program that is not part of the program
itself. Annotations have no direct effect on the operation of the code they annotate. Annotations have
a number of uses, among them:
Information for the compiler Annotations can be used by the compiler to detect errors or suppress
warnings.
Compile-time and deployment-time processing Software tools can process annotation information
to generate code, XML files, and so forth.
Runtime processing Some annotations are available to be examined at runtime.
Q87. Give an Example of Annotations ?show Answer

Ans. Suppose that a software group traditionally starts the body of every class with comments
providing important information:
public class Generation3List extends Generation2List {
// Author: John Doe
// Date: 3/17/2002
// Current revision: 6
// Last modified: 4/12/2004
// By: Jane Doe
// Reviewers: Alice, Bill, Cindy
// class code goes here
}
To add this same metadata with an annotation, you must first define the annotation type. The syntax
for doing this is:
@interface ClassPreamble {
String author();
String date();
int currentRevision() default 1;
String lastModified() default "N/A";
String lastModifiedBy() default "N/A";
// Note use of array
String[] reviewers();
}
The annotation type definition looks similar to an interface definition where the keyword interface is
preceded by the at sign (@) (@ = AT, as in annotation type). Annotation types are a form of interface,
which will be covered in a later lesson. For the moment, you do not need to understand interfaces.
The body of the previous annotation definition contains annotation type element declarations, which
look a lot like methods. Note that they can define optional default values.
After the annotation type is defined, you can use annotations of that type, with the values filled in, like
this:
@ClassPreamble (
author = "John Doe",

date = "3/17/2002",
currentRevision = 6,
lastModified = "4/12/2004",
lastModifiedBy = "Jane Doe",
// Note array notation
reviewers = {"Alice", "Bob", "Cindy"}
)
public class Generation3List extends Generation2List {
// class code goes here
}
Q88. What are few of the Annotations pre defined by Java?show Answer
Ans. @Deprecated annotation indicates that the marked element is deprecated and should no longer
be used. The compiler generates a warning whenever a program uses a method, class, or field with
the @Deprecated annotation.
@Override annotation informs the compiler that the element is meant to override an element declared
in a superclass.
@SuppressWarnings annotation tells the compiler to suppress specific warnings that it would
otherwise generate.
@SafeVarargs annotation, when applied to a method or constructor, asserts that the code does not
perform potentially unsafe operations on its varargsparameter. When this annotation type is used,
unchecked warnings relating to varargs usage are suppressed.
@FunctionalInterface annotation, introduced in Java SE 8, indicates that the type declaration is
intended to be a functional interface, as defined by the Java Language Specification.
Q89. What are meta Annotations ?show Answer

Ans. Annotations that apply to other annotations are called meta-annotations.
Q90. Name few meta-annotations ?show Answer
Ans. @Retention annotation specifies how the marked annotation is stored:
@Documented annotation indicates that whenever the specified annotation is used those elements
should be documented using the Javadoc tool. (By default, annotations are not included in Javadoc.)
@Target annotation marks another annotation to restrict what kind of Java elements the annotation
can be applied to.
@Inherited annotation indicates that the annotation type can be inherited from the super class. (This
is not true by default.) When the user queries the annotation type and the class has no annotation for
this type, the class' superclass is queried for the annotation type. This annotation applies only to class
declarations.
@Repeatable annotation, introduced in Java SE 8, indicates that the marked annotation can be
applied more than once to the same declaration or type use. For more information, see Repeating
Annotations.
Q91. How to display and set the Class path in Unix ?show Answer
Ans. To display the current CLASSPATH variable, use these commands in UNIX (Bourne shell):
% echo $CLASSPATH
To delete the current contents of the CLASSPATH variable,
In UNIX: % unset CLASSPATH; export CLASSPATH
To set the CLASSPATH variable,
In UNIX: % CLASSPATH=/home/george/java/classes; export CLASSPATH
Q92. Difference between Abstract and Concrete Class ?show Answer

Ans. Abstract classes are only meant to be sub classed and not meant to be instantiated whereas
concrete classes are meant to be instantiated.
Q93. Difference between Overloading and Overriding ?show Answer
Ans. Overloading - Similar Signature but different definition , like function overloading.
Overriding - Overriding the Definition of base class in the derived class.
Q94. Difference between Vector and ArrayList ?show Answer
Ans. Vectors are synchronized whereas Array lists are not.
Q95. Different ways of implementing Threads in Java ?show Answer
Ans. Threads in Java can be implement either by Extending Thread class or implementing runnable
interface.
Q96. What is Volatile keyword used for ?show Answer
Ans. Volatile is a declaration that a variable can be accessed by multiple threads and hence shouldn't
be cached.
Q97. What is Serialization ?show Answer
Ans. Storing the state of an object in a file or other medium is called serialization.
Q98. What is the use of Transient Keyword ?show Answer

Ans. It in Java is used to indicate that a field should not be serialized.
Q99. What is a final variable ?show Answer
Ans. Final variable is a constant variable. Variable value can't be changed after instantiation.
Q100. What is a Final Method ?show Answer
Ans. A Method that cannot be overriden in the sub class.
Q101. What is a Final Class ?show Answer
Ans. A Class that cannot be sub classed.
Q102. What is an Immutable Object ?show Answer
Ans. Object that can't be changed after instantiation.
Q103. What is an immutable class ?show Answer
Q104. How to implement an immutable class ?show Answer
Ans. We can make a class immutable by
1. Making all methods and variables as private.
2. Setting variables within constructor.
Public Class ImmutableClass{
private int member;
ImmutableClass(int var){
member=var;
}
}
and then we can initialize the object of the class as
ImmutableClass immutableObject = new ImmutableClass(5);
Now all members being private , you can't change the state of the object.
Q105. Does Declaring an object "final" makes it immutable ?show Answer

Ans. Only declaring primitive types as final makes them immutable. Making objects final means that
the object handler cannot be used to target some other object but the object is still mutable.
Q106. Difference between object instantiation and construction ?show Answer
Ans. Though It's often confused with each other, Object Creation ( Instantiation ) and Initialization
( Construction ) are different things in Java. Construction follows object creation.
Object Creation is the process to create the object in memory and returning its handler. Java provides
New keyword for object creation.
Initialization is the process of setting the initial / default values to the members. Constructor is used for
this purpose. If we don't provide any constructor, Java provides one default implementation to set the
default values according to the member data types.
Q107. Can we override static methods ? Why ?show Answer

Ans. No.
Static methods belong to the class and not the objects. They belong to the class and hence doesn't fit
properly for the polymorphic behavior.
A static method is not associated with any instance of a class so the concept of overriding for runtime
polymorphism using static methods is not applicable.
Q108. Can we access instance variables within static methods ?show Answer
Ans. Yes.
we cannot access them directly but we can access them using object reference.
Static methods belong to a class and not objects whereas non static members are tied to an instance.
Accessing instance variables without the instance handler would mean an ambiguity regarding which
instance the method is referring to and hence its prohibited.
Q109. Can we reduce the visibility of the inherited or overridden method ?show Answer
Ans. No.
Q110. Give an Example of checked and unchecked exception ?show Answer
Ans. ClassNotFoundException is checked exception whereas NoClassDefFoundError is a unchecked
exception.
Q111. Name few Java Exceptions ?show Answer
Ans. IndexOutofBound , NoClassDefFound , OutOfMemory , IllegalArgument.
Q112. Which of the following is tightly bound ? Inheritance or Composition ?show Answer
Ans. Inheritence.
Q113. How can we make sure that a code segment gets executed even in case of uncatched
exceptions ?show Answer
Ans. By putting it within finally.
Q114. Explain the use of "Native" keyword ?show Answer
Ans. Used in method declarations to specify that the method is not implemented in the same Java
source file, but rather in another language
Q115. What is "super" used for ?
show Answer
Ans. Used to access members of the base class.
Q116. What is "this" keyword used for ?
show Answer
Ans. Used to represent an instance of the class in which it appears.
Q117. Difference between boolean and Boolean ?show Answer
Ans. boolean is a primitive type whereas Boolean is a class.
Q118. What is a finalize method ?show Answer
Ans. finalize() method is called just before an object is destroyed.
Q119. What are Marker Interfaces ? Name few Java marker interfaces ?show Answer
Ans. These are the interfaces which have no declared methods.
Serializable and cloneable are marker interfaces.
Q120. Is runnable a Marker interface ?show Answer
Ans. No , it has run method declared.
Q121. Difference between Process and Thread ?show Answer
Ans. Process is a program in execution whereas thread is a separate path of execution in a program.
Q122. What is a Deadlock ?
show Answer
Ans. When two threads are waiting each other and cant precede the program is said to be deadlock.
Q123. Difference between Serialization and Deserialization ?
show Answer
Ans. Serialization is the process of writing the state of an object to a byte stream. Deserialization is
the process of restoring these objects.
Q124. Explain Autoboxing ?
show Answer
Ans. Autoboxing is the automatic conversion that the Java compiler makes between the primitive
types and their corresponding object wrapper classes
Q125. What is an Enum type ?

show Answer
Ans. An enum type is a special data type that enables for a variable to be a set of predefined
constants
Q126. What are Wrapper Classes ? What are Primitive Wrapper Classes ?
show Answer
Ans. A wrapper class is any class which "wraps" or "encapsulates" the functionality of another class or
component. A Wrapper Class that wraps or encapsulates the primitive data type is called Primitive
Wrapper Class.
Q127. What Design pattern Wrapper Classes implement ?show Answer
Ans. Adapter.
Q128. What is "Import" used for ?
show Answer
Ans. Enables the programmer to abbreviate the names of classes defined in a package.
Q129. Different types of memory used by JVM ?show Answer
Ans. Class , Heap , Stack , Register , Native Method Stack.
Q130. What is a class loader ? What are the different class loaders used by JVM ?show Answer
Ans. Part of JVM which is used to load classes and interfaces.
Bootstrap , Extension and System are the class loaders used by JVM.
Q131. Can we declare interface methods as private ?show Answer
Ans. No.
Q132. What is a Static import ?show Answer
Ans. By static import , we can access the static members of a class directly without prefixing it with
the class name.
Q133. Difference between StringBuffer and StringBuilder ?show Answer
Ans. StringBuffer is synchronized whereas StringBuilder is not.
Q134. Difference between Map and HashMap ?
show Answer
Ans. Map is an interface where HashMap is the concrete class.
Q135. What is a Property class ?show Answer
Ans. The properties class is a subclass of Hashtable that can be read from or written to a stream.
Q136. Explain the scenerios to choose between String , StringBuilder and StringBuffer ?show
Answer
Ans. If the Object value will not change in a scenario use String Class because a String object is
immutable.
If the Object value can change and will only be modified from a single thread, use a StringBuilder
because StringBuilder is unsynchronized(means faster).
If the Object value may change, and can be modified by multiple threads, use a StringBuffer because
StringBuffer is thread safe(synchronized).
Q137. Explain java.lang.OutOfMemoryError ?
show Answer
Ans. This Error is thrown when the Java Virtual Machine cannot allocate an object because it is out of
memory, and no more memory could be made available by the garbage collector.
Q138. Can we have multiple servlets in a web application and How can we do that ?show
Answer
Ans. Yes by making entries in web.xml
Q139. How can we manage Error Messages in the web application ?show Answer
Ans. Within message.properties file.
Q140. Is JVM, a compiler or interpretor ?
show Answer
Ans. Its an interpretor.
Q141. Difference between implicit and explicit type casting ?
show Answer
Ans. An explicit conversion is where you use some syntax to tell the program to do a conversion
whereas in case of implicit type casting you need not provide the data type.
Q142. Difference between loadClass and Class.forName ?show Answer
Ans. loadClass only loads the class but doesn't initialize the object whereas Class.forName initialize
the object after loading it.
Q143. Should we override finalize method ?
show Answer
Ans. Finalize is used by Java for Garbage collection. It should not be done as we should leave the
Garbage Collection to Java itself.
Q144. What is assert keyword used for ?show Answer
Ans. The assert keyword is used to make an assertiona statement which the programmer believes
is always true at that point in the program. This keyword is intended to aid in testing and debugging.
Q145. Difference between Factory and Abstract Factory Design Pattern ?show Answer
Ans. Factory Pattern deals with creation of objects delegated to a separate factory class whereas
Abstract Factory patterns works around a super-factory which creates other factories.
Q146. Difference between Factory and Builder Design Pattern ?show Answer
Ans. Builder pattern is the extension of Factory pattern wherein the Builder class builds a complex
object in multiple steps.
Q147. Difference between Proxy and Adapter ?show Answer
Ans. Adapter object has a different input than the real subject whereas Proxy object has the same
input as the real subject. Proxy object is such that it should be placed as it is in place of the real
subject.
Q148. Difference between Adapter and Facade ?
show Answer
Ans. The Difference between these patterns in only the intent. Adapter is used because the objects in
current form cannot communicate where as in Facade , though the objects can communicate , A
Facade object is placed between the client and subject to simplify the interface.
Q149. Difference between Builder and Composite ?
show Answer
Ans. Builder is a creational Design Pattern whereas Composite is a structural design pattern.
Composite creates Parent - Child relations between your objects while Builder is used to create group
of objects of predefined types.
Q150. Example of Chain of Responsibility Design Pattern ?
show Answer
Ans. Exception Handling Throw mechanism.
Q151. Example of Observer Design Pattern ?
show Answer
Ans. Listeners.
Q152. Difference between Factory and Strategy Design Pattern ?show Answer
Ans. Factory is a creational design pattern whereas Strategy is behavioral design pattern. Factory
revolves around the creation of object at runtime whereas Strategy or Policy revolves around the
decision at runtime.
Q153. Shall we use abstract classes or Interfaces in Policy / Strategy Design Pattern ?show
Answer
Ans. Strategy deals only with decision making at runtime so Interfaces should be used.
Q154. Which kind of memory is used for storing object member variables and function local
variables ?show Answer
Ans. Local variables are stored in stack whereas object variables are stored in heap.
Q155. Why do member variables have default values whereas local variables don't have any
default value ?
show Answer
Ans. member variable are loaded into heap, so they are initialized with default values when an
instance of a class is created. In case of local variables, they are stored in stack until they are being
used.
Q156. What is a Default Constructor ?show Answer
Ans. The no argument constructor provided by Java Compiler if no constructor is specified.
Q157. Will Compiler creates a default no argument constructor if we specify only multi
argument constructor ?show Answer
Ans. No, Compiler will create default constructor only if we don't specify any constructor.
Q158. Can we overload constructors ?
show Answer
Ans. Yes.
Q159. What will happen if we make the constructor private ?show Answer
Ans. We can't create the objects directly by invoking new operator.
Q160. How can we create objects if we make the constructor private ?show Answer
Ans. We can do so through a static public member method or static block.
Q161. What will happen if we remove the static keyword from main method ?
show Answer
Ans. Program will compile but will give a "NoSuchMethodError" during runtime.
Q162. Why Java don't use pointers ?show Answer
Ans. Pointers are vulnerable and slight carelessness in their use may result in memory problems and
hence Java intrinsically manage their use.
Q163. Can we use both "this" and "super" in a constructor ?show Answer
Ans. No, because both this and super should be the first statement.
Q164. Do we need to import java.lang.package ?show Answer
Ans. No, It is loaded by default by the JVM.

Q165. Is it necessary that each try block to be followed by catch block ? show Answer
Ans. It should be followed by either catch or finally block.
Q166. Can finally block be used without catch ?show Answer
Ans. Yes but should follow "try" block then.
Q167. What is exception propogation ?show Answer
Ans. Passing the exception object to the calling method.
Q168. Difference between nested and inner classes ?show Answer
Ans. Inner classes are non static nested classes.
Q169. What is a nested interface ?
show Answer
Ans. Any interface declared inside a class or an interface. It is static by default.
Q170. What is an Externalizable interface ?show Answer
Ans. Externalizable interface is used to write the state of an object into a byte stream in compressed
format.
Q171. Difference between serializable and externalizable interface ?show Answer

Ans. Serializable is a marker interface whereas externalizable is not.
Q172. What is reflection ?show Answer
Ans. It is the process of examining / modifying the runtime behaviour of an object at runtime.
Q173. Can we instantiate the object of derived class if parent constructor is protected ?
show Answer
Ans. No
Q174. Can we declare an abstract method private ?
show Answer
Ans. No Abstract methods can only be declared protected or public.
Q175. What are the design considerations while making a choice between using interface and
abstract class ?show Answer
Ans. Keep it as a Abstract Class if its a "Is a" Relationsship and should do subset/all of the
functionality. Keep it as Interface if its a "Should Do" relationship.
Q176. What is a config Object? show Answer

Ans. The config object is an instantiation of javax.servlet.ServletConfig and is a direct wrapper around
the ServletConfig object for the generated servlet. This object allows the JSP programmer access to
the Servlet or JSP engine initialization parameters such as the paths or file location.
Q177. What is a pageContext Object? show Answer
Ans. The pageContext object is an instance of a javax.servlet.jsp.PageContext object. The
pageContext object is used to represent the entire JSP page. This object stores references to the
request and response objects for each request. The application, config, session, and out objects are
derived by accessing attributes of this object.The pageContext object also contains information about
the directives issued to the JSP page, including the buffering information, the errorPageURL, and
page scope.
Q178. What is suspend() method used for ?show Answer
Ans. suspend() method is used to suspend the execution of a thread for a period of time. We can then
restart the thread by using resume() method.
Q179. Difference between suspend() and stop() ?show Answer
Ans. Suspend method is used to suspend thread which can be restarted by using resume() method.
stop() is used to stop the thread, it cannot be restarted again.
Q180. What are the benefits of using Spring Framework ?show Answer
Ans. Spring enables developers to develop enterprise-class applications using POJOs. The benefit of
using only POJOs is that you do not need an EJB container product.
Spring is organized in a modular fashion. Even though the number of packages and classes are
substantial, you have to worry only about ones you need and ignore the rest.
Spring does not reinvent the wheel instead, it truly makes use of some of the existing technologies
like several ORM frameworks, logging frameworks, JEE, Quartz and JDK timers, other view
technologies.
Testing an application written with Spring is simple because environment-dependent code is moved
into this framework. Furthermore, by using JavaBean-style POJOs, it becomes easier to use
dependency injection for injecting test data.
Springs web framework is a well-designed web MVC framework, which provides a great alternative to
web frameworks such as Struts or other over engineered or less popular web frameworks.
Spring provides a convenient API to translate technology-specific exceptions (thrown by JDBC,
Hibernate, or JDO, for example) into consistent, unchecked exceptions.
Lightweight IoC containers tend to be lightweight, especially when compared to EJB containers, for
example. This is beneficial for developing and deploying applications on computers with limited
memory and CPU resources.
Spring provides a consistent transaction management interface that can scale down to a local
transaction
Q181. what is the difference between collections class vs collections interface ?show Answer
Ans. Collections class is a utility class having static methods for doing operations on objects of
classes which implement the Collection interface. For example, Collections has methods for finding
the max element in a Collection.
Q182. Will this code give error if i try to add two heterogeneous elements in the arraylist. ? and
Why ?
List list1 = new ArrayList<>();
list1.add(5);
list1.add("5");show Answer
Ans. If we don't declare the list to be of specific type, it treats it as list of objects.
int 1 is auto boxed to Integer and "1" is String and hence both are objects.
Q183. Difference between Java beans and Spring Beans ?show Answer
Q184. What is the difference between System.console.write and System.out.println ?show
Answer
Ans. System.console() returns null if your application is not run in a terminal (though you can handle
this in your application)
System.console() provides methods for reading password without echoing characters
System.out and System.err use the default platform encoding, while the Console class output
methods use the console encoding
Q185. What are various types of Class loaders used by JVM ?show Answer
Ans. Bootstrap - Loads JDK internal classes, java.* packages.
Extensions - Loads jar files from JDK extensions directory - usually lib/ext directory of the JRE
System - Loads classes from system classpath.
Q186. How are classes loaded by JVM ?show Answer
Ans. Class loaders are hierarchical. The very first class is specially loaded with the help of static
main() method declared in your class. All the subsequently loaded classes are loaded by the classes,
which are already loaded and running.
Q187. Difference between C++ and Java ?show Answer
Ans. Java does not support pointers.
Java does not support multiple inheritances.
Java does not support destructors but rather adds a finalize() method. Finalize methods are invoked
by the garbage collector prior to reclaiming the memory occupied by the object, which has the
finalize() method.
Java does not include structures or unions because the traditional data structures are implemented as
an object oriented framework.
C++ compiles to machine language , when Java compiles to byte code .
In C++ the programmer needs to worry about freeing the allocated memory , where in Java the
Garbage Collector takes care of the the unneeded / unused variables.
Java is platform independent language but c++ is depends upon operating system.
Java uses compiler and interpreter both and in c++ their is only compiler.
C++ supports operator overloading whereas Java doesn't.
Internet support is built-in Java but not in C++. However c++ has support for socket programming
which can be used.
Java does not support header file, include library files just like C++ .Java use import to include
different Classes and methods.
There is no goto statement in Java.
There is no scope resolution operator :: in Java. It has . using which we can qualify classes with the
namespace they came from.
Java is pass by value whereas C++ is both pass by value and pass by reference.
Java Enums are objects instead of int values in C++
C++ programs runs as native executable machine code for the target and hence more near to
hardware whereas Java program runs in a virtual machine.
C++ was designed mainly for systems programming, extending the C programming language
whereas Java was created initially to support network computing.
C++ allows low-level addressing of data. You can manipulate machine addresses to look at anything
you want. Java access is controlled.
C++ has several addressing operators . * & -> where Java has only one: the .
We can create our own package in Java(set of classes) but not in c and c++.
Q188. Difference between static vs. dynamic class loading?show Answer
Ans. static loading - Classes are statically loaded with Javas new operator.
dynamic class loading - Dynamic loading is a technique for programmatically invoking the functions of
a class loader at run time.
Class.forName (Test className);
Q189. Tell something about BufferedWriter ? What are flush() and close() used for ?show
Answer
Ans. A Buffer is a temporary storage area for data. The BufferedWriter class is an output stream.It is
an abstract class that creates a buffered character-output stream.
Flush() is used to clear all the data characters stored in the buffer and clear the buffer.
Close() is used to closes the character output stream.
Q190. What is Scanner class used for ? when was it introduced in Java ?show Answer
Ans. Scanner class introduced in Java 1.5 for reading Data Stream from the imput device. Previously
we used to write code to read a input using DataInputStream. After reading the stream , we can
convert into respective data type using in.next() as String ,in.nextInt() as integer, in.nextDouble() as
Double etc
Q191. Why Struts 1 Classes are not Thread Safe whereas Struts 2 classes are thread safe ?
show Answer
Ans. Struts 1 actions are singleton. So all threads operates on the single action object and hence
makes it thread unsafe.
Struts 2 actions are not singleton and a new action object copy is created each time a new action
request is made and hence its thread safe.
Q192. What are some Java related technologies used for distributed computing ?show Answer
Ans. sockets, RMI. EJB
Q193. Whats the purpose of marker interfaces ?show Answer
Ans. They just tell the compiler that the objects of the classes implementing the interfaces with no
defined methods need to be treated differently.
Q194. What is the difference between final, finally and finalize() ?show Answer
Ans. final - constant variable, restricting method overloading, restricting class subclassing.
finally - handles exception. The finally block is optional and provides a mechanism to clean up
regardless of what happens within the try block. Use the finally block to close files or to release
other system resources like database connections, statements etc.
finalize() - method helps in garbage collection. A method that is invoked before an object is discarded
by the garbage collector, allowing it to clean up its state.
Q195. When do you get ClassCastException?show Answer
Ans. As we only downcast class in the hierarchy, The ClassCastException is thrown to indicate that
code has attempted to cast an object to a subclass of which it is not an instance.
Q196. Explain Thread States ?show Answer
Ans. Runnable - waiting for its turn to be picked for execution by the thread schedular based on
thread priorities.
Running - The processor is actively executing the thread code. It runs until it becomes blocked, or
voluntarily gives up its turn.
Waiting: A thread is in a blocked state while it waits for some external processing such as file I/O to
finish.
Sleeping - Java threads are forcibly put to sleep (suspended) with Thread.sleep. they can resume
using Thread.resume method.
Blocked on I/O - Will move to runnable after I/O condition like reading bytes of data etc changes.
Blocked on synchronization - Will move to Runnable when a lock is acquired.
Dead - The thread is finished working.
Q197. What are strong, soft, weak and phantom references in Java ?show Answer
Ans. Garbage Collector wont remove a strong reference.
A soft reference will only get removed if memory is low.
A weak reference will get removed on the next garbage collection cycle.
A phantom reference will be finalized but the memory will not be reclaimed. Can be useful when you
want to be notified that an object is about to be collected.
Q198. Difference between yield() and sleeping()? show Answer
Ans. When a task invokes yield(), it changes from running state to runnable state. When a task
invokes sleep(), it changes from running state to waiting/sleeping state.
Q199. What is a daemon thread? Give an Example ?show Answer
Ans. These are threads that normally run at a low priority and provide a basic service to a program or
programs when activity on a machine is reduced. garbage collector thread is daemon thread.
Q200. What is the difference between AWT and Swing?show Answer
Ans. Swing provides both additional components like JTable, JTree etc and added functionality to
AWT-replacement components.
Swing components can change their appearance based on the current look and feel library thats
being used.
Swing components follow the MVC paradigm, and thus can provide a much more flexible UI.
Swing provides extras for components, such as icons on many components, decorative borders for
components, tool tips for components etc.
Swing components are lightweight than AWT.
Swing provides built-in double buffering ,which means an off-screen buffer is used during drawing and
then the resulting bits are copied onto the screen.
Swing provides paint debugging support for when you build your own component.
Q201. What is the order of method invocation in an applet?show Answer
Ans. public void init()
public void start()
public void stop()
public void destroy()
Q202. Name few tools for probing Java Memory Leaks ?show Answer
Ans. JProbe, OptimizeIt
Q203. Which memory areas does instance and static variables use ?show Answer
Ans. instance variables are stored on stack whereas static variables are stored on heap.
Q204. What is J2EE? What are J2EE components and services?show Answer
Ans. J2EE or Java 2 Enterprise Edition is an environment for developing and deploying enterprise
applications. The J2EE platform consists of J2EE components, services, Application Programming
Interfaces (APIs) and protocols that provide the functionality for developing multi-tiered and distributed
Web based applications.
Q205. What are the components of J2EE ?show Answer
Ans. applets
Client component like Client side Java codes.
Web component like JSP, Servlet WAR
Enterprise JavaBeans like Session beans, Entity beans, Message driven beans
Enterprise application like WAR, JAR, EAR
Q206. What is XML ?show Answer
Ans. XML or eXtensible Markup Language is a markup languages for describing data and its
metadata.
Q207. Difference between SAX and DOM Parser ?show Answer
Ans. A DOM (Document Object Model) parser creates a tree structure in memory from an input
document whereas A SAX (Simple API for XML) parser does not create any internal structure.
A SAX parser serves the client application always only with pieces of the document at any given time
whereas A DOM parser always serves the client application with the entire document no matter how
much is actually needed by the client.
A SAX parser, however, is much more space efficient in case of a big input document whereas DOM
parser is rich in functionality.
Use a DOM Parser if you need to refer to different document areas before giving back the information.
Use SAX is you just need unrelated nuclear information from different areas.
Xerces, Crimson are SAX Parsers whereas XercesDOM, SunDOM, OracleDOM are DOM parsers.
Q208. What is DTD ?show Answer
Ans. DTD or Document Type Definition is a standard agreed upon way of communication between
two parties. Your application can use a standard DTD to verify that data that you receive
from the outside world is valid and can be parsed by your parser.
Q209. What is XSD ?show Answer
Ans. XSD or Xml Schema Definition is an extension of DTD. XSD is more powerful and extensible
than DTD
Q210. What is JAXP ?show Answer
Ans. Stands for Java API for XML Processing. This provides a common interface for creating and
using SAX, DOM, and XSLT APIs in Java regardless of which vendors implementation is actually
being used.
Q211. What is JAXB ?show Answer
Ans. Stands for Java API for XML Binding. This standard defines a mechanism for writing out Java
objects as XML and for creating Java objects from XML structures.
Q212. What is marshalling ?show Answer
Ans. Its the process of creating XML structures out of Java Objects.
Q213. What is unmarshalling ?show Answer
Ans. Its the process of creating Java Objects out of XML structures.
Q214. Which load testing tools have you used ?show Answer
Ans. Rational Robot, JMeter, LoadRunner.

Q215. What are LDAP servers used for ?show Answer
Ans. LDAP servers are typically used in J2EE applications to authenticate and authorise users. LDAP
servers are hierarchical and are optimized for read access, so likely to be faster than database in
providing read access.
Q216. What is the difference between comparable and comparator in java.util pkg?show
Answer
Ans. Comparable interface is used for single sequence sorting i.e.sorting the objects based on single
data member where as comparator interface is used to sort the object based on multiple data
members.
Q217. What are different modules of spring ?show Answer
Ans. There are seven core modules in spring
Spring MVC
The Core container
O/R mapping
DAO
Application context
Aspect Oriented Programming or AOP
Web module
Q218. Explain Flow of Spring MVC ?show Answer

Ans. The DispatcherServlet configured in web.xml file receives the request.
The DispatcherServlet finds the appropriate Controller with the help of HandlerMapping and then
invokes associated Controller.
Then the Controller executes the logic business logic and then returns ModeAndView object to the
DispatcherServlet.
The DispatcherServlet determines the view from the ModelAndView object.
Then the DispatcherServlet passes the model object to the View.
The View is rendered and the Dispatcher Servlet sends the output to the Servlet container. Finally
Servlet Container sends the result back to the user.
Q219. What is Spring configuration file?
show Answer
Ans. Spring configuration file is an XML file. This file contains the classes information and describes
how these classes are configured and introduced to each other.
Q220. Q: What is default scope of bean in Spring framework?show Answer

Ans. The default scope of bean is Sing leton for Spring framework.
Q221. What bean scopes does Spring support? Explain them.
show Answer
Ans. The Spring Framework supports following five scopes Singleton
prototype
request
session
global-session
Q222. What is bean auto wiring?show Answer

Ans. The Spring container is able to autowire relationships between collaborating beans. This means
that it is possible to automatically let Spring resolve collaborators (other beans) for your bean by
inspecting the contents of the BeanFactory without using and elements.
Q223. Difference between socket and servlet ?show Answer

Ans. servlet is a small, server-resident program that typically runs automatically in response to user
input.
A network socket is an endpoint of an inter-process communication flow across a computer network.
We can think of it as a difference between door and gate. They are similar as they both are entry
points but they are different as they are put up at different areas.
Sockets are for low-level network communication whereas Servlets are for implementing websites and
web services
Q224. Difference Between this() and super() ?
show Answer
Ans. 1.this is a reference to the current object in which this keyword is used whereas super is a
reference used to access members specific to the parent Class.
2.this is primarily used for accessing member variables if local variables have same name, for
constructor chaining and for passing itself to some method whereas super is primarily used to initialize
base class members within derived class constructor.
Q225. What are the phases of the JSP life cycle ?show Answer
Ans. Translation of JSP Page

Compilation of JSP Page
Classloading (class file is loaded by the classloader)
Instantiation (Object of the Generated Servlet is created).
Initialization ( jspInit() method is invoked by the container).
Reqeust processing ( _jspService() method is invoked by the container).
Destroy ( jspDestroy() method is invoked by the container).
Q226. Difference between the jsp scriptlet tag and jsp declaration tag?show Answer
Ans. The jsp scriptlet tag can only declare variables not methods whereas jsp declaration tag can
declare
variables as well as methods.
The declaration of scriptlet tag is placed inside the _jspService() method whereas The declaration of
jsp declaration tag is placed outside the _jspService() method.
Q227. What are JSP directives ? What are different types of directives ?show Answer
Ans. The jsp directives are messages that tells the web container how to translate a JSP page into the
corresponding servlet.
There are three types of directives page directive
include directive
taglib directive
Q228. What is Java bytecode ?show Answer

Ans. Java bytecode is the usual name for the machine language of the Java Virtual
Machine. Java programs are compiled into Java bytecode, which can then be executed
by the JVM.
Q229. What is a Listener ?show Answer

Ans. In GUI programming, an object that can be registered to be notified when events of
some given type occur. The object is said to listen for the events.
Q230. What is MVC ? show Answer
Ans. The Model/View/Controller pattern, a strategy for dividing responsibility in a GUI component. The
model is the data for the component. The view is the visual presentation of the component on the
screen. The controller is responsible for reacting to events by changing the model. According to the
MVC pattern, these responsibilities should be handled by different objects.
Q231. What is race condition ?show Answer
Ans. A source of possible errors in parallel programming, where one thread can cause an error in
another thread by changing some aspect of the state of the program that the second thread is
depending on (such as the value of variable).
Q232. What is unicode ?show Answer
Ans. A way of encoding characters as binary numbers. The Unicode character set includes
characters used in many languages, not just English. Unicode is the character set that is
used internally by Java.
Q233. What is ThreadFactory ?show Answer
Ans. ThreadFactory is an interface that is meant for creating threads instead of explicitly creating
threads by calling new Thread(). Its an object that creates new threads on demand. Using thread
factories removes hardwiring of calls to new Thread, enabling applications to use special thread
subclasses, priorities, etc.
Q234. What is PermGen or Permanent Generation ?show Answer
Ans. The memory pool containing all the reflective data of the java virtual machine itself, such as class
and method objects. With Java VMs that use class data sharing, this generation is divided into readonly and read-write areas. The Permanent generation contains metadata required by the JVM to
describe the classes and methods used in the application. The permanent generation is populated by
the JVM at runtime based on classes in use by the application. In addition, Java SE library classes
and methods may be stored here.
Q235. What is metaspace ?show Answer
Ans. The Permanent Generation (PermGen) space has completely been removed and is kind of
replaced by a new space called Metaspace. The consequences of the PermGen removal is that
obviously the PermSize and MaxPermSize JVM arguments are ignored and you will never get a
java.lang.OutOfMemoryError: PermGen error.
Q236. What is the benefit of inner / nested classes ?show Answer
Ans. You can put related classes together as a single logical group.
Nested classes can access all class members of the enclosing class, which might be useful in certain
cases.
Nested classes are sometimes useful for specific purposes. For example, anonymous inner classes
are useful for writing simpler event-handling code with AWT/Swing.
Q237. Explain Static nested Classes ?show Answer
Ans. The accessibility (public, protected, etc.) of the static nested class is defined by the outer class.
A static nested class is not an inner class, it's a top-level nested class.
The name of the static nested class is expressed with OuterClassName.NestedClassName syntax.
When you define an inner nested class (or interface) inside an interface, the nested class is declared
implicitly public and static.
Static nested classes can be declared abstract or final.
Static nested classes can extend another class or it can be used as a base class.
Static nested classes can have static members.
Static nested classes can access the members of the outer class (only static members, obviously).
The outer class can also access the members (even private members) of the nested class through an
object of nested class. If you dont declare an instance of the nested class, the outer class cannot
access nested class elements directly.
Q238. Explain Inner Classes ?show Answer
Ans. The accessibility (public, protected, etc.) of the inner class is defined by the outer class.
Just like top-level classes, an inner class can extend a class or can implement interfaces. Similarly, an
inner class can be extended by other classes, and an inner interface can be implemented or extended
by other classes or interfaces.
An inner class can be declared final or abstract.
Inner classes can have inner classes, but youll have a hard time reading or understanding such
complex nesting of classes.
Q239. Explain Method Local Inner Classes ?show Answer
Ans. You can create a non-static local class inside a body of code. Interfaces cannot have local
classes, and you cannot create local interfaces.
Local classes are accessible only from the body of the code in which the class is defined. The local
classes are completely inaccessible outside the body of the code in which the class is defined.
You can extend a class or implement interfaces while defining a local class.
A local class can access all the variables available in the body of the code in which it is defined. You
can pass only final variables to a local inner class.
Q240. Explain about anonymous inner classes ?show Answer
Ans. Anonymous classes are defined in the new expression itself, so you cannot create multiple
objects of an anonymous class.
You cannot explicitly extend a class or explicitly implement interfaces when defining an anonymous
class.
An anonymous inner class is always created as part of a statement; don't forget to close the
statement after the class definition with a curly brace. This is a rare case in Java, a curly brace
followed by a semicolon.
Anonymous inner classes have no name, and their type must be either a subclass of the named type
or an implementer of the named interface
Q241. What will happen if class implement two interface having common method?show
Answer
Ans. That would not be a problem as both are specifying the contract that implement class has to
follow.
If class C implement interface A & interface B then Class C thing I need to implement print() because
of interface A then again Class think I need to implement print() again because of interface B, it sees
that there is already a method called test() implemented so it's satisfied.
Q242. What is the advantage of using arrays over variables ?show Answer
Ans. Arrays provide a structure wherein multiple values can be accessed using single reference and
index. This helps in iterating over the values using loops.
Q243. What are the disadvantages of using arrays ?show Answer
Ans. Arrays are of fixed size and have to reserve memory prior to use. Hence if we don't know size in
advance arrays are not recommended to use.
Arrays can store only homogeneous elements.
Arrays store its values in contentious memory location. Not suitable if the content is too large and
needs to be distributed in memory.
There is no underlying data structure for arrays and no ready made method support for arrays, for
every requriment we need to code explicitly
Q244. Difference between Class#getInstance() and new operator ?show Answer
Ans. Class.getInstance doesn't call the constructor whereas if we create an object using new operator
, we need to have a matching constructor or copiler should provide a default constructor.
Q245. Can we create an object if a Class doesn't have any constructor ( not even the default
provided by constructor ) ?show Answer
Ans. Yes , using Class.getInstance.
Q246. What is a cloneable interface and what all methods does it contain?show Answer
Ans. It is not having any method because it is a MARKER interface.
Q247. When you will synchronize a piece of your code?show Answer

Ans. When you expect your code will be accessed by different threads and these threads may change
a particular data causing data corruption.
Q248. Are there any global variables in Java, which can be accessed by other part of your
program?show Answer
Ans. No. Global variables are not allowed as it wont fit good with the concept of encapsulation.
Q249. What is an applet? What is the lifecycle of an applet?show Answer
Ans. Applet is a dynamic and interactive program that runs inside a web page displayed by a java
capable browser.
Lifecycle methods of Applet init( ) method - Can be called when an applet is first loaded
start( ) method - Can be called each time an applet is started
paint( ) method - Can be called when the applet is minimized or maximized
stop( ) method - Can be used when the browser moves off the applet's page
destroy( ) method - Can be called when the browser is finished with the applet
Q250. What is meant by controls and what are different types of controls in AWT / SWT?show
Answer
Ans. Controls are components that allow a user to interact with your application and SWT / AWT
supports the following types of controls:
Labels, Push Buttons, Check Boxes, Choice Lists, Lists, Scrollbars, Text Components.
These controls are subclasses of Component.
Q251. What is a stream and what are the types of Streams and classes of the Streams?show
Answer
Ans. A Stream is an abstraction that either produces or consumes information. There are two types of
Streams :
Byte Streams: Provide a convenient means for handling input and output of bytes.
Character Streams: Provide a convenient means for handling input & output of characters.
Byte Streams classes: Are defined by using two abstract classes, namely InputStream and
OutputStream.
Character Streams classes: Are defined by using two abstract classes, namely Reader and Writer.
Q252. What is session tracking and how do you track a user session in servlets?show Answer
Ans. Session tracking is a mechanism that servlets use to maintain state about a series requests from
the same user across some period of time. The methods used for session tracking are:
User Authentication - occurs when a web server restricts access to some of its resources to only
those clients that log in using a recognized username and password
Hidden form fields - fields are added to an HTML form that are not displayed in the client's browser.
When the form containing the fields is submitted, the fields are sent back to the server
URL rewriting - every URL that the user clicks on is dynamically modified or rewritten to include extra
information. The extra information can be in the form of extra path information, added parameters or
some custom, server-specific URL change.
Cookies - a bit of information that is sent by a web server to a browser and which can later be read
back from that browser.
HttpSession- places a limit on the number of sessions that can exist in memory.
Q253. What is connection pooling?show Answer
Ans. It's a technique to allow multiple clients to make use of a cached set of shared and reusable
connection objects providing access to a database or other resource.
Q254. Advantage of Collection classes over Arrays ?show Answer
Ans. Collections are re-sizable in nature. We can increase or decrease the size as per recruitment.
Collections can hold both homogeneous and heterogeneous data's.
Every collection follows some standard data structures.
Collection provides many useful built in methods for traversing,sorting and search.
Q255. What are the Disadvantages of using Collection Classes over Arrays ?show Answer
Ans. Collections can only hold objects, It can't hold primitive data types.
Collections have performance overheads as they deal with objects and offer dynamic memory
expansion. This dynamic expansion could be a bigger overhead if the collection class needs
consecutive memory location like Vectors.
Collections doesn't allow modification while traversal as it may lead to
concurrentModificationException.
Q256. Can we call constructor explicitly ?show Answer

Ans. Yes.
Q257. Does a class inherit the constructor of its super class?show Answer
Ans. No.
Q258. What is the difference between float and double?show Answer
Ans. Float can represent up to 7 digits accurately after decimal point, where as double can represent
up to 15 digits accurately after decimal point.
Q259. What is the difference between >> and >>>?show Answer
Ans. Both bitwise right shift operator ( >> ) and bitwise zero fill right shift operator ( >>> ) are used to
shift the bits towards right. The difference is that >> will protect the sign bit whereas the >>> operator
will not protect the sign bit. It always fills 0 in the sign bit.
Q260. What is the difference between System.out ,System.err and System.in?show Answer
Ans. System.out and System.err both represent the monitor by default and hence can be used to
send data or results to the monitor. But System.out is used to display normal messages and results
whereas System.err is used to display error messages and System.in represents InputStream object,
which by default represents standard input device, i.e., keyboard.
Q261. Is it possible to compile and run a Java program without writing main( ) method?show
Answer
Ans. Yes, it is possible by using a static block in the Java program.
Q262. What are different ways of object creation in Java ?show Answer
Ans. Using new operator - new xyzClass()
Using factory methods - xyzFactory.getInstance( )
Using newInstance( ) method - (Class.forName(xyzClass))emp.newInstance( )
By cloning an already available object - (xyzClass)obj1.clone( )
Q263. What is Generalization and Specialization in terms of casting ?show Answer

Ans. Generalization or UpCasting is a phenomenon where a sub class is prompted to a super class,
and hence becomes more general. Generalization needs widening or up-casting. Specialization or
DownCasting is a phenomenon where a super class is narrowed down to a sub class. Specialization
needs narrowing or down-casting.
Q264. Can we call the garbage collector explicitly ?
show Answer
Ans. Yes, We can call garbage collector of JVM to delete any unused variables and unreferenced
objects from memory using gc( ) method. This gc( ) method appears in both Runtime and System
classes of java.lang package.
Q265. How does volatile affect code optimization by compiler?show Answer
Ans. Volatile is an instruction that the variables can be accessed by multiple threads and hence
shouldn't be cached. As volatile variables are never cached and hence their retrieval cannot be
optimized.
Q266. Do you think that Java should have had pointers ?show Answer
Ans. Open ended Questions.

Q267. How would you go about debugging a NullPointerException?show Answer
Q268. How does Java differ from other programming languages you've worked with?show
Answer
Q269. Should good code be self-documenting, or is it the responsibility of the developer to
document it?show Answer
Q270. What are points to consider in terms of access modifier when we are overriding any
method?show Answer
Ans. 1. Overriding method can not be more restrictive than the overridden method.
reason : in case of polymorphism , at object creation jvm look for actual runtime object. jvm does not
look for reference type and while calling methods it look for overridden method.
If by means subclass were allowed to change the access modifier on the overriding method, then
suddenly at runtimewhen the JVM invokes the true object's version of the method rather than the
reference type's version then it will be problematic
2. In case of subclass and superclass define in different package, we can override only those method
which have public or protected access.
3. We can not override any private method because private methods can not be inherited and if
method can not be inherited then method can not be overridden.
Q271. what is covariant return type? show Answer
Ans. co-variant return type states that return type of overriding method can be subtype of the return
type declared in method of superclass. it has been introduced since jdk 1.5
Q272. How compiler handles the exceptions in overriding ?show Answer
Ans. 1)The overriding methods can throw any runtime Exception , here in the case of runtime
exception overriding method (subclass method) should not worry about exception being thrown by
superclass method.
2)If superclass method does not throw any exception then while overriding, the subclass method can
not throw any new checked exception but it can throw any runtime exception
3) Different exceptions in java follow some hierarchy tree(inheritance). In this case , if superclass
method throws any checked exception , then while overriding the method in subclass we can not
throw any new checked exception or any checked exception which are higher in hierarchy than the
exception thrown in superclass method
Q273. Why is Java considered Portable Language ?show Answer
Ans. Java is a portable-language because without any modification we can use Java byte-code in any
platform(which supports Java). So this byte-code is portable and we can use in any other major
platforms.
Q274. Tell something about history of Java ?show Answer
Ans. Java was initially found in 1991 by James Gosling, Sun Micro Systems. At first it was called as
"Oak". In 1995 then it was later renamed to "Java". java is a originally a platform independent
language. Currently Oracle, America owns Java.
Q275. How to find if JVM is 32 or 64 bit from Java program. ?show Answer
Ans. You can find JVM - 32 bit or 64 bit by using System.getProperty() from Java program.
Q276. Does every class needs to have one non parameterized constructor ?show Answer
Ans. No. Every Class only needs to have one constructor - With parameters or without parameters.
Compiler provides a default non parameterized constructor if no constructors is defined.
Q277. Difference between throw and throws ?show Answer
Ans. throw is used to explicitly throw an exception especially custom exceptions, whereas throws is
used to declare that the method can throw an exception.
We cannot throw multiple exceptions using throw statement but we can declare that a method can
throw multiple exceptions using throws and comma separator.
Q278. Can we use "this" within static method ? Why ?show Answer
Ans. No. Even though "this" would mean a reference to current object id the method gets called using
object reference but "this" would mean an ambiguity if the same static method gets called using Class
name.
Q279. Similarity and Difference between static block and static method ?show Answer
Ans. Both belong to the class as a whole and not to the individual objects. Static methods are
explicitly called for execution whereas Static block gets executed when the Class gets loaded by the
JVM.
Q280. What are the platforms supported by Java Programming Language?show Answer
Ans. Java runs on a variety of platforms, such as Windows, Mac OS, and the various versions of
UNIX/Linux like HP-Unix, Sun Solaris, Redhat Linux, Ubuntu, CentOS, etc
Q281. How Java provide high Performance ?show Answer
Ans. Java uses Just-In-Time compiler to enable high performance. Just-In-Time compiler is a program
that turns Java bytecode into instructions that can be sent directly to the processor.
Q282. What is IDE ? List few Java IDE ?show Answer
Ans. IDE stands of Integrated Development Environment. Few Java IDE's are WSAD ( Websphhere
Application Developer ) , RAD ( Rational Application Developer ) , Eclipse and Netbeans.
Q283. What is an Object ?show Answer
Ans. Object is a run time entity whose state is stored in fields and behavior is shown via methods.
Methods operate on an object's internal state and serve as the primary mechanism for object-toobject communication.
Q284. What is a Class ?show Answer
Ans. A class is a blue print or Mold using which individual objects are created. A class can contain
fields and methods to describe the behavior of an object.
Q285. According to Java Operator precedence, which operator is considered to be with
highest precedence?show Answer
Ans. Postfix operators i.e () [] . is at the highest precedence.
Q286. What data type Variable can be used in a switch statement ?show Answer
Ans. Variables used in a switch statement can only be a byte, short, int, or char.
Q287. What are the sub classes of Exception class?show Answer

Ans. The Exception class has two main subclasses : IOException class and RuntimeException Class.
Q288. How finally used under Exception Handling?show Answer

Ans. The finally keyword is used to create a block of code that follows a try block. A finally block of
code always executes, whether or not an exception has occurred.
Q289. What things should be kept in mind while creating your own exceptions in Java?show
Answer
Ans. All exceptions must be a child of Throwable.
If you want to write a checked exception that is automatically enforced by the Handle or Declare Rule,
you need to extend the Exception class.
You want to write a runtime exception, you need to extend the RuntimeException class.
Q290. What is Comparable Interface?show Answer

Ans. It is used to sort collections and arrays of objects using the collections.sort() and java.utils. The
objects of the class implementing the Comparable interface can be ordered.
Q291. Explain Set Interface?show Answer

Ans. It is a collection of element which cannot contain duplicate elements. The Set interface contains
only methods inherited from Collection and adds the restriction that duplicate elements are prohibited.
Q292. What is the difference between the Reader/Writer class hierarchy and the
InputStream/OutputStream class hierarchy?show Answer
Ans. The Reader/Writer class hierarchy is character-oriented, and the InputStream/OutputStream
class hierarchy is byte-oriented
Q293. What are use cases?show Answer

Ans. It is part of the analysis of a program and describes a situation that a program might encounter
and what behavior the program should exhibit in that circumstance.
Q294. Which Java operator is right associative?show Answer

Ans. The = operator is right associative.
Q295. What is the difference between a break statement and a continue statement?show
Answer
Ans. Break statement results in the termination of the statement to which it applies (switch, for, do, or
while). A continue statement is used to end the current loop iteration and return control to the loop
statement.
Q296. What is the purpose of the System class?show Answer

Ans. The purpose of the System class is to provide access to system resources.
Q297. Variable of the boolean type is automatically initialized as?show Answer

Ans. The default value of the boolean type is false.
Q298. Can try statements be nested?show Answer

Ans. Yes
Q299. What will happen if static modifier is removed from the signature of the main method?
show Answer
Ans. Program throws "NoSuchMethodError" error at runtime .
Q300. What is the Locale class?show Answer

Ans. The Locale class is used to tailor program output to the conventions of a particular geographic,
political, or cultural region
Q301. Define Network Programming?show Answer

Ans. It refers to writing programs that execute across multiple devices (computers), in which the
devices are all connected to each other using a network.
Q302. What are the advantages and Disadvantages of Sockets ?show Answer
Ans. Sockets are flexible and sufficient. Efficient socket based programming can be easily
implemented for general communications. It cause low network traffic.
Socket based communications allows only to send packets of raw data between applications. Both the
client-side and server-side have to provide mechanisms to make the data useful in any way.
Q303. What environment variables do I need to set on my machine in order to be able to run
Java programs?show Answer
Ans. CLASSPATH and PATH are the two variables.
Q304. What is Externalizable interface?show Answer

Ans. Externalizable is an interface which contains two methods readExternal and writeExternal. These
methods give you a control over the serialization mechanism.
Q305. What is the difference between the size and capacity of a Vector?show Answer
Ans. The size is the number of elements actually stored in the vector, while capacity is the maximum
number of elements it can store at a given instance of time.
Q306. What is an enum or enumeration?show Answer

Ans. An enumeration is an interface containing methods for accessing the underlying data structure
from which the enumeration is obtained. It allows sequential access to all the elements stored in the
collection.
Q307. What is constructor chaining and how is it achieved in Java?

show Answer
Ans. A child object constructor always first needs to construct its parent. In Java it is done via an
implicit call to the no-args constructor as the first statement
Q308. What is the best practice configuration usage for files - pom.xml or settings.xml ?show
Answer
Ans. The best practice guideline between settings.xml and pom.xml is that configurations in
settings.xml must be specific to the current user and that pom.xml configurations are specific to the
project.
Q309. Why Java provides default constructor ?show Answer
Ans. At the beginning of an object's life, the Java virtual machine (JVM) allocates memory on the heap
to accommodate the object's instance variables. When that memory is first allocated, however, the
data it contains is unpredictable. If the memory were used as is, the behavior of the object would also
be unpredictable. To guard against such a scenario, Java makes certain that memory is initialized, at
least to predictable default values before it is used by any code.
Q310. In a case where there are no instance variables what does the default constructor
initialize?show Answer
Ans. Java expects the superclass ( Object Class ) constructor to be called while creation of any
object. So super constructor is called in case there are no instance variables to initialize.
Q311. How can I change the default location of the generated jar when I command "mvn
package"?show Answer
Ans. By default, the location of the generated jar is in ${project.build.directory} or in your target
directory. We can change this by configuring the outputDirectory of maven-jar-plugin.
Q312. What is Maven's order of inheritance?show Answer
Ans. 1. parent pom
2. project pom
3. settings
4. CLI parameters
Q313. What is a Mojo?show Answer

Ans. A mojo is a Maven plain Old Java Object. Each mojo is an executable goal in Maven, and a
plugin is a distribution of one or more related mojos.
Q314. How do I determine which POM contains missing transitive dependency?show Answer
Ans. run mvn -X
Q315. Difference between Encapsulation and Data Hiding ?show Answer
Ans. Data Hiding is a broader concept. Encapsulation is a OOP's centri concept which is a way of
data hiding in OOP's.
Q316. Difference between Abstraction and Implementation hiding ?show Answer
Ans. Implementation Hiding is a broader concept. Abstraction is a way of implementation hiding in
OOP's.
Q317. What are the features of encapsulation ?show Answer
Ans. Combine the data of our application and its manipulation at one place.
Encapsulation Allow the state of an object to be accessed and modified through behaviors.
Reduce the coupling of modules and increase the cohesion inside them.
Q318. What are the examples of Abstraction in Java ?show Answer

Ans. function calling - hides implementation details
wrapper classes
new operator - Creates object in memory, calls constructor
Q319. What are different ways to create String Object? Explain.show Answer
Ans. String str = new String("abc");
String str1 = "abc";
When we create a String using double quotes, JVM looks in the String pool to find if any other String
is stored with same value. If found, it just returns the reference to that String object else it creates a
new String object with given value and stores it in the String pool.
When we use new operator, JVM creates the String object but dont store it into the String Pool. We
can use intern() method to store the String object into String pool or return the reference if there is
already a String with equal value present in the pool.
Q320. Write a method to check if input String is Palindrome?

show Answer
Ans. private static boolean isPalindrome(String str) {
if (str == null)
return false;
StringBuilder strBuilder = new StringBuilder(str);
strBuilder.reverse();
return strBuilder.toString().equals(str);
}
Q321. Write a method that will remove given character from the String?show Answer
Ans. private static String removeChar(String str, char c) {

if (str == null)
return null;
return str.replaceAll(Character.toString(c), "");
}
Q322. Which String class methods are used to make string upper case or lower case?show
Answer
Ans. toUpperCase and toLowerCase
Q323. How to convert String to byte array and vice versa?show Answer
Ans. We can use String getBytes() method to convert String to byte array and we can use String
constructor new String(byte[] arr) to convert byte array to String.
Q324. Why Char array is preferred over String for storing password?
show Answer
Ans. String is immutable in java and stored in String pool. Once its created it stays in the pool until
unless garbage collected, so even though we are done with password its available in memory for
longer duration and there is no way to avoid it. Its a security risk because anyone having access to
memory dump can find the password as clear text.
Q325. Why String is popular HashMap key in Java?show Answer
Ans. Since String is immutable, its hashcode is cached at the time of creation and it doesnt need to
be calculated again. This makes it a great candidate for key in a Map and its processing is fast than
other HashMap key objects. This is why String is mostly used Object as HashMap keys.
Q326. What ate the getter and setter methods ?show Answer
Ans. getters and setters methods are used to store and manipulate the private variables in java
beans. A getters as it has name, suggest retrieves the attribute of the same name. A setters are allows
you to set the values of the attributes.
Q327. public class a {
public static void main(String args[]){
final String s1="job";
final String s2="seeker";
String s3=s1.concat(s2);
String s4="jobseeker";
System.out.println(s3==s4); // Output 1
System.out.println(s3.hashCode()==s4.hashCode()); Output 2
}
}
What will be the Output 1 and Output 2 ?show Answer
Ans. S3 and S4 are pointing to different memory location and hence Output 1 will be false.
Hash code is generated to be used as hash key in some of the collections in Java and is calculated
using string characters and its length. As they both are same string literals, and hence their hashcode
is same.Output 2 will be true.
Q328. What is the use of HashCode in objects ?show Answer
Ans. Hashcode is used for bucketing in Hash implementations like HashMap, HashTable, HashSet
etc.
Q329. Difference between Compositions and Inheritance ?show Answer
Ans. Inheritance means a object inheriting reusable properties of the base class. Compositions means
that an abject holds other objects.
In Inheritance there is only one object in memory ( derived object ) whereas in Composition , parent
object holds references of all composed objects.
From Design perspective - Inheritance is "is a" relationship among objects whereas Composition is
"has a" relationship among objects.
Q330. Will finally be called always if all code has been kept in try block ?show Answer
Ans. The only time finally won't be called is if you call System.exit() or if the JVM crashes first.
Q331. Will the static block be executed in the following code ? Why ?
class Test
{
static
{
System.out.println("Why I am not executing ");
}
public static final int param=20;
}
public class Demo
{
public static void main(String[] args)
{
System.out.println(Test.param);
}
}show Answer
Ans. No the static block won't get executed as the referenced variable in the Test class is final.
Compiler replaces the content of the final variable within Demo.main method and hence actually no
reference to Test class is made.
Q332. Will static block for Test Class execute in the following code ?
class Test
{
static
{
System.out.println("Executing Static Block.");
}
public final int param=20;
public int getParam(){
return param;
}
}
public class Demo
{
{
System.out.println(new Test().param);
}
}show Answer
Ans. Yes.
Q333. What does String intern() method do?show Answer
Ans. intern() method keeps the string in an internal cache that is usually not garbage collected.
Q334. Will the following program display "Buggy Bread" ?
class Test{
static void display(){
System.out.println("Buggy Bread");
}
}
class Demo{
public static void main(String... args){
Test t = null;
t.display();
}
}show Answer
Ans. Yes. static method is not accessed by the instance of class. Either you call it by the class name
or the reference.
Q335. How substring() method of String class create memory leaks?show Answer
Ans. substring method would build a new String object keeping a reference to the whole char array, to
avoid copying it. Hence you can inadvertently keep a reference to a very big character array with just
a one character string.
Q336. Write a program to reverse a string iteratively and recursively ?show Answer
Ans. Using String method new StringBuffer(str).reverse().toString();
Iterative public static String getReverseString(String str){
StringBuffer strBuffer = new StringBuffer(str.length);
for(int counter=str.length -1 ; counter>=0;counter--){
strBuffer.append(str.charAt(counter));
}
return strBuffer;
}
Recursive public static String getReverseString(String str){
if(str.length <= 1){
return str;
}
return (getReverseString(str.subString(1)) + str.charAt(0);
}
Q337. If you have access to a function that returns a random integer from one to five, write
another function which returns a random integer from one to seven.show Answer
Ans. We can do that by pulling binary representation using 3 bits ( random(2) ).
getRandom7() {
String binaryStr = String.valuesOf(random(2))+String.valuesOf(random(2))
+String.valuesOf(random(2));
binaryInt = Integer.valueOf(binaryStr);
int sumValue=0;
int multiple = 1;
while(binaryInt > 0){
binaryDigit = binaryInt%10;
binaryInt = binaryInt /10;
sumValue = sumValue + (binaryDigit * multiple);
multiple = multiple * 2;
}
}
Q338. Write a method to convert binary to a number ?show Answer

Ans. convert(int binaryInt) {
int sumValue=0;
int multiple = 1;
while(binaryInt > 0){
binaryDigit = binaryInt%10;
binaryInt = binaryInt /10;
sumValue = sumValue + (binaryDigit * multiple);
multiple = multiple * 2;
}
return sumValue;
}
Q339. What will the following code print ?
String s1 = "Buggy Bread";
String s2 = "Buggy Bread";
if(s1 == s2)
System.out.println("equal 1");
String n1 = new String("Buggy Bread");
String n2 = new String("Buggy Bread");
if(n1 == n2)
System.out.println("equal 2"); show Answer
Ans. equal 1
Q340. Difference between new operator and Class.forName().newInstance() ?show Answer
Ans. new operator is used to statically create an instance of object. newInstance() is used to create an
object dynamically ( like if the class name needs to be picked from configuration file ). If you know
what class needs to be initialized , new is the optimized way of instantiating Class.
Q341. What is Java bytecode ?show Answer
Ans. Java bytecode is the instruction set of the Java virtual machine. Each bytecode is composed by
one, or two bytes that represent the instruction, along with zero or more bytes for passing
parameters.
Q342. How to find whether a given integer is odd or even without use of modules operator in
java?show Answer
Ans. public static void main(String ar[])
{
int n=5;
if((n/2)*2==n)
{
System.out.println("Even Number ");
}
else
{
System.out.println("Odd Number ");
}
}
}
Q343. Is JVM a overhead ? show Answer

Ans. Yes and No. JVM is an extra layer that translates Byte Code into Machine Code. So Comparing
to languages like C, Java provides an additional layer of translating the Source Code.
C++ Compiler - Source Code --> Machine Code
Java Compiler - Source Code --> Byte Code , JVM - Byte Code --> Machine Code
Though it looks like an overhead but this additional translation allows Java to run Apps on all
platforms as JVM provides the translation to the Machine code as per the underlying Operating
System.
Q344. Can we use Ordered Set for performing Binary Search ?show Answer
Ans. We need to access values on the basis of an index in Binary search which is not possible with
Sets.
Q345. What is Byte Code ? Why Java's intermediary Code is called Byte Code ?show Answer
Ans. Bytecode is a highly optimized set of instructions designed to be executed by the Java run-time
system. Its called Byte Code because each instruction is of 1-2 bytes.
Sample instructions in Byte Code 1: istore_1
2: iload_1
3: sipush 1000
6: if_icmpge 44
9: iconst_2
10: istore_2
Q346. Difference between ArrayList and LinkedList ?show Answer
Ans. LinkedList and ArrayList are two different implementations of the List interface. LinkedList
implements it with a doubly-linked list. ArrayList implements it with a dynamically resizing array.
Q347. If you are given a choice to use either ArrayList and LinkedList, Which one would you
use and Why ?show Answer
Ans. ArrayList are implemented in memory as arrays and hence allows fast retrieval through indices
but are costly if new elements are to be inserted in between other elements.
LinkedList allows for constant-time insertions or removals using iterators, but only sequential access
of elements
1. Retrieval - If Elements are to be retrieved sequentially only, Linked List is preferred.
2. Insertion - If new Elements are to be inserted in between other elements , Array List is preferred.
3. Search - Binary Search and other optimized way of searching is not possible on Linked List.
4. Sorting - Initial sorting could be pain but lateral addition of elements in a sorted list is good with
linked list.
5. Adding Elements - If sufficiently large elements needs to be added very frequently ,Linked List is
preferable as elements don't need consecutive memory location.
Q348. What are the pre-requisite for the collection to perform Binary Search ?show Answer
Ans. 1. Collection should have an index for random access.
2. Collection should have ordered elements.
Q349. Can you provide some implementation of a Dictionary having large number of
words ? show Answer
Ans. Simplest implementation we can have is a List wherein we can place ordered words and hence
can perform Binary Search.
Other implementation with better search performance is to use HashMap with key as first character of
the word and value as a LinkedList.
Further level up, we can have linked Hashmaps like ,
hashmap {
a ( key ) -> hashmap (key-aa , value (hashmap(key-aaa,value)
b ( key ) -> hashmap (key-ba , value (hashmap(key-baa,value)
....................................................................................
z( key ) -> hashmap (key-za , value (hashmap(key-zaa,value)
}
upto n levels ( where n is the average size of the word in dictionary.
Q350. Difference between PATH and CLASSPATH ?show Answer

Ans. PATH is the variable that holds the directories for the OS to look for executables. CLASSPATH is
the variable that holds the directories for JVM to look for .class files ( Byte Code ).
Q351. Name few access and non access Class Modifiers ?show Answer
Ans. private , public and protected are access modifiers.
final and abstract are non access modifiers.
Q352. Which Java collection class can be used to maintain the entries in the order in which
they were last accessed?show Answer
Ans. LinkedHashMap
Q353. Is it legal to initialize List like this ?
LinkedList l=new LinkedList(); show Answer

Ans. No, Generic parameters cannot be primitives.
Q354. Which of the following syntax is correct ?
import static java.lang.System.*;
or
static import java.lang.System.*;show Answer
Ans. import static java.lang.System.*;
Q355. What will be the output of following code ?
{
int x = 10;
int y;
if (x < 100) y = x / 0;
if (x >= 100) y = x * 0;
System.out.println("The value of y is: " + y);
}show Answer
Ans. The code will not compile raising an error that the local variable y might not have been initialized.
Unlike member variables, local variables are not automatically initialized to the default values for their
declared type.
Q356. What will be the output of following Code ?
class BuggyBread {
{
String s2 = "I am unique!";
String s5 = "I am unique!";
System.out.println(s2 == s5);
}
}show Answer
Ans. true, due to String Pool, both will point to a same String object.
class BuggyBread2 {
private static int counter = 0;
void BuggyBread2() {
counter = 5;
}
BuggyBread2(int x){
counter = x;
}
public static void main(String[] args) {
BuggyBread2 bg = new BuggyBread2();
System.out.println(counter);
}
}show Answer
Ans. Compile time error as it won't find the constructor matching BuggyBread2(). Compiler won't
provide default no argument constructor as programmer has already defined one constructor.
Compiler will treat user defined BuggyBread2() as a method, as return type ( void ) has been specified
for that.
class BuggyBread1 {
public String method() {
return "Base Class - BuggyBread1";
}
}
class BuggyBread2 extends BuggyBread1{
private static int counter = 0;
public String method(int x) {
return "Derived Class - BuggyBread2";
}
BuggyBread1 bg = new BuggyBread2();
System.out.println(bg.method());
}
}show Answer
Ans. Base Class - BuggyBread1
Though Base Class handler is having the object of Derived Class but its not overriding as now with a
definition having an argument ,derived class will have both method () and method (int) and hence its
overloading.
Q359. What are RESTful Web Services ?show Answer
Ans. REST or Representational State Transfer is a flexible architecture style for creating web services
that recommends the following guidelines -
1. http for client server communication,

2. XML / JSON as formatiing language ,
3. Simple URI as address for the services and,
4. stateless communication.
Q360. Which markup languages can be used in restful web services ? show Answer
Ans. XML and JSON ( Javascript Object Notation ).
Q361. Difference between Inner and Outer Join ?show Answer
Ans. Inner join is the intersection of two tables on a particular columns whereas Outer Join is the
Union of two tables.
Q362. What is a Cursor ?show Answer

Ans. It's a facility that allows traversal over the records pulled from a table or combination of tables. Its
like iterator in Java.
Q363. What is database deadlock ? How can we avoid them?show Answer
Ans. When multiple external resources are trying to access the DB locks and runs into cyclic wait, it
may makes the DB unresponsive.
Deadlock can be avoided using variety of measures, Few listed below Can make a queue wherein we can verify and order the request to DB.
Less use of cursors as they lock the tables for long time.
Keeping the transaction smaller.
Q364. What are temp tables ?show Answer

Ans. These are the tables that are created temporarily and are deleted once the Stored Procedure is
complete.
For example - we may like to pull some info from a table and then do some operations on that data
and then store the output in final output table. We can store the intermediary values in a temp table
and once we have final output with us, we can just delete it.
Q365. Why Web services use HTTP as the communication protocol ?show Answer
Ans. With the advent of Internet, HTTP is the most preferred way of communication. Most of the
clients ( web thin client , web thick clients , mobile apps ) are designed to communicate using http
only. Web Services using http makes them accessible from vast variety of client applications.
Q366. what will be the output of this code ?
{
StringBuffer s1=new StringBuffer("Buggy");
test(s1);
System.out.println(s1);
}
private static void test(StringBuffer s){
s.append("Bread");
}show Answer
Ans. BuggyBread
{
String s1=new String("Buggy");
test(s1);
}
s.append("Bread");
}show Answer
Ans. Buggy
{
StringBuffer s1=new StringBuffer("Buggy");
test(s1);
}

s=new StringBuffer("Bread");
}show Answer
Ans. Buggy
Q369. what will be the output ?
class Animal {
public void eat() throws Exception {
}
}
class Dog2 extends Animal {
public void eat(){}
public static void main(){
Animal an = new Dog2();
an.eat();
}
}show Answer
Ans. Compile Time Error: Unhandled exception type Exception
Q370. What are advantages of using Servlets over CGI ?show Answer
Ans. Better Performance as Servlets doesn't require a separate process for a single request.
Servlets are platform independent as they are written in Java.
Q371. gfddddddddddddddddddddddddshow Answer
Ans. gfhhhhhhhhhhhhhhhhhhhhf
Q372. Does SQL allow null values ? Can we use it within Where clause ?show Answer
Ans. Yes , we can have null values for columns in SQL. Null value represent that the columns value is
unknown or haven't been filled. Yes, We can use it within where clause to get the rows with null
values.
Q373. Can we add duplicate keys in a HashMap ? What will happen if we attempt to add
duplicate values ?show Answer
Ans. No, We cannot have duplicate keys in HashMap. If we attempt to do so , the previous value for
the key is overwritten.
Q374. What is the use of HTTPSession in relation to http protocol ?show Answer
Ans. http protocol on its own is stateless. So it helps in identifying the relationship between multiple
stateless request as they come from a single source.
Q375. Why using cookie to store session info is a better idea than just using session info in
the request ?show Answer
Ans. Session info in the request can be intercepted and hence a vulnerability. Cookie can be read and
write by respective domain only and make sure that right session information is being passed by the
client.
Q376. What are different types of cookies ?show Answer
Ans. Session cookies , which are deleted once the session is over.
Permanent cookies , which stays at client PC even if the session is disconnected.
Q377. http protocol is by default ... ?show Answer
Ans. stateless
Q378. Can finally block throw an exception ?show Answer
Ans. Yes.
Q379. Can we have try and catch blocks within finally ?show Answer
Ans. Yes
Q380. Which of the following is a canonical path ?
1. C:\directory\..\directory\file.txt
2. C:\directory\subDirectory1\directory\file.txt
3. \directory\file.txtshow Answer
Ans. 2nd
Q381. What will the following code print when executed on Windows ?
public static void main(String[] args){
String parent = null;
File file = new File("/file.txt");
System.out.println(file.getPath());
System.out.println(file.getAbsolutePath());
try {
System.out.println(file.getCanonicalPath());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}show Answer
Ans. \file.txt
C:\file.txt
C:\file.txt

String name = null;
File file = new File("/folder", name);
System.out.print(file.exists());
}show Answer
Ans. NullPointerException at line:
"File file = new File("/folder", name);"
String parent = null;
File file = new File(parent, "myfile.txt");
try {
file.createNewFile();
}
}show Answer
Ans. It will create the file myfile.txt in the current directory.
String child = null;
File file = new File("/folder", child);
try {
file.createNewFile();
}
}show Answer
Ans. NullPointerException at line:
File file = new File("/folder", child);
Q385. What will be the output of following code, assuming that currently we are in c:\Project ?
String child = null;
File file = new File("../file.txt");
System.out.println(file.getPath());
System.out.println(file.getAbsolutePath());
try {
System.out.println(file.getCanonicalPath());
}
}show Answer
Ans. ..\file.txt
C:\Workspace\Project\..\file.txt
C:\Workspace\file.txt
Q386. Which is the abstract parent class of FileWriter ?show Answer
Ans. OutputStreamWriter
Q387. Which class is used to read streams of characters from a file?show Answer
Ans. FileReader
Q388. Which class is used to read streams of raw bytes from a file?show Answer
Ans. FileInputStream
Q389. Which is the Parent class of FileInputStream ?show Answer
Ans. InputStream
Q390. Which of the following code is correct ?
a.
FileWriter fileWriter = new FileWriter("../file.txt");

File file = new File(fileWriter );
BufferedWriter bufferedOutputWriter = new BufferedWriter(fileWriter);
b.
BufferedWriter bufferedOutputWriter = new BufferedWriter("../file.txt");
File file = new File(bufferedOutputWriter );
FileWriter fileWriter = new FileWriter(file);
c.
d.

BufferedWriter bufferedOutputWriter = new BufferedWriter(file);
FileWriter fileWriter = new FileWriter(bufferedOutputWriter );show Answer
Ans. c.
Q391. Which exception should be handled in the following code ?
FileWriter fileWriter = new FileWriter(file);show Answer
Ans. IOException
Q392. Which exceptions should be handled with the following code ?
FileOutputStream fileOutputStream = new FileOutputStream(new File("newFile.txt"));show
Answer
Ans. FileNotFoundException
Q393. Will this code compile fine ?
ObjectOutputStream objectOutputStream = new ObjectOutputStream(new
FileOutputStream(new File("newFile.txt")));show Answer
Ans. Yes.
Q394. What is the problem with this code ?
class BuggyBread1 {
private BuggyBread2 buggybread2;
try {
BuggyBread1 buggybread1 = new BuggyBread1();
FileOutputStream(new File("newFile.txt")));
objectOutputStream.writeObject(buggybread1);
} catch (Exception e) {
}
}
}show Answer
Ans. Though we are trying to serialize BuggyBread1 object but we haven't declared the class to
implement Serializable.
This will throw java.io.NotSerializableException upon execution.
Q395. Will this code run fine if BuggyBread2 doesn't implement Serializable interface ?
class BuggyBread1 implements Serializable{
private BuggyBread2 buggybread2 = new BuggyBread2();
try {
}
}
}show Answer
Ans. No, It will throw java.io.NotSerializableException.
Q396. Will this code work fine if BuggyBread2 doesn't implement Serializable ?
class BuggyBread1 extends BuggyBread2 implements Serializable{
private int x = 5;
try {
}
}
}show Answer
Ans. Yes.
Q397. Can we compose the Parent Class object like this ?
class BuggyBread1 extends BuggyBread2 {

private BuggyBread2 buggybread2;
buggybread2 = new BuggyBread2();
}
}show Answer
Ans. Yes.
Q398. Will this code Work ? If not , Why ?
java.util.Calendar c = new java.util.Calendar();show Answer
Ans. No. It gives the error "Cannot Instantiate the type Calendar". Calendar is an abstract class and
hence Calendar object should be instantiated using Calendar.getInstance().
Q399. Is java.util.Date an abstract Class ? Is java.util.Calendar an abstract Class ?show Answer
Ans. Date is not a abstract class whereas Calendar is.
java.util.Calendar c = java.util.Calendar.getInstance();
c.add(Calendar.MONTH, 5);
System.out.println(c.getTime());show Answer
Ans. Date and Time after 5 months from now.
a. DateFormat df = DateFormat.getInstance();
b. DateFormat df = DateFormat.getDateInstance();
c. DateFormat df = DateFormat.getInstance(DateFormat.FULL);
d. DateFormat df = DateFormat.getDateInstance(DateFormat.FULL);show Answer
Ans. All except c are correct.
Q402. What is the use of parse method in DateFormat ?show Answer
Ans. It is used to parse String to get the Date Object with initialized date.
Q403. Which of the following is not a valid java.util.Locale initialization ?
a. new Locale ()
b. new Locale ( String language )
c. new Locale ( String language , String country )show Answer
Ans. a i.e new Locale()
Q404. Which of the following is not a valid NumberFormat initialization ?

a. NumberFormat.getInstance()
b. NumberFormat.getDateInstance()
c. NumberFormat.getCurrencyInstance()
d. NumberFormat.getNumberInstance()show Answer
Ans. b i.e NumberFormat.getDateInstance()
Integer i1 = new Integer("1");
Integer i2 = new Integer("2");
Integer i3 = Integer.valueOf("3");
int i4 = i1 + i2 + i3;
System.out.println(i4);
}show Answer
Ans. 6
Q406. Which of the following syntax are correct ?
a. LinkedList l=new LinkedList();
b. List l=new LinkedList();
c. LinkedList l=new LinkedList();
d. List l = new LinkedList();show Answer
Ans. c and d are correct.
a. Date date = DateFormat.newInstance(DateFormat.LONG, Locale.US).parse(str);
b. Date date = DateFormat.newInstance(DateFormat.LONG, Locale.US).format(str);
c. Date date = DateFormat.getDateInstance(DateFormat.LONG, Locale.US).parse(str);show
Answer
Ans. c
Q408. What's wrong with this code ?
String regex = "(\\w+)*";
String s = "Java is a programming language.";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.next()) {
System.out.println("The e-mail id is: " + matcher.group());
}
}show Answer
Ans. matcher.find() should have been used instead of matcher.next() within while.
Q409. Which methods of the Pattern class have equivalent methods in the String class? show
Answer
Ans. split() and macthes()
Q410. Can we compare Integers by using equals() in Java ?show Answer
Ans. Yes for the Wrapper class Integer but not for the primitive int.
Q411. What is comparator interface used for ?show Answer
Ans. The purpose of comparator interface is to compare objects of the same class to identify the
sorting order. Sorted Collection Classes ( TreeSet, TreeMap ) have been designed such to look for
this method to identify the sorting order, that is why class need to implement Comparator interface to
qualify its objects to be part of Sorted Collections.
Q412. Which are the sorted collections ?show Answer
Ans. TreeSet and TreeMap
Q413. What is rule regarding overriding equals and hasCode method ?show Answer
Ans. A Class must override the hashCode method if its overriding the equals method.
Q414. What is the difference between Collection and Collections ?show Answer
Ans. Collection is an interface whereas Collections is a utility class.
Q415. Is Java a statically typed or dynamically typed language ?show Answer
Ans. Statically typed
Q416. What do you mean by "Java is a statically typed language" ?show Answer
Ans. It means that the type of variables are checked at compile time in Java.The main advantage here
is that all kinds of checking can be done by the compiler and hence will reduce bugs.
Q417. How can we reverse the order in the TreeMap ?show Answer
Ans. Using Collections.reverseOrder()
Map tree = new TreeMap(Collections.reverseOrder());
Q418. TreeMap orders the elements on which field ?show Answer

Ans. Keys
Q419. How TreeMap orders the elements if the Key is a String ?show Answer
Ans. As String implements Comparable, It refers to the String compareTo method to identify the order
relationship among those elements.
Q420. Can we add heterogeneous elements into TreeMap ?show Answer
Ans. No, Sorted collections don't allow addition of heterogeneous elements as they are not
comparable.
Q421. Will it create any problem if We add elements with key as user defined object into the
TreeMap ?show Answer
Ans. It won't create any problem if the objects are comparable i.e we have that class implementing
Comparable interface.
Q422. Can we null keys in TreeMap ?show Answer
Ans. No, results in exception.
Q423. Can value be null in TreeMap ?show Answer
Ans. Yes.
Q424. Which interface TreeMap implements ?show Answer
Ans. TreeMap implements NavigableMap, SortedMap, Serializable and Clonable.
Q425. Do we have form beans in Struts 2 ?show Answer
Ans. No, because they are not longer required. As action classes are no more singleton in Struts 2,
user inputs can be captured in action itself.
Q426. What is a ConcurrentHashMap ?show Answer
Ans. ConcurrentHashMap is a hashMap that allows concurrent modifications from multiple threads as
there can be multiple locks on the same hashmap.
Q427. What is the use of double checked locking in createInstance() of Singleton class?
Double checked locking code:
public static Singleton createInstance() {
if(singleton == null){
synchronized(Singleton.class) {
if(singleton == null) {
singleton = new Singleton();
}
}
}
return singleton;
}
Single checked locking code:

public static Singleton createInstance() {
synchronized(Singleton.class) {
if(singleton == null) {
singleton = new Singleton();
}
}
return singleton;
}
What advantage does the first code offer compared to the second ?show Answer
Ans. In First Case , Lock for the synchronized block will be received only if
singleton == null whereas in second case every thread will acquire the lock before executing the code.
The problem of synchronization with singleton will only happen when the object has not be
instantiated. Once instantiated , the check singleton == null will always generate true and the same
object will be returned and hence no problem. First condition will make sure that synchronized access
( acquiring locks ) will only take place if the object has not been created so far.
Q428. Why are Getter and Setter better than directly changing and retrieving fields ?show
Answer
Ans. 1. Methods can participate in runtime polymorphism whereas member variables cannot.
2. Validations can be performed before setting the variables.
3. If the input format changes , that can be absorbed by making change ( wrapping ) in the setter and
getter.
Q429. Can we overload main method in Java ?show Answer
Ans. Yes, but the overloaded main methods without single String[] argument doesn't get any special
status by the JVM. They are just another methods that needs to be called explicitly.
Q430. What the Bean scopes provided by Spring ?show Answer
Ans. Singleton , Prototype , Request , Session , global-session
Q431. What are the various Auto Wiring types in Spring ?show Answer
Ans. By Name , By Type and Constructor.
Q432. Difference between first level and second level cache in hibernate ?show Answer
Ans. 1. First level cache is enabled by default whereas Second level cache needs to be enabled
explicitly.
2. First level Cache came with Hibernate 1.0 whereas Second level cache came with Hibernate 3.0.
3. First level Cache is Session specific whereas Second level cache is shared by sessions that is why
First level cache is considered local and second level cache is considered global.
Q433. What are the the methods to clear cache in Hibernate ?show Answer
Ans. Evict() and clear(). Evist is used to clear a particular object from the cache whereas clear clears
the complete local cache.
Q434. What are different types of second level cache ?show Answer
Ans. 1. EHCache ( Easy Hibernate )
2. OSCache ( Open Symphony )
3. Swarm Cache ( JBoss )
4. Tree Cache ( JBoss )
Q435. Can we disable first level cache ? What should one do if we don't want an object to be
cached ?show Answer
Ans. No.We can either call evict after the object retrieval or can use separate sessions.
Q436. How to configure second level cache in Hibernate ?show Answer
Ans. 1. Configure Provider class in Hibernate configuration file.
2. Add Cache usage tag ( read-only or read-write ) in mapping files ( hbm ).
3. Create an XML file called ehcache.xml and place in classpath which contains time settings and
update settings, behavior of cache , lifetime and idletime of Pojos, how many objects are allowed.
Q437. What is Hibernate ?show Answer
Ans. Hibernate is a Java ORM Framework.
Q438. What are the advantages of Hibernate ?show Answer
Ans. 1. No need to know SQL, RDBMS, and DB Schema.
2. Underlying Database can be changed without much effort by changing SQL dialect and DB
connection.
3.Improved Performance by means of Caching.
Q439. What are the different types of inheritance in Hibernate ?show Answer
Ans. Table Per Class , Table per Sub Class , Table per Concrete Class
Q440. What is the purpose of dialect configured in Hibernate configuration file ?show Answer
Ans. It tells the framework which SQL varient to generate.
Q441. Please specify in what sequence the objects of following classes will be created ?
Session , SessionFactory, Query , Configurationshow Answer
Ans. Configuration -> SessionFactory -> Session -> Query

Q442. What are different types of associations in Hibernate ?show Answer
Ans. There are 4 types of associations in Hibernate
One to One
One to Many
Many to One
Many to Many
Q443. What are the configuration files in Hibernate ?show Answer

Ans. hibernate.cfg.xml ( Main Configuration File )
and *.hbm.xml files ( Mapping Files )
Q444. What are the contents of Hibernate configuration file ( hibernate.cfg.xml ) ?show Answer
Ans. HBM Files ( Mapping )
DB Connection ( DB Connection String , User Name , Password , Pool Size )
SQL Dialect ( SQL variant to be generated )
Show SQL ( Show / No show SQL on Console )
Auto Commit ( True / False )
Q445. What are the Core Interfaces of Hibernate Framework ? show Answer
Ans. Configuration
SessionFactory
Session
Transaction
Query and Citeria
Q446. What are collection types in Hibernate ?show Answer

Ans. Bag, Set , List , Array, Map
Q447. Difference between load and get ?show Answer
Ans. If id doesn't exist in the DB load throws an exception whereas get returns null in that case.
get makes the call to DB immediately whereas load makes the call to proxy.
Q448. What is lazy fetching in Hibernate ?show Answer
Ans. Lazy fetching is the technique of not loading the child objects when parent objects are loaded. By
default Hibernate does not load child objects. One can specify whether to load them or not while doing
the association.
Q449. Different types of Hibernate Instance States ?show Answer

Ans. Transient - In this state, an instance is not associated with any persistence context
Persistent - In this state, an instance is associated with a persistence context
Detached - This is a state for an instance which was previously associated with a persistence context
an has been currently closed dissociated
Q450. Is It Good to use Reflection in an application ? Why ?show Answer
Ans. no, It's like challenging the design of application.
Q451. Why is Reflection slower ?show Answer
Ans. Because it has to inspect the metadata in the bytecode instead of just using precompiled
addresses and constants.
Q452. When should we use prototype scope and singleton scope for beans ?show Answer
Ans. We should use singleton scope for beans when they are stateless and prototype when they are
stateful.
Q453. Difference between Assert and Verify ?show Answer
Ans. Assert works only if assertions ( -ea ) are enabled which is not required for Verify.
Assert throws an exception and hence doesn't continue with the test if assert evaluates to false
whereas it's not so with Verify.
Q454. What is the difference between ArrayList and LinkedList ?show Answer
Ans. Underlying data structure for ArrayList is Array whereas LinkedList is the linked list and hence
have following differences 1. ArrayList needs continuous memory locations and hence need to be moved to a bigger space if
new elements are to be added to a filled array which is not required for LinkedList.
2. Removal and Insertion at specific place in ArrayList requires moving all elements and hence leads
to O(n) insertions and removal whereas its constant O(1) for LinkedList.
3. Random access using index in ArrayList is faster than LinkedList which requires traversing the
complete list through references.
4. Though Linear Search takes Similar Time for both, Binary Search using LinkedList requires creating
new Model called Binary Search Tree which is slower but offers constant time insertion and deletion.
5. For a set of integers you want to sort using quicksort, it's probably faster to use an array; for a set
of large structures you want to sort using selection sort, a linked list will be faster.
Q455. Which class elements are not persisted ?show Answer
Ans. Static and Transient.
Q456. Which annotations are used in Hibernate ?show Answer

Ans. @Entity
@Table
@Id
@Column
@Temporal
@Basic
@Enumerated
@Access
@Embeddable
@Lob
@AttributeOverride
@Embedded
@GeneratedValue
@ElementCollection
@JoinTable
@JoinColumn
@CollectionId
@GenericGenerator
@OneToOne
@OneToMany
@ManyToOne
@ManyToMany
@NotFound
Q457. What entries we make in the hibernate config file if we are not using hbm files but
Annotations ?show Answer
Ans. We configure Entity classes having annotated mappings.
Q458. How many SessionFactory and Session objects are created ?show Answer
Ans. Single SessionFactory object and multiple session objects for opening different session.
Hibernate creates new Session object per thread.
Q459. What is the way to rollback transaction if something goes wrong using hibernate
API ? show Answer
Ans. We can have the code calling Hibernate API within try block and can have transaction.rollback
within Catch.
Q460. What is the use of hbm2ddl Configuration in Hibernate ?show Answer
Ans. This configuration specifies if hibernate should creates the Schema / Table on its own if the
respective table is not found.
"update" doesn't create the table if it's not found whereas configuration set as "create" creates the
schema automatically.
Q461. What is the difference between these 2 annotations ?

@Entity
@Entity ( name="EMPLOYEES" )show Answer
Ans. The first annotation will try to map the Class with the Table as of same name as Class whereas
the second annotation will specify the Entity name as "EMPLOYEES" and hence will try to map with
Table Name "EMPLOYEES".
Q462. "What is the difference between these 2 annotations ?
@Entity ( name ="EMPLOYEES")
@Entity
@Table ( name=""EMPLOYEES"" )
@Entity ( name="EMP")
@Table ( name="EMPLPYEES" )
show Answer
Ans. First Annotation will set the Entity name as EMPLOYEES and hence will try to map with the
same Table name.
The second annotation will make the Entity mapped to table EMPLOYEES irrespective of the Entity
Name ( which is class name in this case ).
Third Annotations will set the different names for Enitity and Table and will explicitly map them.
Q463. What are the different ID generating strategies using @GeneratedValue annotation ?
show Answer
Ans. Auto , Identity , Sequence and Table.
Q464. How to do Eager loading in Hibernate ?show Answer
Ans. Using
lazy = false in hibernate config file
or
@Basic(fetch=FetchType.EAGER) at the mapping
Q465. What is Lazy Initialization in Hibernate ?show Answer
Q466. What are the ways to avoid LazyInitializationException ?show Answer
Ans. 1. Set lazy=false in the hibernate config file.
2. Set @Basic(fetch=FetchType.EAGER) at the mapping.
3. Make sure that we are accessing the dependent objects before closing the session.
4. Using Fetch Join in HQL.
Q467. What is cascade ?show Answer
Ans. Instead of Saving Parent as well as Child Entities individually , Hibernate provides the option to
persist / delete the related entities when the Parent is persisted.
Q468. What are the different Cascade types ?show Answer
Ans. Detach, Merge , Persist , Remove , Refresh
Q469. Which type of associated Entities are Eagerly loaded by Default ?show Answer
Ans. OneToOne
Q470. After which Hibernate version , related Entities are initialized lazily ?show Answer
Ans. After Hibernate 3.0
Q471. Can we declare Entity class as final ?show Answer
Ans. Yes but as Hibernate creates the Proxy Classes inherited from the Entity Classes to
communicate with Database for lazy initialization. Declaring entity classes as final will prohibit
communication with database lazily and hence will be a performance hit.
Q472. What are the restrictions for the entity classes ?show Answer
Ans. 1. Entity classes should have default constructor.
2. Entity classes should be declared non final.
3. All elements to be persisted should be declared private and should have public getters and setters
in the Java Bean style.
4. All classes should have an ID that maps to Primary Key for the table.
Q473. What is the difference between int[] x; and int x[]; ?show Answer
Ans. No Difference. Both are the acceptable ways to declare an array.
Q474. What are the annotations used in Junit with Junit4 ?show Answer
Ans. @Test
The Test annotation indicates that the public void method to which it is attached can be run as a test
case.
@Before
The Before annotation indicates that this method must be executed before each test in the class, so
as to execute some preconditions necessary for the test.
@BeforeClass
The BeforeClass annotation indicates that the static method to which is attached must be executed
once and before all tests in the class.
@After
The After annotation indicates that this method gets executed after execution of each test.
@AfterClass
The AfterClass annotation can be used when a method needs to be executed after executing all the
tests in a JUnit Test Case class so as to clean-up the set-up.
@Ignores
The Ignore annotation can be used when you want temporarily disable the execution of a specific
test.
Q475. What is asynchronous I/O ?show Answer
Ans. It is a form of Input Output processing that permits other processing to continue before the I/O
transmission has finished.
Q476. If there is a conflict between Base Class Method definition and Interface Default method
definition, Which definition is Picked ?show Answer
Ans. Base Class Definition.
Q477. What are new features introduced with Java 8 ?show Answer
Ans. Lambda Expressions , Interface Default and Static Methods , Method Reference , Parameters
Name , Optional , Streams, Concurrency.
Q478. Can we have a default method without a Body ?show Answer
Ans. No. Compiler will give error.
Q479. Does java allow implementation of multiple interfaces having Default methods with
Same name and Signature ?show Answer
Ans. No. Compilation error.
Q480. What are Default Methods ?show Answer
Ans. With Java 8, We can provide method definitions in the Interfaces that gets carried down the
classes implementing that interface in case they are not overridden by the Class. Keyword "default" is
used to mark the default method.
Q481. Can we have a default method definition in the interface without specifying the keyword
"default" ?show Answer
Ans. No. Compiler complains that its an abstract method and hence shouldn't have the body.
Q482. Can a class implement two Interfaces having default method with same name and
signature ?
public interface DefaultMethodInterface {
default public void defaultMethod(){
System.out.println("DefaultMethodInterface");
}
}
public interface DefaultMethodInterface2 {
System.out.println("DefaultMethodInterface2");
}
}
public class HelloJava8 implements DefaultMethodInterface,DefaultMethodInterface2 {
DefaultMethodInterface defMethIn = new HelloJava8();
defMethIn.defaultMethod();
}
}show Answer
Ans. No. Compiler gives error saying "Duplicate Default Methods"
Q483. What If we make the method as abstract in another Interface ?

}
}
public void defaultMethod(){
}
}
}
}show Answer
Ans. Even then the Compiler will give error saying that there is a conflict.
Q484. What if we override the conflicting method in the Class ?
}
}
}
}
}
public void defaultMethod(){
System.out.println("HelloJava8");
}
}show Answer
Ans. There won't be any error and upon execution the overriding class method will be executed.
Q485. What will happen if there is a default method conflict as mentioned above and we have
specified the same signature method in the base class instead of overriding in the existing
class ?
show Answer
Ans. There won't be any problem as the Base class method will have precedence over the Interface
Default methods.
Q486. If a method definition has been specified in Class , its Base Class , and the interface
which the class is implementing, Which definition will be picked if we try to access it using
Interface Reference and Class object ? show Answer
Ans. Class method definition is overriding both the definitions and hence will be picked.
Q487. If a method definition has been specified in the Base Class and the interface which the
class is implementing, Which definition will be picked if we try to access it using Interface
Reference and Class object ? show Answer
Ans. Base Class Definition will have precedence over the Interface Default method definition.
Q488. Can we use static method definitions in Interfaces ?show Answer

Ans. Yes, Effective Java 8.
Q489. Can we access Interface static method using Interface references ?show Answer
Ans. No, only using Interface Name.
Q490. Can we have default method with same name and signature in the derived Interface as
the static method in base Interface and vice versa ?show Answer
Ans. Yes , we can do that as static methods are not accessible using references and hence cannot
lead to conflict. We cannot do inverse as Default methods cannot be overridden with the static
methods in derived interface.
Q491. What is a Lambda Expression ? What's its use ?show Answer
Ans. Its an anonymous method without any declaration. Lambda Expression are useful to write
shorthand Code and hence saves the effort of writing lengthy Code. It promotes Developer
productivity, Better Readable and Reliable code.
Q492. Difference between Predicate, Supplier and Consumer ? show Answer
Ans. Predicate represents an anonymous function that accepts one argument and produces a result.
Supplier represents an anonymous function that accepts no argument and produces a result.
Consumer represents an anonymous function that accepts an argument and produces no result.
Q493. What does the following lambda expression means ?
helloJava8 ( x-> x%2 )show Answer
Ans. helloJava8 receives an Integer as argument and then returns the modulus of that Integer.
Q494. What is the difference between namenode and datanode in Hadoop? show Answer
Ans. NameNode stores MetaData (No of Blocks, On Which Rack which DataNode is stored etc)
whereas the DataNode stores the actual Data.
Q495. Write a program to see if the number is prefect number or not ?show Answer
Ans. http://www.c4learn.com/c-programs/program-to-check-whether-number-is.html
Q496. Difference between a Pointer and a Reference ?show Answer
Ans. We can't get the address of a reference like a pointer. Moreover we cannot perform pointer
arithmetic with references.
Q497. Difference between TCP and UDP ?show Answer
Ans. http://www.cyberciti.biz/faq/key-differences-between-tcp-and-udp-protocols/
Q498. What things you would care about to improve the performance of Application if its
identified that its DB communication that needs to be improved ?show Answer
Ans. 1. Query Optimization ( Query Rewriting , Prepared Statements )
2. Restructuring Indexes.
3. DB Caching Tuning ( if using ORM )
4. Identifying the problems ( if any ) with the ORM Strategy ( If using ORM )
Q499. Explain Singleton Design Pattern ?show Answer

Ans. http://www.buggybread.com/2014/03/java-design-pattern-singleton-interview.html

Hadoop and Java Ques - Ans

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Hadoop and Java Ques - Ans

Încărcat de

Drepturi de autor:

Formate disponibile

1. What is Big Data?

Here is an interesting and explanatory visual on What is Big Data?

2. What do the four Vs of Big Data denote?

Here is an explanatory video on the four Vs of Big Data

4. Name some companies that use Hadoop.

6. On what concept the Hadoop framework works?

2)Hadoop MapReduce-This is a java based programming paradigm of Hadoop framework

7) What are the main components of a Hadoop Application?

Core components of a Hadoop application are-

Data Access Components are - Pig and Hive

Data Storage Component is - HBase

Data Integration Components are - Apache Flume, Sqoop, Chukwa

Data Serialization Components are - Thrift and Avro

Data Intelligence Components are - Apache Mahout and Drill.

8. What is Hadoop streaming?

9. What is the best hardware configuration to run Hadoop?

Hadoop Interview Questions and Answers for Freshers - Q.Nos1,2,4,5,6,7,8,9

Hadoop Interview Questions and Answers for Experienced - Q.Nos-3,8,9,10

2. Explain the difference between NameNode, Backup Node and

3. What is commodity hardware?

Job Tracker 50030

Task Tracker 50060

5. Explain about the process of inter cluster data copying.

6. How can you overwrite the replication factors in HDFS?

7. Explain the difference between NAS and HDFS.

NAS runs on a single machine and thus there is no probability of data

In NAS data is stored independent of the computation and hence Hadoop

9. What is the process to change the files at arbitrary locations in

10. Explain about the indexing process in HDFS.

11. What is a rack awareness and on what basis is data stored in a

Hadoop Interview Questions and Answers for Freshers - Q.Nos2,3,7,9,10,11

Hadoop Interview Questions and Answers for Experienced - Q.Nos- 1,2,

2. What are the core methods of a Reducer?

Function Definition- public void setup (context)

Function Definition -public void reduce (Key,Value,context)

Function Definition -public void cleanup (context)

3. Explain about the partitioning, shuffle and sort phase

A new class must be created that extends the pre-defined Partitioner

getPartition method of the Partitioner class must be overridden.

6. Is it important for Hadoop MapReduce jobs to be written in Java?

7. What is the process of changing the split size if there is limited

8. What are the primary phases of a Reducer?

10. Can reducers communicate with each other?

Hadoop Interview Questions and Answers for Freshers - Q.Nos- 2,5,6

Hadoop Interview Questions and Answers for Experienced - Q.Nos1,3,4,7,8,9,10

1)A variable schema

2)When data is stored in the form of collections

Key components of HBase are

Region- This component contains memory data store and Hfile.

Region Server-This monitors the Region.

HBase Master-It is responsible for monitoring the region server.

2. What are the different operational commands in HBase at record

3. What is Row Key?

RDBMS stores normalized data whereas HBase stores de-normalized data.

5. Explain about the different catalog tables in HBase?

7. Explain the difference between HBase and Hive.

8. Explain the process of row deletion in HBase.

9. What are the different types of tombstone markers in HBase for

There are 3 different types of tombstone markers in HBase for deletion-

2)Version Delete Marker-This marker marks a single version of a column.

3)Column Delete Marker-This markers marks all the versions of a column.

10. Explain about HLog and WAL in HBase.