Sunteți pe pagina 1din 5

Subbu BIG DATA/HADOOPDEVELOPER

669-235-4464
bigdata604@gmail.com
Professional Summary:
 Experience 6 years of strong experience in IT industry with Complete Software Development Life cycle
(SDLC) which includes Business Requirements Gathering, System Analysis & Design, Data Modeling,
Development, Testing and Implementation of the projects.
 4+years of experience in developing, implementing and configuring of BigDatausing Hadoopecosystemtools
like HDFS, MapReduce, Hive, Oozie, Sqoop, Zookeeper, RabbitMQ, Kafka, Knox, Ranger, Cassandra, HBase,
MongoDB, Spark and Spark Streaming.
 Experience in installation, configuration, deploying and managing of different Hadoop distributions like
ClouderaandHortonworksDistributions.
 Configured Oozie, Zookeeper, Spark,Kafka,Cassandra and HBase to the existing Hadoop cluster.
 Have an experience in importing and exporting data using Sqoop from Hadoop Distributed File Systems to
Relational Database Systems and vice versa.
 Experience in handling various file formats like AVRO, Sequential, text, xml and Parquet etc.
 Imported the data from source HDFS into Spark RDD for in-memory data computation to generate the output
response.
 Experience on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark
MLlib.
 Expertise in writing SparkRDD transformations, actions, Data Frame's, case classes for the required input data
and performed the data transformations using Spark-Core.
 Experience on collection the realtimestreaming data pipeline from different source using Kafka and store data
into HDFS.
 Extending HIVE core functionality by using custom User Defined Function's (UDF) and User Defined
Aggregating Functions (UDAF) for Hive.
 Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
 Experience in NoSQL Databases HBase, Cassandra and it’s integrated with Hadoop cluster.
 Implemented Cluster for NoSQL tools HBaseas a part of POC to address HBase limitations.
 Exploring withSparkimproving the performance and optimization of the existing algorithms in Hadoop
usingSparkcontext,Spark-SQL, Data Frame, pair RDD's,SparkYARN.
 Strong expertise in using ETL ToolInformaticaPowerCenterDesigner, Workflow Manager, Repository
Manager, Data Quality and ETL concepts.
 Experience of constant information ingestion utilizing Kafka, Spark and different NoSQL databases.
 Knowledge in using NiFi to automate the data movement between different Hadoop systems.
 Implemented Hadoop Security using Knox andRanger integrated LDAP store with Kerberos KDC.
 Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication
and authorization infrastructure.
 Experience on cloud integration with Amazon Elastic MapReduce, Amazon Cloud Compute, Amazon's Simple
Storage Service and Microsoft Azure.
 Experienced on different Relational Data Base Management Systems likeTeradata, PostgreSQL, Oracle and
SQL Server.
 Experienced in scheduling and monitoring the production jobs using Oozie and Azkaban.

TECHNICAL SKILLS:
Hadoop Ecosystems Hadoop, HDFS, MapReduce, Hive, YARN, Oozie, Zookeeper, Spark,
Impala,Spark SQL, Spark Streaming,Hue, Kafka,RabbitMQ,Solar,Sqoop, NiFi,
Knox, Ranger, and Kerberos.

Cloud Services Elastic MapReduce, Amazon Cloud Compute, Simple Storage Service and
Microsoft Azure.
Languages Java, Scala, Python, PL/SQL, Unix Shell Scripting.

Java Technologies Spring MVC, JDBC, JSP, JSON, Applets, Swing, JDBC, JNDI, JSTL, RMI, JMS,
Servlets, EJB, JSF.
UI Technologies HTML5, JavaScript, CSS3, Angular, XML, JSP, JSON AJAX.
Development Tools Eclipse, IntelliJ, Ant, Maven, Insomnia, Postman’s, Scala IDE.
Frameworks/Web Server Spring, JSP, Hibernate, Hadoop, Web Logic, Web Sphere, Tomcat.

SQL/ NoSQL Databases Teradata, PostgreSQL, Oracle,HBase, MongoDB, Cassandra, CouchDB, MySQL
and DB2.
Versions/ Building tools GitHub, BitBucket, SVN, JIRA, Source Tree, Maven

Professional Experience:

Client: ETrade, TX May 2017 – Present


Sr. Big Data Hadoop Developer

Description:Etrade is an IT service company where the data was present in JSON and the target of this project is to
provide access to the end users by bringing it into a structured format. The aim of this project is to show a
sophisticated difference at the end to the end user.

Responsibilities:

 Involved in installing and configuration of Hadoop distribution systems as a Hortonworks Distribution (HDP).
 Involved in requirement gathered, narrates the stories, reviewing and merging the codes of development
teaminto Dev repositories and follow the agile methodologies of SDLC.
 Importing of data from various data sources, performed transformations using Hive, loaded data into HDFS and
extracted the data from SQL Server into HDFS using Sqoop.
 Exporting the analyzing data to the relational databases using for visualization and to generate reports for the BI
team.
 Worked on different data format like AVRO, Parquet and XML.
 Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and Map
Reduce.
 Worked on batch and streaming data ingestion to Cassandra database.
 Expertise in integrating Kafka with Spark streaming for high speed data processing.
 Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
 Worked on implementation of a log producer in Scala that watches for application logs, transform incremental
log and sends them to a Kafka and Zookeeper based log collection platform.
 Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
 Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and RDD's.
 Good understanding of Machine learning and Data Mining.
 Developed and performed unit testing using JUnit framework in a Test-Driven environment (TDD).

 Involved in integrating testing and production support.


 Co-ordinate with offshore and onsite team to understand the requirements and prepare High level and Low-level
design documents from the requirements specification.

Environment: Spark, Spark SQL, GitHub, Maven, HDFS,Hive, Bigdata,Oozie, Sqoop, Spark SQL, Kafka, Shell
Scripting, LINUX, HBase, Scala,R, Tableau, SQL Server.

Client: Western Digital, CA Feb 2016 – Mar 2017


Sr. Big Data Hadoop Developer

Description: Western Digital provides cost-effective solutions for the collection, management, protection and use of
digital information. WD is a long-term innovator and storage industry leader. High volume of sales and finance data
from users, clients and other networks is streamed daily and the data is loaded into Hadoop Cluster. The Data is used
for analyzing the business needs and identifying the trends and requirements in different ways.
Responsibilities:

 Installed and configured distributed data solution using ClouderaDistribution of Hadoop.


 Involved in complete big data flow of the application data ingestion from upstream to HDFS, processing the
data in HDFS and analyzing the data.
 Imported the data from various formats like JSON, Sequential, AVROandParquetto HDFS cluster.
 Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
 Configured Hive and written Hive UDF’s and UDAF’s.
 Implemented advanced procedures like text analytics and processing using the in-memory computing
capabilities likespark.
 Importing and exporting data into HDFSand hive using Sqoop and Kafka.
 Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner
data model which gets the data from Kafka in near real time and Persists into HBase.
 Performance analysis of Sparkstreaming and batch jobs by using Spark tuning parameters.
 Enhanced and optimized productSparkcode to aggregate, group and run data mining tasks using
theSparkframework.
 Developed Spark scripts by using Python shell commands as per the requirement.
 Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
 Experience in managing and reviewing huge Hadoop log files.
 Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting
 Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
 Maintaining technical documentation foreach and every steps of development environment and launching
Hadoop clusters.
 Built the automated build and deployment framework using BitBucket,SourceTree and Mavenetc.
 Used IntelliJ IDEA to develop the code and dubbing.
 Worked on BI tools as Tableauto create dashboards like weekly, monthly, daily reports using tableaudesktop
and publish them to HDFS cluster.

Environment: Bit Bucket, SourceTree, Maven, Hadoop, Bigdata,HDFS,Hive, Oozie, Sqoop, Spark, Kafka, Elastic
Search, Shell Scripting, LINUX, HBase, Scala, Python, Tableau, MySQL.

Client: Transamerica - TX Jan 2015– Dec 2015


Big Data/Hadoop Developer

Description: Transamerica is the US-based brand of Aegon, a Dutch financial services firm. Aegon is one of the
world's leading providers of life insurance, pensions and asset management and is helping approximately 30 million
customers globally to achieve a lifetime of financial security. As a Hadoop developer I was responsible for
developing automated scripts to analyze the incoming data from different processes.

Responsibilities:
 Involved in requirement gathered, narrates the stories and worked with complete Software Development Life
Cycle (SDLC) methodologies based on Agile.
 Involved in installation, configuration, supporting and managing HadoopClusters using Hortonworks
Distribution (HDP) to Cloudera Distributions Hadoop (CDH).
 Worked on Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and
preprocessing.
 Experienced in managing and reviewing Hadoop log files and documenting the issues on daily basis to the
resolution portal.
 Implemented Dynamic Partitions, Buckets in HIVE.
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
 Experience configuring spouts and bolts in various Storm topologies and validating data in the bolts.
 Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
 Have an experience to load and transform large sets of structured, semi structured and unstructured data, using
Sqoop from Hadoop Distributed File Systems to Relational Database Systems and also Relational Database
Systems to Hadoop Distributed File Systems.
 Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala
and databases such as HBase.
 Established/implemented firewall rules, Validated rules with vulnerability scanning tools
 Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop
jobs such as Java MapReduce Hive, Pig, and Sqoop.
 Implemented Storm builder topologies to perform cleansing operations before moving data into HBase.
 Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map
reduce way.
 Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
 Used Spark to create API's in Javaand Python for Big Data analysis.
 Experience in troubleshooting errors in Cassandra, Hive and MapReduce.
 This plugin allows Hadoop MapReduce programs, Cassandra, Pig and Hive to work unmodified and access files
directly.
 Used versions controls tools such as Git Hub, SourceTree, to pull data from Upstream to local branch, check
conflict, cleaning also reviewing the codes of other developers.
 Involved with development teams to discuss JIRA stories and understand the requirements.
 Actively, involved in complete life cycle of agile methodology to design, develop, deploy and support solutions.

Environment: Hadoop, Hive, Pig, Strom,Cassandra, Sqoop, Impala, Oozie, Java,Python,Shell Scripting,
MapReduce, Java Collection, MySQL, Maven.

Client: Skandia - TX May 2013 – Oct 2014


Hadoop Developer

Descriptions: Skandia is a leading aircraft interiors specialist providing innovate products and expert services to
aviation industry.

Responsibilities:
 Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for
datacleansing and preprocessing.
 Involved in loading data from UNIX file system to HDFS.
 Installed and configured Hive and also written Hive UDFs.
 Evaluated business requirements and prepared detailed specifications that follow project guidelines required
todevelop written programs.
 Devised procedures that solve complex business problems with due considerations for
hardware/softwarecapacity and limitations, operating times and desired results.
 Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
 Provided quick response to ad hoc internal and external client requests for data and experienced in creating
adhoc reports.
 Responsible for building scalable distributed data solutions using Hadoop.
 Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and
 Troubleshooting, manage and review data backups, manage and review Hadoop log files. Worked hands on
with ETL process.
 Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
andloaded data into HDFS.
 Extracted the data from Teradata into HDFS using Sqoop.
 Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping
enthusiasts, travelers, music lovers etc.
 Created lookup Hive tables, JSON Format Hive Tables.
 Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
 Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and
Hivebest practices for tuning.
 Exported the patterns analyzed back into Teradata using Sqoop.
 Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
 Installed Oozie workflow engine to run multiple Hive.

Environment: Java, Oracle, HTML, XML, SQL, J2EE, JUnit, JDBC, JSP, Tomcat, SQL Server, MongoDB,
JavaScript, GitHub, SourceTree, NetBeans.
Java Developer June 2012–Feb 2013
Nutecinfoservices - Hyderabad, India.

Description: This project has been designed and developed to process online order request. This project consist of
different module such as online User Registration, Update User Information, Submit order online, payment, process
order and delivery of order.

Responsibilities:
 Analyzing and preparing the requirement Analysis Document.
 Deploying the Application to the JBOSS Application Server.
 Requirement gatherings from various stakeholders of the project.
 Effort-estimation and estimating timelines for development tasks.
 Used to J2EE and EJB to handle the business flow and Functionality.
 Interact with Client to get the confirmation on the functionalities and implementation.
 Involved in the complete SDLC of the Development with full system dependency.
 Actively coordinated with deployment manager for application production launch.
 Provide Support and update for the period under warranty.
 Produce detailed low-level designs from high level design
 Specifications for components of low level complexity.
 Develops, builds and unit tests components of low level
 Complexity from detailed low-level designs.
 Developed user and technical documentation.
 Monitoring of test cases to verify actual results against expected results.
 Performed Functional, User Interface test and Regression Test.
 Carrying out Regression testing to track the problem tracking.
 Implemented Model View Controller (MVC) architecture at the Web tier level to isolate each layer of the
application to avoid the complexity of integration and ease of maintenance along with Validation Framework

Environment: Java, JEE, CSS, HTML, SVN, EJB, UNIX, XML, Work Flow, MyEclipse JMS, JIRA, Oracle,
JBOSS.

EDUCATION:
Bachelor of Technology in Information Technology

S-ar putea să vă placă și