Sunteți pe pagina 1din 4

Name : Mukesh Kharga Mobile : 9762180563 / 7989375626

Role : Spark Developer E-Mail : kharga.mukesh@gmail.com

Objective:
Data Science enthusiast having Hadoop skill sets, an insatiable curiosity and the ability to
mine hidden insights from large sets of structured, semi-structured and unstructured data.

Professional Summary:

3.7 years of experience as a Software Developer of which Close to 2 years of Experiance


in Big Data. At present working in Tata Constancy Services.
Close to 2 Years of hands on experience with Apache Spark and its related
technologies like SparkCore, SparkSQL, Hive on Spark using Scala programming
language.
1 year 8 months of experience as a SQL server developer (database developer).
Experience in developing, implementing and refining data engineering solution for
large volume of data.
Good Understanding of the Banking and Finance domain and processes with
techno-functional skills.
Good Understanding of the Healthcare domain and processes with techno-
functional skills.
Exposure to various hadoop echosystems like Pig, Sqoop, Hive, OOZIE, Hue.
Exposure to NoSQL systems like Hbase.
Familiarity with data loading tools like Flume, Kafka, Sqoop.
Strong experience in writing SQL queries.
Possess good verbal and written communication skills.

Technical Skills:

Programming Languages : Scala, C, Unix Shell Scripting, OOPS Concept.


Big Data Eco-System : Spark, Hive, Sqoop, YARN, Hue, HDFS, OOZEE, Pig, Hbase.
Methodology : Agile
Databases : Microsoft SQL-Server, Oracle.
Tools : Intellij Idea, Eclipse, Maven, Git, Jira, Control-M, TeamCity,
Nexus, Udeploy, Hp-ALM.

Academic Profile:

MTech(IT) from Indian Institute of Information Technology- Allahabad (IIIT-A)

Projects:

Project # 4: DH
Domain : BFS
Client : Deutsche Bank
Duration : June 2016 Present
Role : Spark Developer
Tools : Spark-Core, Spark-SQL, Spark-Hive, Git, Control-M, Intellij Idea, Hue, Oracle,
Hive
Language : Scala
Description:
DH(Data Harmonization) is a spark based application which takes data from oracle, various other
sources (i.e from other deployed projects of hadoop as described below) and from various other
sources(csv, json, xml), processes the data, combines and integrates them into an unambiguous entity.

SubProject:

Webtrek (January 2016 June 2017)


Webtrek is a third party application which captures and stores Bank customers information in
form of Json & CSV files. A Spark based framework is developed to process the Json & csv files and
store them in Hive & Impala tables and further used by Qlikview for reporting purposes.

Mifid (September 2016 March 2017)


Mifid is a data management framework developed using spark to read data from CDM project
which was developed in oracle and various csv files coming from multiple sources and finally store in
hive tables.

URL (June 2016 October 2016)


URL(Unified Reporting Layer) is an Asset management project which is basically used for
Regulatory reporting. A Hadoop based solution is designed to process the source data coming from
Oracle database to HDFS. It then eventually created Hive tables which are consumed by reporting
tool Tableau for further processing.

Roles and Responsibilities:


Designed Spark Framework using Scala to do data analysis in all the above mentioned
projects.
Prepared an object oriented framework and implemented functional programming paradigms
with the help of Scala.
Implemented spark RDD, Transformations and Actions with Scala.
Developed Spark-SQL and Spark-Hive statements for Data processing.
Developed Spark-DataFrames, Case Classes, Tuples, Objects, Functions as per the
requirement.
Developed a framework for in-memory processing and alter the number of partition as per
requirement while processing.
Developed DataFrame to load data from oracle.
Developed PairRDD to perform Data Aggregation in Spark using Scala.
Good knowledge of using Spark aggregate function.
Designed a Spark Framework to load data from oracle in a DataFrame and then do the
processing to get the desired result.
Designed a Spark Framework to load data from hive in a DataFrame and then do the
processing and get the desired result.
Designed Spark FrameWork to process CSV Data using Scala.
Designed Spark FrameWork to process XML Data using Scala.
Designed Spark FrameWork to process JSON Data using Scala.
Performed data Curing: Combining of data.
Good knowledge in importing and exporting data from HDFS to linux system.
Did unit testing of my developed code.
Involved in writing Hive DDL statements.
Triggered the job in cluster using Control-M and finally Check-In the code in Git.
Project # 3: AMPV
Domain : BFS
Client : Deutsche Bank
Duration : Dec 2015 May 2016
Role : Spark Developer
Tools : Spark-Core, Spark-SQL, Spark-Hive, Git, Control-M, Intellij Idea, Hue, Oracle,
Hive
Language : Scala
Description:
AMPV(Asset Management Performance Visualization) is a Spark based application which takes
customer data from third party rating agencies like Morning Star and processes the useful data in
spark and finally stores the data in Hive tables which is used for reporting. Here the source files are in
XML format for which a spark framework is created in order to process them.
Roles and Responsibilities:
Designed a Spark Framework to process the XML data.
Developed DataFrame to read the XML data from HDFS location.
Developed transformation and filter to process the DataFrame and unpivot the data.
Developed Spark-SQL statements to perform the joins and do data processing.
Finally repartition the DataFrame and store it in HDFS.
Written Hive DDL statements.
Trigger the job in cluster using Control-M and finally Check-In the code in Git.

Project # 2: Automation
Domain : BFS
Client : Deutsche Bank
Duration : Dec 2015 Present
Role : Spark Developer
Tools : Spark-Core, Spark-SQL, Spark-Hive, Git, Control-M, Intellij Idea, Hue, Oracle,
Hive, Hp-ALM
Language : Scala
Description:
Spark Automation is a testing tool which is developed using spark framework for all the above
mentioned projects and testers are using it to do regression testing.
Roles and Responsibilities:
Designed a Spark Framework to capture data from different hive tables and filter them using
Spark as per the requirement of testers and create a DataFrame(BaseLine DataFrame)
Get the source for the baseline and prepare a DataFrame(Source DataFrame)
Compare these two DataFrame using hashing mechanism and completed the testing.
The job in cluster is triggered using Control-M.
Create a test case in Hp-ALM.
As per result update ALM.

Project # 1: Reggi
Domain : HealthCare
Client : TCS
Duration : September 2011 May 2013
Role : DataBase developer
Description:
A Product which is developed for Hospitals to store the whole records as well as reduces the effort for
Patients to make an inquiry and take benefits as soon as possible. Its a complete health care solution
to manage all the information for any Hospitals. This is accessible by all service providers of the
HealthCare. The functionality can be hidden according to the user.
Roles and Responsibilities:
Creating Tables as per requirement.
Preparing Data-Mod Script for inserting and updating data present in the table.
Creating and updating Stored Procedures as per requirement.
Creating triggers as per requirement.
Perform unit testing on the code and Check-In the code in Git Repo.

Personal Details:

Nationality : INDIAN
Fathers name : Mahendra Kharga
Gender : MALE
Languages known : ENGLISH and HINDI
Passport : YES.

Declaration:

I hereby declare that all the details furnished above are true to the best of my knowledge and belief.

Place: Pune Mukesh Kharga

S-ar putea să vă placă și