Sunteți pe pagina 1din 3

Data Cleansing Test

Background
This test was designed for Data Scientist prospects to see your approach in handling raw,
unreliable data.

We believe that this test mimics the position’s responsibilities at ilmuOne Data. The data that
our clients collect often do not align with their business goals or the analysis that they ask our
team to perform.

By completing this test, you will help us gain a better understanding of your potential with
regards to this vacancy. Aside from your programming skills, you should consider this test as
a chance to show your problem-solving skills and your ability to think outside the box.

Please note our evaluation will not be limited to your final submission. We will also consider
intangibles including but not limited to your effort and professionalism. Our evaluation will also
be subject to our judgement of your experience based on your CV and first interview. For
example, if you are a fresh graduate, this test would highlight your ability to learn new
concepts. Even if you are unable to finish the test, please submit your best possible attempt.
Good luck!
Problem Statement

Swiftnet is a conventional telecommunications company who conducts most of their Customer


Relationship Management (CRM) through inbound call centers. Swiftnet’s new management
has made it clear that their CRM operations are below their standards. As such, they are very
eager to analyze the call center data they have collected throughout the years. Unfortunately,
the data is messy and difficult to work with.

Swiftnet has enlisted your help to perform data cleansing on their call center data.
However, they wish to first see your capabilities before granting you access to their entire
database. Thus, you are provided with two (2) sample datasets taken from:
• Their customer service log (call_logging.csv); and,
• Their user database (user_data.csv)

With this data, Swiftnet has two (2) specific requests:

1. Call Unification

Rows in the customer service log do NOT represent unique calls, due to the way Swiftnet
tracks their data. Swiftnet’s CRM system creates a new line whenever a customer service
agent picks up the phone. However, a customer might speak to multiple service agents in the
throughout their call. As such, data from the same call is often dispersed to separate rows in
the database.

In order to reliably analyze customer experience, rows from the same call need to be marked
with unique identifiers called call_ID. So far, Swiftnet’s analysts usually perform this manually.
Thus, your task would be to automate this process. Please write a script which generates a
new column of call_ID, with a subset of the customer service log as input. The script must
meet the following two (2) criteria:
• If the script is performed multiple times using the same data point from the customer
service log, it should always generate the same call_id for each data point.
• If the script is performed using two non-overlapping subsets of the customer service
log, and the results are concatenated, the column call_id should still function as a
unique identifier
2. Descriptive Analysis

Please describe the sample datasets you have received and compile your findings in a
report. Assuming that the sample datasets are representative of Swiftnet’s data, please
highlight all insightful findings which you believe would be interesting for Swiftnet’s
management. You are also encouraged to list down questions you would like to ask the client
and recommend additional data points which may strengthen your analysis.

Expected Output
1. A Script (.ipynb, .py, or .R) which generates a new column call_id: unique identifiers
for their customer service log

2. A descriptive analysis report

S-ar putea să vă placă și