Sunteți pe pagina 1din 15

Analysis of CalTaxi dataset

Submitted By:
Harshit Rastogi
Vivek Kumar Jha
VGSoM, IIT Kharagpur

Contents
Case Objectives

Slide# 3

Analysis using Descriptive Statistics

Slide# 4-5

Inferences from Descriptive Analysis

Slide# 6

Determination of Response Variable

Slide# 7

Determination of Predictor Variables

Slide# 8

Linear Regression Model for the data

Slide# 9

Inferences from the model

Slide# 10

Determination of number of extra cars


needed

Slide# 11-12

Summary

Slide# 13

Scope for further enhancements

Slide# 14

Case Objectives
To draw inferences from the data on the basis of
Descriptive analysis of the dataset.
To come up with a model showing dependencies on the
various factors that contribute to booking of cars from
CalTaxi.
To predict number of extra cars that should be bought to
reduce dependency on external parties.
3

Analysis using Descriptive


Statistics (1/2)
Mean number of Bookings in a Week in Descending order
90
80
70
60
50
40
30
20
10
0

Sunday

Saturday

Friday

Thursday

Tuesday

Wednesday

Monday

Analysis using Descriptive


Statistics (2/2)
Mean Bookings on Holiday and Non Holiday

Mean bookings on different weather conditions

90

79

80

78

70

77

60

76
75

50

74

40

73

30

72

20

71

10
0

70
Holiday

Non-Holiday

69

Cloudy

Rainy

Sunny

Inferences from Descriptive


Analysis
The various basic plots shows that people book more
cars on weekends and are more interested when the
weather is Cloudy or Rainy.
Clearly, the dependencies on third parties would be high
on such days.

Determination of Response
Variable
Variable cnt gives the value of total bookings (sum of
ac_cars & non_ac_cars) each day.
cnt shows the engagement of cars every day and hence,
is the Response variable.
As the Maximum rent time is 24 hours, a car can be booked
multiple times a day.
7

Determination of Predictor
Variables
Correlation table between cnt and other variables of the
dataset*
cnt

Climat
e

Weekd
ay

Holiday Temp

aTemp

Hum

Windspe Cent_b Avg_dis


ed
ook_sta t
tus

0.1151
44

0.52736
9

0.56022
5

0.21507
8

0.2973
84

0.401956 0.00451
6

0.11536
8

0.36554
2

Hence, we have the


following
predictor variables
Climate
Weekday

Holiday

Temp

Atemp

Hum

Windspeed

Avg_dist
8

Linear Regression Model for the


data
A multiple linear regression model using the variables gives us the following
summary*:

Inferences from the model


The model gives us R-squared value of 0.79 which shows the accuracy of the
model of predicting cnt on the basis of predictor variables chosen.
We ran the model for the test data and got following values of cnt*:

A mean and media value of around 75 proves the validity of the model as this is
the approximate values that we got from Descriptive Analysis earlier.

10

Determination of number of extra cars


needed (1/2)
Assumptions :
1. Average speed of car is 50 km/hr around Calicut
Hence, we introduce a new variable avg_dur which gives average duration for which
a car is booked on any day
(avg_dur = avg_dist/50)

2. If any car has been booked for more than 10 hours in a day, it wont be
booked again (considering its maintenance and performance).
Using

this

assumption,

we

calculated

average

number

of

visits

per

car

(visits_per_car)

11

Determination of number of extra cars


needed (2/2)
Total number of cars required using the assumptions can be
calculated as :
cars_req = cnt/visits_per_car

A summary of cars_req gave the following values:


Mean number of cars required according to dataset = 48
Median number of cars according to dataset = 39

We performed the same calculation for the weekdays or when it


wasnt a holiday. Summary in such case:
Mean number of cars required according to dataset = 44
Median number of cars according to dataset = 36

12

Summary of the Analysis


Hence,

CalTaxi

should

proceed

with

buying

8-10

cars

to

reduce

dependencies on third party.


Buying these number of cars will reduce the dependencies to almost
negligible value on Weekdays.
On holidays, the number of bookings is generally high. To completely end the
dependencies, the number of purchase would need to increase significantly.
But, it wont be fruitful as those additional purchases may act as sunk cost
on weekdays.
13

Scope for further enhancements


The analysis and predictions of the dataset can be
improved by including more information. Some example
of such information could be as follows:
YoY increment in the booking of the cars for the 4 years.
Effect of fuel prices on car booking status.
Involvement of Promotional offers and discounts by CalTaxi
and its effect on number of bookings.
14

15

S-ar putea să vă placă și