Sunteți pe pagina 1din 19

29/03/2019 Use C# and ML.

NET Machine Learning To Predict Taxi Fares In New York

Use C# And ML.NET Machine


Learning To Predict Taxi Fares In
New York
Mark Farragher Follow
Mar 25 · 6 min read

Building machine learning apps in C# has never been easier!

ML.NET is Microsoft’s new machine learning library. It can run linear


regression, logistic classification, clustering, deep learning, and many
other machine learning algorithms.

ML.NET is a first-class NET library. There’s no need to use Python, you


can easily tap into this library using any NET language, including C#.

Microsoft is pouring all their effort into ML.NET right now. This is
going to be their go-to solution for all machine learning in NET going
forward.

And it’s super easy to use. Watch this, I’m going to build an app that can
predict taxi fares in New York.

The first thing I need is a data file with thousands of New York taxi
rides. The NYC Taxi & Limousine Commission provides yearly TLC Trip
Record Data files which have exactly what I need.

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 1/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

The data file looks like this:

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 2/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 3/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

I’m using the awesome Rainbow CSV plugin for Visual Studio Code
which is highlighting my CSV data file with these nice colors.

The plugin can also run simple RBQL queries directly on the file:

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 4/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 5/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

The final column in the data file has the taxi fare I’m trying to predict.

I’ll use all the other columns as input data:

• The data provider vendor ID

• The rate code (standard, JFK, Newark, Nassau, negotiated, group)

• Number of passengers

• Trip time

• Trip distance

• Payment type (credit card, cash)

I’ll build a machine learning model in C# that will use these columns as
input, and use them to accurately predict the taxi fare for every trip.

And I will use NET Core to build my app.

NET Core is really cool. It’s the multi-platform version of the NET
framework and it runs flawlessly on Windows, OS/X, and Linux.

I’m using the 3.0 preview on my Mac right now and haven’t touched my
Windows 10 virtual machine in days.

Here’s how to set up a new console project in NET Core:

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 6/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

$ dotnet new console -o PricePrediction


$ cd PricePrediction

Next, I need to install the ML.NET NuGet package:

$ dotnet add package Microsoft.ML --version 0.10.0

Now I’m ready to add some classes. I’ll need one to hold a taxi trip, and
one to hold my model’s predictions.

I will modify the Program.cs file like this:

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 7/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

1 /// <summary>
2 /// The TaxiTrip class represents a single taxi trip.
3 /// </summary>
4 public class TaxiTrip
5 {
6 [Column("0")] public string VendorId;
7 [Column("1")] public string RateCode;
8 [Column("2")] public float PassengerCount;
9 [Column("3")] public float TripTime;
10 [Column("4")] public float TripDistance;
11 [Column("5")] public string PaymentType;
12 [Column("6")] public float FareAmount;
13 }
14

The TaxiTrip class holds one single taxi trip. Note how each field is
adorned with a Column attribute that tell the CSV data loading code
which column to import data from.

I’m also declaring a TaxiTripFarePrediction class which will hold a


single fare prediction.

Now I’m going to load the training data in memory:

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 8/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

1 /// <summary>
2 /// The program class.
3 /// </summary>
4 class Program
5 {
6 // file paths to data files
7 static readonly string trainingDataPath = Path.Combine(
8 static readonly string testDataPath = Path.Combine(Envi
9
10 /// <summary>
11 /// The main application entry point.
12 /// </summary>
13 /// <param name="args">The command line arguments.</par
14 static void Main(string[] args)
15 {
16 // create the machine learning context
17 var mlContext = new MLContext();
18
19 // set up the text loader
20 var textLoader = mlContext.Data.CreateTextLoader(
21 new TextLoader.Arguments()
22 {
23 Separators = new[] { ',' },
24 HasHeader = true,
25 Column = new[]
26 {
27 new TextLoader.Column("VendorId", DataK
28 new TextLoader.Column("RateCode", DataK
29 new TextLoader.Column("PassengerCount",
30 T tL d C l ("T i Ti " D t K

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 9/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

30 new TextLoader.Column("TripTime", DataK

This code sets up a TextLoader to load the CSV data into memory. Note
that all column data types are what you’d expect, except RateCode.
This column holds a numeric value from 0 to 6, but I’m loading it as a
text field.

The reason I’m doing this is because RateCode is an enumeration with


the following values:

• 1 = standard

• 2 = JFK

• 3 = Newark

• 4 = Nassau

• 5 = negotiated

• 6 = group

The actual numbers in this context don’t mean anything. And I


certainly don’t want the machine learning model to start believing that
a trip to Newark is three times as important as a standard fare.

So converting these values to strings is a perfect trick to show the


model that RateCode is just a label, and the underlying numbers don’t
mean anything.

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 10/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

With the TextLoader all set up, a single call to Read() is sufficient to
load the entire data file in memory.

Now I’m ready to start building the machine learning model:

1 // set up a learning pipeline


2 var pipeline = mlContext.Transforms.CopyColumns(
3 inputColumnName:"FareAmount",
4 outputColumnName:"Label")
5
6 // one-hot encode all text features
7 .Append(mlContext.Transforms.Categorical.OneHotEncoding
8 .Append(mlContext.Transforms.Categorical.OneHotEncoding
9 .Append(mlContext.Transforms.Categorical.OneHotEncoding
10
11 // combine all input features into a single column
12 .Append(mlContext.Transforms.Concatenate(
13 "Features",
14 "VendorId",
15 "RateCode",
16 "PassengerCount",
17 "TripTime",
18 "TripDistance",
19 "PaymentType"))

Machine learning models in ML.NET are built with pipelines, which are
sequences of data-loading, transformation, and learning components.

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 11/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

My pipeline has the following components:

• CopyColumns which copies the FareAmount column to a new


column called Label. This Label column holds the actual taxi fare
that the model has to predict.

• A group of three OneHotEncodings to perform one hot encoding


on the three columns that contains enumerative data: VendorId,
RateCode, and PaymentType. This is a required step because
machine learning models cannot handle enumerative data
directly.

• Concatenate which combines all input data columns into a single


column called Features. This is a required step because ML.NET
can only train on a single input column.

• A final FastTree regression learner which will train the model to


make accurate predictions.

The FastTreeRegressionTrainer is a very nice training algorithm that


uses gradient boosting, a machine learning technique for regression
problems.

A gradient boosting algorithm builds up a collection of weak regression


models. It starts out with a weak model that tries to predict the taxi
fare. Then it adds a second model that attempts to correct the error in
the first model. And then it adds a third model, and so on.

The result is a fairly strong prediction model that is actually just an


ensemble of weaker prediction models stacked on top of each other.

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 12/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

With the pipeline fully assembled, I can train the model with a call to
Fit().

I now have a fully- trained model. So now I need to load some


validation data, predict the taxi fare for each trip, and calculate the
accuracy of my model:

1 // load test data


2 Console.Write("Loading test data....");
3 IDataView testView = textLoader.Read(testDataPath);
4 Console.WriteLine("done");
5
6 // get a set of predictions
7 Console.Write("Evaluating the model....");
8 var predictions = model.Transform(dataView);
9
10 // get regression metrics to score the model
11 var metrics = mlContext.Regression.Evaluate(predictions, "L
12 Console.WriteLine("done");
13
14 // show the metrics

This code uses the TextLoader class to load another taxi trip data file
for testing. And with a single call to Transform(…) I can set up
predictions for every single trip in the file.

The Evaluate(…) method compares these predictions to the actual taxi


fares and automatically calculates three very handy metrics for me:

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 13/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

• Rms: this is the root mean square error or RMSE value. It’s the go-
to metric in the field of machine learning to evaluate models and
rate their accuracy. RMSE represents the length of a vector in n-
dimensional space, made up of the error in each individual
prediction.

• L1: this is the mean absolute prediction error, expressed in


dollars.

• L2: this is the mean square prediction error, or MSE value. Note
that RMSE and MSE are related: RMSE is just the square root of
MSE.

To wrap up, let’s use the model to make a prediction.

I’m going to take a taxi trip for 3.75 miles and the trip will take me 19
minutes. I’ll be the only passenger and I’ll pay by credit card.

Here’s how to make the prediction:

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 14/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

1 // create a prediction engine for one single prediction


2 var predictionFunction = model.CreatePredictionEngine<TaxiT
3
4 // prep a single taxi trip
5 var taxiTripSample = new TaxiTrip()
6 {
7 VendorId = "VTS",
8 RateCode = "1",
9 PassengerCount = 1,
10 TripTime = 1140,
11 TripDistance = 3.75f,
12 PaymentType = "CRD",
13 FareAmount = 0 // actual fare for this trip = 15.5
14 };
15

I use the CreatePredictionEngine<…>(…) method to set up a


prediction engine. The two type arguments are the input data class and
the class to hold the prediction. And once my prediction engine is set
up, I can simply call Predict(…) to make a single prediction.

I know from the data file that this trip is supposed to cost $15.50. How
accurate will the model prediction be?

Here’s the code running in the Visual Studio Code debugger on my


Mac:

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 15/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 16/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

The output is a bit small, so here’s the app again running in a zsh shell:

I get an RMSE value of 2.06 and an L1 value of 0.42. This means that
my predictions are on average only 42 cents off.

How about that!

According to the model, my 19-minute trip covering 3.75 miles will


cost me $15.79. But the actual fare price is $15.50, so in this case my
model prediction is off by only 29 cents.

So what do you think?

Are you ready to start writing C# machine learning apps with


ML.NET?

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 17/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

Add a comment and let me know!

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 18/19
29/03/2019 Use C# and ML.NET Machine Learning To Predict Taxi Fares In New York

https://medium.com/machinelearningadvantage/use-c-and-ml-net-machine-learning-to-predict-taxi-fares-in-new-york-519546f52591 19/19

S-ar putea să vă placă și