Recreating the ML.NET Getting Started Tutorial in F#
Microsoft have two tutorial sites for ML.NET. One at dotnet.microsoft.com and one at docs.microsoft.com.
Each of the tutorials in the second link have the F# equivalent on GitHub. However, GitHub does not have the F# sample for the dotnet.microsoft.com tutorial. You can find implementations of it on Google but most of them use the legacy LearningPipeline
and not the new MLContext
used in the tutorial.
Creating the project
To begin, call the following commmands to create your project.
dotnet new console -lang F# -o myApp
cd myApp
After this, you can mostly follow the tutorial to get the data set and add the ML.NET package.
The Code
The F# equivalent of the tutorial code is mostly straightforward and is displayed below.
open System
open Microsoft.ML
open Microsoft.ML.Runtime.Api
open Microsoft.ML.Runtime.Data
open Microsoft.ML.Core.Data
// STEP 1: Define your data structures
// IrisData is used to provide training data, and as
// input for prediction operations
// - First 4 properties are inputs/features used to predict the label
// - Label is what you are predicting, and is only set when training
[<CLIMutable>]
type IrisData = {
SepalLength : float32
SepalWidth : float32
PetalLength : float32
PetalWidth : float32
Label : string
}
[<CLIMutable>]
// IrisPrediction is the result returned from prediction operations
type IrisPrediction = {
[<ColumnName("PredictedLabel")>]
PredictedLabel : string
}
[<EntryPoint>]
let main argv =
// STEP 2: Create a ML.NET environment
let mlContext = new MLContext()
// If working in Visual Studio, make sure the 'Copy to Output Directory'
// property of iris-data.txt is set to 'Copy always'
let dataPath = "iris-data.txt";
let reader =
mlContext.Data.TextReader(
TextLoader.Arguments(
Separator = ",",
HasHeader = true,
Column =
[|
TextLoader.Column("SepalLength", Nullable DataKind.R4, 0)
TextLoader.Column("SepalWidth", Nullable DataKind.R4, 1)
TextLoader.Column("PetalLength", Nullable DataKind.R4, 2)
TextLoader.Column("PetalWidth", Nullable DataKind.R4, 3)
TextLoader.Column("Label", Nullable DataKind.Text, 4)
|]
)
)
let trainingDataView = reader.Read(MultiFileSource(dataPath))
// Helper functions to help with creating the pipeline
let append (estimator : IEstimator<'a>) (pipeline : IEstimator<'b>) =
match pipeline with
| :? IEstimator<ITransformer> as p ->
p.Append estimator
| _ -> failwith "The pipeline has to be an instance of IEstimator<ITransformer>."
let downcastPipeline (pipeline : IEstimator<'a>) =
match pipeline with
| :? IEstimator<ITransformer> as p -> p
| _ -> failwith "The pipeline has to be an instance of IEstimator<ITransformer>."
// STEP 3: Transform your data and add a learner
// Assign numeric values to text in the "Label" column, because only
// numbers can be processed during model training.
// Add a learning algorithm to the pipeline. e.g.(What type of iris is this?)
// Convert the Label back into original text (after converting to number in step 3)
let pipeline =
mlContext.Transforms.Categorical.MapValueToKey("Label")
|> append(mlContext.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth"))
|> append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(label="Label", features="Features"))
|> append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"))
|> downcastPipeline
let model = pipeline.Fit(trainingDataView)
let prediction =
model.MakePredictionFunction<IrisData, IrisPrediction>(mlContext).Predict(
{
SepalLength = 3.3f
SepalWidth = 1.6f
PetalLength = 0.2f
PetalWidth = 5.1f
Label = ""
}
)
printfn "Predicted flower type is: %s" prediction.PredictedLabel
0 // return an integer exit code
One big difference from the C# code is in the way the pipeline is created. Due to how F# deal with type constraint, it is necessary to do a type check to call the Append
function and to downcast it. In the code above, two functions is created to make the process easier.