2. Important Information
• Schedule
• https://data-anz.azurewebsites.net
• Mobile App
• https://data-anz.sessionize.com/
• Note – Session Links only live when session starts, Please use the Schedule
links to join in advance
3. Important Information - continued
• Prize Draws
• Draws take place at 03:15 UTC and 08:35 UTC
• You have to be present in the Session to win!
• Join Links for the Session are in the schedule (https://data-
anz.azurewebsites.net)
• Sponsors
• 2 Dedicated time slots for Sponsors: 01:45 UTC and 05:40 UTC
• Drop in for possible 1-on-1 discussions at the other open times
• Join Links for the Sponsors sessions from the schedule (https://data-
anz.azurewebsites.net)
8. Prepare your Data Build & Train Run
Model consumption
End-user app using
the ML model
ML model ML model
Model creation
App/tools for training the ML model
Datasets
Machine Learning Workflow
9. Designed for .NET
developers
Custom ML
made easy
Extensible:
TensorFlow, ONNX &
more
Trusted & proven at
scale
C#
F#
http://dot.net/ml
ML.NET
A free, open source, and cross-platform machine learning
framework for the .NET developer platform.
16. ML.NET
CLI
(Command-Line Interface)
>_
> mlnet auto-train --ml-task binary-classification --dataset "customer-reviews.tsv" --label-column-name Sentiment
• Use the CLI to easily
build custom ML
models with
Automated ML
• Cross-platform
(Windows, Linux,
MacOS)
• Generate code for
training and
consumption
17. ML.NET
Model Builder
(Visual Studio)
• A simple UI to easily build
custom ML models with
Automated ML
• Load from files and
databases
• Generate code for training
and consumption
• Run everything local
Download VS vsix:
http://aka.ms/mlnetmodelbuilder
21. MLContext
Starting point for all ML.NET operations.
It provides ways to create components for:
• Data preparation
• Feature engineering
• Training
• Prediction
• Model evaluation
• Logging
• Execution control
• Seeding
22. IDataView
Data in ML.NET is represented as IDataView
High-dimensional
Lazy loading and memory efficient
Immutable
23. DataViewSchema
• Data schema of IDataView =
set of columns, their names, types, and other annotations
• Before loading data, you must define how the schema of data will look
(column names & types)
• Use class definitions to define IDataView schemas
Class definition of schema
Dataset
Label SepalLength SepalWidth PetalLength PetalWidth
Iris-setosa 5.1 3.5 1.4 0.2
Iris-versicolor 7.0 3.2 4.7 1.4
Iris-setosa 4.9 3.0 1.5 0.1
…
IDataView
24. Data sources
Files
Load data from sources like text, binary, and
image files to IDV
Supports:
• Text: .csv, .tsv, .txt
• Images: .png, .jpg, .bmp
Databases
Load and train data directly from relational
database.
Supports:
• SQL Server, Azure SQL Database, Oracle,
SQLite, PostgreSQL, Progress, IBM DB2, + más
Other sources
Load from Enumerable (in-memory collections)
Supports:
• JSON/XML, …
25. Database Loader
Train on
C# Training code which creates
a ML model
Relational
database
• SQL Server
• Oracle
• PostgreSQL
• MySQL
• etc.
Escenarios:
• Training directly against relational
databases
• Simple coding
• Supports any RDBMS supported
by System.Data
• Available from 1.4-preview
release and onwards
• Requires the Nuget Package
System.Data.SqlClient
26. MLContext mlContext = new MLContext();
DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader<HouseData>();
CREATE TABLE [House]
(
[HouseId] INT NOT NULL IDENTITY,
[Size] REAL NOT NULL,
[NumBed] INT NOT NULL,
[Price] REAL NOT NULL
CONSTRAINT [PK_House] PRIMARY KEY ([HouseId])
);
public class HouseData
{
public float Size { get; set; }
public float NumBed { get; set; }
public float Price { get; set; }
}
string connectionString = @”your-connection-string";
string sqlCommand = "SELECT Size, CAST(NumBed as REAL) as NumBed, Price FROM House";
DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString,
sqlCommand);
IDataView data = loader.Load(dbSource);
27. IEstimator & ITransformer
IEstimator ITransformer
Normalization
• Min-Max
• Binning
• Mean
variance
Missing
Values
• Indicate
• Replace
ColumnMapping
• Concatenate
• Copy columns
• Drop columns
Type
Conversion
• Convert
type
• Map value
to key
• Hash
Text
Transforms
• Featurize
text
• Remove stop
words
• N-grams
• Word bags
32. Problem to solve
We want to build a model that helps to identify if a woman can
develop diabetes based on historical data from other patients:
• Number of Pregnancies
• Glucose level
• Blood Pressure
• Skin Thickness
• Insulin
• BMI
• Diabetes Pedigree Function
• Age
• Outcome
35. Call to Action
Start here http://dot.net/ml
Check the samples http://aka.ms/mlnetsamples
Read the documentation http://aka.ms/mlnetdocs
Contribute or request features http://aka.ms/mlnet
Watch the videos https://aka.ms/mlnetyoutube
Slide 1 – Sponsors
Please call out and thank them and ask the attendees to pay them a vist (it’s hard for sponsors in a virtual event so please give them some love)
Slide 2 – Core Information
Remind attendess where they can get the schedule and join links
Slide 3 – More Info
Remind them they need to be in the Prize Draw sessions to win
Remind them the Sponsors have dedicated sessions at specific time (Redgate & Telstra Purple only)
branch of Artificial intelligence
Can be considered as a method of teaching computers to make predictions based on data.
One of its characteristics is that it learns, it performs a semi-automated extraction of knowledge from data
Data is important. ML always starts from data in order to learn
The automation is in form of algorithm and computer to do the job,
It is not fully automated because it requires many smart decisions by a human.
There are Tens of thousands of machine learning algorithms
Hundreds are genereated by research every year
Every machine learning algorithm has three components:
Representation
Evaluation
Optimization
8
Built for .NET developers
With ML.NET, you can use your existing .NET skills to easily integrate ML into your .NET apps without any prior ML experience.
Custom ML made easy with AutoML
ML.NET offers AutoML and productive tools to help you easily build, train, and deploy high-quality custom ML models.
Extended with TensorFlow & more
ML.NET allows you to leverage other popular ML libraries like Infer.NET, TensorFlow, and ONNX for additional ML scenarios.
Trusted and proven at scale
Use the same ML framework used by recognized Microsoft products like PowerBI, Microsoft Defender, Outlook, and Bing.
Classical ML tasks: ML.NET supports many classical machine learning scenarios and tasks, such as classification, regression, time series, and more. ML.NET provides more than 40 trainers (algorithms targeting a specific task), so you can select and fine-tune the specific algorithm that achieves higher accuracy and better solves your ML problem.
Computer Vision: Starting in ML.NET 1.4-Preview, ML.NET also offers image-based training tasks (image classification/recognition) with your own custom images, which uses TensorFlow under the covers for training. Microsoft is working on adding support for object detection training as well.
Thanks to this, Computer vision and other deep learning tasks are supported in ML.NET
You can add intelligence based on deep neural network models to your .NET applications
Microsoft’s strategy is to integrate low level ibraries and runteims into ML .NET. So ML.NET has been designed as an extensible platform which allows you to can consume other popular ML frameworks (TensorFlow, ONNX, Infer.NET, and more) and have access to even more machine learning scenarios, like image classification, object detection, and more.
Disponible en PowerShell (Windows) o Bash (macOS, Linux)
ML.NET offers Model Builder (a simple UI tool) and ML.NET CLI to make it super easy to build custom ML Models.
These tools use Automated ML (AutoML), a cutting edge technology that automates the process of building best performing models for your Machine Learning scenario.
All you have to do is load your data, and AutoML takes care of the rest of the model building process.
With ML.NET, you can create custom ML models using C# or F# without having to leave the .NET ecosystem.
ML.NET lets you re-use all the knowledge, skills, code, and libraries you already have as a .NET developer so that you can easily integrate machine learning into your web, mobile, desktop, games, and IoT apps.
IDV = Flexible, efficient way of describing tabular data
The IDataView component provides a very efficient, compositional processing of tabular data (columns and rows) especialy made for machine learning and advanced analytics applications. It is designed to efficiently handle high dimensional data and large data sets. It is also suitable for single node processing of data partitions belonging to larger distributed data sets.
Immutable – can’t change it – have to create a copy – not directly accessing the idv and changing it – making new copy of it
Lazy – as needed go through it, not all in memory
https://github.com/dotnet/machinelearning/blob/master/docs/code/IDataViewDesignPrinciples.md
FlowerType is what we’re trying to predict. This becomes the Label property. This Label is the label or target variable. In an IDataView the column names are the names of the properties. The way in which data is loaded is determined by LoadColumn which specifies the index where to find that data point in the file.
https://github.com/dotnet/machinelearning/blob/master/docs/code/IDataViewDesignPrinciples.md
https://github.com/dotnet/machinelearning/blob/master/docs/code/SchemaComprehension.md
Numerical data that is not of type Real has to be converted to Real.
The Real type is represented as a single-precision floating-point value or Single, the input type expected by ML.NET algorithms.
In this sample, the NumBed column is an integer in the database.
Using the CAST built-in function, it's converted to Real.
Estimators
Untrained transformer
Definition of the operations that are to take place: Normalización, Tratamiento de Valores faltantes, ColumnMapping, Conversión de Tipos,
Es el plan de acción
Transformers
Component that realizes the transformations defined by the estimators
With ML.NET, the same algorithm can be applied to different tasks. For example, Stochastic Dual Coordinated Ascent can be used for Binary Classification, Multiclass Classification, and Regression. The difference is in how the output of the algorithm is interpreted to match the task.
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-choose-an-ml-net-algorithm