BigML.io - The BigML API

BigML.io: The BigML API

October 12, 2012

BigML Inc BigML.io: The BigML API October 12, 2012 1 / 66

1 Introduction

2 BigML Resources

3 Sources

4 Datasets

5 Models

6 Predictions

7 Evaluations

8 Bindings

9 Final Remarks


BigML.io: Base URL

Base URL

https://bigml.io

A RESTful API for creating and managing BigML resources
programmatically.
All accesses are performed over HTTPS.


BigML.io: Development Mode

Dev Mode

https://bigml.io/dev/

No credits are charged.
Limited to 1MB per resource but unlimited in the number of resources.


BigML.io: Version

Version

https://bigml.io/andromeda/

BigML.io ﬁrst version is named andromeda.
If you omit the version name in your API requests, you will get access to
the latest API version.


BigML.io: Authentication

Authentication

1 BIGML_USERNAME=alfred
2 BIGML_API_KEY=62270d2ad14eba4e349432e80d749342de5550a4
3 BIGML_AUTH="username=$BIGML_USERNAME;api_key=$BIGML_API_KEY"

All accesses to BigML.io need to be authenticated.
Authentication is performed including your username and your BigML API
Key in every request.
If you use an environment variable (e.g. BIGML AUTH) you can keep your
credentials out of your source code.


BigML Resources

Source Dataset Model Prediction
A source is a file A dataset is a A model is A prediction is
containing the structured created using a created using a
raw data that version of a dataset as model and the
you want to use data source input, selecting new instance
to create a where each which fields to that you want to
predictive column has use as input classify as input
model been assigned a and which field
type will be the
objective


BigML.io: Source
Create a New Source

sepal length,sepal width,petal length,petal width,species
5.1,3.5,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
5.8,2.7,5.1,1.9,Iris-virginica

A source is the raw data that you want to use to create a predictive
model.
A source is usually a (big) file in tabular format.
Each column in the file represents a feature or field.
By default, the last column represents the class or objective field.
The file may have a first row or header with a name for each field.


BigML.io: Source

Source Base URL

https://bigml.io/source

Datasources can be created using several data sources:
Local ﬁles
Remote data accessed via HTTP or HTTPs
Files in S3 buckets
Blobs in Windows Azure storage
Inline data contained in the datasource creation request
Data must be in tabular format, cannot be bigger than 64GB, and
can be compressed (.Z or .gz, but not .zip)


BigML.io: Creating a Source using a local file

Creating a Source

curl https://bigml.io/source?$BIGML_AUTH -F file=@iris.csv

The file must be attached in the post as a file upload
The Content-Type in your HTTP request must be
multipart/form-data, as specified by RFC2388.


BigML.io: Creating a Source using a remote URL

Creating a Remote Source

curl https://bigml.io/source?$BIGML_AUTH
-X "POST"
-H "content-type: application/json"
-d '{"remote": "https://static.bigml.com/csv/iris.csv"}'

The Content-Type in your HTTP request must be application/json.
URLs can be HTTP or HTTPS with realm authentication, public or
private Amazon S3, or Windows Azure ﬁles.


BigML.io: Creating a Source using inline data

Creating an Inline Source

-X "POST"
-d '{"data": "a,b,c,dn1,2,3,4n5,6,7,8"}'

The Content-Type in your HTTP request must be application/json.
Source data is included in the JSON body as a string with key
“data”.
Maximum size of inline sources is 10MB.


BigML.io: New Source
New Source

1 {
2 "category": 0,
3 "code": 201,
4 "content_type": "application/octet-stream",
5 "created": "2012-05-21T18:41:47.546669",
6 "credits": 0.0,
7 "description": "",
8 "file_name": "iris.csv",
9 "md5": "d1175c032e1042bec7f974c91e4a65ae",
10 "name": "iris.csv",
11 "number_of_datasets": 0,
12 "number_of_models": 0,
13 "number_of_predictions": 0,
14 "private": true,
15 "resource": "source/4f52824203ce893c0a000053",
16 "size": 4608,
17 "source_parser": {},
18 "status": {
19 "code": 2,
20 "elapsed": 0,
21 "message": "The source creation has been started"
22 },
23 "tags": [],
24 "type": 0,
25 "updated": "2012-05-21T18:41:47.546693"
26 }


BigML.io: Source Arguments

One Required Type Description
ﬁle multipart form data File.
remote String URL of the remote source.
data String Inline data in tabular format.

Optional Type Description
category Integer The category that best describes the data.
description String A description of the source of up to 8192 characters.
name String The name you want to give to the new source.
private Boolean Whether you want your source to be private or not.
source parser Object Set of parameters to parse the source.
tags List A list of strings that help classify and index your source.

Table : Source Arguments


BigML.io: Creating a Source with args

Creating a Source with args

-X "POST"
-d '{"remote": "https://static.bigml.com/csv/iris.csv", "name": "iris"}'


BigML.io: Source Parser

Source Parser

1 {
2 "header": true,
3 "locale": "en-US",
4 "missing_tokens": ["?"],
5 "quote": """,
6 "separator": ",",
7 "trim": true
8 }


BigML.io: Updating a Source

Updating a Source

curl https://bigml.io/source/4f64191d03ce89860a000000?$BIGML_AUTH
-X PUT
-H 'content-type: application/json'
-d '{"name": "a new name", "source_parser": {"locale": "es-ES"}}'


BigML.io: Deleting a Source

Deleting a Source

curl "https://bigml.io/source/4f603fe203ce89bb2d000000?$BIGML_AUTH"
-X DELETE

Response HTTP/1.1 204 NO CONTENT


BigML.io: Retrieving a Source

Retrieving a Source via BigML.io

curl
"https://bigml.io/source/4eee50b90a590f7d5c000008?$BIGML_AUTH"

Visualizing a Source via BigML.com

https://bigml.com/dashboard/source/4eee50b90a590f7d5c000008


BigML.io: Source Properties

property type filterable sortable updatable
category Integer yes yes yes
code Integer no no no
content type String yes yos no
created Datetime yes yes no
credits Float yes yes no
description String yes yes yes
fields Object no no no
file name String yes yes no
md5 String no no no
name String yes yes yes
number of datasets Integer yes yes no
number of models Integer yes yes no
number of predictions Integer yes yes no
private Boolean yes yes yes
resource String no no no
rows Integer yes yes no
size Integer yes yes no
source String yes yes no
source status String yes yes no
status Object no no no
tags List yes yes yes
updated Datetime yes yes no

Table : Source Properties


BigML.io: Listing Sources

Listing Sources

curl "https://bigml.io/source?limit=10;offset=10;$BIGML_AUTH"

limit Specifies the number of sources to retrieve. Must be less
than or equal to 200.
offset The position of the whole source list at which the retrieved
source list will start off.


BigML.io: Listing Sources (cont.)
Source Listing

1 {
2 "meta": {
3 "limit": 10,
4 "next": "/source?limit=10&offset=20&username=francisco&api_key=aa4420adaed03ea68c850",
5 "offset": 10,
6 "previous": null,
7 "total_count": 540
8 },
9 "objects": [
10 {
11 "code": 200,
12 "content_type": "text/csv",
13 ...
14 },
15 ...
16 {
17 "code": 200,
18 "content_type": "text/csv",
19 ...
20 }
21 ]
22 }


BigML.io: Filtering Sources
Retrieving sources bigger than 1 MB

curl "https://bigml.io/source?size_gt=1048576;$BIGML_AUTH"

Filter Description
lt Less than
lte Less than or equal to
gt Greater than
gte Greater than or equal to

Table : Filtering Arguments


BigML.io: Sorting Sources

Sorting sources by size

curl "https://bigml.io/source?order_by=-size;$BIGML_AUTH"

order by Specifies the order of the sources to retrieve. Must be one
of the sortable fields. If you prefix the field name with “-”,
the order will be descending.


BigML.io: Dataset

Dataset Base URL

https://bigml.io/dataset

A dataset is a structured version of a source where each field has
been processed and serialized according to its type.
A field can be numeric or categorical.
Datetime and text fields are coming down the pike.


BigML.io: Create a New Dataset

Create a New Dataset

curl "https://bigml.io/andromeda/dataset?$BIGML_AUTH"
-X POST
-d '{"source": "/source/4ee5761c80e1c664f1000000"}'


New Datatset

1 { "category": 0,
2 "code": 201,
3 "columns": 5,
4 "created": "2012-05-25T06:02:40.889538",
5 "credits": 0.0087890625,
7 "fields": {
8 "000000": {
9 "column_number": 0,
10 "name": "sepal length",
11 "optype": "numeric"
12 },
13 ...
14 },
15 "locale": "en_US",
16 "name": "iris' dataset",
17 "number_of_models": 0,
19 "private": true,
20 "resource": "dataset/4f66a0b903ce8940c5000000",
21 "rows": 0,
22 "size": 4608,
23 "source": "source/4f665b8103ce8920bb000006",
24 "source_status": true,
25 "status": {
26 "code": 1,
27 "message": "The dataset is being processed and will be created soon"
28 },
29 "tags": [],
30 "updated": "2012-05-25T06:02:40.889570" }


BigML.io: Dataset Arguments

Required Type Description
source String Valid source/id

category Integer The category that best describes the dataset.
description String A description of the dataset of up to 8192 characters.
ﬁelds Object The ﬁelds that you want to use to create the dataset.
name String Name of the dataset.
private Boolean Whether you want your dataset to be private or not.
size Integer Maximum number of bytes to process.
tags List A list of strings that help classify and index your dataset.

Table : Dataset Arguments


BigML.io: Creating a Dataset with args

Creating a Dataset with args

curl "https://bigml.io/dataset?$BIGML_AUTH"
-X POST
-d '{"source": "/source/4ee5761c80e1c664f1000000", "name": "my dataset"}'


BigML.io: Updating a Dataset

Updating a Dataset

curl https://bigml.io/dataset/4f66a0b903ce8940c5000000?$BIGML_AUTH
-X PUT
-d '{"name": "a new name"}'


BigML.io: Deleting a Dataset

Deleting a Dataset

curl "https://bigml.io/dataset/4f66a0b903ce8940c5000000?$BIGML_AUTH"
-X DELETE



BigML.io: Retrieving a Dataset

Retrieving a Dataset via BigML.io

curl "https://bigml.io/dataset/4f66a0b903ce8940c5000000?$BIGML_AUTH"

Retrieving a Dataset via BigML.com

https://bigml.com/dashboard/dataset/4f66a0b903ce8940c5000000


BigML.io: Dataset Properties

columns Integer yes yes no
locale String no no no
number of models Integer yes yes no
rows Integer yes yes no
source status Boolean yes yes no

Table : Dataset Properties


BigML.io: Listing Datasets

Listing Datasets

curl "https://bigml.io/dataset?limit=10;offset=10;$BIGML_AUTH"

limit The total number of datasets to retrieve (≤ 200).
oﬀset The oﬀset at which the dataset listing will start.


BigML.io: Dataset Listing
Dataset Listing

1 {
2 "meta": {
3 "limit": 10,
4 "next": "/dataset?limit=10&offset=20&username=ciskoo&api_key=70aaae8d77699bc5d43788a5",
5 "offset": 10,
6 "previous": null,
8 },
9 "objects": [
10 {
11 "code": 200,
12 "columns": 120,
13 ...
14 },
15 ...
16 {
17 "code": 200,
18 "columns": 5,
19 ...
20 }
21 ]
22 }


BigML.io: Filtering Datasets
Retrieving datasets bigger than 1 MB

curl "https://bigml.io/dataset?size_gt=1048576;$BIGML_AUTH"

Filter Description
lt Less than
gt Greater than



BigML.io: Sorting Datasets

Sorting datasets by size

curl "https://bigml.io/dataset?order_by=-size;$BIGML_AUTH"

order by Speciﬁes the order of the datasets to retrieve. Must be one
they will be given in descending order.


BigML.io: Model

Model Base URL

https://bigml.io/model

A model is a tree-like representation of your dataset with
predictive power.
You can create a model selecting which fields from your dataset
you want to use as input fields (or predictors) and which field you
want to predict, the objective field.


BigML.io: Create a New Model

Create a New Model

curl https://bigml.io/model?$BIGML_AUTH
-X POST
-d '{"dataset": "dataset/4f66a80803ce8940c5000006"}'


New Model

1 { "category": 0,
2 "code": 201,
3 "columns": 5,
4 "created": "2012-05-25T07:13:07.243623",
5 "credits": 0.03515625,
6 "dataset": "dataset/4f66a80803ce8940c5000006",
7 "dataset_status": true,
9 "holdout": 0.0,
10 "input_fields": [],
12 "max_columns": 5,
13 "max_rows": 150,
14 "name": "iris' dataset model",
16 "objective_fields": [],
17 "private": true,
18 "range": [
19 1, 150
20 ],
21 "resource": "model/4f67c0ee03ce89c74a000006",
22 "rows": 150,
23 "size": 4608,
26 "status": {
27 "code": 1, "message": "The model is being processed and will be created soon"
28 },
29 "tags": [],
30 "updated": "2012-05-25T07:13:07.243658" }


BigML.io: Model Arguments

dataset String Valid dataset/id

input fields List The fields that you want to use to create the model.
objective fields List The field that you want to predict.
range List The range of successive instances to build the model.
tags List A list of strings that help classify your dataset.

Table : Model Arguments


BigML.io: Creating a Model with args

Creating a Model with args

curl https://bigml.io/andromeda/model?$BIGML_AUTH
-X POST
-d '{"dataset": "dataset/4f66a80803ce8940c5000006", "input_fields": ["000001", "000003"]}'


BigML.io: Updating a Model

Updating a Model

curl https://bigml.io/model/4f67c0ee03ce89c74a000006?$BIGML_AUTH
-X PUT


BigML.io: Deleting a Model

Deleting a Model

curl "https://bigml.io/model/4f67c0ee03ce89c74a000006?$BIGML_AUTH"
-X DELETE



BigML.io: Retrieving a Model

Retrieving a Model via BigML.io

curl "https://bigml.io/model/4f66a80803ce8940c5000006?$BIGML_AUTH"

Retrieving a Model via BigML.com

https://bigml.com/dashboard/model/4f66a80803ce8940c5000006


BigML.io: Model Properties

columns Integer yes yes no
dataset String yes yes no
dataset status Boolean yes yes no
input ﬁelds Object no no no
max columns Integer yes yes no
max rows Integer yes yes no
model Object no no no
objective ﬁelds List no no no
range List no no no
statistical pruning Boolean yes yes no

Table : Model Properties


BigML.io: Listing Models

Listing Models

curl "https://bigml.io/model?limit=10;offset=10;$BIGML_AUTH"

limit The number of models to retrieve (≤ 200).
offset The offset at which the model listing will start off.


BigML.io: Model Listing
Model Listing

1 {
2 "meta": {
3 "limit": 10,
4 "next": "/model?limit=10&offset=20&username=ciskoo&api_key=70aaae8d77699bc5d437876d85",
5 "offset": 10,
6 "previous": null,
8 },
9 "objects": [
10 {
11 "code": 200,
12 "columns": 1150,
13 ...
14 },
15 ...
16 {
17 "code": 200,
18 "columns": 512,
19 ...
20 }
21 ]
22 }


BigML.io: Filtering Models
Retrieving models bigger than 1 MB

curl "https://bigml.io/model?size_gt=1048576;$BIGML_AUTH"

Filter Description
lt Less than
gt Greater than



BigML.io: Sorting Models

Sorting models by size

curl "https://bigml.io/model?order_by=-size;$BIGML_AUTH"

order by Speciﬁes the order of the models to retrieve. Must be one
they will be given in descending order.


BigML.io: Prediction

Prediction Base URL

https://bigml.io/prediction

A prediction is created using a model/id and the properties of the
new instance (input data) for which you wish to create a prediction.
To create a new prediction, BigML.io will automatically navigate the
corresponding model to ﬁnd the leaf node that best classiﬁes the
new instance.


BigML.io: Create a New Prediction

Create a New Prediction

curl https://bigml.io/prediction?$BIGML_AUTH
-X POST
-d '{"model": "model/4f67c0ee03ce89c74a000006",
"input_data": {"000001": 3}}'


New Prediction

1 { "code": 201,
2 "created": "2012-03-21T16:26:51.300678",
3 "credits": 0.01,
6 "fields": { ... },
7 "input_data": { "000001": 3 },
8 "locale": "en-US",
9 "model": "model/4f67c0ee03ce89c74a000006",
10 "model_status": true,
11 "name": "Prediction for species",
12 "objective_fields": [ "000004" ],
13 "prediction": { "000004": "Iris-virginica" },
14 "prediction_path": {
15 "bad_fields": [],
16 "next_predicates": [
17 { "count": 100, "field": "000002", "operator": ">", "value": 2.45 },
18 { "count": 50, "field": "000002", "operator": "<=", "value": 2.45 }
19 ],
20 "path": [],
21 "unknown_fields": []
22 },
23 "private": true,
24 "resource": "prediction/4f6a014b03ce89584500000f",
27 "status": { "code": 5, "message": "The prediction has been created" },
28 "updated": "2012-03-21T16:26:51.300700" }


BigML.io: Prediction Arguments

model String Valid model/id.
input data Object Field’s id/value pairs representing the instance.

tags List A list of strings that help classify and index your dataset.

Table : Prediction Arguments


BigML.io: Creating a Prediction with args

Creating a Prediction with args

curl https://bigml.io/andromeda/prediction?$BIGML_AUTH
-X POST
-d '{"input_data": {"000001": 3},
"model": "model/4f67c0ee03ce89c74a000006",
"name": "my prediction"}'


BigML.io: Updating a Prediction

Updating a Prediction

curl https://bigml.io/prediction/4f6a014b03ce89584500000f?$BIGML_AUTH
-X PUT


BigML.io: Deleting a Prediction

Deleting a Prediction

curl "https://bigml.io/prediction/4f6a014b03ce89584500000f?$BIGML_AUTH"
-X DELETE



BigML.io: Retrieving a Prediction

Retrieving a Prediction via BigML.io

curl "https://bigml.io/prediction/4f6a014b03ce89584500000f?$BIGML_AUTH"

Retrieving a Prediction via BigML.com

https://bigml.com/dashboard/prediction/4f6a014b03ce89584500000f


BigML.io: Prediction Properties

dataset String yes yes no
dataset status Boolean yes yes no
input data Object no no no
model String yes yes no
model status Boolean yes yes no
objective ﬁelds List yes yes no
prediction Object yes yes no
prediction path Object no no no
source status Boolean yes yes no

Table : Prediction Properties


BigML.io: Listing Predictions

Listing Predictions

curl "https://bigml.io/prediction?limit=10;offset=10;$BIGML_AUTH"

limit The number of predictions to retrieve (≤ 200).
offset The offset at which the prediction listing will start off.


Prediction Listing

1 { "category": 0,
2 "code": 201,
3 "created": "2012-05-25T07:20:35.687797",
4 "credits": 0.01,
8 "fields": { ... },
9 "input_data": { "000001": 3 },
11 "model": "model/4f67c0ee03ce89c74a000006",
12 "model_status": true,
13 "name": "Prediction for species",
14 "objective_fields": [ "000004" ],
15 "prediction": { "000004": "Iris-virginica" },
16 "prediction_path": {
17 "bad_fields": [],
18 "next_predicates": [
19 { "field": "000002", "operator": ">", "value": 2.45 },
20 { "field": "000002", "operator": "<=", "value": 2.45 }
21 ],
22 "path": [], "unknown_fields": [] },
23 "private": true,
24 "resource": "prediction/4f6a014b03ce89584500000f",
27 "status": { "code": 5, "message": "The prediction has been created" },
28 "tags": [],
29 "updated": "2012-05-25T07:20:35.687819" }


BigML.io: Filtering Predictions
Retrieving predictions created after 12/1/2012

curl "https://bigml.io/prediction?created__gt=2012-01-12;$BIGML_AUTH"

Filter Description
lt Less than
gt Greater than



BigML.io: Sorting Predictions

Sorting predictions by name

curl "https://bigml.io/prediction?order_by=-name;$BIGML_AUTH"

order by Specifies the order of the predictions to retrieve. Must be
one of the sortable fields. If you prefix the field name with
“-”, they will be given in descending order.


BigML.io: Evaluation

Evaluation Base URL

https://bigml.io/evaluation

An evaluation automatically measures the performance of a model
correctly predicting the objective ﬁeld for a pre-labeled test set.
An evaluation is created using the model/id of the model under
evaluation and the a dataset/id of the testset.


BigML.io: Public Bindings

Bash https://github.com/bigmlcom/bigml-bash
Python https://github.com/bigmlcom/python
R https://github.com/bigmlcom/bigml-r
iOS https://github.com/fgarcialainez/ML4iOS
Java https://github.com/javinp/bigml-java
Ruby http://vigosan.github.com/big ml/


BigML.io: Final Remarks

dev mode Remember to include /dev in your URL requests to avoid
credit charges.
version Remember to include the current version name
/andromeda in your URL requests to make sure that
future versions of the BigML API do not interfere with your
application.


BigML.io - The BigML API

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BigML.io - The BigML API

Similar to BigML.io - The BigML API (20)

Recently uploaded

Recently uploaded (20)

BigML.io - The BigML API