sparktk linear_regression

Functions

def load(

path, tc=<class 'sparktk.arguments.implicit'>)

load LinearRegressionModel from given path

def train(

frame, value_column, observation_columns, elastic_net_parameter=0.0, fit_intercept=True, max_iterations=100, reg_param=0.0, standardization=True, convergence_tolerance=1e-06)

Creates a LinearRegressionModel by training on the given frame

Parameters:

frame

(Frame):

A frame to train the model on

value_column

(str):

Column name containing the value for each observation.

observation_columns

(List[str]):

List of column(s) containing the observations.

elastic_net_parameter

(double):

Parameter for the ElasticNet mixing. Default is 0.0

fit_intercept

(bool):

Parameter for whether to fit an intercept term. Default is true

max_iterations

(int):

Parameter for maximum number of iterations. Default is 100

reg_param

(double):

Parameter for regularization. Default is 0.0

standardization

(bool):

Parameter for whether to standardize the training features before fitting the model. Default is true

convergence_tolerance

(str):

Parameter for the convergence tolerance for iterative algorithms. Default is 1E-6

Returns

(LinearRegressionModel):

A trained linear regression model

Classes

class LinearRegressionModel

Linear Regression Model

Example:

>>> rows = [[0,0],[1, 2.5],[2, 5.0],[3, 7.5],[4, 10],[5, 12.5],[6, 13.0],[7, 17.15], [8, 18.5],[9, 23.5]]
>>> schema = [("x1", float),("y", float)]
>>> frame = tc.frame.create(rows, schema)

Consider the following frame with two columns.

>>> frame.inspect()
[#]  x1  y
==============
[0]   0      0
[1]   1    2.5
[2]   2    5.0
[3]   3    7.5
[4]   4     10
[5]   5   12.5
[6]   6   13.0
[7]   7  17.15
[8]   8   18.5
[9]   9   23.5

>>> model = tc.models.regression.linear_regression.train(frame,'y',['x1'])
[===Job Progress===]

>>> model
explained_variance      = 49.2759280303
intercept               = -0.0327272727273
iterations              = 1
mean_absolute_error     = 0.529939393939
mean_squared_error      = 0.630096969697
objective_history       = [0.0]
observation_columns     = [u'x1']
r2                      = 0.987374330661
root_mean_squared_error = 0.793786476136
value_column            = y
weights                 = [2.4439393939393925]

>>> linear_regression_test_return = model.test(frame, 'y')
[===Job Progress===]

>>> linear_regression_test_return
explained_variance      = 49.2759280303
mean_absolute_error     = 0.529939393939
mean_squared_error      = 0.630096969697
r2                      = 0.987374330661
root_mean_squared_error = 0.793786476136

>>> predicted_frame = model.predict(frame, ["x1"])
[===Job Progress===]

>>> predicted_frame.inspect()
[#]  x1   y      predicted_value
=================================
[0]  0.0    0.0  -0.0327272727273
[1]  1.0    2.5     2.41121212121
[2]  2.0    5.0     4.85515151515
[3]  3.0    7.5     7.29909090909
[4]  4.0   10.0     9.74303030303
[5]  5.0   12.5      12.186969697
[6]  6.0   13.0     14.6309090909
[7]  7.0  17.15     17.0748484848
[8]  8.0   18.5     19.5187878788
[9]  9.0   23.5     21.9627272727

>>> model.save("sandbox/linear_regression_model")

>>> restored = tc.load("sandbox/linear_regression_model")

>>> restored.value_column == model.value_column
True

>>> restored.intercept == model.intercept
True

>>> set(restored.observation_columns) == set(model.observation_columns)
True

>>> restored.test(frame, 'y').r2
0.987374330660537

The trained model can also be exported to a .mar file, to be used with the scoring engine:

>>> canonical_path = model.export_to_mar("sandbox/linearRegressionModel.mar")

Ancestors (in MRO)

LinearRegressionModel
sparktk.propobj.PropertiesObject
__builtin__.object

Instance variables

var explained_variance

The explained variance regression score

var intercept

The intercept of the trained model

var iterations

The number of training iterations until termination

var mean_absolute_error

The risk function corresponding to the expected value of the absolute error loss or l1-norm loss

var mean_squared_error

The risk function corresponding to the expected value of the squared error loss or quadratic loss

var objective_history

Objective function(scaled loss + regularization) at each iteration

var observation_columns

List of column(s) containing the observations.

var r2

The coefficient of determination of the trained model

var root_mean_squared_error

The square root of the mean squared error

var value_column

Column name containing the value for each observation.

var weights

Weights of the trained model

Methods

def __init__(

self, tc, scala_model)

def export_to_mar(

self, path)

Exports the trained model as a model archive (.mar) to the specified path.

Parameters:

path

(str):

Path to save the trained model

Returns

(str):

Full path to the saved .mar file

def predict(

self, frame, observation_columns)

Predict values for a frame using a trained Linear Regression model

Parameters:

frame

(Frame):

The frame to predict on

observation_columns:

Optional(List[str]) List of column(s) containing the observations

Returns

(Frame):

returns frame with predicted column added

def save(

self, path)

Saves the model to given path

Parameters:

path

(str):

path to save

def test(

self, frame, value_column, observation_columns=None)

Test the frame given the trained model

Parameters:

frame

(Frame):

The frame to predict on

value_column

(String):

Column name containing the value for each observation

observation_columns:

Optional(List[str]) List of column(s) containing the observations

Returns

(LinearRegressionTestMetrics):

LinearRegressionTestMetrics object consisting of results from model test

def to_dict(

self)

def to_json(

self)

Index

Functions

Classes

Functions

Classes

Ancestors (in MRO)

Instance variables

Methods