Up

sparktk max

MAx (Moving Average with Exogeneous Variables) Model

Functions

def load(

path, tc=<class 'sparktk.arguments.implicit'>)

load MaxModel from given path

def train(

frame, ts_column, x_columns, q, x_max_lag, include_original_x=True, include_intercept=True, init_params=None)

Creates Moving Average with Explanatory Variables (MAX) Model from the specified time series values.

Given a time series, fits Moving Average with Explanatory Variables (MAX) model. Q represents the moving average error terms, x_max_lag represents the maximum lag order for exogenous variables. If include_original_x is true, the model is fitted with an original exogenous variables. If includeIntercept is true, the model is fitted with an intercept.

Parameters:
frame(Frame):Frame used for training.
ts_column(str):Name of the column that contains the time series values.
x_columns(List(str)):Names of the column(s) that contain the values of exogenous regressors.
q(int):Moving average order
x_max_lag(int):The maximum lag order for exogenous variables.
include_original_x(Optional(boolean)):If True, the model is fit with an original exogenous variables (intercept for exogenous variables). Default is True.
include_intercept(Optional(boolean)):If True, the model is fit with an intercept. Default is True.
init_params(Optional(List[float]):A set of user provided initial parameters for optimization. If the list is empty (default), initialized using Hannan-Rissanen algorithm. If provided, order of parameter should be: intercept term (mostly 0), MA parameters (in increasing order of lag) and paramteres for exogenous variables (in increasing order of lag).

Returns(MaxModel): Trained MAX model

Classes

class MaxModel

A trained MAX model.

Example:

Data from Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Consider the following model trained and tested on the sample data set in frame 'frame'. The frame has five columns where "CO_GT" is the time series value and "C6H6_GT", "PT08_S2_NMHC" and "T" are exogenous inputs.

CO_GT - True hourly averaged concentration CO in mg/m^3 C6H6_GT - True hourly averaged Benzene concentration in microg/m^3 PT08_S2_NMHC - Titania hourly averaged sensor response (nominally NMHC targeted) T - Temperature in C

>>> frame.inspect()
[#]  CO_GT  C6H6_GT  PT08_S2_NMHC  T
=======================================
[0]    2.6     11.9        1046.0  13.6
[1]    2.0      9.4         955.0  13.3
[2]    2.2      9.0         939.0  11.9
[3]    2.2      9.2         948.0  11.0
[4]    1.6      6.5         836.0  11.2
[5]    1.2      4.7         750.0  11.2
[6]    1.2      3.6         690.0  11.3
[7]    1.0      3.3         672.0  10.7
[8]    2.9      2.3         609.0  10.7
[9]    2.6      1.7         561.0  10.3


>>> model = tc.models.timeseries.max.train(frame, "CO_GT", ["C6H6_GT", "PT08_S2_NMHC", "T"], 2, 1)
[===Job Progress===]

>>> model.ma
[0.5777638449118448, -0.06530007715221572]

>>> model.xreg
[-0.021849032465107086, 0.0009772982251014968, 0.028419655845061332, 1.329220909234935, -0.026697271514035982, -0.099174926201381]

In this example, we will call predict using the same frame that was used for training, again specifying the name of the time series column and the names of the columns that contain exogenous regressors.

>>> predicted_frame = model.predict(frame, "CO_GT", ["C6H6_GT", "PT08_S2_NMHC", "T"])
[===Job Progress===]

The predicted_frame that's return has a new column called predicted_y. This column contains the predicted time series values.

>>> predicted_frame.column_names
[u'CO_GT', u'C6H6_GT', u'PT08_S2_NMHC', u'T', u'predicted_y']

>>> predicted_frame.inspect(columns=["CO_GT","predicted_y"])
[#]  CO_GT  predicted_y
=========================
[0]    2.6  2.61411194003
[1]    2.0  2.41046087551
[2]    2.2  2.39156069744
[3]    2.2  2.36598300718
[4]    1.6  2.37166693834
[5]    1.2  2.37166693834
[6]    1.2  2.37450890393
[7]    1.0  2.35745711042
[8]    2.9  2.35745711042
[9]    2.6  2.34608924808

The trained model can be saved to be used later:

>>> model_path = "sandbox/savedMaxModel"

>>> model.save(model_path)

The saved model can be loaded through the tk context and then used for forecasting values the same way that the original model was used.

>>> loaded_model = tc.load(model_path)

>>> predicted_frame = loaded_model.predict(frame, "CO_GT", ["C6H6_GT", "PT08_S2_NMHC", "T"])

>>> predicted_frame.inspect(columns=["CO_GT","predicted_y"])
[#]  CO_GT  predicted_y
=========================
[0]    2.6  2.61411194003
[1]    2.0  2.41046087551
[2]    2.2  2.39156069744
[3]    2.2  2.36598300718
[4]    1.6  2.37166693834
[5]    1.2  2.37166693834
[6]    1.2  2.37450890393
[7]    1.0  2.35745711042
[8]    2.9  2.35745711042
[9]    2.6  2.34608924808

The trained model can also be exported to a .mar file, to be used with the scoring engine:

>>> canonical_path = model.export_to_mar("sandbox/max.mar")

Ancestors (in MRO)

  • MaxModel
  • sparktk.propobj.PropertiesObject
  • __builtin__.object

Instance variables

var ar

Coefficient values from the trained model (AR with increasing degrees).

var c

Intercept

var includeIntercept

A boolean flag indicating if the intercept should be included.

var includeOriginalXreg

A boolean flag indicating if the non-lagged exogenous variables should be included.

var init_params

A set of user provided initial parameters for optimization

var ma

Coefficient values from the trained model (MA with increasing degrees).

var q

Moving average order

var x_max_lag

The maximum lag order for exogenous variables.

var xreg

Coefficient values from the trained model fox exogenous variables with increasing degrees.

Methods

def __init__(

self, tc, scala_model)

def export_to_mar(

self, path)

Exports the trained model as a model archive (.mar) to the specified path.

Parameters:
path(str):Path to save the trained model

Returns(str): Full path to the saved .mar file

def predict(

self, frame, ts_column, x_columns)

New frame with column of predicted y values

Predict the time series values for a test frame, based on the specified x values. Creates a new frame revision with the existing columns and a new predicted_y column.

Parameters:
frame(Frame):Frame used for predicting the ts values
ts_column(str):Name of the time series column
x_columns(List[str]):Names of the column(s) that contain the values of the exogenous inputs.

Returns(Frame): A new frame containing the original frame's columns and a column *predictied_y*

def save(

self, path)

Save the trained model to the specified path

Parameters:

:param path: Path to save

def to_dict(

self)

def to_json(

self)