sparktk arimax
ARIMAX (Autoregressive Integrated Moving Average with Exogeneous Variables) Model
Functions
def load(
path, tc=<class 'sparktk.arguments.implicit'>)
load ARIMAXModel from given path
def train(
frame, ts_column, x_columns, p, d, q, x_max_lag, include_original_x=True, include_intercept=True, init_params=None)
Creates Autoregressive Integrated Moving Average with Explanatory Variables (ARIMAX) Model from the specified time series values.
Given a time series, fits an non-seasonal Autoregressive Integrated Moving Average with Explanatory Variables (ARIMAX) model of order (p, d, q) where p represents the autoregression terms, d represents the order of differencing and q represents the moving average error terms. X_max_lag represents the maximum lag order for exogenous variables. If include_original_x is true, the model is fitted with an original exogenous variables. If includeIntercept is true, the model is fitted with an intercept.
frame | (Frame): | Frame used for training. |
ts_column | (str): | Name of the column that contains the time series values. |
x_columns | (List(str)): | Names of the column(s) that contain the values of exogenous regressors. |
p | (int): | Autoregressive order |
d | (int): | Differencing order |
q | (int): | Moving average order |
x_max_lag | (int): | The maximum lag order for exogenous variables. |
include_original_x | (Optional(boolean)): | If True, the model is fit with an original exogenous variables (intercept for exogenous variables). Default is True. |
include_intercept | (Optional(boolean)): | If True, the model is fit with an intercept. Default is True. |
init_params | (Optional(List[float]): | A set of user provided initial parameters for optimization. If the list is empty (default), initialized using Hannan-Rissanen algorithm. If provided, order of parameter should be: intercept term, AR parameters (in increasing order of lag), MA parameters (in increasing order of lag) and paramteres for exogenous variables (in increasing order of lag). |
Returns | (ArimaxModel): | Trained ARIMAX model |
Classes
class ArimaxModel
A trained ARIMAX model.
Data from Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Consider the following model trained and tested on the sample data set in frame 'frame'. The frame has five columns where "CO_GT" is the time series value and "C6H6_GT", "PT08_S2_NMHC" and "T" are exogenous inputs.
CO_GT - True hourly averaged concentration CO in mg/m^3 C6H6_GT - True hourly averaged Benzene concentration in microg/m^3 PT08_S2_NMHC - Titania hourly averaged sensor response (nominally NMHC targeted) T - Temperature in C
>>> frame.inspect()
[#] CO_GT C6H6_GT PT08_S2_NMHC T
=======================================
[0] 2.6 11.9 1046.0 13.6
[1] 2.0 9.4 955.0 13.3
[2] 2.2 9.0 939.0 11.9
[3] 2.2 9.2 948.0 11.0
[4] 1.6 6.5 836.0 11.2
[5] 1.2 4.7 750.0 11.2
[6] 1.2 3.6 690.0 11.3
[7] 1.0 3.3 672.0 10.7
[8] 2.9 2.3 609.0 10.7
[9] 2.6 1.7 561.0 10.3
>>> model = tc.models.timeseries.arimax.train(frame, "CO_GT", ["C6H6_GT", "PT08_S2_NMHC", "T"], 1, 1, 1, 1, True, False)
[===Job Progress===]
>>> model.c
0.24886373113659435
>>> model.ar
[-0.8612398115782316]
>>> model.ma
[-0.45556700539598505]
>>> model.xreg
[0.09496697769170012, -0.00043805552312166737, 0.0006888829627820128, 0.8523170824191132, -0.017901092786057428, 0.017936687425751337]
In this example, we will call predict using the same frame that was used for training, again specifying the name of the time series column and the names of the columns that contain exogenous regressors.
>>> predicted_frame = model.predict(frame, "CO_GT", ["C6H6_GT", "PT08_S2_NMHC", "T"])
[===Job Progress===]
The predicted_frame that's return has a new column called predicted_y. This column contains the predicted time series values.
>>> predicted_frame.column_names
[u'CO_GT', u'C6H6_GT', u'PT08_S2_NMHC', u'T', u'predicted_y']
>>> predicted_frame.inspect(columns=["CO_GT","predicted_y"])
[#] CO_GT predicted_y
=========================
[0] 2.6 2.83896716391
[1] 2.0 2.89056663602
[2] 2.2 2.84550712171
[3] 2.2 2.88445194591
[4] 1.6 2.85091111286
[5] 1.2 2.8798667019
[6] 1.2 2.85451566607
[7] 1.0 2.87634898739
[8] 2.9 2.85726970866
[9] 2.6 2.87356376648
The trained model can be saved to be used later:
>>> model_path = "sandbox/savedArimaxModel"
>>> model.save(model_path)
The saved model can be loaded through the tk context and then used for forecasting values the same way that the original model was used.
>>> loaded_model = tc.load(model_path)
>>> predicted_frame = loaded_model.predict(frame, "CO_GT", ["C6H6_GT", "PT08_S2_NMHC", "T"])
>>> predicted_frame.inspect(columns=["CO_GT","predicted_y"])
[#] CO_GT predicted_y
=========================
[0] 2.6 2.83896716391
[1] 2.0 2.89056663602
[2] 2.2 2.84550712171
[3] 2.2 2.88445194591
[4] 1.6 2.85091111286
[5] 1.2 2.8798667019
[6] 1.2 2.85451566607
[7] 1.0 2.87634898739
[8] 2.9 2.85726970866
[9] 2.6 2.87356376648
The trained model can also be exported to a .mar file, to be used with the scoring engine:
>>> canonical_path = model.export_to_mar("sandbox/arimax.mar")
Ancestors (in MRO)
- ArimaxModel
- sparktk.propobj.PropertiesObject
- __builtin__.object
Instance variables
var ar
Coefficient values from the trained model (AR with increasing degrees).
var c
Intercept
var d
Differencing order
var includeIntercept
A boolean flag indicating if the intercept should be included.
var includeOriginalXreg
A boolean flag indicating if the non-lagged exogenous variables should be included.
var init_params
A set of user provided initial parameters for optimization
var ma
Coefficient values from the trained model (MA with increasing degrees).
var p
Autoregressive order
var q
Moving average order
var x_max_lag
The maximum lag order for exogenous variables.
var xreg
Coefficient values from the trained model fox exogenous variables with increasing degrees.
Methods
def __init__(
self, tc, scala_model)
def export_to_mar(
self, path)
Exports the trained model as a model archive (.mar) to the specified path.
path | (str): | Path to save the trained model |
:returns (str) Full path to the saved .mar file
def predict(
self, frame, ts_column, x_columns)
New frame with column of predicted y values
Predict the time series values for a test frame, based on the specified x values. Creates a new frame revision with the existing columns and a new predicted_y column.
frame | (Frame): | Frame used for predicting the ts values |
ts_column | (str): | Name of the time series column |
x_columns | (List[str]): | Names of the column(s) that contain the values of the exogenous inputs. |
Returns | (Frame): | A new frame containing the original frame's columns and a column *predictied_y* |
def save(
self, path)
Save the trained model to the specified path
:param path: Path to save
def to_dict(
self)
def to_json(
self)