sparktk svm
Functions
def load(
path, tc=<class 'sparktk.arguments.implicit'>)
load SvmModel from given path
def train(
frame, label_column, observation_columns, intercept=True, num_iterations=100, step_size=1.0, reg_type=None, reg_param=0.01, mini_batch_fraction=1.0)
Creates a Svm Model by training on the given frame
frame | (Frame): | frame of training data |
label_column | (str): | Column containing the label for each observation |
observation_columns | (list(str)): | Column(s) containing the observations |
intercept | (boolean): | Flag indicating if the algorithm adds an intercept. Default is true |
num_iterations | (int): | Number of iterations for SGD. Default is 100 |
step_size | (float): | Initial step size for SGD optimizer for the first step. Default is 1.0 |
reg_type | (Optional(str)): | Regularization "L1" or "L2". Default is "L2" |
reg_param | (float): | Regularization parameter. Default is 0.01 |
mini_batch_fraction | (float): | Set fraction of data to be used for each SGD iteration. Default is 1.0; corresponding to deterministic/classical gradient descent |
Returns | (SvmModel): | The SVM trained model (with SGD) |
Support Vector Machine is a supervised algorithm used to perform binary classification. A Support Vector Machine constructs a high dimensional hyperplane which is said to achieve a good separation when a hyperplane has the largest distance to the nearest training-data point of any class. This model runs the MLLib implementation of SVM with SGD optimizer. The SVM model is initialized, trained on columns of a frame, used to predict the labels of observations in a frame, and tests the predicted labels against the true labels. During testing, labels of the observations are predicted and tested against the true labels using built-in binary Classification Metrics.
Classes
class SvmModel
A trained Svm model
>>> frame = tc.frame.create([[-48.0,1], [-75.0,1], [-63.0,1], [-57.0,1], [73.0,0], [-33.0,1], [100.0,0],
... [-54.0,1], [78.0,0], [48.0,0], [-55.0,1], [23.0,0], [45.0,0], [75.0,0]],
... [("data", float),("label", str)])
>>> frame.inspect()
[#] data label
=================
[0] -48.0 1
[1] -75.0 1
[2] -63.0 1
[3] -57.0 1
[4] 73.0 0
[5] -33.0 1
[6] 100.0 0
[7] -54.0 1
[8] 78.0 0
[9] 48.0 0
>>> model = tc.models.classification.svm.train(frame, 'label', ['data'])
>>> model.label_column
u'label'
>>> model.observation_columns
[u'data']
>>> predicted_frame = model.predict(frame, ['data'])
>>> predicted_frame.inspect()
[#] data label predicted_label
==================================
[0] -48.0 1 1
[1] -75.0 1 1
[2] -63.0 1 1
[3] -57.0 1 1
[4] 73.0 0 0
[5] -33.0 1 1
[6] 100.0 0 0
[7] -54.0 1 1
[8] 78.0 0 0
[9] 48.0 0 0
>>> test_metrics = model.test(predicted_frame)
>>> test_metrics
accuracy = 1.0
confusion_matrix = Predicted_Pos Predicted_Neg
Actual_Pos 7 0
Actual_Neg 0 7
f_measure = 1.0
precision = 1.0
recall = 1.0
>>> model.save("sandbox/svm")
>>> restored = tc.load("sandbox/svm")
>>> restored.label_column == model.label_column
True
>>> restored.intercept == model.intercept
True
>>> set(restored.observation_columns) == set(model.observation_columns)
True
>>> predicted_frame2 = restored.predict(frame)
>>> predicted_frame2.inspect()
[#] data label predicted_label
==================================
[0] -48.0 1 1
[1] -75.0 1 1
[2] -63.0 1 1
[3] -57.0 1 1
[4] 73.0 0 0
[5] -33.0 1 1
[6] 100.0 0 0
[7] -54.0 1 1
[8] 78.0 0 0
[9] 48.0 0 0
>>> canonical_path = model.export_to_mar("sandbox/SVM.mar")
Ancestors (in MRO)
- SvmModel
- sparktk.propobj.PropertiesObject
- __builtin__.object
Instance variables
var intercept
intercept used during model training
var label_column
column containing the label used during model training
var mini_batch_fraction
minimum batch fraction used to train the model
var num_iterations
max number of iterations allowed during model training
var observation_columns
columns containing the observation values used during model training
var reg_param
regularization parameter used to train the model
var reg_type
regularization type used to train the model
var step_size
step size value used to train the model
Methods
def __init__(
self, tc, scala_model)
def export_to_mar(
self, path)
Exports the trained model as a model archive (.mar) to the specified path
path | (str): | Path to save the trained model |
Returns | (str): | Full path to the saved .mar file |
def predict(
self, frame, columns=None)
Predicts the labels for the observation columns in the given input frame. Creates a new frame with the existing columns and a new predicted column.
frame | (Frame): | Frame used for predicting the values |
c | (List[str]): | Names of the observation columns. |
Returns | (Frame): | A new frame containing the original frame's columns and a prediction column |
def save(
self, path)
save the trained model to path
def test(
self, frame, columns=None)
test the frame given the trained model
def to_dict(
self)
def to_json(
self)