sparktk.models.classification.naive_bayes API documentation

def load(

path, tc=<class 'sparktk.arguments.implicit'>)

load NaiveBayesModel from given path

def train(

frame, label_column, observation_columns, lambda_parameter=1.0)

Creates a Naive Bayes by training on the given frame

frame

(Frame):

frame of training data

label_column

(str):

Column containing the label for each observation

observation_columns

(List[str]):

Column(s) containing the observations

lambda_parameter

(float):

Additive smoothing parameter Default is 1.0

Returns

(NaiveBayesModel):

Trained Naive Bayes model

class NaiveBayesModel

A trained Naive Bayes model

Example:

>>> frame = tc.frame.create([[1,19.8446136104,2.2985856384],
...                          [1,16.8973559126,2.6933495054],
...                          [1,5.5548729596, 2.7777687995],
...                          [0,46.1810010826,3.1611961917],
...                          [0,44.3117586448,3.3458963222],
...                          [0,34.6334526911,3.6429838715]],
...                          [('Class', int), ('Dim_1', float), ('Dim_2', float)])

>>> model = tc.models.classification.naive_bayes.train(frame, 'Class', ['Dim_1', 'Dim_2'], 0.9)

>>> model.label_column
u'Class'

>>> model.observation_columns
[u'Dim_1', u'Dim_2']

>>> model.lambda_parameter
0.9

>>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2'])

>>> predicted_frame.inspect()
[#]  Class  Dim_1          Dim_2         predicted_class
========================================================
[0]      1  19.8446136104  2.2985856384              0.0
[1]      1  16.8973559126  2.6933495054              1.0
[2]      1   5.5548729596  2.7777687995              1.0
[3]      0  46.1810010826  3.1611961917              0.0
[4]      0  44.3117586448  3.3458963222              0.0
[5]      0  34.6334526911  3.6429838715              0.0

>>> model.save("sandbox/naivebayes")

>>> restored = tc.load("sandbox/naivebayes")

>>> restored.label_column == model.label_column
True

>>> restored.lambda_parameter == model.lambda_parameter
True

>>> set(restored.observation_columns) == set(model.observation_columns)
True

>>> metrics = model.test(frame)

>>> metrics.precision
1.0

>>> predicted_frame2 = restored.predict(frame, ['Dim_1', 'Dim_2'])

>>> predicted_frame2.inspect()
[#]  Class  Dim_1          Dim_2         predicted_class
========================================================
[0]      1  19.8446136104  2.2985856384              0.0
[1]      1  16.8973559126  2.6933495054              1.0
[2]      1   5.5548729596  2.7777687995              1.0
[3]      0  46.1810010826  3.1611961917              0.0
[4]      0  44.3117586448  3.3458963222              0.0
[5]      0  34.6334526911  3.6429838715              0.0


>>> canonical_path = model.export_to_mar("sandbox/naivebayes.mar")

Ancestors (in MRO)

NaiveBayesModel
sparktk.propobj.PropertiesObject
__builtin__.object

Instance variables

var label_column

var lambda_parameter

var observation_columns

Methods

def __init__(

self, tc, scala_model)

def export_to_mar(

self, path)

Exports the trained model as a model archive (.mar) to the specified path

Parameters:

path

(str):

Path to save the trained model

Returns

(str):

Full path to the saved .mar file

def predict(

self, frame, columns=None)

Predicts the labels for the observation columns in the given input frame. Creates a new frame with the existing columns and a new predicted column.

Parameters:

frame

(Frame):

Frame used for predicting the values

c

(List[str]):

Names of the observation columns.

Returns

(Frame):

A new frame containing the original frame's columns and a prediction column

def save(

self, path)

def test(

self, frame, columns=None)

def to_dict(

self)

def to_json(

self)

Index

Functions

Classes

sparktk naive_bayes

Functions

Classes

Ancestors (in MRO)

Instance variables

Methods