sparktk pca
Functions
def load(
path, tc=<class 'sparktk.arguments.implicit'>)
load PcaModel from given path
def train(
frame, columns, mean_centered=True, k=None)
Creates a PcaModel by training on the given frame
| frame | (Frame): | A frame of training data. | 
| columns | (str or list[str]): | Names of columns containing the observations for training. | 
| mean_centered | (bool): | Whether to mean center the columns. | 
| k | (int): | Principal component count. Default is the number of observation columns. | 
| Returns | (PcaModel): | The trained PCA model | 
Classes
class PcaModel
Princiapl Component Analysis Model
>>> frame = tc.frame.create([[2.6,1.7,0.3,1.5,0.8,0.7],
...                          [3.3,1.8,0.4,0.7,0.9,0.8],
...                          [3.5,1.7,0.3,1.7,0.6,0.4],
...                          [3.7,1.0,0.5,1.2,0.6,0.3],
...                          [1.5,1.2,0.5,1.4,0.6,0.4]],
...                         [("1", float), ("2", float), ("3", float), ("4", float), ("5", float), ("6", float)])
-etc-
>>> frame.inspect()
[#]  1    2    3    4    5    6
=================================
[0]  2.6  1.7  0.3  1.5  0.8  0.7
[1]  3.3  1.8  0.4  0.7  0.9  0.8
[2]  3.5  1.7  0.3  1.7  0.6  0.4
[3]  3.7  1.0  0.5  1.2  0.6  0.3
[4]  1.5  1.2  0.5  1.4  0.6  0.4
>>> model = tc.models.dimreduction.pca.train(frame, ['1','2','3','4','5','6'], mean_centered=True, k=4)
>>> model.columns
[u'1', u'2', u'3', u'4', u'5', u'6']
>>> model.column_means
[2.92, 1.48, 0.4, 1.3, 0.7, 0.52]
>>> model.singular_values
[1.804817009663242, 0.8835344148403884, 0.7367461843294286, 0.15234027471064396]
>>> model.right_singular_vectors
[[-0.9906468642089336, 0.11801374544146298, 0.02564701035332026, 0.04852509627553534], [-0.07735139793384983, -0.6023104604841426, 0.6064054412059492, -0.4961696216881456], [0.028850639537397756, 0.07268697636708586, -0.24463936400591005, -0.17103491337994484], [0.10576208410025367, 0.5480329468552814, 0.7523059089872701, 0.2866144016081254], [-0.024072151446194616, -0.30472267167437644, -0.011259366445851784, 0.48934541040601887], [-0.00617295395184184, -0.47414707747028795, 0.0753345822621543, 0.6329307498105843]]
>>> predicted_frame = model.predict(frame, mean_centered=True, t_squared_index=True, columns=['1','2','3','4','5','6'], k=3)
-etc-
>>> predicted_frame.inspect()
[#]  1    2    3    4    5    6    p_1              p_2
===================================================================
[0]  1.5  1.2  0.5  1.4  0.6  0.4    1.44498618058   0.150509319195
[1]  2.6  1.7  0.3  1.5  0.8  0.7   0.314738695012  -0.183753549226
[2]  3.5  1.7  0.3  1.7  0.6  0.4  -0.549024749481   0.235254068619
[3]  3.3  1.8  0.4  0.7  0.9  0.8  -0.471198363594  -0.670419608227
[4]  3.7  1.0  0.5  1.2  0.6  0.3  -0.739501762517   0.468409769639
<BLANKLINE>
[#]  p_3              t_squared_index
=====================================
[0]  -0.163359836968   0.719188122813
[1]   0.312561560113   0.253649649849
[2]   0.465756549839   0.563086507007
[3]  -0.228746130528   0.740327252782
[4]  -0.386212142456   0.723748467549
>>> model.save('sandbox/pca1')
>>> model2 = tc.load('sandbox/pca1')
>>> model2.k
4
>>> predicted_frame2 = model2.predict(frame, mean_centered=True, t_squared_index=True, columns=['1','2','3','4','5','6'], k=3)
>>> predicted_frame2.inspect()
[#]  1    2    3    4    5    6    p_1              p_2
===================================================================
[0]  1.5  1.2  0.5  1.4  0.6  0.4    1.44498618058   0.150509319195
[1]  2.6  1.7  0.3  1.5  0.8  0.7   0.314738695012  -0.183753549226
[2]  3.5  1.7  0.3  1.7  0.6  0.4  -0.549024749481   0.235254068619
[3]  3.3  1.8  0.4  0.7  0.9  0.8  -0.471198363594  -0.670419608227
[4]  3.7  1.0  0.5  1.2  0.6  0.3  -0.739501762517   0.468409769639
<BLANKLINE>
[#]  p_3              t_squared_index
=====================================
[0]  -0.163359836968   0.719188122813
[1]   0.312561560113   0.253649649849
[2]   0.465756549839   0.563086507007
[3]  -0.228746130528   0.740327252782
[4]  -0.386212142456   0.723748467549
>>> canonical_path = model.export_to_mar("sandbox/Kmeans.mar")
Ancestors (in MRO)
- PcaModel
- sparktk.propobj.PropertiesObject
- __builtin__.object
Instance variables
var column_means
var columns
var k
var mean_centered
var right_singular_vectors
var singular_values
Methods
def __init__(
self, tc, scala_model)
def export_to_mar(
self, path)
Exports the trained model as a model archive (.mar) to the specified path
| path | (str): | Path to save the trained model | 
| Returns | (str): | Full path to the saved .mar file | 
def predict(
self, frame, columns=None, mean_centered=None, k=None, t_squared_index=False)
Predicts the labels for the observation columns in the given input frame. Creates a new frame with the existing columns and a new predicted column.
| frame | (Frame): | Frame used for predicting the values | 
| columns | (List[str]): | Names of the observation columns. | 
| mean_centered | (boolean): | whether to mean center the columns. Default is true | 
| k | (int): | the number of principal components to be computed, must be <= the k used in training. Default is the trained k | 
| t_squared_index | (boolean): | whether the t-square index is to be computed. Default is false | 
| Returns | (Frame): | A new frame containing the original frame's columns and a prediction column | 
def save(
self, path)
def to_dict(
self)
def to_json(
self)