sparktk.frame.ops.timeseries_from_observations module
# vim: set encoding=utf-8
# Copyright (c) 2016 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
def timeseries_from_observations(self, date_time_index, timestamp_column, key_column, value_column):
"""
Returns a frame that has the observations formatted as a time series.
:param date_time_index: List of date/time strings. DateTimeIndex to conform all series to.
:param timestamp_column: The name of the column telling when the observation occurred.
:param key_column: The name of the column that contains which string key the observation belongs to.
:param value_column: The name of the column that contains the observed value.
:return: Frame formatted as a time series (with a column for key and a column for the vector of values).
Uses the specified timestamp, key, and value columns and the date/time index provided to format the observations
as a time series. The time series frame will have columns for the key and a vector of the observed values that
correspond to the date/time index.
Examples
--------
In this example, we will use a frame of observations of resting heart rate for three individuals over three days.
The data is accessed from Frame object called *my_frame*:
>>> my_frame.inspect(my_frame.count())
[#] name date resting_heart_rate
======================================================
[0] Edward 2016-01-01T12:00:00Z 62
[1] Stanley 2016-01-01T12:00:00Z 57
[2] Edward 2016-01-02T12:00:00Z 63
[3] Sarah 2016-01-02T12:00:00Z 64
[4] Stanley 2016-01-02T12:00:00Z 57
[5] Edward 2016-01-03T12:00:00Z 62
[6] Sarah 2016-01-03T12:00:00Z 64
[7] Stanley 2016-01-03T12:00:00Z 56
We then need to create an array that contains the date/time index,
which will be used when creating the time series. Since our data
is for three days, our date/time index will just contain those
three dates:
>>> datetimeindex = ["2016-01-01T12:00:00.000Z","2016-01-02T12:00:00.000Z","2016-01-03T12:00:00.000Z"]
Then we can create our time series frame by specifying our date/time
index along with the name of our timestamp column (in this example, it's
"date"), key column (in this example, it's "name"), and value column (in
this example, it's "resting_heart_rate").
>>> ts = my_frame.timeseries_from_observations(datetimeindex, "date", "name", "resting_heart_rate")
[===Job Progress===]
Take a look at the resulting time series frame schema and contents:
>>> ts.schema
[(u'name', ), (u'resting_heart_rate', vector(3))]
>>> ts.inspect()
[#] name resting_heart_rate
================================
[0] Stanley [57.0, 57.0, 56.0]
[1] Edward [62.0, 63.0, 62.0]
[2] Sarah [None, 64.0, 64.0]
"""
if not isinstance(date_time_index, list):
raise TypeError("date_time_index should be a list of date/times")
scala_date_list = self._tc.jutils.convert.to_scala_date_time_list(date_time_index)
from sparktk.frame.frame import Frame
return Frame(self._tc,
self._scala.timeSeriesFromObseravations(scala_date_list, timestamp_column, key_column, value_column))
Functions
def timeseries_from_observations(
self, date_time_index, timestamp_column, key_column, value_column)
Returns a frame that has the observations formatted as a time series.
date_time_index: | List of date/time strings. DateTimeIndex to conform all series to. |
timestamp_column: | The name of the column telling when the observation occurred. |
key_column: | The name of the column that contains which string key the observation belongs to. |
value_column: | The name of the column that contains the observed value. |
Returns: | Frame formatted as a time series (with a column for key and a column for the vector of values). |
Uses the specified timestamp, key, and value columns and the date/time index provided to format the observations as a time series. The time series frame will have columns for the key and a vector of the observed values that correspond to the date/time index.
In this example, we will use a frame of observations of resting heart rate for three individuals over three days. The data is accessed from Frame object called my_frame:
>>> my_frame.inspect(my_frame.count())
[#] name date resting_heart_rate
======================================================
[0] Edward 2016-01-01T12:00:00Z 62
[1] Stanley 2016-01-01T12:00:00Z 57
[2] Edward 2016-01-02T12:00:00Z 63
[3] Sarah 2016-01-02T12:00:00Z 64
[4] Stanley 2016-01-02T12:00:00Z 57
[5] Edward 2016-01-03T12:00:00Z 62
[6] Sarah 2016-01-03T12:00:00Z 64
[7] Stanley 2016-01-03T12:00:00Z 56
We then need to create an array that contains the date/time index, which will be used when creating the time series. Since our data is for three days, our date/time index will just contain those three dates:
>>> datetimeindex = ["2016-01-01T12:00:00.000Z","2016-01-02T12:00:00.000Z","2016-01-03T12:00:00.000Z"]
Then we can create our time series frame by specifying our date/time index along with the name of our timestamp column (in this example, it's "date"), key column (in this example, it's "name"), and value column (in this example, it's "resting_heart_rate").
>>> ts = my_frame.timeseries_from_observations(datetimeindex, "date", "name", "resting_heart_rate")
[===Job Progress===]
Take a look at the resulting time series frame schema and contents:
>>> ts.schema
[(u'name', <type 'unicode'>), (u'resting_heart_rate', vector(3))]
>>> ts.inspect()
[#] name resting_heart_rate
================================
[0] Stanley [57.0, 57.0, 56.0]
[1] Edward [62.0, 63.0, 62.0]
[2] Sarah [None, 64.0, 64.0]
def timeseries_from_observations(self, date_time_index, timestamp_column, key_column, value_column):
"""
Returns a frame that has the observations formatted as a time series.
:param date_time_index: List of date/time strings. DateTimeIndex to conform all series to.
:param timestamp_column: The name of the column telling when the observation occurred.
:param key_column: The name of the column that contains which string key the observation belongs to.
:param value_column: The name of the column that contains the observed value.
:return: Frame formatted as a time series (with a column for key and a column for the vector of values).
Uses the specified timestamp, key, and value columns and the date/time index provided to format the observations
as a time series. The time series frame will have columns for the key and a vector of the observed values that
correspond to the date/time index.
Examples
--------
In this example, we will use a frame of observations of resting heart rate for three individuals over three days.
The data is accessed from Frame object called *my_frame*:
>>> my_frame.inspect(my_frame.count())
[#] name date resting_heart_rate
======================================================
[0] Edward 2016-01-01T12:00:00Z 62
[1] Stanley 2016-01-01T12:00:00Z 57
[2] Edward 2016-01-02T12:00:00Z 63
[3] Sarah 2016-01-02T12:00:00Z 64
[4] Stanley 2016-01-02T12:00:00Z 57
[5] Edward 2016-01-03T12:00:00Z 62
[6] Sarah 2016-01-03T12:00:00Z 64
[7] Stanley 2016-01-03T12:00:00Z 56
We then need to create an array that contains the date/time index,
which will be used when creating the time series. Since our data
is for three days, our date/time index will just contain those
three dates:
>>> datetimeindex = ["2016-01-01T12:00:00.000Z","2016-01-02T12:00:00.000Z","2016-01-03T12:00:00.000Z"]
Then we can create our time series frame by specifying our date/time
index along with the name of our timestamp column (in this example, it's
"date"), key column (in this example, it's "name"), and value column (in
this example, it's "resting_heart_rate").
>>> ts = my_frame.timeseries_from_observations(datetimeindex, "date", "name", "resting_heart_rate")
[===Job Progress===]
Take a look at the resulting time series frame schema and contents:
>>> ts.schema
[(u'name', ), (u'resting_heart_rate', vector(3))]
>>> ts.inspect()
[#] name resting_heart_rate
================================
[0] Stanley [57.0, 57.0, 56.0]
[1] Edward [62.0, 63.0, 62.0]
[2] Sarah [None, 64.0, 64.0]
"""
if not isinstance(date_time_index, list):
raise TypeError("date_time_index should be a list of date/times")
scala_date_list = self._tc.jutils.convert.to_scala_date_time_list(date_time_index)
from sparktk.frame.frame import Frame
return Frame(self._tc,
self._scala.timeSeriesFromObseravations(scala_date_list, timestamp_column, key_column, value_column))