Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version

Forecast Validation (Time Series)

Synopsis

This operator performs a validation of a forecast model, which predicts the future values of a time series.

Description

The operator creates sliding windows from the input time series, specified by the time series attribute parameter. In each validation step the training window is provided at the inner training set port of the Training subprocess. Its size is defined by the parameter window size. The training window can be used to train a forecast model (e.g. an ARIMA model, by the ARIMA operator), which has to be provided to the model port of the Training subprocess.

The inner test set port of the Testing subprocess, contains the values of the test window. Its size is defined by the parameter horizon size. The forecast model of the Training subprocess is used to predict these values. For the next validation fold, the training and the test windows are shifted by k values, defined by the parameter step size.

Contrary to the Cross Validation operator the number of values which has to be forecasted by the forecast model has to be equal to the horizon size. Thus, the forecasted values are already added to the ExampleSet provided at the test set port, an additional Apply Forecast operator is not necessary. The attribute holding the test window values has the label role, while the attribute holding the forecasted values has the prediction role. Thus a Performance operator (e.g. Performance (Regression)) can be used to calculate the performance of the forecast.

The described behavior is the default example based windowing. It can be changed to time based windowing or custom windowing by changing the unit parameter. For time based windowing, the windowing parameter are specified in time durations/periods. For the "custom" windowing an additional ExampleSet has to be provided to the new "custom windows" input port. It holds the start (and optional the stop values) of the windows. For more details see the unit parameter and the description of the corresponding parameters.

Expert settings (for example no overlapping windows, the empty window handling, ..) can be enabled by selecting the corresponding expert settings parameter.

If the model port of the Forecast Validation operator is connected, a final window with the same size as the training windows, but ending at the last example of the input series is used to train a final forecast model. This final model is provided at the model output port. It can be directly used by the Apply Forecast operator to predict the future values for the input time series. The operator also deliver all test set ExampleSets, appended to one ExampleSet and the averaged Performance Vector.

This operator works on all time series (numerical, nominal and time series with date time values).

Input

  • example set (Data Table)

    The ExampleSet which contains the time series data as an attribute.

  • custom windows (Data Table)

    The example set which contains the start (and stop) values of the custom windows. Only needs to be connected if the parameter unit is set to custom.

Output

  • model (Model)

    If the model port of the Forecast Validation operator is connected, a final window with the same size as the training windows, but ending at the last example of the input series is used to train a final forecast model, which is delivered at this port. The final forecast model can be directly used by the Apply Forecast operator to predict the future values for the input time series.

  • example set (Data Table)

    The ExampleSet that was given as input is passed through without changes.

  • test result set (Data Table)

    All test set ExampleSets, appended to one ExampleSet.

  • performance (Performance Vector)

    This is an expandable port. You can connect any performance vector (result of a Performance operator) to the result port of the inner Testing subprocess. The performance output port delivers the average of the performances over all folds of the validation

Parameters

  • time_series_attribute

    The time series attribute holding the time series values for which the forecast model shall be build. The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • has_indices

    This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.

    Range:
  • indices_attribute

    If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • sort_time_series

    If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.

    Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.

    Range:
  • expert_settings

    This parameter can be selected to show expert settings for a more detailed configuration of the operator. The expert settings are: windows defined, custom start point, custom end point, date format, no overlapping windows and empty window handling.

    Range:
  • unit

    The mode on how windows are defined. It defines the unit of the window parameters (window size, step size, horizon size and horizon offset).

    • example based: The window parameters are specified in number of examples. This is the default option.
    • time based: The window parameter are specified in time durations/periods (units ranging from milliseconds to years).
    • custom: An additional example set has to be provided to the new "custom windows" input port. It holds the start (and optional the stop values) of the windows.
    Range:
  • windows_defined

    This parameter defines the point from which the windows are defined of. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

    • from start: The first window will start at the first example of the input data set. The following windows are set up according to the window parameters.
    • from end: The last window will end at the last example of the input data set. The previous windows are set up according to the window parameters.
    • custom start: The first window will start at the custom start point provided by the parameter custom start point / custom start time. The following windows are set up according to the window parameters.
    • custom end: The last window will end at the custom end point provided by the parameter custom end point / custom end time. The previous windows are set up according to the window parameters.
    Range:
  • custom_start_point

    If the parameter windows defined is set to custom start and the unit is set to example based, this parameter defines the custom point from which the windows start. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

    Range:
  • custom_end_point

    If the parameter windows defined is set to custom end and the unit is set to example based, this parameter defines the custom point where the windows end. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

    Range:
  • custom_start_time

    If the parameter windows defined is set to custom start and the unit is set to time based, this parameter defines the custom date time point from which the windows start.

    The date time format used to interpret the string provided in this parameter is defined by the parameter date format. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

    Range:
  • custom_end_time

    If the parameter windows defined is set to custom end and the unit is set to time based, this parameter defines the custom date time point where the windows end.

    The date time format used to interpret the string provided in this parameter is defined by the parameter date format. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

    Range:
  • date_format

    Date format used for the custom start time and custom end time parameters. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

    Range:
  • window_size

    The number of values in the training window. The ExampleSet provided at the training set port of the Training subprocess will have window size number of examples. The window size has to be smaller or equal to the length of the time series.

    Range:
  • window_size_time

    The time duration/period of the training window.

    The example set provided at the training set port of the Training subprocess will have all examples which are in the corresponding window.

    The window size time has to be smaller or equal to the time duration of the time series.

    Range:
  • no_overlapping_windows

    If this parameter is set to true, the parameter stepsize is determined automatically, so that all windows and horizons don't overlap. The stepsize is set to window size + horizon size. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

    Range:
  • step_size

    The step size between the first values of two consecutive windows. E.g. with a window size of 10 and a step size of 2, the first window has the values from 0, ..., 9, the second window the values from 2, ..., 11 and so on. If no overlapping windows is set to true the step size is automatically determined depending on window size and horizon size.

    Range:
  • step_size_time

    The step size (in units of time) between the start points of two consecutive windows. E.g. with a window size of 1 week and a step size of 2 days, the first window has the days from 0, ..., 6, the second window the days from 2, ..., 8 and so on. If no overlapping windows is set to true the step size time is automatically determined depending on window size time, horizon size time and horizon offset time.

    Range:
  • horizon_size

    The number of values in the test window. The ExampleSet provided at the test set port of the Testing subprocess will have horizon size number of examples. It will have an attribute holding the original time series values in the test window (attribute name is the name of the time series attribute parameter), and an attribute holding the values in the test window, forecasted by the forecast model from the Training subprocess (attribute name is forecast of <time series attribute>). In addition, the ExampleSet has an attribute with the forecast position, ranging from 1 to horizon size. If the parameter has indices is set to true the ExampleSet has also an attribute holding the last index value of the training window.

    Range:
  • horizon_size_time

    The time duration/period taken in the test window.

    The ExampleSet provided at the test set port of the Testing subprocess will have the examples in the corresponding windows It will have an attribute holding the original time series values in the test window (attribute name is the name of the time series attribute parameter), and an attribute holding the values in the test window, forecasted by the forecast model from the Training subprocess (attribute name is forecast of <time series attribute>). In addition, the ExampleSet has an attribute with the forecast position, ranging from 1 to maximum number of horizon values. If the parameter has indices is set to true the ExampleSet has also an attribute holding the last index value of the training window.

    Range:
  • windows_stop_definition

    Defines if the end of the custom windows are either defined by the start of the next window (windows are spanning over the whole index range) or from an additional attribute.

    • from next window start: The end of the windows are defined by the start of the next window (windows are spanning over the whole index range) Training windows end at the start of the next horizon window (or the next training window, if there aren't horizon windows). Horizon windows end at the start of the next training window. Be aware that the last value of the start definition values (the last value of the horizon start attribute or the last value of the window start attribute, if there aren't horizon windows) is only used as the end of the final window.
    • from attribute: The end of the windows are defined by additional attribute(s) in the custom window example set. The attribute names have to be provided by the parameters window stop attribute and horizon stop attribute.
    Range:
  • window_start_attribute

    This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the start values for the custom training windows.

    The window start attribute, window stop attribute, horizon start attribute and horizon stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.

    Range:
  • window_stop_attribute

    This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the end values for the custom training windows.

    The window start attribute, window stop attribute, horizon start attribute and horizon stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.

    Range:
  • horizon_start_attribute

    This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the start values for the custom horizon windows.

    The window start attribute, window stop attribute, horizon start attribute and horizon stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.

    Range:
  • horizon_stop_attribute

    This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the stop values for the custom horizon windows.

    The window start attribute, window stop attribute, horizon start attribute and horizon stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.

    Range:
  • empty_window_handling

    This parameter defines how empty windows (windows which do not contain an Example) will be handled. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

    • add empty exampleset: Empty windows will be added as an empty ExampleSet, or a row with missing values.
    • skip: Empty windows will be skipped completely in the processing. If horizon windows are created as well and either the training or the horizon window is empty, the processing for both windows is skipped
    • fail: A user error is thrown, if an empty window occurs.
    Range:
  • enable_parallel_execution

    This parameter enables the parallel execution of the inner processes. Please disable the parallel execution if you run into memory problems.

    Range:

Tutorial Processes

Validate the performance of an ARIMA model for Lake Huron

In this process the Forecast Validation operator is used to validate the performance of an ARIMA model for the Lake Huron data set. The ARIMA model is trained on a training window with a size of 20. This model is used to forecast the next 5 ( horizon size ) values of the time series. The forecasted values are compared to the original ones, to calculate the performance of the forecast model.

The step size is set to 5, so the training and test windows are shifted by 5 in each validation fold.

Use time based windowing to train and test on complete months of a daily Sales data set

In this tutorial process a fictive Sales data set with daily entries is created. The Forecast Validation operator with time based windowing is used to perform a training of a Holt-Winter forecast model on three months of the input data and the model is validated on the data of the following months.

Use custom windowing to define your own training and test windows

In this process an ExampleSet holding the fictive dates of fiscal quarters of a company is created. This ExampleSet is used as custom windows for the Forecast Validation operator to define custom training and test windows.

On a fictive Sales data set, the Forecast Validation operator trains an Holt-Winter forecast model on the custom training window and evaluates its performance on the following custom test window.