Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.10 - Check here for latest version

Apply Feature Set (Model Simulator)

Synopsis

This operator creates performs a fully automated feature engineering process which covers feature selection and feature generation.

Description

A feature set object describes a set of features which should be part of an example set or expressions how to generate new features. It is the output of automatic feature selection and generation methods, mainly of the operator {@link AutomaticFeatureEngineeringOperator}.

A feature set can be applied to new example sets with help of this operator. It will ensure that the resulting example set has the same structure as the delivered feature set.

The three main ways to use this object are the following: a feature set simply describes the complete input example set. In this case, all features will be used and no features will be generated. The feature set can also describe a subset of the input features in which case the other features will be dropped. And finally, a feature set can also contain expressions of how to created new features (in addition or instead of simple feature subsets).

Input

  • example set (Data Table)

    This input port expects a data set which should be transformed with the specified feature set.

  • feature set

    The feature set which described the desired transformations, i.e. selections and generation expressions.

Output

  • example set (Data Table)

    The resulting data with the feature set descriptions applied to it.

  • feature set

    The feature set which was given as input.

Parameters

  • handle missings Indicates if missing and infinite values should be replaced by the average / mode of known values if they appear as a result of feature generation. Range: boolean
  • keep originals Indicates if attributes in the data which are not part of the feature set should be still kept. Range: boolean
  • originals special role Indicates if original attributes which are kept should get a special role instead of regular so that they are not used my machine learning operators. Range: boolean
  • recreate missing attributes Indicates if columns which are missing in the given data should be recreated in the output. This way the resulting data is compatible with the one used to create this feature set which is often useful in scoring situations. Range: boolean

Tutorial Processes

Finding feature sets and apply them

This process creates an optimal feature set which is then applied to the complete training data to build the final model. The same feature set is also applied on an independent validation set before the prediction model is applied.