You are viewing the RapidMiner Studio documentation for version 9.1 - Check here for latest version
Execute Python (Python Scripting)
Synopsis
Executes a Python script.Description
Before using this operator you need to specify the path to your Python installation under Tools -> Preferences -> Python Scripting. Your Python installation must include the pandas module since example sets get converted to pandas.DataFrames.
This operator executes either the script provided through the script file port or parameter or the script specified in the script parameter. The arguments of the script correspond to the input ports, where example sets are converted to pandas.DataFrames. Analogously, the values returned by the script are delivered at the output ports of the operator, where pandas.DataFrames are converted to example sets.
The console output of Python is shown in the Log View (View -> Show View -> Log).
Input
- script file (File)
A file containing a python script to be executed. The file has to comply with the script parameter rules. This port is optional, a file can also be provided through the script file parameter.
- input
The Script operator can have multiple inputs. An input must be either an example set, a file object or a Python object which was generated by an 'Execute Python' operator.
Output
- output
The Script operator can have multiple outputs. An output can be either an example set, a file object or a Python object generated by this operator.
Parameters
- script
The Python script to execute. Define a method with name 'rm_main' with as many arguments as connected input ports or alternatively a *args argument to use a dynamic number of attributes. The return values of the method 'rm_main' are delivered to the connected output ports. If the method returns a tuple then the single entries of the tuple are delivered to the output ports. Entries from the data type 'pandas.DataFrames' are converted to example sets; files are converted to File Objects, other Python objects are serialized and can be used by other 'Execute Python' operators or stored in the repository. Serialized Python objects have to be smaller than 2 GB.
If you pass an example set to your script through an input port, the meta data of the example set (types and roles) is available in the script. You can access it by reading the attribute rm_metadata of the associated pandas.DataFrame, in our example data. data.rm_metadata is a dictionary from attribute names to a tuple of attribute type and attribute role.
You can influence the meta data of an example set that you return as a pandas.DataFrame by setting the attribute rm_metadata. If you don't specify attribute types in this dictionary, they will be determined using the data types in Python. You can specify your own roles or use the standard roles of RapidMiner like 'label'.
For more information about the meta data handling in a Python operator check the tutorial process 'Meta data handling' below.
If a script file is provided either through the script file port or parameter (port takes precedence), that script will be used instead of the value of this parameter.
Range: text - script_file A file containing a python script to be executed. The file has to comply with the script parameter rules. This parameter is optional. Range: filename
Tutorial Processes
Clustering using Python
Random data is generated and then fed to the Python script. The script clusters the data in Python using as many clusters as are specified in the macro. The resulting ExampleSet contains the cluster in the 'cluster' attribute.
Building a model and applying it using Python
This tutorial process uses the 'Execute Python' operators to first build a decision tree model using the 'Deals' data and then applying it to the 'Deals Testset' data. Before using the data, it the nominal values are converted to unique integers. The first Python scripting operator 'build model' builds the model and delivers it to its output port. The second Python scripting operator 'apply model' applies this model to the testset, adding a column called prediction. After specifying the 'label' and 'prediction' columns with 'Set Role', the result can be viewed.
Creating a plot using Python and storing it in your repository
This tutorial process uses the 'Execute Python' operator to first fetch example data, then create a plot and return both to the output ports. Please store the process in your repository. The data are shown as example set and the plot is stored in the repository as image.
Reading an example set from a file using Python
This tutorial process uses the 'Execute Python' operator to save example data in a csv file. The second 'Execute Python' operator receives this file, reads the data and returns a part of the data to the output port. The result is an example set.
Meta data handling
This tutorial process shows how to access the meta data of incoming example sets inside a 'Execute Python' operator. It also explains how to set the meta data for the outcoming example sets.