RapidMiner and Python
On this page, we have collected all the features which unlock the potential of RapidMiner to data loving people who prefer to work with (Python) code on their projects. We will also show all the ways that you can turn this into a team effort, interacting with others on your team who prefer to work using RapidMiner's proven authoring method (using operators and processes).
Call Python from RapidMiner
As a RapidMiner user working on a project, you will often find it useful to call Python code from a RapidMiner process. Going one step further, you can "package" your model training or ETL transformation written in Python as a RapidMiner operator and distribute it to others on your team. Some typical scenarios where this will come in handy:
- you find it easier or more convenient to write a data prep step or a modeling step as Python code
- you want to reuse a piece of Python code that someone on your team has created
- you want to extend RapidMiner with a cutting edge Python library
Call RapidMiner from Python
When working with Python code (possibly a notebook), you may want to get access to data and metadata stored in RapidMiner projects and repositories, and it can be useful to call RapidMiner Studio or RapidMiner AI Hub to run some processes. We provide a Python library which allows you to handle any of these typical scenarios:
- you want to leverage data stored and prepped in a RapidMiner repository or project
- you want to run a process built in RapidMiner and use its output as an input in your code
- you want to access an external data source without the hassle of handling credentials in your code
RapidMiner Notebooks
RapidMiner Notebooks offer a familiar notebook-based code authoring environment as part of RapidMiner AI Hub. It allows code savvy data scientists and data engineers to do their work in a familiar way, while also enabling out-of-the-box collaboration with other team members using RapidMiner for authoring and deployment. Some typical scenarios where RapidMiner Notebooks will come in handy:
- your company has adopted RapidMiner AI Hub and you wish to keep working in a Notebook environment
- you are collaborating with others using a RapidMiner project
- you need to use a dataset stored in a RapidMiner project or repository for your Notebook based project
- you need the output of a RapidMiner process as an input for your Notebook based project
Architecture
This diagram explains the high level components which together implement the integration of Python code authoring and execution in RapidMiner AI Hub. These enable all the above mentioned use-cases with very little or no manual configuration.
Platform Admin provides the possibility to centrally manage coding environments across AI Hub, noted by the dashed arrows.
The rest of the arrows represent the possibility of RapidMiner process execution from Python code using our Python library.
The Python Scripting Extension enables Python code execution in various components of the product (RapidMiner Server for web service-like execution, RapidMiner Job Agents for scheduled and ad-hoc batch execution).