What's New in RapidMiner Studio 9.8.0?
Released: October 14th, 2020
The following sections describe what's new in RapidMiner Studio 9.8.0:
New Features
- Utilize AI Hub 9.8 support for large files in Projects. Files with more than 10MB and stored ExampleSets are automatically handled to be versioned as expected, but stored more efficiently. This is backed by Git LFS, which means Python or R coders can continue to easily work with these projects as long as they have the Git LFS extension installed.
- Time Series Windowing Update:
- Added time based (window parameters are specified in time units) and custom windowing (start and stop values of the windows are provided by an additional example set) for all windowing operators (Windowing, Process Windows, Forecast Validation, Sliding Window Validation)
- Added a few more parameters: expert settings (couples a few expert parameters into not shown, if it is not selected), windows defined (specifies from which point windows are defined), empty window handling
- Changed the computation of the final model for the Forecast Validation and Sliding Window Validation operators to compute the model on a final window with the same size as the training windows and which ends at the last example of the input series
- Time Series: Added new aggregation methods (median, maximum, minimum, standard deviation, variance) to Moving Average Filter
- Cloud Connectivity
- Added connectivity to Azure Data Lake Storage Gen2:
- Read Azure Data Lake Storage Gen2
- Loop Azure Data Lake Storage Gen2
- Write Azure Data Lake Storage Gen2
- Added connectivity to Azure Data Lake Storage Gen2:
Enhancements
- H2O
- New operator: K-Means (H2O), which implements K-Means clustering using the bundled H2O library. Key features include:
- Estimate the optimal value of k, when a good initial guess is not available from the user
- Built-in standardization and nominal encoding
- Quick and memory efficient execution
- Note: estimate k is strongly preferential to low k values. Make sure to double check results and if they are in line with expectations.
- New operator: K-Means (H2O), which implements K-Means clustering using the bundled H2O library. Key features include:
- Newly created repositories and projects are now by default stored in the current users "Documents" folder. The location continues to be customizable on repository / project creation
- When opening a process or RapidMiner file using "Open with..." RapidMiner Studio, the process will be loaded from the repository registered for the path. Process files that are not stored in a repository will be imported just like the menu item "Import Process" would
- IOObject collections are now stored in a new, zip-based file format, ending with .collection
- Incorporated a new library to better make use of system proxy settings if "system" is selected in the preferences, especially w.r.t. Windows and WPAD/PAC files. This will drastically improve the experience in complex corporate network setups
- HTML5 safe mode is now way more performant
- Upgraded Chromium binaries to version 79
- Improved error message for remote repository creation (central AI Hub repository and projects) when the authentication is mismatched (user/password vs SSO)
- Added Settings option to optimize internal file browser for mapped network drives
- Time Series: Moved Moving Average Filter into the Transformation operator group and removed the obsolete Filter operator group
- Time Series: Reordered the output ports of the Multi Label Performance and Multi Horizon Performance operators
Bugfixes
- Fixed wrong metadata after renaming in the new repositories and then creating a new entry with the previous name
- Fixed rare issues that could cause problems when trying to view Visualizations on certain machines
- Fixed Mixed Euclidean Distance for nominal values and Nominal Distance
- A JNA library on the Windows PATH no longer results in an error
- Fixed issue that could cause charts in the Deployments view to not show up.
- Fixed problem that caused the legacy smtp password setting in the Preferences dialog to become broken when the dialog was saved more than once after changing the value. Note that this setting is not recommended anymore, use the new Send Mail connection instead.
- Fixed a similar problem with the legacy connection UI encrypting passwords and tokens multiple times
- Auto Model Results calculated on AI Hub can now be opened via Results view after the folder with all results has been moved/copied
- Upgraded bundled JRE to 8u265
- Deployments keep working now after the Server repository has been renamed
- Fixed a problem where unsigned extensions could not make use of the new connection objects inside operators
- Fixed potential IllegalArgumentException in Google Storage operators when running on Server
- ExampleSets with huge nominal values can be retrieved again from the repository
- Time Series: Fixed a bug in Equalize Time Stamps which caused an infinite loop in some cases when the calendar time was set to 'domain' and the input data consists of already partwise equidistant time stamps
Development
Modularization
RapidMiner Studio has been modularized! Well, to be fair, that's a bit of an overstatement right here, but we laid the foundation for developing future features in modules by moving some basics that are used absolutely everywhere into modules. The noteworthy thing about modules is, that they can be referenced as a library without the entire Studio Core, and that they do not have a dependency back to the Studio Core themselves. This has obviously resulted in a change of the project structure, where next to rapidminer-studio-core you now have an open-source folder with sub-folders for each module. We have NOT changed package structures for this change, so your extensions should work just as they did before, with no change required on your side. Caveat: We changed some deep internal APIs, so if you were very naughty and used internal APIs, there actually is a chance your project breaks with RapidMiner Studio 9.8, but 99% of projects should be perfectly fine.
As previously mentioned, most of the modules that were created with 9.8 are not every exciting, they just pave the way for us to develop more modular in the future (for which these modules are extremely helpful). However, there are two modules that might be of interest straight away:
- rapidminer-studio-encryption: An encryption library based on Google Tink with up-to-date and trivial to use symmetric and streaming encryption, featuring algorithms like Authenticated Encryption with Associated Data via AES256-GCM. Get started by looking at com.rapidminer.tools.encryption.EncryptionProvider
- rapidminer-studio-globalsearch: An indexing and querying library based on Apache Lucene which can be used to make anything in your project searchable. It is used in Studio to power the Global Search in the top-right corner. Get started by looking at com.rapidminer.search.GlobalSearchRegistry as well as com.rapidminer.search.GlobalSearchIndexer
- rapidminer-studio-settings: A very simple component which offers settings, either globally or for given contexts. They are simple String/String key-value pairs that can be changed at any time. It includes the ability to have protected settings that cannot be changed without special permissions. It also features listeners, so you can react to changes of those settings at runtime. While not very exciting in itself, this module is used by all other modules for customizable settings that each module offers. You can change settings by using the com.rapidminer.settings.Settings class.
New Data Core
Our new data core based on the Belt project is finally here! Forget about the ExampleSet class and start using the Table class, RapidMiner's new representation of example sets. Click here to learn everything you need to know about the new framework and to get started writing operators using Belt today. For a short time the new framework will stay in a beta phase, but we encourage you to start using it as soon as possible since the old API will be deprecated in the near future.
Meta Data
- Deprecated string/object values for generic MetaData. Please use Annotations or create subclasses.
- Deprecated Annotations for attributes and attribute meta data