Categories

Versions

You are viewing the RapidMiner Hub documentation for version 9.10 - Check here for latest version

Project Versioning

AI Hub data storage is backed by integrated Git and Large File System (LFS) servers for keeping all data of your Projects.

The integrated git server is used to keep track of smaller files and the integrated LFS server is responsible for handling larger files. By default, and if LFS is enabled for a Project, files with the extensions .ioo, .rmhdf5table, .collection and .conninfo are stored in the LFS server.

It’s recommended to always store binary data like Excel sheets or pictures inside LFS and have LFS enabled for all Projects by default!

You can define additional file extensions which will be tracked by adapting the .gitattributes file in a Project.

Storage backend

The integrated git and LFS server store data inside the RapidMiner AI Hub home directory and all files reside in $rmHomeDir/data/repositories/git_server and $rmHomeDir/data/repositories/git_lfs_server respectively. In git terminology, the git data is stored inside bare git repositories. In the integrated LFS server, file names always match their respective SHA-256 checksum.

Advanced configuration for upload, disk space availability and consistency checks

The integrated git and LFS server which store their data inside the RapidMiner AI Hub home directory depend on enough disk space being available.

In order to avoid corrupted files after upload, they require a certain amount of disk space being available regardless of the size of the uploaded files. In addition to that, when large files are uploaded to a Project, their expected size and SHA-256 checksum is verified by the integrated LFS server.

The following table outlines important properties which can be changed for disk space and consistency checks inside the execution.properties file.

Property Description Availability
repositories.maxUploadSize By default, the LFS server only allows to upload files smaller than 5 Gigabytes. Identifiers like Gb or Mb for gigabytes and megabytes are supported. Any version supporting Projects
repositories.gitEnableDiskspaceCheckHook Verifies that at least gitDiskspaceCheckThreshold is available inside the RapidMiner AI Hub home directory. >= 9.10.4
repositories.gitDiskspaceCheckThreshold Defaults to 5120M. Identifiers like G or M for gigabytes and megabytes are supported. >= 9.10.4
repositories.lfsEnableDiskspaceCheck Verifies that at least minLfsDiskspaceCheckThreshold is available inside the RapidMiner AI Hub home directory. >= 9.10.4
repositories.minLfsDiskspaceCheckThreshold Defaults to 5120M and is doubled when lfsRemoveUnsuccessfulUploads is enabled. Identifiers like G or M for gigabytes and megabytes are supported. >= 9.10.4
repositories.lfsRemoveUnsuccessfulUploads Defaults to true. When consistency checks fails during upload (checksum or size), those files will be directly removed afterwards to avoid keeping failed uploads. >= 9.10.4
repositories.lfsEnableUploadSizeCheck Defaults to true. Enables check of LFS files being uploaded. >= 9.10.4
repositories.lfsEnableUploadChecksumCheck Defaults to true. Enables checksum verification of LFS files being uploaded. >= 9.10.4