A filesystem block is an object that allows you to read and write data from paths. Prefect provides multiple built-in file system types that cover a wide range of use cases.
Be aware that LocalFileSystem access is limited to the exact path provided. This file system may not be ideal for some use cases. The execution environment for your workflows may not have the same file system as the environment you are writing and deploying your code on.
Use of this file system can limit the availability of results after a flow run has completed or prevent the code for a flow from being retrieved successfully at the start of a run.
The RemoteFileSystem block enables interaction with arbitrary remote file systems. Under the hood, RemoteFileSystem uses fsspec and supports any file system that fsspec supports.
RemoteFileSystem properties include:
Property
Description
basepath
String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed.
settings
Dictionary containing extra parameters required to access the remote file system.
The GitHub filesystem block enables interaction with GitHub repositories. This block is read-only and works with both public and private repositories.
GitHub properties include:
Property
Description
reference
An optional reference to pin to, such as a branch name or tag.
repository
The URL of a GitHub repository to read from, in either HTTPS or SSH format.
access_token
A GitHub Personal Access Token (PAT) with repo scope.
To create a block:
fromprefect.filesystemsimportGitHubblock=GitHub(repository="https://github.com/my-repo/",access_token=<my_access_token># only required for private repos)block.get_directory("folder-in-repo")# specify a subfolder of repoblock.save("dev")
The SMB file system block enables interaction with SMB shared network storage. Under the hood, SMB uses smbprotocol. Used to connect to Windows-based SMB shares from Linux-based Prefect flows. The SMB file system block is able to copy files, but cannot create directories.
SMB properties include:
Property
Description
basepath
String path to the location of files on the remote filesystem. Access to files outside of the base path will not be allowed.
smb_host
Hostname or IP address where SMB network share is located.
Handling credentials for cloud object storage services¶
If you leverage S3, GCS, or Azure storage blocks, and you don't explicitly configure credentials on the respective storage block, those credentials will be inferred from the environment. Make sure to set those either explicitly on the block or as environment variables, configuration files, or IAM roles within both the build and runtime environment for your deployments.
A Prefect installation and doesn't include filesystem-specific package dependencies such as s3fs, gcsfs or adlfs. This includes Prefect base Docker images.
You must ensure that filesystem-specific libraries are installed in an execution environment where they will be used by flow runs.
In Dockerized deployments using the Prefect base image, you can leverage the EXTRA_PIP_PACKAGES environment variable. Those dependencies will be installed at runtime within your Docker container or Kubernetes Job before the flow starts running.
In Dockerized deployments using a custom image, you must include the filesystem-specific package dependency in your image.
Here is an example from a deployment YAML file showing how to specify the installation of s3fs from into your image:
infrastructure:type:docker-containerenv:EXTRA_PIP_PACKAGES:s3fs# could be gcsfs, adlfs, etc.
You may specify multiple dependencies by providing a comma-delimted list.