Using ruamel.yaml, support cross-file references and YAML composition in YAML files using tags !reference, !reference-all, !flatten, and !merge.
Install the package from PyPI with:
# pip
pip install yaml-reference
# poetry
poetry add yaml-reference
# uv
uv add yaml-referenceThis Python library implements the YAML specification for cross-file references and YAML composition in YAML files using tags !reference, !reference-all, !flatten, and !merge as defined in the yaml-reference-specs project.
# root.yaml
version: "3.1"
services:
- !reference
path: "services/website.yaml"
- !reference
path: "services/database.yaml"
networkConfigs:
!reference-all
glob: "networks/*.yaml"
tags: !flatten
- !reference { path: "common/tags.yaml" }
- "web"
- "service"
config: !merge
- !reference { path: "config/defaults.yaml" }
- !reference { path: "config/overrides.yaml" }
Supposing there are services/website.yaml and services/database.yaml files in the same directory as root.yaml, and a networks directory with YAML files, the above will be expanded to account for the referenced files with the following Python code:
from yaml_reference import load_yaml_with_references
data = load_yaml_with_references("root.yaml")
print(data)
# {"networkConfigs": [{"network": "vpn","version": "1.1"},{"network": "nfs","version": "1.0"}],"services": ["website","database"],"version": "3.1"}
# With path restrictions for security
data = load_yaml_with_references("root.yaml", allow_paths=["/allowed/path"])Note that the load_yaml_with_references function instantiates a ruamel.yaml.YAML loader class (typ='safe') to perform the deserialization of the YAML files, and returns a Python dictionary with the recursively-expanded YAML data.
If you wish to resolve one "layer" of references without recursively exhausting the entire reference graph, the parse_yaml_with_references function can be used to obtain the original YAML document's contents with !reference/!reference-all tags as dedicated objects called Reference and ReferenceAll.
from yaml_reference import parse_yaml_with_references
data = parse_yaml_with_references("root.yaml")
print(data["networkConfigs"])
# ReferenceAll(glob="networks/*.yaml", location="/path/to/root.yaml")
# With path restrictions for security
data = parse_yaml_with_references("root.yaml", allow_paths=["/allowed/path"])The !merge tag combines multiple YAML mappings (dictionaries) into a single mapping. This is useful for composing configuration from multiple sources or applying overrides. When you use !merge, you provide a sequence of mappings that will be merged together, with later mappings overriding keys from earlier ones.
# Example: Merge default and override configurations
config: !merge
- {host: "localhost", port: 8080, debug: false}
- {port: 9000, debug: true} # Overrides port and debug from the first mappingWhen loaded with load_yaml_with_references, this becomes {"host": "localhost", "port": 9000, "debug": true}. The !merge tag can also be nested and combined with !reference and !flatten tags for complex YAML composition scenarios.
Note that, if a nested sequence of mappings is provided to !merge, the sequence argument will be flattened first, and then the resulting mappings will be merged together. For example:
config: !merge
- - a: 1
- b: 2
- c: 3
- - [{c: 5, a: 5}]Will be processed into {"config": {"a": 5, "b": 2, "c": 5}} because the nested sequence of mappings will be flattened into a single sequence of mappings before merging.
Both !reference and !reference-all tags support an optional anchor parameter that allows you to import only a specific anchored section from a file, rather than the entire file contents. This is useful when you want to extract a particular part of a larger YAML document.
# main.yaml
database_config: !reference
path: "config.yaml"
anchor: db_settings
api_keys: !reference-all
glob: "secrets/*.yaml"
anchor: api_keyIn this example, if config.yaml contains multiple anchored sections, only the one labeled with &db_settings will be imported. Similarly, !reference-all will extract the &api_key anchor from each file matching the glob pattern.
Here's a practical example:
# config.yaml
app_name: MyApplication
db_settings: &db_settings
host: localhost
port: 5432
database: myapp
cache_settings: &cache_settings
ttl: 3600# main.yaml
config: !reference
path: "config.yaml"
anchor: db_settingsWhen loaded with load_yaml_with_references("main.yaml"), the result will be:
{
"config": {
"host": "localhost",
"port": 5432,
"database": "myapp"
}
}Note that the app_name and cache_settings fields from config.yaml are not included in the result because only the anchored section was imported. If the specified anchor is not found in the referenced file, a ValueError will be raised.
To get rid of red squigglies in VSCode when using the !reference, !reference-all, !flatten, and !merge tags, you can add the following to your settings.json file:
"yaml.customTags": [
"!reference mapping",
"!reference-all mapping",
"!flatten sequence",
"!merge sequence"
]There is a CLI interface for this package which can be used to read a YAML file which contains !reference tags and dump its contents as pretty-printed JSON with references expanded. This is useful for generating a single file for deployment or other purposes. Note that the keys of mappings will be sorted alphabetically. This CLI interface is used to test the contract of this package against the yaml-reference-specs project.
$ yaml-reference-cli -h
usage: yaml-reference-cli [-h] [--allow ALLOW_PATHS] input_file
Compile a YAML file containing !reference tags into a new YAML file with resolved references. Expects a YAML file to be provided via the "input_file" argument.
Outputs JSON content to stdout.
positional arguments:
input_file Path to the input YAML file with references to resolve and print as JSON.
options:
-h, --help show this help message and exit
--allow ALLOW_PATHS Path to allow references from.
$ yaml-reference-cli root.yaml
{
"networkConfigs": [
{
"network": "vpn",
"version": "1.1"
},
{
"network": "nfs",
"version": "1.0"
}
],
"services": [
"website",
"database"
],
"tags": [
"common:aws",
"common:http",
"common:security",
"common:waf",
"web",
"service"
],
"version": "3.1"
}It's still possible to yield the results as a YAML file using the yq CLI tool (mikefarah/yq).
$ yaml-reference-cli root.yaml | yq -P
networkConfigs:
- network: vpn
version: 1.1
- network: nfs
version: 1.0
services:
- website
- database
tags:
- common:aws
- common:http
- common:security
- common:waf
- web
- service
version: 3.1
# Pipe it to a result file
$ yaml-reference-cli root.yaml | yq -P > .compiled/root.yamlAs required by the yaml-reference-specs specification, this package includes circular reference detection to prevent infinite recursion. If a circular reference is detected (e.g., A references B, B references C, C references A), a ValueError will be raised with a descriptive error message. This protects against self-references and circular chains in both !reference and !reference-all tags.
By default, !reference and !reference-all tags can only reference files within the same directory as the source YAML file (or child subdirectories). To allow references to files in other disparate directory trees, you must explicitly specify allowed paths using the allow_paths parameter:
from yaml_reference import load_yaml_with_references
# Allow references from specific directories only
data = load_yaml_with_references(
"config.yml",
allow_paths=["/allowed/path1", "/allowed/path2"]
)In the CLI, use the --allow flag:
yaml-reference compile input.yml --allow /allowed/path1 --allow /allowed/path2Whether or not allow_paths is specified, the default behavior is to allow references to files in the same directory as the source YAML file (or subdirectories). "Back-navigating" out of a the root directory is not allowed (".." local references in a root YAML file). This provides a secure baseline to prevent unsafe access which is not explicitly allowed.
!reference-all applies silent-omission semantics when individual glob matches fall outside the allowed path set. Disallowed paths are filtered out before any file is opened (security invariant: disallowed file contents are never loaded into memory). The result is the subset of glob matches that are both reachable and allowed:
| Scenario | Behaviour | Exit |
|---|---|---|
| Glob matches zero files | Returns [] |
rc=0 |
Some matched files are outside allow_paths |
Disallowed files are silently dropped; remaining files are returned | rc=0 |
All matched files are outside allow_paths |
Returns [] |
rc=0 |
Glob pattern is absolute (starts with /) |
Hard error – ValueError raised |
rc=1 |
| A matched file is the calling file (self-reference) | Hard error – circularity ValueError raised |
rc=1 |
| A matched file transitively references the caller | Hard error – circularity ValueError raised |
rc=1 |
This design keeps !reference-all resilient against partially-populated directory trees while still enforcing absolute-path and circularity invariants as hard failures.
References using absolute paths (e.g., /tmp/file.yml) are explicitly rejected with a ValueError. All reference paths must be relative to the source file's directory. If you absolutely must reference an absolute path, relative paths to symlinks can be used. Note that their target directories must be explicitly allowed to avoid permission errors (see the above section about "Path restriction and allow_paths").
Contributor(s):
- David Sillman dsillman2000@gmail.com
- Personal website: https://www.dsillman.com
- Ryan Johnson