Skip to content

dsillman2000/yaml-reference

Repository files navigation

yaml-reference

Using ruamel.yaml, support cross-file references and YAML composition in YAML files using tags !reference, !reference-all, !flatten, and !merge.

Install the package from PyPI with:

# pip
pip install yaml-reference
# poetry
poetry add yaml-reference
# uv
uv add yaml-reference

Spec

Spec Status

This Python library implements the YAML specification for cross-file references and YAML composition in YAML files using tags !reference, !reference-all, !flatten, and !merge as defined in the yaml-reference-specs project.

Example

# root.yaml
version: "3.1"
services:
  - !reference
    path: "services/website.yaml"

  - !reference
    path: "services/database.yaml"

networkConfigs:
  !reference-all
  glob: "networks/*.yaml"

tags: !flatten
  - !reference { path: "common/tags.yaml" }
  - "web"
  - "service"

config: !merge
  - !reference { path: "config/defaults.yaml" }
  - !reference { path: "config/overrides.yaml" }

Supposing there are services/website.yaml and services/database.yaml files in the same directory as root.yaml, and a networks directory with YAML files, the above will be expanded to account for the referenced files with the following Python code:

from yaml_reference import load_yaml_with_references

data = load_yaml_with_references("root.yaml")
print(data)
# {"networkConfigs": [{"network": "vpn","version": "1.1"},{"network": "nfs","version": "1.0"}],"services": ["website","database"],"version": "3.1"}

# With path restrictions for security
data = load_yaml_with_references("root.yaml", allow_paths=["/allowed/path"])

Note that the load_yaml_with_references function instantiates a ruamel.yaml.YAML loader class (typ='safe') to perform the deserialization of the YAML files, and returns a Python dictionary with the recursively-expanded YAML data.

If you wish to resolve one "layer" of references without recursively exhausting the entire reference graph, the parse_yaml_with_references function can be used to obtain the original YAML document's contents with !reference/!reference-all tags as dedicated objects called Reference and ReferenceAll.

from yaml_reference import parse_yaml_with_references

data = parse_yaml_with_references("root.yaml")
print(data["networkConfigs"])
# ReferenceAll(glob="networks/*.yaml", location="/path/to/root.yaml")

# With path restrictions for security
data = parse_yaml_with_references("root.yaml", allow_paths=["/allowed/path"])

The !merge Tag

The !merge tag combines multiple YAML mappings (dictionaries) into a single mapping. This is useful for composing configuration from multiple sources or applying overrides. When you use !merge, you provide a sequence of mappings that will be merged together, with later mappings overriding keys from earlier ones.

# Example: Merge default and override configurations
config: !merge
  - {host: "localhost", port: 8080, debug: false}
  - {port: 9000, debug: true}  # Overrides port and debug from the first mapping

When loaded with load_yaml_with_references, this becomes {"host": "localhost", "port": 9000, "debug": true}. The !merge tag can also be nested and combined with !reference and !flatten tags for complex YAML composition scenarios.

Note that, if a nested sequence of mappings is provided to !merge, the sequence argument will be flattened first, and then the resulting mappings will be merged together. For example:

config: !merge
  - - a: 1
    - b: 2
  - c: 3
  - - [{c: 5, a: 5}]

Will be processed into {"config": {"a": 5, "b": 2, "c": 5}} because the nested sequence of mappings will be flattened into a single sequence of mappings before merging.

Using Anchors with !reference and !reference-all

Both !reference and !reference-all tags support an optional anchor parameter that allows you to import only a specific anchored section from a file, rather than the entire file contents. This is useful when you want to extract a particular part of a larger YAML document.

# main.yaml
database_config: !reference
  path: "config.yaml"
  anchor: db_settings

api_keys: !reference-all
  glob: "secrets/*.yaml"
  anchor: api_key

In this example, if config.yaml contains multiple anchored sections, only the one labeled with &db_settings will be imported. Similarly, !reference-all will extract the &api_key anchor from each file matching the glob pattern.

Here's a practical example:

# config.yaml
app_name: MyApplication
db_settings: &db_settings
  host: localhost
  port: 5432
  database: myapp
cache_settings: &cache_settings
  ttl: 3600
# main.yaml
config: !reference
  path: "config.yaml"
  anchor: db_settings

When loaded with load_yaml_with_references("main.yaml"), the result will be:

{
  "config": {
    "host": "localhost",
    "port": 5432,
    "database": "myapp"
  }
}

Note that the app_name and cache_settings fields from config.yaml are not included in the result because only the anchored section was imported. If the specified anchor is not found in the referenced file, a ValueError will be raised.

VSCode squigglies

To get rid of red squigglies in VSCode when using the !reference, !reference-all, !flatten, and !merge tags, you can add the following to your settings.json file:

    "yaml.customTags": [
        "!reference mapping",
        "!reference-all mapping",
        "!flatten sequence",
        "!merge sequence"
    ]

CLI interface

There is a CLI interface for this package which can be used to read a YAML file which contains !reference tags and dump its contents as pretty-printed JSON with references expanded. This is useful for generating a single file for deployment or other purposes. Note that the keys of mappings will be sorted alphabetically. This CLI interface is used to test the contract of this package against the yaml-reference-specs project.

$ yaml-reference-cli -h
  usage: yaml-reference-cli [-h] [--allow ALLOW_PATHS] input_file

  Compile a YAML file containing !reference tags into a new YAML file with resolved references. Expects a YAML file to be provided via the "input_file" argument.
  Outputs JSON content to stdout.

  positional arguments:
    input_file           Path to the input YAML file with references to resolve and print as JSON.

  options:
     -h, --help           show this help message and exit
     --allow ALLOW_PATHS  Path to allow references from.

$ yaml-reference-cli root.yaml
  {
    "networkConfigs": [
      {
        "network": "vpn",
        "version": "1.1"
      },
      {
        "network": "nfs",
        "version": "1.0"
      }
    ],
    "services": [
      "website",
      "database"
    ],
    "tags": [
      "common:aws",
      "common:http",
      "common:security",
      "common:waf",
      "web",
      "service"
    ],
    "version": "3.1"
  }

It's still possible to yield the results as a YAML file using the yq CLI tool (mikefarah/yq).

$ yaml-reference-cli root.yaml | yq -P
networkConfigs:
  - network: vpn
    version: 1.1
  - network: nfs
    version: 1.0
services:
  - website
  - database
tags:
  - common:aws
  - common:http
  - common:security
  - common:waf
  - web
  - service
version: 3.1
# Pipe it to a result file
$ yaml-reference-cli root.yaml | yq -P > .compiled/root.yaml

Circular reference protection

As required by the yaml-reference-specs specification, this package includes circular reference detection to prevent infinite recursion. If a circular reference is detected (e.g., A references B, B references C, C references A), a ValueError will be raised with a descriptive error message. This protects against self-references and circular chains in both !reference and !reference-all tags.

Security considerations

Path restriction and allow_paths

By default, !reference and !reference-all tags can only reference files within the same directory as the source YAML file (or child subdirectories). To allow references to files in other disparate directory trees, you must explicitly specify allowed paths using the allow_paths parameter:

from yaml_reference import load_yaml_with_references

# Allow references from specific directories only
data = load_yaml_with_references(
    "config.yml",
    allow_paths=["/allowed/path1", "/allowed/path2"]
)

In the CLI, use the --allow flag:

yaml-reference compile input.yml --allow /allowed/path1 --allow /allowed/path2

Whether or not allow_paths is specified, the default behavior is to allow references to files in the same directory as the source YAML file (or subdirectories). "Back-navigating" out of a the root directory is not allowed (".." local references in a root YAML file). This provides a secure baseline to prevent unsafe access which is not explicitly allowed.

Glob matching behavior for !reference-all

!reference-all applies silent-omission semantics when individual glob matches fall outside the allowed path set. Disallowed paths are filtered out before any file is opened (security invariant: disallowed file contents are never loaded into memory). The result is the subset of glob matches that are both reachable and allowed:

Scenario Behaviour Exit
Glob matches zero files Returns [] rc=0
Some matched files are outside allow_paths Disallowed files are silently dropped; remaining files are returned rc=0
All matched files are outside allow_paths Returns [] rc=0
Glob pattern is absolute (starts with /) Hard error – ValueError raised rc=1
A matched file is the calling file (self-reference) Hard error – circularity ValueError raised rc=1
A matched file transitively references the caller Hard error – circularity ValueError raised rc=1

This design keeps !reference-all resilient against partially-populated directory trees while still enforcing absolute-path and circularity invariants as hard failures.

Absolute path restrictions

References using absolute paths (e.g., /tmp/file.yml) are explicitly rejected with a ValueError. All reference paths must be relative to the source file's directory. If you absolutely must reference an absolute path, relative paths to symlinks can be used. Note that their target directories must be explicitly allowed to avoid permission errors (see the above section about "Path restriction and allow_paths").

Acknowledgements

Contributor(s):

About

YAML tagging system for reading modular YAML files with Python

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors