YAML loading and dumping improvements#62932
Conversation
|
This PR is marked as draft until I complete the following tasks:
|
3943125 to
c0bc1a7
Compare
|
I believe this is ready to be reviewed now. |
9951e47 to
de0e1cc
Compare
| assert isinstance(dataset["spam"], collections.OrderedDict) | ||
| assert isinstance(dataset["spam"]["foo"], collections.OrderedDict) |
There was a problem hiding this comment.
I'm going to trigger a full test run on this PR, simply to find out if there are any other tests that may fail due to these changes.
|
re-run full all |
|
FWIW I'm on board with this idea! I don't yet know if I'm on board with the implementation - as @MKLeb implied with the full test run, this is a pretty intensive change with (potentially) broad repercussions... hopefully all of them good! But I'm sure someone out there is relying on some backwards funky behavior 🙃 I would love if we could bridge the existing behavior and this with like... a feature flag and deprecation warning, like... That way folks could opt-in to the new behavior, and if they have a bunch of problems then they can file some bugs and still have a better fallback than just downgrading their salt installation. In any case, I'm sure a lot of us would feel better with an in-depth code review! |
I can do that. One thing: There is already an option for legacy YAML behavior: I'm tempted to make the option a version selector rather than a boolean so that the YAML behavior can be changed again in the future without much fanfare. Maybe something like from salt.version import SaltStackVersion
want_v = __opts__.get("override_yaml_parser_version")
if want_v:
want_v = SaltStackVersion.parse(want_v)
else:
want_v = SaltStackVersion.current_release()
if want_v < SaltStackVersion(3006):
do_old_stuff()
else:
do_this_new_stuff()Note that the new behavior would be the default behavior, so if it breaks a user they would have to take action to switch to the old behavior. Thoughts? |
If we think this change (or a future change) is too risky to make the default right away, we could predate the version and add a warning: if want_v < SaltStackVersion(3007):
if not __opts__.get("override_yaml_parser_version"):
salt.utils.versions.warn_until(3007, "Salt's YAML parsing behavior will change in v3007.0; see xyz for details")
do_old_stuff()
else:
do_this_new_stuff() |
|
re-run full all |
|
I set this to draft until all of the tasks in the PR description are done. |
|
@waynew I'm trying to figure out how to coax the loader to set |
|
@MKLeb Thanks, though it looks like |
|
@s0undt3ch Any ideas here? |
|
Not sure if that's helpful, but you can always manually load Lines 291 to 303 in 327664a import salt.config
import salt.syspaths
from pathlib import Path
__opts__ = None
def my_func():
global __opts__
if __opts__ is None:
__opts__ = salt.config.minion_config(Path(salt.syspaths.CONFIG_DIR) / "minion")
# in case you need grains in __opts__, see the linked example
# ... do stuff |
This does not change the behavior, but it does simplify the code.
The `salt.loader` module is intentially imported late in `salt.config.call_id_function()` to avoid a circular import error. However, `salt.loader` is also used in `salt.config.mminion_config()`, so import it there as well. This avoids an error if `mminion_config()` is called before `call_id_function()`, or if the import statement in `call_id_function()` is removed in the future.
The same YAML value can be represented in different ways, so tests that compare a generated YAML string with a manually typed string are fragile. Use `salt.utils.yaml` to generate the reference YAML so that the implementation of the YAML dumper can change without breaking the tests.
The delayed `import` statements match `import` statements at the top of the file so they are effectively no-ops.
This commit is a prerequisite to adding version-selected behavior changes.
This will make it possible for users to switch to an older YAML loader/dumper behavior in case a change breaks something. Note that several top-of-file imports were moved into functions to avoid circular import errors.
I believe this was the original intention. Even if it was not, the symmetry with `salt.utils.yaml.save_dump()` makes `dump()` less surprising now, and it makes it easier to introduce representer changes that affect both `dump()` and `safe_dump()`.
Before, the indentation would only be increased if the compiled module
`yaml.CSafeDumper` did not exist. Now it is indented even if it does.
Behavior before (assuming `yaml.CSafeDumper` exists):
```pycon
>>> import salt.utils.yaml as y
>>> print(y.dump({"foo": ["bar"]}, Dumper=y.IndentedSafeOrderedDumper, default_flow_style=False))
foo:
- bar
```
Behavior after:
```pycon
>>> import salt.utils.yaml as y
>>> print(y.dump({"foo": ["bar"]}, Dumper=y.IndentedSafeOrderedDumper, default_flow_style=False))
foo:
- bar
```
The `SaltYamlSafeLoader.add_constructor()` method is a class method,
not an instance method. Therefore, any call to that method in an
instance method (such as `__init__`) affects all future instances, not
just the current instance (`self`). That's not a problem if the
registration calls simply re-register the same constructor functions
over and over, but that was not the case here: different functions
were registered depending on the value of the `dictclass` parameter to
`__init__`.
The net effect was that an `!!omap` node was correctly processed as a
sequence of single-entry maps (and a list of (key, value) tuples
returned) until the first time a SaltYamlSafeLoader was constructed
with a non-`dict` class. After that, every `!!omap` node was always
incorrectly processed as a map node regardless of `dictclass`.
Now `!!omap` processing is performed as originally intended:
* If the `dictclass` parameter is `dict` (the default), the behavior
when loading a `!!map` or `!!omap` node is unchanged from the base
class's behavior.
* If the `dictclass` parameter is not `dict`:
* `!!map` nodes are loaded like they are in the base class
except the custom class is used instead of `dict`.
* `!!omap` nodes are loaded the same as `!!map` nodes. (This is
a bug because an `!!omap` node is a sequence node of
single-valued map nodes, not an ordinary map node. A future
commit will fix this.)
Behavior before:
```pycon
>>> import salt.utils.yaml as y
>>> import collections
>>> y.load("!!omap [{foo: bar}, {baz: bif}]")
[('foo', 'bar'), ('baz', 'bif')]
>>> y.SaltYamlSafeLoader("", dictclass=collections.OrderedDict) # created for side-effect only
<salt.utils.yamlloader.SaltYamlSafeLoader object at 0xc31a10>
>>> y.load("!!omap [{foo: bar}, {baz: bif}]") # exact same as before
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "salt/utils/yamlloader.py", line 159, in load
return yaml.load(stream, Loader=Loader)
File "venv/lib/python3.8/site-packages/yaml/__init__.py", line 81, in load
return loader.get_single_data()
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data
return self.construct_document(node)
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 60, in construct_document
for dummy in generator:
File "salt/utils/yamlloader.py", line 45, in construct_yaml_map
value = self.construct_mapping(node)
File "salt/utils/yamlloader.py", line 56, in construct_mapping
raise ConstructorError(
yaml.constructor.ConstructorError: expected a mapping node, but found sequence
in "<unicode string>", line 1, column 1
```
Behavior after:
```pycon
>>> import salt.utils.yaml as y
>>> import collections
>>> y.load("!!omap [{foo: bar}, {baz: bif}]")
[('foo', 'bar'), ('baz', 'bif')]
>>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data
return self.construct_document(node)
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 60, in construct_document
for dummy in generator:
File "salt/utils/yamlloader.py", line 42, in construct_yaml_omap
return (yield from self.construct_yaml_map(node))
File "salt/utils/yamlloader.py", line 36, in construct_yaml_map
value = self.construct_mapping(node)
File "salt/utils/yamlloader.py", line 52, in construct_mapping
raise ConstructorError(
yaml.constructor.ConstructorError: expected a mapping node, but found sequence
in "<unicode string>", line 1, column 1
>>> y.load("!!omap [{foo: bar}, {baz: bif}]")
[('foo', 'bar'), ('baz', 'bif')]
```
This commit also adds a unit test for the `dictclass` parameter,
though it's important to note that the new test is not a regression
test for the bug fixed by this commit. (I don't think it would be
worthwhile to write a regression test because the test code would be
complicated and unreadable.)
YAML nodes tagged with `!!timestamp` nodes now become Python
`datetime` objects rather than strings. To preserve compatibility
with existing YAML files, timestamp-like strings without `!!timestamp`
are still loaded as strings.
Behavior before:
```pycon
>>> import datetime
>>> import salt.utils.yaml as y
>>> y.load("'2022-10-21T18:16:03.1-04:00'")
'2022-10-21T18:16:03.1-04:00'
>>> y.load("!!timestamp '2022-10-21T18:16:03.1-04:00'")
'2022-10-21T18:16:03.1-04:00'
```
Behavior after:
```pycon
>>> import datetime
>>> import salt.utils.yaml as y
>>> y.load("'2022-10-21T18:16:03.1-04:00'")
'2022-10-21T18:16:03.1-04:00'
>>> y.load("!!timestamp '2022-10-21T18:16:03.1-04:00'")
datetime.datetime(2022, 10, 21, 18, 16, 3, 100000, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
```
Before, `!!omap` nodes were incorrectly assumed to be mapping nodes
when `dictclass` was not `dict`. Now they are correctly processed as
sequences of single-entry mappings. The resulting Python object has
type `dictclass` (usually `collections.OrderedDict` or a subclass).
This commit does not change the behavior when `dictclass` is `dict`;
loading an `!!omap` node still returns a list of (key, value)
tuples (PyYAML's default behavior). I consider that to be a bug in
PyYAML, so a future commit may change the behavior in the `dict` case
to match the non-`dict` behavior. (This commit uses `dictclass` for
the non-`dict` case to match what appears to be the original
intention.)
Behavior before:
```pycon
>>> import salt.utils.yaml as y
>>> import collections
>>> y.load("!!omap [{foo: bar}, {baz: bif}]")
[('foo', 'bar'), ('baz', 'bif')]
>>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data
return self.construct_document(node)
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 60, in construct_document
for dummy in generator:
File "salt/utils/yamlloader.py", line 42, in construct_yaml_omap
return (yield from self.construct_yaml_map(node))
File "salt/utils/yamlloader.py", line 36, in construct_yaml_map
value = self.construct_mapping(node)
File "salt/utils/yamlloader.py", line 52, in construct_mapping
raise ConstructorError(
yaml.constructor.ConstructorError: expected a mapping node, but found sequence
in "<unicode string>", line 1, column 1
```
Behavior after:
```pycon
>>> import salt.utils.yaml as y
>>> import collections
>>> y.load("!!omap [{foo: bar}, {baz: bif}]")
[('foo', 'bar'), ('baz', 'bif')]
>>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data()
OrderedDict([('foo', 'bar'), ('baz', 'bif')])
```
Relevant bug: saltstack#12161
Now the behavior is consistent regardless of the value of the
Loader's `dictclass` constructor parameter.
Starting with Python 3.6, Python `dict` objects always iterate in
insertion order, so iteration order was already guaranteed when using
an ordinary YAML map. However, `!!omap` provides a stronger guarantee
to users (plain map iteration order can be thought of as a Salt
implementation detail that might change in the future), and it allows
them to make their `.sls` files self-documenting. For example,
instead of:
```yaml
my_pillar_data:
key1: val1
key2: val2
```
users can now do:
```yaml
my_pillar_data: !!omap
- key1: val1
- key2: val2
```
to make it clear to readers that the entries are intended to be
processed in order.
Behavior before:
```pycon
>>> import salt.utils.yaml as y
>>> import collections
>>> y.load("!!omap [{foo: bar}, {baz: bif}]")
[('foo', 'bar'), ('baz', 'bif')]
>>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data()
OrderedDict([('foo', 'bar'), ('baz', 'bif')])
```
Behavior after:
```pycon
>>> import salt.utils.yaml as y
>>> import collections
>>> y.load("!!omap [{foo: bar}, {baz: bif}]")
OrderedDict([('foo', 'bar'), ('baz', 'bif')])
>>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data()
OrderedDict([('foo', 'bar'), ('baz', 'bif')])
```
Relevant bug: saltstack#12161
My main motivation for adding this is to facilitate testing, though it
also gives module authors greater flexibility (in particular: a
`tuple` can be a `dict` key, but a `list` cannot because it is not
hashable), and I don't see a strong reason why this shouldn't be
added.
Behavior before:
```pycon
>>> import salt.utils.yaml as y
>>> y.load("!!python/tuple [foo, bar]")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "salt/utils/yamlloader.py", line 159, in load
return yaml.load(stream, Loader=Loader)
File "venv/lib/python3.8/site-packages/yaml/__init__.py", line 81, in load
return loader.get_single_data()
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data
return self.construct_document(node)
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 55, in construct_document
data = self.construct_object(node)
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 100, in construct_object
data = constructor(self, node)
File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 427, in construct_undefined
raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'
in "<unicode string>", line 1, column 1
```
Behavior after:
```pycon
>>> import salt.utils.yaml as y
>>> y.load("!!python/tuple [foo, bar]")
('foo', 'bar')
```
Override the implicit resolver for timestamp strings so that `datetime.datetime` objects are always written with an accompaying `!!timestamp` tag. This is a behavior change: Any users that depend on a `datetime` becoming a plain YAML string (which would be read in as a `str` object) must convert the object to string themselves. This change preserves the semantics of the object, and it preserves round-trip identity. Behavior before: ```pycon >>> import datetime >>> import salt.utils.yaml as y >>> print(y.dump(datetime.datetime(2022, 10, 21, 18, 16, 3, 100000, tzinfo=datetime.timezone(datetime.timedelta(hours=-4))))) 2022-10-21 18:16:03.100000-04:00 ``` Behavior after: ```pycon >>> import datetime >>> import salt.utils.yaml as y >>> print(y.dump(datetime.datetime(2022, 10, 21, 18, 16, 3, 100000, tzinfo=datetime.timezone(datetime.timedelta(hours=-4))))) !!timestamp 2022-10-21 18:16:03.100000-04:00 ```
Behavior before:
```pycon
>>> from salt.utils.odict import OrderedDict
>>> import salt.utils.yaml as y
>>> import collections
>>> print(y.dump(OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
>>> print(y.safe_dump(OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
>>> print(y.dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False))
!!python/object/apply:collections.OrderedDict
- - - foo
- bar
>>> print(y.safe_dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False))
NULL
```
Behavior after:
```pycon
>>> from salt.utils.odict import OrderedDict
>>> import salt.utils.yaml as y
>>> import collections
>>> print(y.dump(OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
>>> print(y.safe_dump(OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
>>> print(y.dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
>>> print(y.safe_dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
```
Before, an `OrderedDict` object was represented as a plain YAML
mapping. Now it is represented as an `!!omap` node, which is a
sequence of single-valued mappings.
This is a behavior change: Any users that depend on an `OrderedDict`
becoming a plain YAML mapping (which would be read in as a `dict`
object) must first convert the `OrderedDict` to `dict`.
This change preserves the semantics of the object, and it preserves
round-trip identity.
Behavior before:
```pycon
>>> from salt.utils.odict import OrderedDict
>>> import salt.utils.yaml as y
>>> import collections
>>> print(y.dump(OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
>>> print(y.safe_dump(OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
>>> print(y.dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
>>> print(y.safe_dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False))
foo: bar
```
Behavior after:
```pycon
>>> from salt.utils.odict import OrderedDict
>>> import salt.utils.yaml as y
>>> import collections
>>> print(y.dump(OrderedDict([("foo", "bar")]), default_flow_style=False))
!!omap
- foo: bar
>>> print(y.safe_dump(OrderedDict([("foo", "bar")]), default_flow_style=False))
!!omap
- foo: bar
>>> print(y.dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False))
!!omap
- foo: bar
>>> print(y.safe_dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False))
!!omap
- foo: bar
```
Before, `!!python/tuple` was only used for `OrderedDumper`. Now it is
also used for `SafeOrderedDumper` and `IndentedSafeOrderedDumper`.
This is a behavior change: Any users that depended on a `tuple`
becoming a plain YAML sequence (which would be read in as a `list`
object) must first convert the `tuple` to `list`.
This change preserves the semantics of the object, and it preserves
round-trip identity. Preserving round-trip identity is particularly
important if the tuple is used as a key in a `dict` because `list`
objects are not hashable.
Behavior before:
```pycon
>>> import salt.utils.yaml as y
>>> print(y.dump(("foo", "bar"), default_flow_style=False))
!!python/tuple
- foo
- bar
>>> print(y.safe_dump(("foo", "bar"), default_flow_style=False))
- foo
- bar
```
Behavior after:
```pycon
>>> import salt.utils.yaml as y
>>> print(y.dump(("foo", "bar"), default_flow_style=False))
!!python/tuple
- foo
- bar
>>> print(y.safe_dump(("foo", "bar"), default_flow_style=False))
!!python/tuple
- foo
- bar
```
Failure to represent an object is a bug so it should be noisy, not quietly produce `NULL`. The representer this commit disables was added in commit 2e2ce5f. Behavior before: ```pycon >>> import salt.utils.yaml as y >>> y.safe_dump(type("GeneratedObjectSubclass", (), {})) 'NULL\n' ``` Behavior after: ```pycon >>> import salt.utils.yaml as y >>> y.safe_dump(type("GeneratedObjectSubclass", (), {})) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "salt/utils/yamldumper.py", line 270, in safe_dump return dump(data, stream, Dumper=SafeOrderedDumper, **kwargs) File "salt/utils/yamldumper.py", line 261, in dump return yaml.dump(data, stream, **kwargs) File "venv/lib/python3.8/site-packages/yaml/__init__.py", line 253, in dump return dump_all([data], stream, Dumper=Dumper, **kwds) File "venv/lib/python3.8/site-packages/yaml/__init__.py", line 241, in dump_all dumper.represent(data) File "venv/lib/python3.8/site-packages/yaml/representer.py", line 27, in represent node = self.represent_data(data) File "venv/lib/python3.8/site-packages/yaml/representer.py", line 58, in represent_data node = self.yaml_representers[None](self, data) File "venv/lib/python3.8/site-packages/yaml/representer.py", line 231, in represent_undefined raise RepresenterError("cannot represent an object", data) yaml.representer.RepresenterError: ('cannot represent an object', <class '__main__.GeneratedObjectSubclass'>) ```
|
Closing this due to inactivity. Anyone should feel free to re-open it if they want to see it through to the end in one release cycle. |
What does this PR do?
Introduces a new
yaml_compatibilityoption that controls the behavior of YAML loading/dumping. If a user wants to revert to the YAML behavior from a previous version of Salt, or preview upcoming changes to YAML behavior, they can set it to the desired version of Salt. For example, settingyaml_compatibilityto 3006 will pin the current behavior, and setting it to 3007 allows the user to preview the behavior that will become the default in Salt v3007.In addition, the following behavior changes take effect immediately:
dictclass to thesalt.utils.yaml.SaltYamlSafeLoaderconstructor no longer causes all future!!omapnodes to raise an exception when loaded.!!python/tupleis now supported, and produces a Pythontupleobject. Before, such nodes would raise an exception. (This change does not break backwards compatibility, which is why it takes effect immediately.)After v3006 is released (
salt.version.SaltVersionsInfo.SULFUR.releasedis changed toTrue), the following behavior changes will automatically take effect. These can be previewed now by settingyaml_compatibilityto 3007.salt.utils.yaml.dump()will default tosalt.utils.yaml.OrderedDumperrather than PyYAML'syaml.Dumperclass. (I believe this was the original intention, and nowdump()'s behavior is symmetric withsafe_dump()).salt.utils.yaml.IndentedSafeOrderedDumperlist indentation (was broken only whenyaml.CSafeDumperexists).!!timestampnode will produces a Pythondatetime.datetimeobject (currently they are always loaded as strings).!!omapnodes (currently they are sometimes incorrectly parsed as mapping nodes rather than sequence nodes containing mapping nodes).!!omapnode will always produce acollections.OrderedDictobject. Currently, it sometimes produces a list of (key, value) tuples and sometimes raises an exception.datetime.datetimeobject to YAML now explicitly tags the node with!!timestamp.collections.OrderedDictobject to YAML now consistently produces a YAML!!omapnode, which is a sequence of single-valued mappings. Before, the behavior varied depending on the specific object type (collections.OrderedDictvs.salt.utils.odict.OrderedDict) and the dump function (salt.utils.yaml.dump()vs.salt.utils.yaml.safe_dump()).tupleobject to YAML now consistently produces a YAML sequence node tagged with!!python/tuple. Before, onlysalt.utils.yaml.dump()would tag the node with!!python/tuple;salt.utils.yaml.safe_dump()did not.NULLnode.Clean-ups and code health:
salt.utils.yamlunit tests to pytest.What issues does this PR fix or reference?
#12161 (can be closed even without this PR because the bug no longer exists, although this PR adds a regression test in the form of an integration test)
Behavior Changes
IndentedSafeOrderedDumper:Before (assuming
yaml.CSafeDumperexists):After:
This fix required a switch to the pure Python
yaml.SafeDumperclass (forIndentedSafeOrderedDumperonly), which will be slower thanyaml.CSafeDumper.Loading YAML
!!omapnodes to Python:Before:
After:
Loading YAML
!!timestampnodes to Python:Before:
After:
Loading YAML
!!python/tuplenodes to Python:Before:
After:
Dumping Python
datetime.datetimeobjects to YAML:Before:
After:
Dumping Python
OrderedDictobjects to YAML:Before:
After:
Dumping Python
tupleobjects to YAML:Before:
After:
Dumping unsupported objects:
Behavior before:
Behavior after:
Merge requirements satisfied?
[NOTICE] Bug fixes or features added to Salt require tests.
Commits signed with GPG?
No
Tasks