Do not deduplicate captured args while expanding format_args!#149926
Do not deduplicate captured args while expanding format_args!#149926ShoyuVanilla wants to merge 1 commit intorust-lang:mainfrom
format_args!#149926Conversation
|
Some changes occurred in src/tools/clippy cc @rust-lang/clippy Some changes occurred in compiler/rustc_ast_lowering/src/format.rs cc @m-ou-se |
|
r? @spastorino rustbot has assigned @spastorino. Use |
This comment has been minimized.
This comment has been minimized.
|
@rustbot author |
fb523ee to
3ebdaa4
Compare
|
@rustbot ready |
|
Nominating as per #145739 (comment) |
|
It'd be worth adding a test for the drop behavior. |
3ebdaa4 to
af89685
Compare
|
Given that this makes more sense for the language, along with the clean crater results and the intuition that it'd be surprising if anything actually leaned on this, I propose: @rfcbot fcp merge lang |
|
Team member @traviscross has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. |
|
I don't think we should do this. It will make the generated code for I don't want to end up in a situation where it would make sense for Clippy to suggest something like: Adding @rust-rfcbot concern equivalence |
|
🔔 This is now entering its final comment period, as per the review above. 🔔 |
|
@rfcbot concern just-does-not-seem-worth-it So I've been kind of sitting this out, and I hate to do this, but I am going to raise a concern for a bit more discussion. The bottom line is that I don't really buy the "consistency" motivation for this change. I see no reason to expect that It seems like the consistency concern comes up when you look at something like But I see a rather consistent mental model which is that embedded I gather that the main argument in FAVOR of this chang is that it'd be simpler if the embedded I have a kind of internal heuristic which is like: don't break compatibility if you don't have to. Right now, if this were a fresh design, I suppose I'd be on the fence because of Am I missing something? Is there a more complex argument? If I do have this right, can somebody summarize the perf/cost hit and point me at the comments about the complexity of the proposed optimization? (The optimization smacks to me "sufficiently smart compiler" when a rather "dumb compiler hack" would do the trick...) EDIT: I read over more the comments. I can see that the |
|
Actually, what would help me is to write-out the precise desugaring expected for EDIT: I guess that based on this comment...
...it's clear enough. |
What about between
I think the best case for the simple argument is demonstrated by the earnest surprise that @theemathas (in #145739) and @Jules-Bertholet (in rust-lang/rfcs#3626 (comment)) had about the behavior. Obviously both are Rust experts. As @Jules-Bertholet said:
The simple argument really is that simple. From lang, we've pushed a consistent evaluation opsem for consts, the interpolation behavior is inconsistent with that, and that surprises people. The slightly-more-complex argument, in an RFC 3626 context, goes as follows. Let's say we want to desugar this: fn main() {
println!("{X.f} {X.f}"); //~ Format string to desugar.
}
// Feel free to ignore this scaffolding...
use core::{fmt, sync::atomic::{AtomicU8, Ordering::Relaxed}};
const X: W = W(Y);
const Y: U = U { f: V(AtomicU8::new(0)) };
struct W(U);
struct U { f: V }
struct V(AtomicU8);
impl fmt::Display for V {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let x = self.0.fetch_add(1, Relaxed);
write!(f, "displayV({x})")
}
}
impl Drop for W {
fn drop(&mut self) {
let x = self.0.f.0.fetch_add(1, Relaxed);
println!("dropW({x})");
}
}
impl core::ops::Deref for W {
type Target = U;
fn deref(&self) -> &Self::Target {
let x = self.0.f.0.fetch_add(1, Relaxed);
println!("derefW({x})");
&self.0
}
}There are three ways that come to mind: fn main() {
//println!("{X.f} {X.f}"); //~ Format string to desugar.
println!("-- 1:");
println!("{} {}", X.f, X.f); //~ Desugaring 1.
println!("-- 2:");
{ let x = X; println!("{} {}", x.f, x.f) }; //~ Desugaring 2.
println!("-- 3:");
println!("{} {0:}", X.f); //~ Desugaring 3.
}These produce outputs: I find the output of desugaring 1 satisfying, given our intended opsem expectations for consts. I find the output of desugarings 2 and 3 various degrees of unsatisfying. In particular, the idea that with For my part, if we were not to clean this up, then I'd become more skeptical of whether we'd want to expand our interpolation syntax at all. Is it worth it? I really can't imagine there being meaningful breakage here. Regarding the performance, I'm hopeful that as-if optimizations can cover the common cases, but maybe there would be some cost. Maybe in some cases people should indeed prefer to write |
|
Thinking on this. I think that looking at particular examples isn't that fruitful for me. I get a lot more value out of thinking about the desugaring, making sure it's sensible, that it does the right thing in the common cases, and letting the semanticscs of edge cases fall out from the desugaring. I think this aligns well with the way people learn: first learn the easy stuff, then learn the details, then let those details guide their intutions (the Terrance Tao More to math than rigor process). So, let's assume that the backwards compatibility is not an issue here for the moment. Then I think we are arguing about the desugarings. TC is proposing that a nice desugaring is going to be: There are other options you could take, e.g., it behaves differently in the particular case of a single variable, or, we create one argument based on the literal bytes of the expression, or more complex things. I do also think that when we look forward to To me, there are several conflicting design axioms here. One of them is "Rust is straightforward" or something like that, but another is "idiomatic nice Rust does the right thing modulo a specific set of compiler optimizations" (and should we include this particular thing in that case). I am curious: do we have any data about how often it occurs in practice that the same variable is repeated? (I don't recall.) Thinking about it, I can certainly see the argument that "why should it work differently than repeating variable references in any other context, if that's an efficiency problem, shouldn't we address it holistically". |
|
The previous few posts make me ponder this: Is there an existing complexity we can re-use for this? Like what if we imagine that f"{a.b.foo} {a.b.bar}"desugared to something more like tuple_closure_format!("{} {}", || (&a.b.foo, &a.b.bar))Then we could say "well obviously that follows the same capture rules as closures", using the less-straight-forward-but-more-useful capturing rules we've already defined for closures. (And by not needing to do it on tokens we could have much smarter rules -- maybe even things smart about For clarity, this is currently a thought exercise, not a concrete "we should change it to work that way" proposal. There might well be reasons this doesn't work at all; I haven't thought through it in detail. |
|
OK, I've given this some more thought -- but before I write anything, I want to confirm something: The optimization we've been discussing, is the point here to remove the "extra fields" from the format args struct? Can someone link me to the PR or description? That sounds like a pretty tailored and complex optimization, is it intended to be special purpose for the format-args struct or something that would apply more generally? |
|
The PR with the optimization is #152480. |
|
@RalfJung that's a big PR, is there a shorter description of what it does and how? |
|
@nikomatsakis the relevant part is the changes to AST→HIR lowering, https://github.com/rust-lang/rust/pull/152480/changes#diff-8b601779832d29e42b24bb88ccc3a5762217d72f148e8070767ee787b271190c. the way it's implemented there is as a transformation on the AST representation of |
|
Thanks Dianne. Let me see if I can dash off a comment explaining where I stand here. The TL;DR is that my proposal would be:
This does cost some consistency, but I think in areas where consistency isn't necessarily expected. For example, it's already the case that The alternative gains in consistency but loses some optimization and backwards compatibility. It's not clear to me how much the optimization matters but I think that it may, particularly since we use format-args all over the place in code. I know that people often find the machinery is heavyweight in embedded land. I don't think the vast majority of users will care whichever way we decide here, but there will be some that are surprised by const-drops running or not running, and some who are surprised that their structs are bigger than they should be. I tend to think the former will only matter to Rust supermavens, and they can learn the way that the desugaring works, it's straightforward enough. The latter is an invisible tax across Rust codebases that may impact every user. Expanded version: I think there's definite tension between several good Rust design principles: Efficient by default -- the idiomatic, obvious Rust code should generate efficient things ~the same as what you would get if you did it by hand, or perhaps more efficient. To that end, if you desugared No need for a 'sufficiently smart compiler' -- we should not be leaning on super whiz-bang optimizations to get that efficient by default, just the "obvious" ones that compilers typically do, such as inlining, copy prop, CSE. The kind of thing you would do by hand automatically. (We kinda cheat on this one, sometimes, leaning on fancy alias analysis, I think that the work on minirust etc may let us out of that trap.) Stability without stagnation -- we should try to avoid changing behavior without a strong reason. Compiler and stdlib aren't special -- we try to expose primitives users could build themselves (or at least have a plan that they can eventually do so...). Context-free programming -- this is a tricky one, but obviously we aim to reduce the context needed for people to understand what some code will do when it executes. Probably need to either expand or refine this to be more specific. This one is somewhat aspirational but: Define through desugaring -- there should be a convenient syntax and an explicit syntax; the convenient one should desugar to the explicit one in a straightforward way. That is then used to resolve non-obvious edge cases around the convenient syntax. Looking forward, I think we want
Reading over the proposed optimization, it seems to violate "compiler isn't special", in that I don't think we would ever expect to expose that kind of test to a user-defined macro. That's not the end of the world, but it seems unfortunate. Doing no optimization violates efficient by default -- this may not matter, it's only a small thing, on its own I might say "whatever" but it also changes behavior. My inclination is to try and preserve wins when we can. I think my proposal wins on stability and perf by default; I think it is neutral towards "define through desugaring", it's a bit more of a complex desugaring, but not wildly so, and it increases consistency with some other things (e.g., The optimization loses big on stdlib isn't special and I think that's kinda worse. |
|
FWIW, Niko's “compiler and stdlib aren't special" argument has mostly won me over. If However, this is in tension with with the desire to support |
|
I've given this some thought, I've also talked to @traviscross and @joshtriplett. I'm finding that I have a hard time convincing myself one way or the other on this! I suspect that BOTH of these are, to a first approximation, true:
If you could convince me that one of those was not true -- that somebody would notice -- that'd push me one way or the other more firmly. But I'd need some data. I'm going to try and see how often repeated variables occur in practice. Assuming my assumptions are valid -- that neither is all that big a deal -- then you have two competing, but largely abstract, principles
I think both are important. I go back and forth on which I think are more important. When it comes to the "fancy compiler optimization", yeah, it kind of lets you have both, but I find it overengineered for the problem, and it expands our "scope" of what it takes to achieve efficiency. If efficiency matters that much, I might rather do it the simple way of saying "we deduplicate at the string level". It is, in a way, less surprising to me. |
FWIW, I sympathize with that. I have also gone through phases of preferring either approach to this.^^ Though IMO the compiler optimizations actually provide a nice way out of this, I find them an elegant solution (have our semantic cake and eat the perf benefits, too) -- except that @m-ou-se doesn't like them, which gives me pause. |
|
So I wrote a little script (gist) to find all string literals, count the number with ANY interpolation variables (well, braces anyhow) and then count the number with repeats. I ran it across the rust repo and got 2.5%, though that number includes tests. I'd like to run it across crates.io. Do with that what you will. |
|
I ran the script across the top 222 crates from crates.io. I found that about 5% of strings have repeated variables...
...that's actually more than I expected! It pushes me to think I am right to hold this concern. |
|
One more piece of data: I found exactly ZERO instances of repeated "capital" identifiers. e.g., |
|
If we assume that 5% is common enough that we DO want avoid an extra field for local variables, then I think it's reasonable to assume we also want to avoid an extra field for fields. I don't see why Deref impls can, technically, have side-effects. So while This further pushes me to the conclusion that the most appealing options are
I think a key variable may be how much you think it matters whether |
I do. Because although I would very likely write format!("{x} {x}")then also if I ever stumble accross this code: format!("{self.field} {self.field}")I would very likely, and spontaneously, deduplicate it myself into the following to improve readability: format!("{x} {x}", x=self.field)Which I think is, if I'm not alone, an argument in favour of syntactic deduplication. |
|
@nikomatsakis: I don't know. Presumably we still wouldn't want to deduplicate value expressions. I just struggle with the idea that Let's also dig into the axiom that the compiler and stdlib aren't special. In a world where functions return places, the lowering would need type information to deduplicate only place expressions. That seems more complicated to me than @dianne's AST → HIR lowering optimization. But if the desugaring doesn't deduplicate place expressions, then this isn't a problem. |
|
We already can't really syntactically distinguish place expressions from value expressions -- we need name resolution at least, which arguably is a semantic analysis (user-defined macros cannot do it). |
I'm working on a somewhat more precise analysis. I've instrumented rustc to capture details about format strings. I'll be running this through crater. It's built on top of @dianne's branch (in #152480) so that we can see the effect of that optimization and how much it matters whether we do it at all. So far, I've run this on a stage 2 build of the compiler and standard library (including all dependencies). In this sample, of format strings that use interpolation at all, only 0.5% have any duplicates. Of these, almost all are places, and @dianne's optimization recovers 96.1% of the size cost (i.e., all but 48 bytes total). The cost of not doing deduplication at all (i.e., not doing dianne's optimization) on the See below for the full results. |
|
Based on an instrumented top-10k crater run (in #154205, which includes 10,885 crates), here's what I found. Excluding 22 outlier crates, except where mentioned: Only 1.8% of crates and 0.1% of The median per-crate cost of not deduplicating at all is zero bytes. Considering only the 194 affected crates, the median cost is 32 bytes (mean: 46 bytes). The total cost across all these crates, summed together, is under 9KB. Including the 22 outliers, the total cost (summed across all 216 affected crates) is under 28KB. With dianne's optimization, only 27 non-outlier and 37 total crates are affected, with a total cost (summed across all affected crates) of under 1.4KB and 7.4KB, respectively. The outlier crate most affected by turning off deduplication is hddsgen at 7KB. Every single duplicate for this crate, though, is a place, so dianne's optimization would drop this cost to zero. This crate was first published 27 days ago. The next-largest outlier crate is pikpaktui, first published 6 weeks ago, at 2.8KB. This one doesn't benefit at all from the optimization. I don't mean any judgment by saying this, but these two and many earlier outlier crates I looked at seem heavily AI-generated. There's something about the way the models write format strings (and the number of them they write), at least in these outlier cases, that seems different to me than what humans do. Anyway, the full report is below. For my part, I judge the practical cost of not deduplicating as ε-zero. |
For awareness there is a crater issue regarding top-{n} not actually testing the top-n crates: |
|
checks Yup, it doesn't even compile majorly-used crates like |
|
OK. I'll schedule a full crater run then. |
View all comments
Resolves #145739
I ran crater with #149291.
While there are still a few seemingly flaky, spurious results, no crates appear to be affected by this breaking change.
The only hit from the lint was
https://github.com/multiversx/mx-sdk-rs/blob/813927c03a7b512a3c6ef9a15690eaf87872cc5c/framework/meta-lib/src/tools/rustc_version_warning.rs#L19-L30,
which performs formatting on consts of type
::semver::Version. These constants contain a nested::semver::Identifier(Version.pre.identifier) that has a custom destructor. However, this case is not impacted by the change, so no breakage is expected.