admin: Draft policy on use of AI coding assistants#5072
admin: Draft policy on use of AI coding assistants#5072lgritz wants to merge 2 commits intoAcademySoftwareFoundation:mainfrom
Conversation
Signed-off-by: Larry Gritz <lg@larrygritz.com>
docs/dev/AI_Policy.md
Outdated
| **AI tools may not be used to fix GitHub issues labeled as "good first | ||
| issue"**, and are strongly discouraged for "Dev Days" work. Cultivating and | ||
| educating new contributors is part of our job, and as such, we do not want | ||
| people to swoop in and use tools to trivially solve these tasks that were | ||
| curated specifically for somebody to actually learn from. | ||
|
|
There was a problem hiding this comment.
Just as some students will secretly use AI to do their assignments, I suspect the same will happen here whether we allow it or not.
What we could do, however, is only allow a contributor to pick one item from the "good first issue". After that they would need approval from the maintainers if they want to keep submitting from this list. This would prevent someone grabbing the entire list. If they use AI for their one submission, hopefully they still got to learn something about OIIO...
There was a problem hiding this comment.
Unlike a school assignment, there is literally nothing to be gained by breaking the rules here. I think we just need to let people know our expectations, and not worry about or have any overhead of allowing/enforcing this particular point. The same situation would occur if a senior developer just spent the day (without AI) doing 10 different GFIs and left none for beginners. We'd tell them to knock it off, but we wouldn't worry about putting any kind of approval system in place.
There was a problem hiding this comment.
I think some people might see plenty of gain in this. I believe some might see getting a bunch of PRs submitted and accepted as helping their prospects of getting hired. Others might see it as a gamification where their goal is to get as many PRs completed across projects so that their "score" goes up. Whether they learned anything or whether they truly care about the project might be irrelevant to them.
I'm fine with starting with simple requests, like in the proposal, and only adding in enforcement if this ever starts to become a problem.
There was a problem hiding this comment.
I see your point, @ThiagoIze. I guess we've already seen something like that in people who breeze through with a fuzz-induced crash who want to round it up to a major security issue to get the credit for finding it. I assume they are receiving some kind of accolades elsewhere?
There is nothing about this that lasts forever. If we're seeing a problem in practice, we can adjust and find some way to prevent it.
I think that orthogonal to AI tools, we can certainly say in our DD explanations that people can do more than one PR if they want, but only one should be GFI.
|
Highly recommend design project wise SKILL.md and AGENTS.md For example llama.cpp has prebuilt agents file that already define some rulesets When in our discussions that also proven my personal preferences most important is SKILL.md where defined coding style, language subsets allowed, level of abstraction, optimizations (like no inline lambdas or no hot path exceptions). I can share my cpp-hpc skill, that can be adapted for OIIO style (like a loosen a rule against lambdas 😅) |
|
Another critical rule, is fuzzing and tests. For example both OpenMeta (c++ and rust) are covered not only by a corpus tests but fuzzy tests. And LLM on every new feature added run this tests. |
|
I also would recommend to required cross review using different LLM. For example Claude Opus/Sonet code must be reviewed by GPT5.4/Gemini 3.1 or latest Kimi, GLM or Gwen. And one of the part of PR can be code-review.md files from LLMs that have not be used for a coding. We actually can define a LLM_REVIEW_RULES.md where define a question set that LLM should review a code. |
|
Regarding "Prompt" sharing. That's idea probably appears from lack of understanding how LLM assistant coding workflow actually looks like. |
docs/dev/AI_Policy.md
Outdated
| to take reasonable care that code is not copied from a source with an | ||
| incompatible license. | ||
|
|
||
| We believe that the reputable coding assistant tools automatically exclude |
There was a problem hiding this comment.
I can read this as (1) wishful thinking ;) or (2) a ironic statement that there are no reputable coding assistant tools. I don't believe anything in this paragraph can be demonstrated, or is actually implemented. I can find no evidence in any source-available assistants, and the secondary implication that such avoidance may be in the training or fine-tuning is again, not demonstrable or documented. Better just to omit I think.
There was a problem hiding this comment.
Am I wrong in reading the claims from Claude, Copilot, and maybe others as implying that they have some internal safeguards that prevent it from giving you answers that contain wholesale copying?
But I can omit that if you think it's best.
There was a problem hiding this comment.
I think we just don't know, and I certainly don't trust the documentation from those tools being honest.
meshula
left a comment
There was a problem hiding this comment.
Just a comment that we should probably not imply that coding tools are more capable than they are; there is no evidence for example that e.g. GPL based code is not in the training corpus of any closed provenance LLM.
|
I highly recommend to watch a 3blue1brown video series about neural networks and how they work https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&si=xKDbI8nsk8rrmpjT |
I'm pretty sure I said that if you pre-write a spec, you should share the spec, but if there was an extended conversation, a short summary was sufficient. Like, the following would be fine as far as I'm concerned: "I used the Coffee 5.2 model to design a rewrite of class X to change from a hash map to a sorted list with binary search, there were several iterations about the specifics, and when I was satisfied, I had it perform the analogous transformation of classes Y and Z in a totally automated fashion with no further intervention needed." But if you happened to write up a specific several paragraphs of instructions of a full spec and it proceeded from that in a nearly automated way, and you think it would be at all helpful for people to see exactly what you did, then you are encouraged to just paste the spec into the commit (and that's probably less work than writing a summary paragraph). |
We can recommend trying multiple tools to check each other as a sound practice if available, but I don't see how we can require it. It's not reasonable to assume people have access to multiple paid services as a prerequisite to using any of them. The review that counts is the human one, the AI reviews are merely advisory anyway. |
Yes, I'm hoping that as people gain experience, we'll end up populating the repo with various skills and other useful things that can be shared by the developers using the popular tools. |
Yes, this is a recommendation from practice. LLM especially in continuous session as a human can miss a critical points. |
Oh, I hope I didn't make it sound like that. I'm sure they're all trained on everything. But I am under the impression that the big paid tools like Claude Code and Copilot have some provisions built into the tooling that prevent the answers they give you from containing substantial portions exactly copied from anything. But even aside from that, in practice, my experience is that these things when operating inside an existing code base, they are so good at making code that conforms to the style and practice and idioms of the surrounding code and meshes into it well, that my intuition is that it's not a direct copy of anything elsewhere. If I saw something that looked like it didn't belong, I'd be suspicious and probably reject it or rewrite it myself. It's hard to codify that intuition, but my feeling and experience are that I can kinda tell when something it suggests is almost certainly on the safe side of the line, versus when it is suspicious. I have much less intuition, and much less trust, about asking it to create a codebase from scratch -- then I can't help but wonder where it may have been copied from. But of course, these rules are for this project, which is a large established code base into which a wholesale copying from another code base is likely to look foreign. We should also take a step back and remember where we are today, with no AI: We actually have no safeguards in place that prevent somebody from (purposely or inadvertently) not abiding by the DCO/CLA, and having their "human-written" PR actually contain code that is too similar to code elsewhere that they have seen, referred to while writing, or directly copied. We have always lived with a certain amount of that risk. It's not at all clear to me that using code assistants increases that risk. |
I think we all know how they work. The concern about GPL is that sometimes, transformers really can spit out text that's so similar to an extended passage from the corpus that it's essentially a copy, even though it was a probabilistic process that got there. It's only the end product that's important in some sense -- if we end up with a large section "copied" (or very nearly so) from a GPL project, saying "but the AI doesn't work like that" is not going to help, we'll need to fix it. Fortunately, in an established code base, I tend to see the results match the surrounding code base in style and idiom so well that it seems highly improbable that a protectable amount of the code matches some other codebase in a close to verbatim way. I'm not super worried about it in practice in this project, as perhaps I would be in a new project starting from scratch that had no surrounding code to conform to. |
|
As an overall comment, I think it's really important to state a few assumptions: The genie is out of the bottle -- we know that people are going to use these tools for code in their PRs. They already are. Most of the senior developers I know actually want to use these tools. And for the most part, I trust them to have good judgment about when and how to use them. If we don't have any guidelines, people will use them any which way, including ways we wish they wouldn't or that are detrimental to the project. But if we try to ban the tools or make the rules too burdensome, people will just lie to evade the rules. It will be hard to "catch" them, except in the most obvious and inept cases. So all we're really trying to do here is find a balance of how to communicate our values and expectations in a way that uses the lightest touch necessary to prevent most of the unwanted behaviors. So we're boiling it down to just the basics: Use coding tools if you find them helpful, but you're still on the hook for fully understanding and standing behind what you submit. Interact with the project and community yourself, not by agent. Disclose what tools you used and how (to a non-burdensome level of detail). Don't waste maintainer time with low quality PRs or interfere with other project goals like saving the curated introductory issues for actual beginners. The rest is just fleshing these points out with a little more explanation and rationale. We don't want to get TOO prescriptive, though we might have more specific recommendations over time as we gain experience with how the tools pan out in this code base. |
|
@lgritz how about create a LLM-friendly folders with Codex/Claude subfolders to use them for tailored SKILL and AGENTS files? (sadly there is no single standard there) |
Yes, that's precisely my intent. |
|
It's worth highlighting up front that the project maintainers will not even LOOK at a PR until the CLA is signed, and the CLA must be signed by an actual human or corporation. Maybe this is in place already, but the policy should be to auto-reject PRs without a signed CLA. In legit cases where the submitter signs after submission (which happens frequently), maintainers can always re-open the PR. Admittedly I do occasionally help a submitter through fixing a PR and getting the CLA signed at the same time, when they appear legit and worth helping. |
It's not frequently, it's always for new contributors. When, as CLA manager, I add an employee to the CLA list in EasyCLA, that does not complete the process. They still have to submit the PR, get the "you don't have a CLA on record" comment appended, and then get diverted to a secondary process where they have to "accept" being included in the company's CLA. And God forbid they do the submission from the wrong account or didn't have every commit signed using the very same email associated with the GH userid that the CLA system knows about. Given the current state of how the system works, auto-rejecting the entire PR of literally every first-time contributor (and every regular who makes any of several simple mistakes) feels like it will be perceived as intentionally hostile to developers. I think the auto-close suggestion could make sense if EasyCLA was completely overhauled -- by which I mean, there was a simple and foolproof way to do all the paperwork ahead of the PR submission and be 100% sure that the PR would go through without a hitch, and a totally reliable way to see while preparing your submission (but before you hit the final "submit" button) whether you are all clear from the CLA perspective. And anyway, it would only protect against a true drive-by of an agent unknown to us. It wouldn't help stop somebody who had already signed the CLA once but then turns on agents to do things using their GH credentials. |
@meshula and others, Here's one of the links that was supporting this impression for Claude: And for GitHub Copilot: |
|
Personally I wish we could just ban AI contributions outright, but I know I'm in the minority in this view. |
I hear you, brother. I wish this tech had never come along. But it's here and I don't think it's possible to keep people from using it. The best we can do is try to shape the norms of how it's used. Also, the more time goes on, the more developers I like and respect really want to use these tools, and I don't want to be on the side of not trusting them to choose their own tools and use them responsibly. And now I'm becoming one of them; I don't know exactly what role it will have in my workflow a year from now, but I know I don't want to be prevented from figuring that out. But it's on us -- through norms, rigorous code review, etc. -- to ensure that what people are doing doesn't negatively affect the quality of these projects or the way we interact with each other. |
|
I tried to make AGENTS.md file with Codex 5.4 ExtraHigh |
Claude need his own CLAUDE.md it can reference AGENTS.md but probably worth to make a tailored file. Similar for Gemini. And still worth to design (again, per vendor) SKILLs with more technical specs on coding patterns. Btw I was need to build llama.cpp on jetson and used Codex CLI for this, and a one of the first questions from him, was about contribution and warning about fully ai based commits that are prohibited in their AGENTS.md |
I'm still a beginner at this, so this is a serious question: Your AGENTS.md seems pretty generic, not model- or tool-specific, and seems like it would be totally adequate as a CLAUDE.md. So what would you imagine needs more customization per-tool rather than having the tool just refer to it and have them all share the same general instructions? |
* Brief TLDR summary of our principles at the beginning. * A little more permissive language with what's expected for disclosures, but added "Assisted-by:" sign-off suggestion as the minimum. * At the risk of being more wordy, revised the IP/DCO section to make suggestions about what factors might make people more confident that it's not copied (detailed spec or revision by human, extending existing code vs new code, seeming to tightly fit existing idioms, use of tools that make statements about copying guardrails). * Clarify that these rules don't apply to using LLMs to explain or learn about the code base, nor to asking them to review your code prior to submission (assuming you don't ask them to do fixes for you). Signed-off-by: Larry Gritz <lg@larrygritz.com>
|
@lgritz looks like AGENTS.md can be a well supported generic LLM readme. |
|
I think the policy is entirely reasonable, and don't really have concrete suggestions to improve it. From experience it's very helpful to have a policy to point to. But it still leaves tricky situations, and probably there is not much to be done than deal with it case by case.
And some other thoughts about AI contributions:
|
|
Thanks for the thoughtful response, @brechtvl. I agree with all of that. Thankfully, I think on most of the ASWF projects, we have small developer communities and can rely a lot on trust of the developers, and trust that the maintainers can make mostly-correct judgments. It may be that none of these things are real problems in practice for us, we shall see. I don't envy the position you're in with Blender, which operates at an entirely different scale and probably will be presented with thorny dilemmas on a daily basis. |
Just to close the loop, I believe that there is a general intent, but in practice https://arxiv.org/abs/2505.12546 and all the "Claude etc output Harry Potter books" stories. So my vector here is to just omit an expectation that such mechanisms might be operational :) |
Draft for comments and discussion!
Have I forgotten something important? Am I over-emphasizing something irrelevant?
Nothing is set in stone. Feedback requested.