Describe the enhancement requested
Google searches for docs take users to older versions: https://bsky.app/profile/tanho.ca/post/3miwyhp63q22w
AI has the following recommendations, but I think that we might be able to add in custom html or make a PR to pkgdown if it doesn't work.
We should also make a PR to arrow site with the fixes for previously rendered docs
🤖 analysis below
Three things combine:
- No canonical tags on R docs — The Python/C++ docs (built with Sphinx) have
tags on every page, including old versions, pointing to the current URL. The R docs (built with pkgdown)
have none. So Google sees 24 copies of the same content and has to guess which is authoritative.
- URL changed between v12 and v13 — The schema page was Schema.html (capital S) in v12, but became
schema.html (lowercase) from v13 onward. Google indexed the old URL, it still works on v12, and the
current docs return 404 for the capitalized version. Google has no reason to switch.
- All 24 old versions are fully crawlable — No noindex, no canonical tags, no robots.txt restrictions.
The old versions collectively have more inbound links from years of Stack Overflow answers and blog
posts.
What could fix it
The most impactful approach would be:
- Add canonical tags pointing to the current (unversioned) URL on all pages
- Add noindex to old versioned docs so Google stops surfacing them
pkgdown doesn't natively support canonical tags, so this would likely need a post-build script that
injects them into the HTML after pkgdown generates the docs. The Python/C++ docs already solve this via
Sphinx's built-in canonical URL support.
Component(s)
Documentation
Describe the enhancement requested
Google searches for docs take users to older versions: https://bsky.app/profile/tanho.ca/post/3miwyhp63q22w
AI has the following recommendations, but I think that we might be able to add in custom html or make a PR to pkgdown if it doesn't work.
We should also make a PR to arrow site with the fixes for previously rendered docs
🤖 analysis below
Three things combine:
tags on every page, including old versions, pointing to the current URL. The R docs (built with pkgdown)
have none. So Google sees 24 copies of the same content and has to guess which is authoritative.
schema.html (lowercase) from v13 onward. Google indexed the old URL, it still works on v12, and the
current docs return 404 for the capitalized version. Google has no reason to switch.
The old versions collectively have more inbound links from years of Stack Overflow answers and blog
posts.
What could fix it
The most impactful approach would be:
pkgdown doesn't natively support canonical tags, so this would likely need a post-build script that
injects them into the HTML after pkgdown generates the docs. The Python/C++ docs already solve this via
Sphinx's built-in canonical URL support.
Component(s)
Documentation