Skip to content

HF Upload guide and storage location updates from Collab Guide#24

Merged
egrace479 merged 5 commits intomainfrom
src-upstream
Apr 30, 2026
Merged

HF Upload guide and storage location updates from Collab Guide#24
egrace479 merged 5 commits intomainfrom
src-upstream

Conversation

@egrace479
Copy link
Copy Markdown
Member

@egrace479 egrace479 commented Apr 29, 2026

hlapp and others added 3 commits April 28, 2026 21:09
Initially we singled out Google Drive, but it's not Google Drive specifically. Instead any cloud storage or drive tied to an institutional user is not acceptable for the already stated reasons; institutional users can and do go away, resulting in impermanence by design.
Pull from Collab Guide [PR 65](Imageomics/Collaborative-distributed-science-guide#65)

* Initial re-organization of page, includes integrity checks and more links to docs
WIP refining

* Add in links, fix typos

* Additional notes in sections, refs out

* Add updated screenshot with arrow
match UI change to 'contribute'

* Remove extra outdated content, update URLs for new versions, run linting

* Clarify UI upload cases

Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>

* Clarify CLI upload use case

Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>

* Add pointer for considerations with large folder uploads

---------

Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>
@egrace479 egrace479 requested a review from NetZissou April 29, 2026 01:14
@egrace479 egrace479 added the src-upstream Update coming from the upstream repo label Apr 29, 2026
Comment thread docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md Outdated
Comment thread docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md Outdated
Comment thread docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md Outdated
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Pull from Collab Guide [PR 69](Imageomics/Collaborative-distributed-science-guide#69)

* Restructure citation template section for improved clarity

set up a standard vs extended citation
also adds references as a key in Zenodo metadata template, with citation pointer recommending use there

* Add support for content tabs

used for citation CFF templates

* Use conference-paper as default for preferred-citation

include examples and exlanations for keys

* fix formatting for citation file examples

* Shorten note before template citation files

Move preferred-citation notes to tab where it's included

* Replace note about yaml validator tool with note to check format on branch

validator tool checks yaml, not citation cff format

* Create a subsubsection for citation templates, reduce admonition use to avoid clutter

* Fix formatting of ORCID field

* Clarify use of ORCID number only for zenodo JSON, different from CFF field

* Add comment about when to update commit hash
@egrace479 egrace479 marked this pull request as ready for review April 29, 2026 17:45
@egrace479
Copy link
Copy Markdown
Member Author

egrace479 commented Apr 29, 2026

@NetZissou, this is now ready; I pulled the latest update. Note that it did require conflict resolution to reset to "ABC-Center", so please review.

Copy link
Copy Markdown

@NetZissou NetZissou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

* For already published data usage, see the [Metadata Checklist](Metadata-Checklist.md).
* **ML Models:** Hugging Face Model Repository ([Model checklist](Model-Checklist.md)).
* Though alternative storage options may be discussed, **Google Drive is not an acceptable storage location for research data, models, or code**. Folder activity does not include actual file additions or deletions, so content can be changed or removed without a record of when or by whom. All research, data, models, and code must be stored in **a version controlled repository, preferably in more than one location** to ensure preservation and full provenance tracking.
* Though alternative storage options may be discussed, **Google Drive, OneDrive, and other institutional user-tied locations are not an acceptable storage location for research data, models, or code**. Folder activity does not include actual file additions or deletions, so content can be changed or removed without a record of when or by whom. All research, data, models, and code must be stored in **a version controlled repository, preferably in more than one location** to ensure preservation and full provenance tracking.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@egrace479 egrace479 merged commit 8cac3f1 into main Apr 30, 2026
1 check passed
@egrace479 egrace479 deleted the src-upstream branch April 30, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

src-upstream Update coming from the upstream repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants