Conversation
📝 WalkthroughWalkthroughThis PR replaces the whitelist concept with a generalized data-access model: Changes
Sequence Diagram(s)sequenceDiagram
participant Frontend as Frontend (client)
participant API as Backend API (misc.views)
participant Utils as Business Logic (misc.utils)
participant DB as Database (UserDataAccess table)
Frontend->>API: GET /get-data-access-status?post_id&project_id
API->>Utils: get_data_access_status(user, post_id, project_id)
Utils->>DB: query UserDataAccess (filters: user, project/post/null, view_user_data, api_access_tier)
DB-->>Utils: matching entries (exists/values)
Utils-->>API: (has_data_access, view_deanonymized_data)
API-->>Frontend: JSON { has_data_access, view_deanonymized_data }
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable poems in the walkthrough.Disable the |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comments/services/feed.py`:
- Around line 73-79: The current checks treat author_is_staff as truthy so False
is treated like "not provided"; update the conditional logic to detect presence
and explicit True/False values: use "author_is_staff is not None" to detect a
provided boolean and "author_is_staff is True" / "author_is_staff is False" for
behavior decisions. Concretely, change the branch conditions around author and
author_is_staff (the if that currently reads "if author is not None and
author_is_staff", the "elif author_is_staff", and related qs.filter calls) to
explicitly check for is not None and compare to True/False, and implement the
corresponding filters (author_id, author__is_staff=True, author__is_staff=False,
and parent=None where needed).
In `@users/serializers.py`:
- Around line 138-142: get_reduced_api_restriction_projects is returning
duplicate project IDs and loads full WhitelistUser objects; change the query on
user.whitelists to select only project_id and deduplicate in the DB by using
values_list('project_id', flat=True).distinct() combined with the existing
project_id__isnull=False filter so the method returns a lean, unique list of
project IDs without instantiating full model instances.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 912be90c-bc22-4669-8060-485dd631298d
📒 Files selected for processing (8)
comments/serializers/common.pycomments/services/feed.pymisc/migrations/0008_whitelistuser_view_forecaster_data.pymisc/models.pymisc/utils.pyusers/migrations/0016_user_api_access_tier.pyusers/models.pyusers/serializers.py
🚀 Preview EnvironmentYour preview environment is ready!
Details
ℹ️ Preview Environment InfoIsolation:
Limitations:
Cleanup:
|
addresses main site parts of primary spec of #4466 add bot_benchmarking to api access tiers add author_is_staff optional param to comments endpoint
…/api-access/comments-and-bot_benchmarking
2e0916b to
55a2c5f
Compare
…/api-access/endpoint-updates
…from user data permissions - Rename WhitelistUser model to UserDataAccess across backend, frontend, and migrations - Replace view_forecaster_data field with view_user_data (default False) to explicitly gate user-level data access, separate from API tier grants - Add api_access_tier field to UserDataAccess for project/post-scoped API tier overrides - Extract ApiAccessTier enum to users/constants.py - Rename all related identifiers: whitelists -> data_accesses, is_whitelisted -> has_data_access, get-whitelist-status -> get-data-access-status - Update /users/me endpoint: rename reduced_api_restriction_projects to project_data_access, only return it when ?with_data_access=true is passed - Migration backfills view_user_data=True for all existing rows
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
misc/utils.py (1)
27-45:⚠️ Potential issue | 🟠 MajorCheck all projects attached to the post, not just
default_project.
utils/views.py:156-172now treats anypost.projectsmembership as sufficient forhas_data_access. Here, Lines 29-31 only matchpost.default_project, and Lines 37-45 reuse that same narrowed project for the admin shortcut. Users whose access is granted through a non-default project on the post will get a false negative from this helper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@misc/utils.py` around lines 27 - 45, The helper narrows checks to post.default_project which misses permissions granted via other projects on a post; update the logic in the has_data_access helper to iterate over post.projects (or use post.projects.all()) instead of using post.default_project: when post_id is set, collect data_access_entries for every project in post.projects and also check ProjectUserPermission.objects.filter(user=user, project__in=post.projects.all(), permission=ObjectPermission.ADMIN).exists() so the admin shortcut and data_access_entries include all projects attached to the post rather than only the default_project.utils/csv_utils.py (1)
214-236:⚠️ Potential issue | 🔴 CriticalTighten score scoping for
only_include_user_idsand anonymous callers.
user_forecastsis re-scoped after Line 146, but the score branch is not. If a non-privileged caller providesonly_include_user_ids, Lines 221-228 will return those users’ score rows. And whenuserisNone, Line 232 collapses toQ(user__isnull=True) | Q(), which matches every score. Reapply the caller restriction after the optional ID filter, and use onlyQ(user__isnull=True)for anonymous exports.Possible tightening
- elif only_include_user_ids: + elif only_include_user_ids: + allowed_user_ids = set(only_include_user_ids) + if not (has_data_access or is_staff): + allowed_user_ids &= {user.id} if user else set() # only include user-specific scores for the given user_ids scores = scores.filter( - Q(user_id__in=only_include_user_ids) | Q(user__isnull=True) + Q(user_id__in=allowed_user_ids) | Q(user__isnull=True) ) archived_scores = archived_scores.filter( - Q(user_id__in=only_include_user_ids) | Q(user__isnull=True) + Q(user_id__in=allowed_user_ids) | Q(user__isnull=True) ) elif not (has_data_access or is_staff): # only include user-specific scores for the logged-in user scores = scores.filter( - Q(user__isnull=True) | (Q(user=user) if user else Q()) + Q(user__isnull=True) + | (Q(user=user) if user else Q(user__isnull=True)) ) archived_scores = archived_scores.filter( - Q(user__isnull=True) | (Q(user=user) if user else Q()) + Q(user__isnull=True) + | (Q(user=user) if user else Q(user__isnull=True)) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@utils/csv_utils.py` around lines 214 - 236, The scores/archived_scores query branch incorrectly allows broader access when only_include_user_ids is set and when user is None; update the logic so that after applying the optional only_include_user_ids filter you reapply the caller restriction when not (has_data_access or is_staff), and for anonymous callers (user is None) use only Q(user__isnull=True) instead of Q(user__isnull=True) | Q(), i.e. ensure scores and archived_scores are additionally filtered to (Q(user__isnull=True) | Q(user=user)) for logged-in callers and to Q(user__isnull=True) for anonymous callers while still honoring only_include_user_ids.
🧹 Nitpick comments (1)
misc/models.py (1)
111-120: Disallow deanonymized-only grants that never take effect.Lines 111-119 allow
view_deanonymized_data=Truewhileview_user_data=False, butmisc/utils.pyfilters entries byview_user_data=Truebefore it checks deanonymization. That makes these rows look more permissive in admin than they are at runtime. A smallCheckConstraintor matching admin validation would keep the permission matrix consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@misc/models.py` around lines 111 - 120, Add a constraint and/or admin validation to prevent rows where view_deanonymized_data is True while view_user_data is False: in the model that defines the fields view_user_data and view_deanonymized_data, add a CheckConstraint enforcing "NOT view_deanonymized_data OR view_user_data" (i.e., view_deanonymized_data implies view_user_data) and add matching clean()/ModelAdmin form validation to reject or warn on such combinations so admin UI and runtime filtering remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@misc/utils.py`:
- Around line 27-45: The helper narrows checks to post.default_project which
misses permissions granted via other projects on a post; update the logic in the
has_data_access helper to iterate over post.projects (or use
post.projects.all()) instead of using post.default_project: when post_id is set,
collect data_access_entries for every project in post.projects and also check
ProjectUserPermission.objects.filter(user=user, project__in=post.projects.all(),
permission=ObjectPermission.ADMIN).exists() so the admin shortcut and
data_access_entries include all projects attached to the post rather than only
the default_project.
In `@utils/csv_utils.py`:
- Around line 214-236: The scores/archived_scores query branch incorrectly
allows broader access when only_include_user_ids is set and when user is None;
update the logic so that after applying the optional only_include_user_ids
filter you reapply the caller restriction when not (has_data_access or
is_staff), and for anonymous callers (user is None) use only
Q(user__isnull=True) instead of Q(user__isnull=True) | Q(), i.e. ensure scores
and archived_scores are additionally filtered to (Q(user__isnull=True) |
Q(user=user)) for logged-in callers and to Q(user__isnull=True) for anonymous
callers while still honoring only_include_user_ids.
---
Nitpick comments:
In `@misc/models.py`:
- Around line 111-120: Add a constraint and/or admin validation to prevent rows
where view_deanonymized_data is True while view_user_data is False: in the model
that defines the fields view_user_data and view_deanonymized_data, add a
CheckConstraint enforcing "NOT view_deanonymized_data OR view_user_data" (i.e.,
view_deanonymized_data implies view_user_data) and add matching
clean()/ModelAdmin form validation to reject or warn on such combinations so
admin UI and runtime filtering remain consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b4f09573-398e-498b-8ce8-e437f1a7a28f
📒 Files selected for processing (17)
front_end/src/app/(main)/questions/[id]/components/download_question_data_modal/index.tsxfront_end/src/services/api/posts/posts.shared.tsfront_end/src/types/utils.tsmisc/admin.pymisc/migrations/0008_whitelistuser_api_access_tier.pymisc/models.pymisc/urls.pymisc/utils.pymisc/views.pyusers/constants.pyusers/models.pyusers/serializers.pyusers/views.pyutils/csv_utils.pyutils/serializers.pyutils/tasks.pyutils/views.py
| migrations.AddField( | ||
| model_name="userdataaccess", | ||
| name="api_access_tier", | ||
| field=models.CharField( | ||
| choices=[ | ||
| ("restricted", "Restricted"), | ||
| ("bot_benchmarking", "Bot Benchmarking"), | ||
| ("unrestricted", "Unrestricted"), | ||
| ], | ||
| default="restricted", | ||
| help_text="Indicates the API access tier relevant to this data access entry.", | ||
| max_length=32, | ||
| ), | ||
| ), | ||
| migrations.AddField( | ||
| model_name="userdataaccess", | ||
| name="view_user_data", | ||
| field=models.BooleanField( | ||
| default=False, | ||
| help_text=( | ||
| "If True, the user can view user-level data (e.g., download datasets " | ||
| "with user-level information included). If False, the user can only access " | ||
| "aggregated data or anonymized user-level data." | ||
| ), | ||
| ), | ||
| ), | ||
| migrations.RunPython( | ||
| set_existing_view_user_data, | ||
| migrations.RunPython.noop, | ||
| ), |
There was a problem hiding this comment.
Backfill api_access_tier for renamed whitelist entries.
After the rename, Line 23 adds api_access_tier with a default of "restricted", and the only RunPython here backfills view_user_data. Existing WhitelistUser rows will therefore land in the lowest tier, which drops the project-specific API access those records previously represented. Please add an explicit tier backfill for legacy rows before relying on "restricted" as the default for new entries.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
misc/migrations/0008_whitelistuser_api_access_tier.py (1)
23-35:⚠️ Potential issue | 🟠 MajorBackfill
api_access_tierfor existing rows before relying on the new default.Line 32 sets
"restricted"as the default, and the only data migration at Line 49-52 updatesview_user_dataonly. Existing renamed whitelist entries can lose prior API-tier behavior unlessapi_access_tieris explicitly migrated.Suggested migration update
+def set_existing_api_access_tier(apps, schema_editor): + UserDataAccess = apps.get_model("misc", "UserDataAccess") + UserDataAccess.objects.all().update(api_access_tier="benchmarking") + + class Migration(migrations.Migration): @@ migrations.AddField( model_name="userdataaccess", name="api_access_tier", @@ default="restricted", @@ ), + migrations.RunPython( + set_existing_api_access_tier, + migrations.RunPython.noop, + ), migrations.RunPython( set_existing_view_user_data, migrations.RunPython.noop, ),Also applies to: 49-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@misc/migrations/0008_whitelistuser_api_access_tier.py` around lines 23 - 35, The migration adds userdataaccess.api_access_tier with default "restricted" but does not backfill existing rows, so add a RunPython data migration in this migration (or immediately after) that iterates UserDataAccess (model_name="userdataaccess") and sets api_access_tier based on existing state (e.g., preserve prior whitelist/legacy tier fields or map existing boolean/enum fields to "restricted"/"benchmarking"/"unrestricted"), implement an idempotent forward function and a no-op or reversible backward function, and ensure this RunPython runs before any code that relies on the new column (also update the existing data migration that updates view_user_data at lines ~49-52 to include api_access_tier backfill for renamed whitelist entries). Ensure the migration imports migrations.RunPython and uses the model retrieved via apps.get_model to avoid import-time models.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@users/migrations/0016_user_api_access_tier.py`:
- Around line 17-20: Create a new Django data migration that updates existing
rows where users_user.api_access_tier == "bot_benchmarking" to the new value
"benchmarking": implement a RunPython forwards function that queries the User
model (apps.get_model("users", "User")) and performs an
update(queryset.update(api_access_tier="benchmarking")) for those records;
include a reverse function that optionally maps "benchmarking" back to
"bot_benchmarking" (or leave as noop if irreversible) and add the RunPython
operation to the migration's operations list so already-migrated DBs are
corrected.
---
Duplicate comments:
In `@misc/migrations/0008_whitelistuser_api_access_tier.py`:
- Around line 23-35: The migration adds userdataaccess.api_access_tier with
default "restricted" but does not backfill existing rows, so add a RunPython
data migration in this migration (or immediately after) that iterates
UserDataAccess (model_name="userdataaccess") and sets api_access_tier based on
existing state (e.g., preserve prior whitelist/legacy tier fields or map
existing boolean/enum fields to "restricted"/"benchmarking"/"unrestricted"),
implement an idempotent forward function and a no-op or reversible backward
function, and ensure this RunPython runs before any code that relies on the new
column (also update the existing data migration that updates view_user_data at
lines ~49-52 to include api_access_tier backfill for renamed whitelist entries).
Ensure the migration imports migrations.RunPython and uses the model retrieved
via apps.get_model to avoid import-time models.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 26a03672-fcc7-4247-84cf-c06b91c0ca58
📒 Files selected for processing (3)
misc/migrations/0008_whitelistuser_api_access_tier.pyusers/constants.pyusers/migrations/0016_user_api_access_tier.py
| choices=[ | ||
| ("restricted", "Restricted"), | ||
| ("bot_benchmarking", "Bot Benchmarking"), | ||
| ("benchmarking", "Benchmarking"), | ||
| ("unrestricted", "Unrestricted"), |
There was a problem hiding this comment.
Add a forward data migration for legacy bot_benchmarking values.
Line 19 updates the choice value, but already-migrated databases won’t rewrite stored values automatically. Rows still containing bot_benchmarking can fail enum-based checks after this rename.
Please add a new forward migration that updates existing users_user.api_access_tier values from bot_benchmarking to benchmarking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@users/migrations/0016_user_api_access_tier.py` around lines 17 - 20, Create a
new Django data migration that updates existing rows where
users_user.api_access_tier == "bot_benchmarking" to the new value
"benchmarking": implement a RunPython forwards function that queries the User
model (apps.get_model("users", "User")) and performs an
update(queryset.update(api_access_tier="benchmarking")) for those records;
include a reverse function that optionally maps "benchmarking" back to
"bot_benchmarking" (or leave as noop if irreversible) and add the RunPython
operation to the migration's operations list so already-migrated DBs are
corrected.
elisescu
left a comment
There was a problem hiding this comment.
Had some inline comments.
| related_name="data_accesses", | ||
| help_text="Optional. Scopes this entry to a specific project. " | ||
| "The API access tier will apply to this project if it exceeds the user's " | ||
| "base tier. If neither project nor post is set, the entry applies globally.", |
There was a problem hiding this comment.
If neither project nor post is set, the entry applies globally."
I believe that should not be the case, and if both the project and the post are missing, then the entry is just ignored and the user gets its access tier from the User table. Or am I missing something?
| return None | ||
| entries = ( | ||
| user.data_accesses.filter(project_id__isnull=False) | ||
| .exclude(api_access_tier=ApiAccessTier.RESTRICTED) |
There was a problem hiding this comment.
.exclude(api_access_tier=ApiAccessTier.RESTRICTED)
Not sure we should do that? If we don't do it, then we could use this mechanism to restrict a user's global access for certain projects. Unlikely we'll need it but the code looks cleaner in that case, I would say
| return user.has_usable_password() | ||
|
|
||
| def get_project_data_access(self, user: User): | ||
| if not self.context.get("with_data_access"): |
There was a problem hiding this comment.
Would it not be better to not populate the project_data_access serialised field at all when the param is missing, instead of setting it to null?
addresses main site part of optional feature of #4466
followup to #4488
Summary
Renames WhitelistUser model to UserDataAccess and decouples API access tier grants from user-level data permissions.
Key changes
Model rename: WhitelistUser → UserDataAccess, with related_name updated from whitelists to data_accesses across all FKs
New view_user_data field: Replaces view_forecaster_data. Previously, the existence of a whitelist entry implied user data access. Now entries can exist solely for API tier overrides — only entries with view_user_data=True grant user-level data access. Existing rows are backfilled to True.
ApiAccessTier extracted to users/constants.py; BOT_BENCHMARKING renamed to BENCHMARKING
/users/me endpoint: reduced_api_restriction_projects replaced with project_data_access, only returned when ?with_data_access=true is passed
Renamed API surface: get-whitelist-status/ → get-data-access-status/, response key is_whitelisted → has_data_access
Frontend updated to match all backend renames (types, API client, download modal)
All changes consolidated into the existing 0008 migration
Summary by CodeRabbit
Release Notes
New Features
Refactor