-
Notifications
You must be signed in to change notification settings - Fork 11
Overhaul the devops side of the project. #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 0.36-live
Are you sure you want to change the base?
Conversation
|
@a5ehren is attempting to deploy a commit to the sportsdataverse Team on Vercel. A member of the Team first needs to authorize it. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe pull request modernizes packaging infrastructure by migrating from setup.py to pyproject.toml, updates GitHub Actions workflows to newer versions, and introduces defensive error handling and JSON parsing safeguards across multiple sportsdataverse modules. Additionally, it adds comprehensive test coverage for CFB data loaders and removes legacy example files while updating dependencies to newer versions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
sportsdataverse/dl_utils.py (1)
8-8: HTTP error handling inESPNHTTP.send_api_requestis safer; consider refining exceptionsWrapping
requests.getin a try/except and returning anESPNResponsesentinel (or raising whenraise_exception_on_error=True) is a solid improvement.Two refinements to consider:
- Define a library-specific exception (e.g.
ESPNHTTPError) and chain the original error:+class ESPNHTTPError(Exception): + """ESPN HTTP request failed.""" + @@ - except requests.exceptions.RequestException as e: - print(f"Request error for {base_url}: {e}") - # Return empty response with error status - data = self.espn_response(response="", status_code=500, url=base_url) - if raise_exception_on_error: - raise Exception(f'RequestError: Failed to fetch data from {base_url}: {e}') - return data + except requests.exceptions.RequestException as e: + print(f"Request error for {base_url}: {e}") + data = self.espn_response(response="", status_code=500, url=base_url) + if raise_exception_on_error: + raise ESPNHTTPError( + f"Failed to fetch data from {base_url}" + ) from e + return data
- Optionally, give
timeouta non‑None default (e.g. 10–30 seconds) so a hung endpoint can’t block indefinitely.These are quality-of-life improvements rather than blockers.
Also applies to: 174-186
tests/cfb/test_loaders.py (1)
24-32: Tighten multi-season assertions and consider relaxing message matchingTwo minor robustness points in these tests:
Multi-season checks (
load_cfb_pbp/load_cfb_schedule)
The comment says “Should have data from both seasons”, but the assertion is:assert result['season'].nunique() >= 1If both seasons are genuinely expected to be present in the underlying data,
>= 2would better reflect that intent. Otherwise, updating the comment to something like “at least one season present” would avoid confusion.SeasonNotFoundError message matching
The invalid-season tests assert on specific message substrings withmatch="season cannot be less than ..."etc. That makes tests brittle to harmless wording changes in the error messages. You might want to either:
- Drop
match=...and assert only onSeasonNotFoundError, or- Match a shorter, stable fragment (e.g., just the minimum year number).
Also applies to: 64-72, 40-43, 80-83, 117-120, 154-157
sportsdataverse/cfb/cfb_pbp.py (1)
152-161: Improve exception chaining and logging in new ESPN summary/odds handlingThe new guards around ESPN summary and odds responses are a solid robustness improvement, but a couple of small tweaks would make them cleaner and friendlier to tooling:
Chain the original JSON error in
espn_cfb_pbp
Ruff B904 is right that preserving the original exception helps distinguish JSON issues from handler bugs. You can do:- try: - summary = json.loads(summary_resp) - except (json.JSONDecodeError, ValueError) as e: - raise ValueError(f"Failed to decode JSON from {summary_url}: {e}")
try:summary = json.loads(summary_resp)except (json.JSONDecodeError, ValueError) as e:raise ValueError(f"Failed to decode JSON from {summary_url}") from e2. **Avoid `print` in `__helper__espn_cfb_odds_information__` and centralize defaults** Library code generally shouldn’t write to stdout. Since you already import `logging`, consider: ```diff - if odds_resp is None: - # spread, overUnder, homeFavorite, gameSpreadAvailable - return (2.5, 55.5, True, False) + DEFAULT_ODDS = (2.5, 55.5, True, False) + if odds_resp is None: + # spread, overUnder, homeFavorite, gameSpreadAvailable + return DEFAULT_ODDS ... - except (json.JSONDecodeError, ValueError) as e: - print(f"Error decoding odds JSON from {odds_url}: {e}") - return (2.5, 55.5, True, False) + except (json.JSONDecodeError, ValueError) as e: + logging.getLogger(__name__).warning( + "Error decoding odds JSON from %s: %s", odds_url, e + ) + return DEFAULT_ODDSand reuse
DEFAULT_ODDSfor the empty-items branch as well.These changes keep the new behavior but make errors easier to trace and keep the library quiet unless a logger is configured.
Also applies to: 225-241
sportsdataverse.egg-info/PKG-INFO (1)
3-3: Consider standard semantic versioning format.The version
0.0.36.2.32uses four segments, which deviates from standard semantic versioning (major.minor.patch). While not incorrect, this may cause compatibility issues with tools expecting three-segment versions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (21)
.github/workflows/python-publish.yml(1 hunks).github/workflows/tests.yml(1 hunks).gitignore(1 hunks).python-version(1 hunks)examples/test_mlbam.py(0 hunks)pyproject.toml(1 hunks)setup.py(0 hunks)sportsdataverse.egg-info/PKG-INFO(1 hunks)sportsdataverse.egg-info/SOURCES.txt(1 hunks)sportsdataverse.egg-info/requires.txt(1 hunks)sportsdataverse.egg-info/top_level.txt(1 hunks)sportsdataverse/cfb/cfb_pbp.py(4 hunks)sportsdataverse/cfb/cfb_schedule.py(3 hunks)sportsdataverse/cfb/cfb_teams.py(1 hunks)sportsdataverse/dl_utils.py(2 hunks)sportsdataverse/nfl/nfl_games.py(2 hunks)sportsdataverse/nfl/nfl_pbp.py(12 hunks)tests/cfb/test_loaders.py(1 hunks)tests/cfb/test_pbp.py(4 hunks)tests/cfb/test_schedule.py(1 hunks)tests/cfb/test_teams.py(1 hunks)
💤 Files with no reviewable changes (2)
- examples/test_mlbam.py
- setup.py
🧰 Additional context used
🧬 Code graph analysis (6)
sportsdataverse/cfb/cfb_schedule.py (1)
sportsdataverse/dl_utils.py (1)
underscore(67-86)
tests/cfb/test_schedule.py (1)
sportsdataverse/cfb/cfb_schedule.py (3)
espn_cfb_schedule(7-118)espn_cfb_calendar(122-184)most_recent_cfb_season(186-193)
tests/cfb/test_pbp.py (2)
sportsdataverse/cfb/cfb_pbp.py (2)
espn_cfb_pbp(133-217)run_processing_pipeline(5203-5280)sportsdataverse/nfl/nfl_pbp.py (1)
run_processing_pipeline(5248-5323)
tests/cfb/test_teams.py (1)
sportsdataverse/cfb/cfb_teams.py (1)
espn_cfb_teams(5-37)
sportsdataverse/cfb/cfb_teams.py (1)
sportsdataverse/dl_utils.py (1)
underscore(67-86)
tests/cfb/test_loaders.py (2)
sportsdataverse/cfb/cfb_loaders.py (5)
load_cfb_pbp(9-35)load_cfb_schedule(37-64)load_cfb_rosters(66-92)load_cfb_team_info(94-123)get_cfb_teams(125-138)sportsdataverse/errors.py (1)
SeasonNotFoundError(5-6)
🪛 Ruff (0.14.7)
sportsdataverse/nfl/nfl_games.py
18-18: Probable use of requests call without timeout
(S113)
21-21: Consider moving this statement to an else block
(TRY300)
24-24: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
24-24: Avoid specifying long messages outside the exception class
(TRY003)
27-27: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
27-27: Avoid specifying long messages outside the exception class
(TRY003)
66-66: Probable use of requests call without timeout
(S113)
71-71: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
71-71: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
74-74: Avoid specifying long messages outside the exception class
(TRY003)
tests/cfb/test_pbp.py
593-593: Local variable json_dict_stuff is assigned to but never used
Remove assignment to unused variable json_dict_stuff
(F841)
sportsdataverse/cfb/cfb_pbp.py
156-156: Avoid specifying long messages outside the exception class
(TRY003)
161-161: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
161-161: Avoid specifying long messages outside the exception class
(TRY003)
sportsdataverse/nfl/nfl_pbp.py
4794-4794: Avoid equality comparisons to False; use not play_df["sp"]: for false checks
Replace with not play_df["sp"]
(E712)
4794-4794: Avoid equality comparisons to True; use play_df["scrimmage_play"]: for truth checks
Replace with play_df["scrimmage_play"]
(E712)
4803-4803: Avoid equality comparisons to False; use not play_df["sp"]: for false checks
Replace with not play_df["sp"]
(E712)
4803-4803: Avoid equality comparisons to True; use play_df["scrimmage_play"]: for truth checks
Replace with play_df["scrimmage_play"]
(E712)
sportsdataverse/dl_utils.py
185-185: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
185-185: Create your own exception
(TRY002)
185-185: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (19)
.gitignore (1)
21-21: Ignore.coveragein VCSIgnoring coverage artifacts is standard and avoids committing local test outputs. Change looks good.
.python-version (1)
1-1: Default runtime 3.13 is reasonable; confirm ecosystem supportPinning
.python-versionto 3.13 aligns with the test matrix (3.9and3.x) and a>=3.9requirement. Just ensure your dependencies and tooling are validated on 3.13 as well..github/workflows/tests.yml (1)
10-29: Matrixed tests across 3.9 and latest 3.x look correctThe updated matrix and action versions correctly exercise the minimum supported Python (3.9) and the latest 3.x, and the install/test steps remain straightforward.
sportsdataverse/cfb/cfb_teams.py (1)
20-36: Defensive teams JSON handling and conditional normalization look solidInitializing
teamsto an empty DataFrame, safely walkingsports/leagues/teamsvia.get(..., [{}]), and only normalizing/renaming columns whenteamsis non‑empty gives the function a predictable “empty DataFrame on error” behavior and avoids crashes on bad API responses.sportsdataverse/nfl/nfl_pbp.py (3)
7-7: Usingimportlib.resources.filesfor model paths is a good modernizationResolving
ep_model,wp_spread, andqbr_modelviafiles("sportsdataverse").joinpath("nfl/models/...")is the right approach for pyproject-based packages and works with zipped/wheel installs, as long as those model files are included as package data.Also applies to: 21-23
3191-3567: Indexingstr.extract(...)[0]correctly normalizes extracted player columnsUpdating the various
play_df.text.str.extract(...)calls in__add_player_colsto index[0]ensures a 1‑D Series is assigned into each column (instead of a DataFrame), which aligns with pandas’ API and avoids shape mismatches. The regex patterns themselves are unchanged, so behavior is preserved with improved compatibility.
4790-4809: Resetting index after grouped cumsums prevents MultiIndex assignment issuesThe added
.reset_index(level=0, drop=True)on the grouped cumsums fordrive_play_index,prog_drive_EPA,prog_drive_WPA, anddrive_total_yardscorrectly flattens the MultiIndex produced bygroupby(...).apply(...), ensuring these Series align withplay_df’s index and avoiding assignment errors on newer pandas..github/workflows/python-publish.yml (1)
16-32: Publish workflow may need update if packaging migration to pyproject.toml is plannedIf
setup.pyis being removed as part of a modernization topyproject.toml, this workflow will fail on release. Verify whether this migration is in progress and, if so, update the build step accordingly.Consider switching to a pyproject-aware builder:
- - name: Install dependencies - run: | - python -m pip install --upgrade pip - pip install setuptools wheel twine + - name: Install build tooling + run: | - python -m pip install --upgrade pip + pip install --upgrade pip + pip install build twine - name: Build and publish env: TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} run: | - python setup.py sdist bdist_wheel + python -m build twine upload dist/*sportsdataverse/nfl/nfl_games.py (1)
17-27: Add explicit timeouts and chain underlying exceptions in NFL HTTP helpersThe new guarded flows in
nfl_token_genandnfl_game_detailsare a big improvement, but two reliability issues remain:
- Both
requests.request(...)andrequests.get(...)are invoked without a timeout, so a hung NFL endpoint can block the caller indefinitely.- Re‑raising as
ValueError(...)withoutfrom eloses the original stack/context.Consider something along these lines:
-def nfl_token_gen(): +def nfl_token_gen(timeout: float = 10.0): @@ - try: - response = requests.request("POST", url, headers=headers, data = payload) + try: + response = requests.request("POST", url, headers=headers, data=payload, timeout=timeout) response.raise_for_status() access_token = json.loads(response.content)['access_token'] return access_token except requests.exceptions.RequestException as e: print(f"Error fetching NFL token from {url}: {e}") - raise ValueError(f"Failed to generate NFL API token: {e}") + raise ValueError("Failed to generate NFL API token") from e @@ -def nfl_game_details(game_id=None, headers=None, raw=False) -> Dict: +def nfl_game_details(game_id=None, headers=None, raw=False, timeout: float = 10.0) -> Dict: @@ - try: - summary_resp = requests.get(summary_url, headers=headers) + try: + summary_resp = requests.get(summary_url, headers=headers, timeout=timeout) summary_resp.raise_for_status() summary = summary_resp.json() except requests.exceptions.RequestException as e: print(f"Error fetching NFL game details from {summary_url}: {e}") - raise ValueError(f"Failed to fetch NFL game details: {e}") + raise ValueError("Failed to fetch NFL game details") from e except json.JSONDecodeError as e: print(f"Error parsing NFL game details JSON: {e}") - raise ValueError(f"Failed to parse NFL game details: {e}") + raise ValueError("Failed to parse NFL game details") from eThis keeps the public API essentially the same while preventing indefinite hangs and improving debuggability.
Also applies to: 65-74
sportsdataverse.egg-info/requires.txt (1)
1-30: Runtime and extra dependency pins look consistent with pyprojectThe updated requirements mirror the pyproject dependency and extras configuration; no issues from a packaging standpoint as long as both stay in sync.
pyproject.toml (1)
1-71: pyproject configuration is coherent and matches the code’s needsBuild backend, metadata, runtime deps, extras, and package-data all align with the module usage (including model files); this is a solid modernization from setup.py.
sportsdataverse.egg-info/top_level.txt (1)
1-5: top_level entries are reasonable for current repo layoutIncluding docs, examples, and build artifacts here is consistent with the current source tree and shouldn’t affect runtime behavior.
sportsdataverse.egg-info/SOURCES.txt (1)
4-16: Updated SOURCES manifest matches the new packaging and examplesIncluding
pyproject.toml, Sphinx config, and example scripts here aligns with the move to modern packaging and expanded examples; nothing problematic.sportsdataverse/cfb/cfb_pbp.py (1)
7-8: Switch toimportlib.resources.filesfor model paths looks goodResolving
ep_model.model,wp_spread.model, andqbr_model.modelviafiles("sportsdataverse").joinpath(...)is an appropriate replacement for pkg_resources and works with the package-data entries in pyproject; loading them into global Boosters at import time preserves previous behavior.Also applies to: 22-33
sportsdataverse/cfb/cfb_schedule.py (1)
136-184: LGTM: Comprehensive error handling added.The defensive JSON parsing and error handling for both branches (ondays and non-ondays) is well implemented. The code properly guards against None responses, JSON decoding errors, and missing keys, with appropriate fallbacks to empty DataFrames.
tests/cfb/test_teams.py (1)
1-87: LGTM: Comprehensive test coverage for espn_cfb_teams.The test suite provides excellent coverage with tests for:
- Different parameter combinations (default, FBS, FCS)
- DataFrame structure validation
- Column naming conventions
- Data quality checks (uniqueness, non-null values)
The tests are well-structured and use appropriate assertions.
tests/cfb/test_schedule.py (2)
49-55: Good defensive testing pattern.The try/except block appropriately handles potential ESPN API failures or missing data for FCS schedules, using
pytest.skipto document the known issue rather than failing the test.
162-172: LGTM: Comprehensive logic validation.The test properly validates the season determination logic based on the current date (after August 15 or September → current year, otherwise → previous year).
sportsdataverse.egg-info/PKG-INFO (1)
31-31: The attrs package version 25.3.0 has been released and is stable. No action required.
| try: | ||
| events_txt = json.loads(resp) | ||
| events = events_txt.get('events') | ||
| if events is None: | ||
| events = [] | ||
| except (json.JSONDecodeError, ValueError) as e: | ||
| print(f"Error decoding JSON from {url}: {e}") | ||
| events = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against malformed event structures in the processing loop.
The JSON error handling correctly defaults to an empty list, but the event processing loop (lines 63-115) assumes each event has the expected nested structure (competitions[0]['competitors'][0]['team'], etc.). If the API returns events with missing or malformed keys, lines 64-91 will raise AttributeError or TypeError.
Consider wrapping the event processing in a try/except block:
for event in events:
- event.get('competitions')[0].get('competitors')[0].get('team').pop('links',None)
- event.get('competitions')[0].get('competitors')[1].get('team').pop('links',None)
- if event.get('competitions')[0].get('competitors')[0].get('homeAway')=='home':
+ try:
+ event.get('competitions')[0].get('competitors')[0].get('team').pop('links',None)
+ event.get('competitions')[0].get('competitors')[1].get('team').pop('links',None)
+ if event.get('competitions')[0].get('competitors')[0].get('homeAway')=='home':
+ # ... rest of the processing ...
+ except (AttributeError, TypeError, KeyError, IndexError) as e:
+ print(f"Skipping malformed event: {e}")
+ continueCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In sportsdataverse/cfb/cfb_schedule.py around lines 54 to 61 (and the subsequent
event loop at ~63-115), the code assumes every event has the expected nested
keys and indices which will raise TypeError/AttributeError/IndexError on
malformed events; wrap the per-event processing inside a try/except that catches
KeyError, TypeError, IndexError (or a broad Exception if preferred), log or
print a concise warning including the event id or index and the exception, and
continue to the next event, and where sensible use dict.get() and index/length
checks to provide safe defaults for missing nested fields so malformed events
are skipped without breaking the entire loop.
3cd4016 to
98fcf34
Compare
*Convert project to pyproject.toml setup, via uv *Set min python version to 3.9 *Fix some minor package deprecations in the near future *Update all dependency minimums to 3.9 resolved values *Add significant test coverage for CFB code *Add graceful error handling for web APIs *Update github actions to test multiple versions of python *Fix bugs from the new tests *Fix bugs against python 3.14
*Convert project to pyproject.toml setup, via uv
*Set min python version to 3.9
*Fix some minor package deprecations in the near future
*Update all dependency minimums to 3.9 resolved values
*Add significant test coverage for CFB code
*Add graceful error handling for web APIs
*Update github actions to test multiple versions of python
*Fix bugs from the new tests
*Fix bugs against python 3.14
Summary by CodeRabbit
Bug Fixes
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.