-
Notifications
You must be signed in to change notification settings - Fork 31
Add Basic Basketball Module for NBA Tracking Data #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Basic Basketball Module for NBA Tracking Data #26
Conversation
|
Hi @not-heavychevy, thanks for the awesome PR! I think before we merge this we need to make sure the behavior of This means, for example:
Additionally, I have a question that I can't check right now. What is the behavior of the "events" right now, I see they are getting parsed from the data, but where do they go? e.g. are they joined to the correct frame somewhere, stored in a separate dataframe? I'm asking because in the other sports the Events (like passes, snaps, shots etc.) can be used to create labels for the tracking data to train on. |
- Add `data` and `settings` fields (with `init=False`) to DefaultDataset base class - Implement `get_dataframe()` to return `self.data` or `None` if not loaded - Implement `get_settings()` to return `self.settings` - Preserve abstract `load()`, `add_graph_ids()`, and `add_dummy_labels()` methods for subclasses - Remove duplicated `get_dataframe`/`get_settings` definitions from individual Dataset classes
|
Thanks @not-heavychevy. I only just noticed that the kloppy.io open_as_file does not support 7z! So, leave that for now if you haven't found a good solution to that yet. |
|
Thanks @UnravelSports for your follow-up! Let me know if there are any other changes you’d like before this can be merged, or if the implementation is fine as it stands. I’m happy to tweak anything that’s still missing. |
Description
This pull request introduces a new module, basketball, designed to process NBA tracking data. The implementation mirrors the approach used for the American Football (BigDataBowl) module by leveraging a Polars back-end and following the established unravelsports structure.
Key Changes
1. New Directory Structure (
unravel/basketball/)__init__.pyExports the module components.
dataset/Directory:__init__.pydataset.pyImplements the
BasketballDatasetclass. This class supports two data-loading modes:Downloads a 7zip archive from the provided URL, extracts a JSON file, and loads the data.
Loads a JSON file from the local
data/nba/directory using a game identifier (e.g.,"Celtics@Lakers").The class processes the raw data into a Polars DataFrame with columns such as
game_id,frame_id,team,player,x, andy. Additional fields (e.g.,quarter,game_clock,shot_clock) are also incorporated if available.graphs/Directory:__init__.pypitch_dimensions.pyContains the
BasketballPitchDimensionsclass, which defines court dimensions (length, width, three-point radius, basket position) using official NBA standards.graph_settings.pyContains the
BasketballGraphSettingsclass, which stores conversion parameters including:graph_converter.pyImplements the
BasketballGraphConverterclass. This class:frame_idto create a graph dictionary with keys such as"id","x"(node features),"a"(adjacency matrix), and"e"(edge features).2. Dynamic Feature Computations
In addition to the basic positional data, this pull request adds new dynamic computations to enrich the dataset with motion-related features. For each record, the following calculations are performed:
Coordinate Differences (dx, dy):
The change in the x and y coordinates is computed for each frame relative to the previous frame.
Velocity Components (vx, vy):
These are calculated by dividing the coordinate differences (dx, dy) by the time difference (
dt). The time difference is derived from thegame_clockcolumn if available, or a default value of 1 is used.Speed (speed):
The speed magnitude is calculated using the Euclidean norm (i.e., the square root of the sum of the squares of vx and vy).
Direction (direction):
The movement direction is computed using the
arctan2function, which returns the angle (in radians) representing the direction of movement.Acceleration (acceleration):
Acceleration is determined as the difference in speed between the current and the previous frame, divided by the time difference (
dt_acc) between these frames.Thus, besides the basic positional coordinates, the dataset now includes dynamic features reflecting players' speed, movement direction, and acceleration.
3. Adherence to Issue Requirements
Data Loading:
The implementation supports loading data via both a URL (from a 7zip archive) and a game identifier or local file path.
Data Structure:
A Polars DataFrame is used as the primary data structure, where each row represents a player's record for a specific frame.
Essential Classes:
BasketballDatasetBasketballPitchDimensionsBasketballGraphSettingsBasketballGraphConverterThese classes are implemented to parallel existing modules (e.g., for American Football), ensuring consistency across unravelsports.
Dynamic Feature Enhancements:
The new dynamic computations (dx, dy, vx, vy, speed, direction, acceleration) are integrated into the conversion process, enriching the dataset for more in-depth analysis and modeling.
4. Integration and Testing
Integration:
The new files are integrated into the project structure with the necessary imports added in the corresponding
__init__.pyfiles.Testing:
Basic functionality has been verified. Additional tests—similar to those in the American Football module—will be added to ensure:
Conclusion
This pull request lays the groundwork for processing NBA tracking data by giving you a solid framework for loading and converting data into graph representations. Plus, I've added dynamic features—like velocity, movement direction, and acceleration.