Add Basic Basketball Module for NBA Tracking Data #26

not-heavychevy · 2025-04-10T19:40:52Z

Description

This pull request introduces a new module, basketball, designed to process NBA tracking data. The implementation mirrors the approach used for the American Football (BigDataBowl) module by leveraging a Polars back-end and following the established unravelsports structure.

Key Changes

1. New Directory Structure (`unravel/basketball/`)

__init__.py
Exports the module components.
dataset/ Directory:
- __init__.py
- dataset.py
  Implements the BasketballDataset class. This class supports two data-loading modes:
  - URL Mode:
    Downloads a 7zip archive from the provided URL, extracts a JSON file, and loads the data.
  - Local/Game Identifier Mode:
    Loads a JSON file from the local data/nba/ directory using a game identifier (e.g., "Celtics@Lakers").
  The class processes the raw data into a Polars DataFrame with columns such as game_id, frame_id, team, player, x, and y. Additional fields (e.g., quarter, game_clock, shot_clock) are also incorporated if available.
graphs/ Directory:
- __init__.py
- pitch_dimensions.py
  Contains the BasketballPitchDimensions class, which defines court dimensions (length, width, three-point radius, basket position) using official NBA standards.
- graph_settings.py
  Contains the BasketballGraphSettings class, which stores conversion parameters including:
  - Options for normalizing coordinates.
  - Settings for constructing the adjacency matrix.
  - Maximum speeds for players and the ball.
  - Team value indicators (e.g., for defending or attacking).
- graph_converter.py
  Implements the BasketballGraphConverter class. This class:
  - Normalizes coordinates based on the court dimensions.
  - Computes node features using the normalized x and y coordinates.
  - Builds the adjacency matrix by comparing team values.
  - Calculates simple edge features (Euclidean distances between nodes).
  - Groups the data by frame_id to create a graph dictionary with keys such as "id", "x" (node features), "a" (adjacency matrix), and "e" (edge features).

2. Dynamic Feature Computations

In addition to the basic positional data, this pull request adds new dynamic computations to enrich the dataset with motion-related features. For each record, the following calculations are performed:

Coordinate Differences (dx, dy):
The change in the x and y coordinates is computed for each frame relative to the previous frame.
Velocity Components (vx, vy):
These are calculated by dividing the coordinate differences (dx, dy) by the time difference (dt). The time difference is derived from the game_clock column if available, or a default value of 1 is used.
Speed (speed):
The speed magnitude is calculated using the Euclidean norm (i.e., the square root of the sum of the squares of vx and vy).
Direction (direction):
The movement direction is computed using the arctan2 function, which returns the angle (in radians) representing the direction of movement.
Acceleration (acceleration):
Acceleration is determined as the difference in speed between the current and the previous frame, divided by the time difference (dt_acc) between these frames.

Thus, besides the basic positional coordinates, the dataset now includes dynamic features reflecting players' speed, movement direction, and acceleration.

3. Adherence to Issue Requirements

Data Loading:
The implementation supports loading data via both a URL (from a 7zip archive) and a game identifier or local file path.
Data Structure:
A Polars DataFrame is used as the primary data structure, where each row represents a player's record for a specific frame.
Essential Classes:
- BasketballDataset
- BasketballPitchDimensions
- BasketballGraphSettings
- BasketballGraphConverter
These classes are implemented to parallel existing modules (e.g., for American Football), ensuring consistency across unravelsports.
Dynamic Feature Enhancements:
The new dynamic computations (dx, dy, vx, vy, speed, direction, acceleration) are integrated into the conversion process, enriching the dataset for more in-depth analysis and modeling.

4. Integration and Testing

Integration:
The new files are integrated into the project structure with the necessary imports added in the corresponding __init__.py files.
Testing:
Basic functionality has been verified. Additional tests—similar to those in the American Football module—will be added to ensure:
- Correct data loading.
- Accuracy of both static (positional) and dynamic feature calculations.
- Proper normalization and formatting of node and edge features.

Conclusion

This pull request lays the groundwork for processing NBA tracking data by giving you a solid framework for loading and converting data into graph representations. Plus, I've added dynamic features—like velocity, movement direction, and acceleration.

UnravelSports · 2025-04-11T07:18:50Z

Hi @not-heavychevy, thanks for the awesome PR!

I think before we merge this we need to make sure the behavior of BasketballDataset, BasketballGraphConverter, BasketballGraphSettings and BasketballPitchDimensions is aligned with the existing behavior of the other sports to ensure similar API behavior across the package.

This means, for example:

BasketballDataset:
- Should be a dataclass and inherit from DefaultDataset
  - note: I like the inclusion of get_dataframe you made, but that should for example live in the DefaultDataset such that all Dataset classes can use this.
- For the source parameter we should probably be called differently (e.g. tracking_data).
- We should probably start using FileLike (an import from kloppy, which is already a dependency anyway). This would allow us to load any file type out of the box (e.g. from github, from local, from cloud etc.), and it already has support for compressed files like 7z. This way we don't have to implement this again ourselves. For an example of how this might work check out this example of loading directly from github
- Should take some of the parameters (that is, max_player_speed, max_ball_speed, max_player_acceleration and max_ball_acceleration, orient_ball_owning and sample_rate)
- self.load() should be moved inside the __post_init__. This was a change made in 0.3.0, because we can assume that if someone calls the dataset, they also want to load it.
- The dataset should have a self.settings that relies on DefaultSettings.
  - note: this would require pitch_dimensions inside DefaultSettings to also take an BasketballPitchDimensions object.
  - And, it would require the creation of this BasketballPitchDimensions class similar to
    AmericanFootballPitchDimensions.
- The dataset should thus have self.data (a Polars dataframe) and self.settings. I'm fine with adding get_dataframe and get_settings options, but these should be added to DefaultDataset
- The dataset should have it's own add_dummy_labels and add_graph_ids.
BasketballGraphConverter:
- Should be modeled after AmericanFootballGraphConverter. This means amongst other things
  - It should be using Polars, not a pandas dataframe like we do now.
  - It should have a __exprs_variables property
  - It should have a compute method
  - Applying the settings
  - Doing specific checks to ensure the data is correct
  - Allowing passing a list of graph_feature_cols
  - We should use a Group,Column and Constant approach to ensure consistency and reduce errors.
  - etc.
- As you can see in the to_graph_frames is currently different in AmericanFootballGraphConverter and SoccerGraphConverterPolars. Preferably we use the one in SoccerGraphConverterPolars because it's faster and cleaner. This is something that still needs to be updated in AmericanFootballGraphConverter.
BasketballGraphSettings should inherit from DefaultGraphSettings.
- note: DefaultGraphSettings still has parameters max_player_speed, max_ball_speed, max_player_acceleration and max_ball_acceleration. These should be ignored and will be depracted at some point together with the regular SoccerGraphConverter (non-Polars).
BasketballPitchDimensions should behave like AmericanFootballPitchDimensions as mentioned above.

Additionally, I have a question that I can't check right now. What is the behavior of the "events" right now, I see they are getting parsed from the data, but where do they go? e.g. are they joined to the correct frame somewhere, stored in a separate dataframe? I'm asking because in the other sports the Events (like passes, snaps, shots etc.) can be used to create labels for the tracking data to train on.

- Add `data` and `settings` fields (with `init=False`) to DefaultDataset base class - Implement `get_dataframe()` to return `self.data` or `None` if not loaded - Implement `get_settings()` to return `self.settings` - Preserve abstract `load()`, `add_graph_ids()`, and `add_dummy_labels()` methods for subclasses - Remove duplicated `get_dataframe`/`get_settings` definitions from individual Dataset classes

UnravelSports · 2025-04-27T05:56:14Z

Thanks @not-heavychevy.

I only just noticed that the kloppy.io open_as_file does not support 7z! So, leave that for now if you haven't found a good solution to that yet.

not-heavychevy · 2025-05-01T08:00:49Z

Thanks @UnravelSports for your follow-up!
I double-checked the current Kloppy open_as_file helpers as well and couldn’t find native 7z extraction either, so I kept the small py7zr fallback we already had for now.

Let me know if there are any other changes you’d like before this can be merged, or if the implementation is fine as it stands. I’m happy to tweak anything that’s still missing.

not-heavychevy added 9 commits April 10, 2025 14:18

added BasketballDataset class

bb7d854

added BasketballPitchDimensions class

2abeeff

added graph settings

bd59522

added optimized graph converter

8a83938

added ball handling

f5071c6

added init files

26d6d85

bugfix dataset load() bug

f2d164b

added tests

d86c0af

added additional fields computation

d1c0c73

not-heavychevy added 20 commits April 12, 2025 18:38

BasketballDataset inherits from DefaultDataset

64f5ee3

bugfix

835cd59

files read with kloppy.io

98f09ae

added norm parameters

0502aa7

refactor: move get_dataframe to DefaultDataset

d2f6b52

created post_init

53ea444

added self.settings to BasketballDataset

3482bf9

added add_dummy_labels и add_graph_ids

51a6657

rewritten tests for dataset.py

1352f80

Refactor BasketballPitchDimensions

b0fc5c1

added tests for BasketballPitchDimensions

1e04bfd

Refactor BasketballGraphSettings

627fae8

added tests for BasketballGraphSettings

1bdd740

Merge PitchDimensions and GraphSettings

7c64156

graph_settings test update

a70739c

import bugs fix

ebe0914

graph_converter refactoring

2dcd3fb

dataset separator bugfix

4b96024

added tests for graph_converter

af3a02a

moved the functionality to “features”

8a47337

not-heavychevy added 7 commits April 26, 2025 16:07

tests update

633afca

tests fix

7463b1e

Deprecate speed/acceleration thresholds

dcfa8e4

Refactor _convert to use polars methods

7eb2081

Add unified graph-export API to GraphConverter

b0b9d72

added new tests for public export API

e55d30e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Basic Basketball Module for NBA Tracking Data #26

Add Basic Basketball Module for NBA Tracking Data #26

Uh oh!

not-heavychevy commented Apr 10, 2025

Uh oh!

UnravelSports commented Apr 11, 2025 •

edited

Loading

Uh oh!

UnravelSports commented Apr 27, 2025

Uh oh!

not-heavychevy commented May 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Basic Basketball Module for NBA Tracking Data #26

Are you sure you want to change the base?

Add Basic Basketball Module for NBA Tracking Data #26

Uh oh!

Conversation

not-heavychevy commented Apr 10, 2025

Description

Key Changes

1. New Directory Structure (unravel/basketball/)

2. Dynamic Feature Computations

3. Adherence to Issue Requirements

4. Integration and Testing

Conclusion

Uh oh!

UnravelSports commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

UnravelSports commented Apr 27, 2025

Uh oh!

not-heavychevy commented May 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. New Directory Structure (`unravel/basketball/`)

UnravelSports commented Apr 11, 2025 •

edited

Loading