Skip to content

Conversation

@not-heavychevy
Copy link

Description

This pull request introduces a new module, basketball, designed to process NBA tracking data. The implementation mirrors the approach used for the American Football (BigDataBowl) module by leveraging a Polars back-end and following the established unravelsports structure.

Key Changes

1. New Directory Structure (unravel/basketball/)

  • __init__.py
    Exports the module components.

  • dataset/ Directory:

    • __init__.py

    • dataset.py
      Implements the BasketballDataset class. This class supports two data-loading modes:

      • URL Mode:
        Downloads a 7zip archive from the provided URL, extracts a JSON file, and loads the data.
      • Local/Game Identifier Mode:
        Loads a JSON file from the local data/nba/ directory using a game identifier (e.g., "Celtics@Lakers").

      The class processes the raw data into a Polars DataFrame with columns such as game_id, frame_id, team, player, x, and y. Additional fields (e.g., quarter, game_clock, shot_clock) are also incorporated if available.

  • graphs/ Directory:

    • __init__.py
    • pitch_dimensions.py
      Contains the BasketballPitchDimensions class, which defines court dimensions (length, width, three-point radius, basket position) using official NBA standards.
    • graph_settings.py
      Contains the BasketballGraphSettings class, which stores conversion parameters including:
      • Options for normalizing coordinates.
      • Settings for constructing the adjacency matrix.
      • Maximum speeds for players and the ball.
      • Team value indicators (e.g., for defending or attacking).
    • graph_converter.py
      Implements the BasketballGraphConverter class. This class:
      • Normalizes coordinates based on the court dimensions.
      • Computes node features using the normalized x and y coordinates.
      • Builds the adjacency matrix by comparing team values.
      • Calculates simple edge features (Euclidean distances between nodes).
      • Groups the data by frame_id to create a graph dictionary with keys such as "id", "x" (node features), "a" (adjacency matrix), and "e" (edge features).

2. Dynamic Feature Computations

In addition to the basic positional data, this pull request adds new dynamic computations to enrich the dataset with motion-related features. For each record, the following calculations are performed:

  • Coordinate Differences (dx, dy):
    The change in the x and y coordinates is computed for each frame relative to the previous frame.

  • Velocity Components (vx, vy):
    These are calculated by dividing the coordinate differences (dx, dy) by the time difference (dt). The time difference is derived from the game_clock column if available, or a default value of 1 is used.

  • Speed (speed):
    The speed magnitude is calculated using the Euclidean norm (i.e., the square root of the sum of the squares of vx and vy).

  • Direction (direction):
    The movement direction is computed using the arctan2 function, which returns the angle (in radians) representing the direction of movement.

  • Acceleration (acceleration):
    Acceleration is determined as the difference in speed between the current and the previous frame, divided by the time difference (dt_acc) between these frames.

Thus, besides the basic positional coordinates, the dataset now includes dynamic features reflecting players' speed, movement direction, and acceleration.

3. Adherence to Issue Requirements

  • Data Loading:
    The implementation supports loading data via both a URL (from a 7zip archive) and a game identifier or local file path.

  • Data Structure:
    A Polars DataFrame is used as the primary data structure, where each row represents a player's record for a specific frame.

  • Essential Classes:

    • BasketballDataset
    • BasketballPitchDimensions
    • BasketballGraphSettings
    • BasketballGraphConverter

    These classes are implemented to parallel existing modules (e.g., for American Football), ensuring consistency across unravelsports.

  • Dynamic Feature Enhancements:
    The new dynamic computations (dx, dy, vx, vy, speed, direction, acceleration) are integrated into the conversion process, enriching the dataset for more in-depth analysis and modeling.

4. Integration and Testing

  • Integration:
    The new files are integrated into the project structure with the necessary imports added in the corresponding __init__.py files.

  • Testing:
    Basic functionality has been verified. Additional tests—similar to those in the American Football module—will be added to ensure:

    • Correct data loading.
    • Accuracy of both static (positional) and dynamic feature calculations.
    • Proper normalization and formatting of node and edge features.

Conclusion

This pull request lays the groundwork for processing NBA tracking data by giving you a solid framework for loading and converting data into graph representations. Plus, I've added dynamic features—like velocity, movement direction, and acceleration.

@UnravelSports
Copy link
Owner

UnravelSports commented Apr 11, 2025

Hi @not-heavychevy, thanks for the awesome PR!

I think before we merge this we need to make sure the behavior of BasketballDataset, BasketballGraphConverter, BasketballGraphSettings and BasketballPitchDimensions is aligned with the existing behavior of the other sports to ensure similar API behavior across the package.

This means, for example:

  • BasketballDataset:
    • Should be a dataclass and inherit from DefaultDataset
      • note: I like the inclusion of get_dataframe you made, but that should for example live in the DefaultDataset such that all Dataset classes can use this.
    • For the source parameter we should probably be called differently (e.g. tracking_data).
    • We should probably start using FileLike (an import from kloppy, which is already a dependency anyway). This would allow us to load any file type out of the box (e.g. from github, from local, from cloud etc.), and it already has support for compressed files like 7z. This way we don't have to implement this again ourselves. For an example of how this might work check out this example of loading directly from github
    • Should take some of the parameters (that is, max_player_speed, max_ball_speed, max_player_acceleration and max_ball_acceleration, orient_ball_owning and sample_rate)
    • self.load() should be moved inside the __post_init__. This was a change made in 0.3.0, because we can assume that if someone calls the dataset, they also want to load it.
    • The dataset should have a self.settings that relies on DefaultSettings.
      • note: this would require pitch_dimensions inside DefaultSettings to also take an BasketballPitchDimensions object.
      • And, it would require the creation of this BasketballPitchDimensions class similar to
        AmericanFootballPitchDimensions.
    • The dataset should thus have self.data (a Polars dataframe) and self.settings. I'm fine with adding get_dataframe and get_settings options, but these should be added to DefaultDataset
    • The dataset should have it's own add_dummy_labels and add_graph_ids.
  • BasketballGraphConverter:
    • Should be modeled after AmericanFootballGraphConverter. This means amongst other things
      • It should be using Polars, not a pandas dataframe like we do now.
      • It should have a __exprs_variables property
      • It should have a compute method
      • Applying the settings
      • Doing specific checks to ensure the data is correct
      • Allowing passing a list of graph_feature_cols
      • We should use a Group,Column and Constant approach to ensure consistency and reduce errors.
      • etc.
    • As you can see in the to_graph_frames is currently different in AmericanFootballGraphConverter and SoccerGraphConverterPolars. Preferably we use the one in SoccerGraphConverterPolars because it's faster and cleaner. This is something that still needs to be updated in AmericanFootballGraphConverter.
  • BasketballGraphSettings should inherit from DefaultGraphSettings.
    • note: DefaultGraphSettings still has parameters max_player_speed, max_ball_speed, max_player_acceleration and max_ball_acceleration. These should be ignored and will be depracted at some point together with the regular SoccerGraphConverter (non-Polars).
  • BasketballPitchDimensions should behave like AmericanFootballPitchDimensions as mentioned above.

Additionally, I have a question that I can't check right now. What is the behavior of the "events" right now, I see they are getting parsed from the data, but where do they go? e.g. are they joined to the correct frame somewhere, stored in a separate dataframe? I'm asking because in the other sports the Events (like passes, snaps, shots etc.) can be used to create labels for the tracking data to train on.

- Add `data` and `settings` fields (with `init=False`) to DefaultDataset base class
- Implement `get_dataframe()` to return `self.data` or `None` if not loaded
- Implement `get_settings()` to return `self.settings`
- Preserve abstract `load()`, `add_graph_ids()`, and `add_dummy_labels()` methods for subclasses
- Remove duplicated `get_dataframe`/`get_settings` definitions from individual Dataset classes
@UnravelSports
Copy link
Owner

Thanks @not-heavychevy.

I only just noticed that the kloppy.io open_as_file does not support 7z! So, leave that for now if you haven't found a good solution to that yet.

@not-heavychevy
Copy link
Author

Thanks @UnravelSports for your follow-up!
I double-checked the current Kloppy open_as_file helpers as well and couldn’t find native 7z extraction either, so I kept the small py7zr fallback we already had for now.

Let me know if there are any other changes you’d like before this can be merged, or if the implementation is fine as it stands. I’m happy to tweak anything that’s still missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants