This project implements a GPU-accelerated database management system that processes SQL queries using both CPU and GPU execution paths. The system uses DuckDB as the query optimizer and planner, while implementing custom GPU kernels for query execution.
- CUDA Toolkit (version 11.0 or higher)
- CMake (version 3.10 or higher)
- C++17 compatible compiler
- WSL2 (if running on Windows)
dbms/
├── data/ # Contains CSV data files and query files
├── include/ # Header files
│ ├── constants/ # Database constants
│ ├── csv_parser/ # CSV parsing utilities
│ ├── dbms/ # Core DBMS components
│ ├── kernels/ # CUDA kernels
│ └── physical_plan/ # Physical plan execution
├── src/ # Source files
├── vendor/ # Third-party dependencies
│ └── duckdb/ # DuckDB source code
└── CMakeLists.txt # Build configuration
- Clone DuckDB into the vendor directory:
cd vendor
git clone https://github.com/duckdb/duckdb.git
cd duckdb- Build DuckDB:
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j- Create a build directory and navigate to it:
mkdir build
cd build- Configure and build the project:
cmake ..
makeThe project requires two command-line arguments:
- Path to the data folder containing CSV files
- Path to the query file
Example usage:
./build/PC-project /path/to/data/folder /path/to/query.txtFor example:
./build/PC-project data data/query2.txtThe program will:
- Import CSV files from the specified data folder
- Execute the query from the specified query file
- Run both CPU and GPU execution paths
- Display performance metrics using the built-in profiler
- Generate output files in the current directory:
Team9_<query_filename>.csv: GPU execution results
For example, if you run with query1.txt, the following file will be created:
Team9_query1.csv
The system expects CSV files in the following format:
- Each table should have a corresponding CSV file in the data directory
- CSV files should be named according to their table names (e.g.,
table_1.csv) - The first row should contain column names
- Data should be comma-separated
Queries should be written in standard SQL format and saved in a text file. The system supports:
- SELECT statements
- JOIN operations
- WHERE clauses
- Basic aggregations
Example query file (query1.txt):
SELECT * FROM table_1 JOIN table_4 ON table_1.id = table_4.id WHERE table_1.value > 100;The system includes a built-in profiler that measures:
- Total execution time
- CSV import time
- Logical plan generation time
- Physical plan generation time
- CPU execution time
- GPU execution time
The profiler output will be displayed after each run.