GPU-Accelerated Database Management System

This project implements a GPU-accelerated database management system that processes SQL queries using both CPU and GPU execution paths. The system uses DuckDB as the query optimizer and planner, while implementing custom GPU kernels for query execution.

Prerequisites

CUDA Toolkit (version 11.0 or higher)
CMake (version 3.10 or higher)
C++17 compatible compiler
WSL2 (if running on Windows)

Project Structure

dbms/
├── data/               # Contains CSV data files and query files
├── include/           # Header files
│   ├── constants/     # Database constants
│   ├── csv_parser/    # CSV parsing utilities
│   ├── dbms/          # Core DBMS components
│   ├── kernels/       # CUDA kernels
│   └── physical_plan/ # Physical plan execution
├── src/               # Source files
├── vendor/           # Third-party dependencies
│   └── duckdb/       # DuckDB source code
└── CMakeLists.txt     # Build configuration

Setting Up DuckDB

Clone DuckDB into the vendor directory:

cd vendor
git clone https://github.com/duckdb/duckdb.git
cd duckdb

Build DuckDB:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j

Building the Project

Create a build directory and navigate to it:

mkdir build
cd build

Configure and build the project:

cmake ..
make

Running the Project

The project requires two command-line arguments:

Path to the data folder containing CSV files
Path to the query file

Example usage:

./build/PC-project /path/to/data/folder /path/to/query.txt

For example:

./build/PC-project data data/query2.txt

The program will:

Import CSV files from the specified data folder
Execute the query from the specified query file
Run both CPU and GPU execution paths
Display performance metrics using the built-in profiler
Generate output files in the current directory:
- Team9_<query_filename>.csv: GPU execution results

For example, if you run with query1.txt, the following file will be created:

Team9_query1.csv

Data Format

The system expects CSV files in the following format:

Each table should have a corresponding CSV file in the data directory
CSV files should be named according to their table names (e.g., table_1.csv)
The first row should contain column names
Data should be comma-separated

Query Format

Queries should be written in standard SQL format and saved in a text file. The system supports:

SELECT statements
JOIN operations
WHERE clauses
Basic aggregations

Example query file (query1.txt):

SELECT * FROM table_1 JOIN table_4 ON table_1.id = table_4.id WHERE table_1.value > 100;

Performance Profiling

The system includes a built-in profiler that measures:

Total execution time
CSV import time
Logical plan generation time
Physical plan generation time
CPU execution time
GPU execution time

The profiler output will be displayed after each run.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.idea		.idea
.vscode		.vscode
data		data
data_small		data_small
include		include
queries_big		queries_big
queries_small		queries_small
script		script
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Project Document.pdf		Project Document.pdf
README.md		README.md
run_ux.sh		run_ux.sh
time.txt		time.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU-Accelerated Database Management System

Prerequisites

Project Structure

Setting Up DuckDB

Building the Project

Running the Project

Data Format

Query Format

Performance Profiling

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

kaokab33/GPU-Accelerated-Database-Management-System

Folders and files

Latest commit

History

Repository files navigation

GPU-Accelerated Database Management System

Prerequisites

Project Structure

Setting Up DuckDB

Building the Project

Running the Project

Data Format

Query Format

Performance Profiling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages