diff --git a/.dockerignore b/.dockerignore index f995b72..aba4a6b 100644 --- a/.dockerignore +++ b/.dockerignore @@ -3,6 +3,9 @@ .gitignoreold .dockerignore .tool-versions +compose.yaml +compose.yml +docker-compose.yaml docker-compose.yml envs/ .venv/ diff --git a/ADMIN_FEATURES.md b/ADMIN_FEATURES.md index 98de7d2..7112394 100644 --- a/ADMIN_FEATURES.md +++ b/ADMIN_FEATURES.md @@ -42,7 +42,7 @@ When you submit a date: 2. The task runs asynchronously in the background 3. You'll see confirmation: "{date} submitted" 4. Monitor progress via: - - Celery logs: `docker-compose logs -f celery` + - Celery logs: `docker compose logs -f celery` - Flower dashboard: http://localhost:5555 ## Scheduled ETL @@ -66,4 +66,4 @@ Note: The system also runs ETL automatically: For more admin capabilities, use: - **Flower** (http://localhost:5555): Monitor Celery tasks - **Database queries**: Direct PostgreSQL access for data inspection -- **API endpoints**: Public API for viewing statistics at /api/ \ No newline at end of file +- **API endpoints**: Public API for viewing statistics at /api/ diff --git a/CLAUDE.md b/CLAUDE.md index 46500c6..b14cc4b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -106,7 +106,7 @@ PyPIStats.org is a Flask-based web application that provides analytics and visua ### Setup: ```bash -make pypistats # Launch complete dev environment with docker-compose +make pypistats # Launch complete dev environment with Docker Compose ``` ### Project Structure: @@ -155,7 +155,7 @@ pypistats/ ## Deployment ### Docker: -- Dockerfile and docker-compose.yml provided +- Dockerfile and compose.yaml provided - docker-entrypoint.sh for container initialization ### Kubernetes: @@ -179,4 +179,4 @@ pypistats/ - All timestamps are in UTC - Package statistics exclude known mirror downloads by default - Maximum lookback period is 180 days to manage database size -- Uses BigQuery's public PyPI dataset (bigquery-public-data.pypi.file_downloads) \ No newline at end of file +- Uses BigQuery's public PyPI dataset (bigquery-public-data.pypi.file_downloads) diff --git a/CONFIGURATION.md b/CONFIGURATION.md index 59338f2..43d2d49 100644 --- a/CONFIGURATION.md +++ b/CONFIGURATION.md @@ -49,7 +49,7 @@ Web server configuration that uses: - `WEB_CONCURRENCY` for worker count - `LOG_LEVEL` for logging verbosity -### Docker Compose (`docker-compose.yml`) +### Docker Compose (`compose.yaml`) Provides default values for local development: - PostgreSQL: `admin`/`root` on port 5433 - Redis: port 6379 diff --git a/ETL_TESTING.md b/ETL_TESTING.md index 78ffc7e..e8c2c99 100644 --- a/ETL_TESTING.md +++ b/ETL_TESTING.md @@ -17,7 +17,7 @@ 1. Start the services: ```bash - docker-compose up -d + docker compose up -d ``` 2. Access the admin panel: @@ -33,16 +33,16 @@ 1. Start all services: ```bash - docker-compose up -d + docker compose up -d ``` 2. Trigger the ETL task manually: ```bash # Run ETL for yesterday's data (default) - docker-compose exec celery python -c "from pypistats.tasks.pypi import etl; etl.delay()" + docker compose exec celery python -c "from pypistats.tasks.pypi import etl; etl.delay()" # Run ETL for a specific date - docker-compose exec celery python -c "from pypistats.tasks.pypi import etl; etl.delay('2025-08-13')" + docker compose exec celery python -c "from pypistats.tasks.pypi import etl; etl.delay('2025-08-13')" ``` 3. Monitor the task in Flower: @@ -54,12 +54,12 @@ 1. Start services: ```bash - docker-compose up -d + docker compose up -d ``` 2. Enter Flask shell: ```bash - docker-compose exec web flask shell + docker compose exec web flask shell ``` 3. Run the ETL function directly (synchronously): @@ -79,7 +79,7 @@ 1. Create a test script: ```bash - docker-compose exec web python + docker compose exec web python ``` 2. Test the BigQuery connection: @@ -113,22 +113,22 @@ ### Check Celery Logs ```bash -docker-compose logs -f celery +docker compose logs -f celery ``` ### Check Celery Beat Schedule ```bash -docker-compose logs -f beat +docker compose logs -f beat ``` ### Verify Database Tables ```bash -docker-compose exec postgresql psql -U admin -d pypistats -c "\dt" +docker compose exec postgresql psql -U admin -d pypistats -c "\dt" ``` ### Check Recent Downloads ```bash -docker-compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM overall ORDER BY date DESC LIMIT 10;" +docker compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM overall ORDER BY date DESC LIMIT 10;" ``` ## Troubleshooting @@ -136,7 +136,7 @@ docker-compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM over ### Common Issues 1. **"GOOGLE_SERVICE_ACCOUNT_JSON environment variable is required"** - - Ensure the environment variable is set in docker-compose.yml or .env file + - Ensure the environment variable is set in compose.yaml or .env file - The JSON must be a valid, complete service account key 2. **BigQuery Permission Denied** @@ -144,7 +144,7 @@ docker-compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM over - Check the project_id in the service account JSON matches your project 3. **Connection to PostgreSQL Failed** - - Ensure PostgreSQL container is running: `docker-compose ps` + - Ensure PostgreSQL container is running: `docker compose ps` - Check DATABASE_URL is correctly set 4. **No Data Retrieved** @@ -156,7 +156,7 @@ docker-compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM over The ETL is configured to run daily at 1 AM UTC via Celery Beat. To verify it's scheduled: ```bash -docker-compose exec beat celery -A pypistats.extensions.celery inspect scheduled +docker compose exec beat celery -A pypistats.extensions.celery inspect scheduled ``` ## Sample .env File for Local Testing @@ -172,8 +172,8 @@ BASIC_AUTH_PASSWORD=secret PYPISTATS_SECRET=dev-secret-key ``` -Then update docker-compose.yml to use the .env file: +Then update compose.yaml to use the .env file: ```yaml x-envs: &envs env_file: .env -``` \ No newline at end of file +``` diff --git a/Makefile b/Makefile index 80187dc..d3c111b 100644 --- a/Makefile +++ b/Makefile @@ -1,23 +1,23 @@ # format everything fmt: - docker-compose run --rm web isort . - docker-compose run --rm web black . + docker compose run --rm web isort . + docker compose run --rm web black . # check formatting without modifying files check-fmt: - docker-compose run --rm web isort . --check-only - docker-compose run --rm web black . --check + docker compose run --rm web isort . --check-only + docker compose run --rm web black . --check -# launch the application in docker-compose +# launch the application in docker compose .PHONY: pypistats pypistats: - docker-compose down - docker-compose build - docker-compose up + docker compose down + docker compose build + docker compose up # bring down the application and destroy the db volumes cleanup: - docker-compose down -v + docker compose down -v # setup a local environment setup: diff --git a/README.rst b/README.rst index 6518912..af56bfe 100644 --- a/README.rst +++ b/README.rst @@ -23,11 +23,11 @@ Development ----------- 1. Copy ``.env.example`` to ``.env`` and configure your environment variables: - + .. code-block:: bash - + cp .env.example .env # Edit .env with your configuration -2. Run ``make pypistats`` to launch a complete development environment using docker-compose. +2. Run ``make pypistats`` to launch a complete development environment using Docker Compose. diff --git a/backfill_examples.md b/backfill_examples.md index 330f8f3..5db38d7 100644 --- a/backfill_examples.md +++ b/backfill_examples.md @@ -13,17 +13,17 @@ The backfill system provides several ways to populate historical PyPI download s ```bash # Check what data exists for July 2024 -docker-compose run --rm celery python manage_backfill.py status 2024-07-01 2024-07-31 +docker compose run --rm celery python manage_backfill.py status 2024-07-01 2024-07-31 ``` ### 2. Backfill Recent Days ```bash # Backfill last 7 days (skipping existing data) -docker-compose run --rm celery python manage_backfill.py recent 7 +docker compose run --rm celery python manage_backfill.py recent 7 # Or via Python -docker-compose run --rm celery python -c " +docker compose run --rm celery python -c " from pypistats.tasks.backfill import backfill_recent_days backfill_recent_days(30) # Last 30 days " @@ -33,7 +33,7 @@ backfill_recent_days(30) # Last 30 days ```bash # Backfill July 2024, one day at a time, with 2-second delay between days -docker-compose run --rm celery python manage_backfill.py sequential \ +docker compose run --rm celery python manage_backfill.py sequential \ 2024-07-01 2024-07-31 \ --delay 2 \ --skip-existing @@ -43,7 +43,7 @@ docker-compose run --rm celery python manage_backfill.py sequential \ ```bash # Backfill Q3 2024 with 3 parallel workers, 7 days per chunk -docker-compose run --rm celery python manage_backfill.py parallel \ +docker compose run --rm celery python manage_backfill.py parallel \ 2024-07-01 2024-09-30 \ --workers 3 \ --chunk-days 7 @@ -53,7 +53,7 @@ docker-compose run --rm celery python manage_backfill.py parallel \ ```bash # Backfill January through June 2024 -docker-compose run --rm celery python manage_backfill.py monthly \ +docker compose run --rm celery python manage_backfill.py monthly \ 2024-01 2024-06 \ --delay 2 \ --skip-existing @@ -63,7 +63,7 @@ docker-compose run --rm celery python manage_backfill.py monthly \ ```bash # Backfill all of 2024 -docker-compose run --rm celery python manage_backfill.py year 2024 --workers 2 +docker compose run --rm celery python manage_backfill.py year 2024 --workers 2 ``` ### 7. Custom Backfill via Python @@ -92,10 +92,10 @@ if status['summary']['days_missing'] > 0: ```bash # Monitor active tasks -docker-compose run --rm celery celery -A pypistats.extensions.celery inspect active +docker compose run --rm celery celery -A pypistats.extensions.celery inspect active # Check task result -docker-compose run --rm celery python -c " +docker compose run --rm celery python -c " from celery.result import AsyncResult result = AsyncResult('YOUR_TASK_ID') print(f'Status: {result.status}') @@ -128,19 +128,19 @@ For a fresh instance, backfill in stages: ```bash # 1. Last 7 days (for immediate data) -docker-compose run --rm celery python manage_backfill.py recent 7 +docker compose run --rm celery python manage_backfill.py recent 7 # 2. Current month -docker-compose run --rm celery python manage_backfill.py monthly 2024-08 2024-08 +docker compose run --rm celery python manage_backfill.py monthly 2024-08 2024-08 # 3. Previous 3 months (parallel) -docker-compose run --rm celery python manage_backfill.py parallel \ +docker compose run --rm celery python manage_backfill.py parallel \ 2024-05-01 2024-07-31 \ --workers 2 \ --chunk-days 15 # 4. Historical data (monthly batches) -docker-compose run --rm celery python manage_backfill.py monthly \ +docker compose run --rm celery python manage_backfill.py monthly \ 2024-01 2024-04 \ --delay 3 \ --skip-existing @@ -161,14 +161,14 @@ If a backfill fails: 1. Check which dates completed: ```bash - docker-compose run --rm celery python manage_backfill.py status START_DATE END_DATE + docker compose run --rm celery python manage_backfill.py status START_DATE END_DATE ``` 2. Resume with `--skip-existing` flag: ```bash - docker-compose run --rm celery python manage_backfill.py sequential \ + docker compose run --rm celery python manage_backfill.py sequential \ START_DATE END_DATE \ --skip-existing ``` -The system will skip dates that already have data and only process missing days. \ No newline at end of file +The system will skip dates that already have data and only process missing days. diff --git a/docker-compose.yml b/compose.yaml similarity index 68% rename from docker-compose.yml rename to compose.yaml index b1118cb..aa1ead4 100644 --- a/docker-compose.yml +++ b/compose.yaml @@ -1,15 +1,15 @@ x-envs: &envs env_file: .env environment: - - FLASK_APP=pypistats/run.py - - FLASK_ENV=development - - FLASK_DEBUG=1 - - DATABASE_URL=${DATABASE_URL:-postgresql://admin:root@postgresql:5432/pypistats} - - REDIS_URL=${REDIS_URL:-redis://redis:6379/0} - - BASIC_AUTH_USER=${BASIC_AUTH_USER:-user} - - BASIC_AUTH_PASSWORD=${BASIC_AUTH_PASSWORD:-password} - - PYPISTATS_SECRET=${PYPISTATS_SECRET:-dev-secret-key} - - GOOGLE_SERVICE_ACCOUNT_JSON=${GOOGLE_SERVICE_ACCOUNT_JSON:-} + FLASK_APP: pypistats/run.py + FLASK_ENV: development + FLASK_DEBUG: 1 + DATABASE_URL: ${DATABASE_URL:-postgresql://admin:root@postgresql:5432/pypistats} + REDIS_URL: ${REDIS_URL:-redis://redis:6379/0} + BASIC_AUTH_USER: ${BASIC_AUTH_USER:-user} + BASIC_AUTH_PASSWORD: ${BASIC_AUTH_PASSWORD:-password} + PYPISTATS_SECRET: ${PYPISTATS_SECRET:-dev-secret-key} + GOOGLE_SERVICE_ACCOUNT_JSON: ${GOOGLE_SERVICE_ACCOUNT_JSON:-} volumes: pgdata: {} @@ -80,9 +80,9 @@ services: postgresql: image: "postgres:16" environment: - - POSTGRES_USER=admin - - POSTGRES_PASSWORD=root - - POSTGRES_DB=pypistats + POSTGRES_USER: admin + POSTGRES_PASSWORD: root + POSTGRES_DB: pypistats ports: - "5433:5432" volumes: