Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions ADMIN_FEATURES.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ When you submit a date:
2. The task runs asynchronously in the background
3. You'll see confirmation: "{date} submitted"
4. Monitor progress via:
- Celery logs: `docker-compose logs -f celery`
- Celery logs: `docker compose logs -f celery`
- Flower dashboard: http://localhost:5555

## Scheduled ETL
Expand All @@ -66,4 +66,4 @@ Note: The system also runs ETL automatically:
For more admin capabilities, use:
- **Flower** (http://localhost:5555): Monitor Celery tasks
- **Database queries**: Direct PostgreSQL access for data inspection
- **API endpoints**: Public API for viewing statistics at /api/
- **API endpoints**: Public API for viewing statistics at /api/
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ PyPIStats.org is a Flask-based web application that provides analytics and visua

### Setup:
```bash
make pypistats # Launch complete dev environment with docker-compose
make pypistats # Launch complete dev environment with Docker Compose
```

### Project Structure:
Expand Down Expand Up @@ -179,4 +179,4 @@ pypistats/
- All timestamps are in UTC
- Package statistics exclude known mirror downloads by default
- Maximum lookback period is 180 days to manage database size
- Uses BigQuery's public PyPI dataset (bigquery-public-data.pypi.file_downloads)
- Uses BigQuery's public PyPI dataset (bigquery-public-data.pypi.file_downloads)
28 changes: 14 additions & 14 deletions ETL_TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

1. Start the services:
```bash
docker-compose up -d
docker compose up -d
```

2. Access the admin panel:
Expand All @@ -33,16 +33,16 @@

1. Start all services:
```bash
docker-compose up -d
docker compose up -d
```

2. Trigger the ETL task manually:
```bash
# Run ETL for yesterday's data (default)
docker-compose exec celery python -c "from pypistats.tasks.pypi import etl; etl.delay()"
docker compose exec celery python -c "from pypistats.tasks.pypi import etl; etl.delay()"

# Run ETL for a specific date
docker-compose exec celery python -c "from pypistats.tasks.pypi import etl; etl.delay('2025-08-13')"
docker compose exec celery python -c "from pypistats.tasks.pypi import etl; etl.delay('2025-08-13')"
```

3. Monitor the task in Flower:
Expand All @@ -54,12 +54,12 @@

1. Start services:
```bash
docker-compose up -d
docker compose up -d
```

2. Enter Flask shell:
```bash
docker-compose exec web flask shell
docker compose exec web flask shell
```

3. Run the ETL function directly (synchronously):
Expand All @@ -79,7 +79,7 @@

1. Create a test script:
```bash
docker-compose exec web python
docker compose exec web python
```

2. Test the BigQuery connection:
Expand Down Expand Up @@ -113,22 +113,22 @@

### Check Celery Logs
```bash
docker-compose logs -f celery
docker compose logs -f celery
```

### Check Celery Beat Schedule
```bash
docker-compose logs -f beat
docker compose logs -f beat
```

### Verify Database Tables
```bash
docker-compose exec postgresql psql -U admin -d pypistats -c "\dt"
docker compose exec postgresql psql -U admin -d pypistats -c "\dt"
```

### Check Recent Downloads
```bash
docker-compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM overall ORDER BY date DESC LIMIT 10;"
docker compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM overall ORDER BY date DESC LIMIT 10;"
```

## Troubleshooting
Expand All @@ -144,7 +144,7 @@ docker-compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM over
- Check the project_id in the service account JSON matches your project

3. **Connection to PostgreSQL Failed**
- Ensure PostgreSQL container is running: `docker-compose ps`
- Ensure PostgreSQL container is running: `docker compose ps`
- Check DATABASE_URL is correctly set

4. **No Data Retrieved**
Expand All @@ -156,7 +156,7 @@ docker-compose exec postgresql psql -U admin -d pypistats -c "SELECT * FROM over
The ETL is configured to run daily at 1 AM UTC via Celery Beat. To verify it's scheduled:

```bash
docker-compose exec beat celery -A pypistats.extensions.celery inspect scheduled
docker compose exec beat celery -A pypistats.extensions.celery inspect scheduled
```

## Sample .env File for Local Testing
Expand All @@ -176,4 +176,4 @@ Then update docker-compose.yml to use the .env file:
```yaml
x-envs: &envs
env_file: .env
```
```
18 changes: 9 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# format everything
fmt:
docker-compose run --rm web isort .
docker-compose run --rm web black .
docker compose run --rm web isort .
docker compose run --rm web black .

# check formatting without modifying files
check-fmt:
docker-compose run --rm web isort . --check-only
docker-compose run --rm web black . --check
docker compose run --rm web isort . --check-only
docker compose run --rm web black . --check

# launch the application in docker-compose
# launch the application in docker compose
.PHONY: pypistats
pypistats:
docker-compose down
docker-compose build
docker-compose up
docker compose down
docker compose build
docker compose up

# bring down the application and destroy the db volumes
cleanup:
docker-compose down -v
docker compose down -v

# setup a local environment
setup:
Expand Down
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ Development
-----------

1. Copy ``.env.example`` to ``.env`` and configure your environment variables:

.. code-block:: bash

cp .env.example .env
# Edit .env with your configuration

2. Run ``make pypistats`` to launch a complete development environment using docker-compose.
2. Run ``make pypistats`` to launch a complete development environment using Docker Compose.

32 changes: 16 additions & 16 deletions backfill_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,17 @@ The backfill system provides several ways to populate historical PyPI download s

```bash
# Check what data exists for July 2024
docker-compose run --rm celery python manage_backfill.py status 2024-07-01 2024-07-31
docker compose run --rm celery python manage_backfill.py status 2024-07-01 2024-07-31
```

### 2. Backfill Recent Days

```bash
# Backfill last 7 days (skipping existing data)
docker-compose run --rm celery python manage_backfill.py recent 7
docker compose run --rm celery python manage_backfill.py recent 7

# Or via Python
docker-compose run --rm celery python -c "
docker compose run --rm celery python -c "
from pypistats.tasks.backfill import backfill_recent_days
backfill_recent_days(30) # Last 30 days
"
Expand All @@ -33,7 +33,7 @@ backfill_recent_days(30) # Last 30 days

```bash
# Backfill July 2024, one day at a time, with 2-second delay between days
docker-compose run --rm celery python manage_backfill.py sequential \
docker compose run --rm celery python manage_backfill.py sequential \
2024-07-01 2024-07-31 \
--delay 2 \
--skip-existing
Expand All @@ -43,7 +43,7 @@ docker-compose run --rm celery python manage_backfill.py sequential \

```bash
# Backfill Q3 2024 with 3 parallel workers, 7 days per chunk
docker-compose run --rm celery python manage_backfill.py parallel \
docker compose run --rm celery python manage_backfill.py parallel \
2024-07-01 2024-09-30 \
--workers 3 \
--chunk-days 7
Expand All @@ -53,7 +53,7 @@ docker-compose run --rm celery python manage_backfill.py parallel \

```bash
# Backfill January through June 2024
docker-compose run --rm celery python manage_backfill.py monthly \
docker compose run --rm celery python manage_backfill.py monthly \
2024-01 2024-06 \
--delay 2 \
--skip-existing
Expand All @@ -63,7 +63,7 @@ docker-compose run --rm celery python manage_backfill.py monthly \

```bash
# Backfill all of 2024
docker-compose run --rm celery python manage_backfill.py year 2024 --workers 2
docker compose run --rm celery python manage_backfill.py year 2024 --workers 2
```

### 7. Custom Backfill via Python
Expand Down Expand Up @@ -92,10 +92,10 @@ if status['summary']['days_missing'] > 0:

```bash
# Monitor active tasks
docker-compose run --rm celery celery -A pypistats.extensions.celery inspect active
docker compose run --rm celery celery -A pypistats.extensions.celery inspect active

# Check task result
docker-compose run --rm celery python -c "
docker compose run --rm celery python -c "
from celery.result import AsyncResult
result = AsyncResult('YOUR_TASK_ID')
print(f'Status: {result.status}')
Expand Down Expand Up @@ -128,19 +128,19 @@ For a fresh instance, backfill in stages:

```bash
# 1. Last 7 days (for immediate data)
docker-compose run --rm celery python manage_backfill.py recent 7
docker compose run --rm celery python manage_backfill.py recent 7

# 2. Current month
docker-compose run --rm celery python manage_backfill.py monthly 2024-08 2024-08
docker compose run --rm celery python manage_backfill.py monthly 2024-08 2024-08

# 3. Previous 3 months (parallel)
docker-compose run --rm celery python manage_backfill.py parallel \
docker compose run --rm celery python manage_backfill.py parallel \
2024-05-01 2024-07-31 \
--workers 2 \
--chunk-days 15

# 4. Historical data (monthly batches)
docker-compose run --rm celery python manage_backfill.py monthly \
docker compose run --rm celery python manage_backfill.py monthly \
2024-01 2024-04 \
--delay 3 \
--skip-existing
Expand All @@ -161,14 +161,14 @@ If a backfill fails:

1. Check which dates completed:
```bash
docker-compose run --rm celery python manage_backfill.py status START_DATE END_DATE
docker compose run --rm celery python manage_backfill.py status START_DATE END_DATE
```

2. Resume with `--skip-existing` flag:
```bash
docker-compose run --rm celery python manage_backfill.py sequential \
docker compose run --rm celery python manage_backfill.py sequential \
START_DATE END_DATE \
--skip-existing
```

The system will skip dates that already have data and only process missing days.
The system will skip dates that already have data and only process missing days.