Skip to content

Commit 31c08a6

Browse files
committed
Move tests/examples to examples. Add READMEs to all examples.
1 parent e7e6e2c commit 31c08a6

16 files changed

+34
-14
lines changed

.pre-commit-config.yaml

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,28 +49,34 @@ repos:
4949
types: ['python']
5050
exclude: (?x)(
5151
tests/examples|
52-
tests/workspace
52+
tests/workspace|
53+
examples
5354
)
5455
- id: pylint
5556
name: Pylint
5657
entry: poetry run pylint
5758
language: system
5859
types: ['python']
60+
exclude: (?x)(
61+
examples/
62+
)
5963
- id: pydocstyle
6064
name: pydocstyle
6165
entry: poetry run pydocstyle
6266
language: system
6367
types: ['python']
6468
exclude: (?x)(
6569
docs/|
66-
tests/
70+
tests/|
71+
examples/
6772
)
6873
- id: mypy
6974
name: mypy
7075
entry: poetry run mypy --follow-imports=silent
7176
language: system
7277
exclude: (?x)(
7378
tests/examples|
74-
tests/workspace
79+
tests/workspace|
80+
examples
7581
)
7682
types: ['python']

docs/source/introduction.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ This is done in a manner that maintains transparency and control over how the or
1111

1212
In this tutorial, we go through the different mechanisms SSG has for configuring the data generation, and the different levels of fidelity they can provide and different kinds of utility they can have.
1313
To showcase SSG, we will use the `AirBnb User Bookings dataset, available at Kaggle <https://www.kaggle.com/competitions/airbnb-recruiting-new-user-bookings/data>`_.
14-
The original dataset is a collection CSV files that can be ported to a relational database using `this Python script <https://github.com/alan-turing-institute/sqlsynthgen/blob/migrate-adult-dataset-to-SQL/tests/examples/airbnb/csv_to_database.py>`_ (it requires having SSG `previously installed <https://sqlsynthgen.readthedocs.io/en/latest/installation.html#enduser>`_).
14+
The original dataset is a collection CSV files that can be ported to a relational database using `this Python script <https://github.com/alan-turing-institute/sqlsynthgen/blob/main/examples/airbnb/csv_to_database.py>`_ (it requires having SSG `previously installed <https://sqlsynthgen.readthedocs.io/en/latest/installation.html#enduser>`_).
1515
The script assumes you have a local PostgresSQL server running at port 5432, username ``postgres`` and password ``password``, with a database called ``airbnb`` to upload the data to.
1616
These assumptions can be edited in the ``main`` function of the script.
1717

@@ -88,7 +88,7 @@ The ``generic`` object on line 9 is an instance of the Mimesis type `generic pro
8888
Mimesis is a package for creating random data and has a wide array of providers (the Mimesis term for data generators) for different scenarios, which SSG makes extensive use of.
8989

9090
Similar edits as above for the ``users`` table need to be made for the primary key columns of the other tables.
91-
See `this Python file <https://github.com/alan-turing-institute/sqlsynthgen/blob/migrate-adult-dataset-to-SQL/tests/examples/airbnb/ssg_manual_edit.py>`_ for the full changes to the ``ssg.py`` file.
91+
See `this Python file <https://github.com/alan-turing-institute/sqlsynthgen/blob/main/examples/airbnb/ssg_manual_edit.py>`_ for the full changes to the ``ssg.py`` file.
9292

9393
Now when we run ``create-data`` we get valid, if not very sensible, values in each of our tables. For example:
9494

@@ -585,4 +585,4 @@ Note that we make here the same trade off as we did before: generating very high
585585
* Full transparency and control over the ways in which the source data is utilised, and thus the ways in which privacy could in principle be at risk, including easy implementation of differential privacy guarantees.
586586
* The possibility of starting from very low fidelity data, and incrementally adding fidelity to particular aspects of the data, as is needed to serve the utility of whatever use case the synthetic data is created for.
587587

588-
Examples of the complete files generated by the tutorial can be found at: ``/sqlsynthgen/tests/examples/airbnb``.
588+
Examples of the complete files generated by the tutorial can be found at: ``/sqlsynthgen/examples/airbnb``.

docs/source/loan_data.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ we see that they are always 0 or 1 so we will pick randomly from 0 and 1 for our
6868

6969
**config.yaml**
7070

71-
.. literalinclude:: ../../tests/examples/loans/config1.yaml
71+
.. literalinclude:: ../../examples/loans/config1.yaml
7272
:language: yaml
7373

7474
We run SqlSynthGen's ``make-generators`` command to create ``ssg.py``, which contains a generator class for each table in the source database:
@@ -101,7 +101,7 @@ We notice that the ``districts`` table doesn't contain any sensitive data so we
101101

102102
**config.yaml**
103103

104-
.. literalinclude:: ../../tests/examples/loans/config2.yaml
104+
.. literalinclude:: ../../examples/loans/config2.yaml
105105
:language: yaml
106106

107107
We can export the vocabularies to `.yaml` files, delete the old synthetic data, import the vocabularies and create new synthetic data with:
@@ -202,14 +202,14 @@ We can take the real values in the right proportions, and even add noise to make
202202

203203
**config.yaml**
204204

205-
.. literalinclude:: ../../tests/examples/loans/config3.yaml
205+
.. literalinclude:: ../../examples/loans/config3.yaml
206206
:language: yaml
207207

208208
We define a custom row-generator to use the source statistics and Python's ``random.choices()`` function to choose a value:
209209

210210
**my_row_generators.py**
211211

212-
.. literalinclude:: ../../tests/examples/loans/my_row_generators.py
212+
.. literalinclude:: ../../examples/loans/my_row_generators.py
213213
:language: python
214214

215215
As before, we will need to re-create ``ssg.py`` and the data.

examples/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Examples
2+
3+
This folder holds example configurations of how sqlsynthgen may be used on various data schemas.

examples/airbnb/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Introductory example config using AirBnB data
2+
3+
This is the SSG configuration developed in the [introductory tutorial](https://sqlsynthgen.readthedocs.io/en/latest/introduction.html) in our docs.

tests/examples/airbnb/airbnb_generators.py renamed to examples/airbnb/airbnb_generators.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import datetime
22
import random
3-
from typing import Optional, Generator, Tuple
3+
from typing import Generator, Optional, Tuple
44

55

66
def user_dates_provider(generic):
File renamed without changes.
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from typing import Any, Callable, List, Type
2+
23
import numpy as np
34
import pandas as pd
4-
from tqdm import tqdm
55
from sqlalchemy import (
66
Column,
77
Date,
@@ -14,6 +14,7 @@
1414
)
1515
from sqlalchemy.ext.declarative import declarative_base
1616
from sqlalchemy.orm import Session
17+
from tqdm import tqdm
1718

1819
Base = declarative_base()
1920

File renamed without changes.

0 commit comments

Comments
 (0)