You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/introduction.rst
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ This is done in a manner that maintains transparency and control over how the or
11
11
12
12
In this tutorial, we go through the different mechanisms SSG has for configuring the data generation, and the different levels of fidelity they can provide and different kinds of utility they can have.
13
13
To showcase SSG, we will use the `AirBnb User Bookings dataset, available at Kaggle <https://www.kaggle.com/competitions/airbnb-recruiting-new-user-bookings/data>`_.
14
-
The original dataset is a collection CSV files that can be ported to a relational database using `this Python script <https://github.com/alan-turing-institute/sqlsynthgen/blob/migrate-adult-dataset-to-SQL/tests/examples/airbnb/csv_to_database.py>`_ (it requires having SSG `previously installed <https://sqlsynthgen.readthedocs.io/en/latest/installation.html#enduser>`_).
14
+
The original dataset is a collection CSV files that can be ported to a relational database using `this Python script <https://github.com/alan-turing-institute/sqlsynthgen/blob/main/examples/airbnb/csv_to_database.py>`_ (it requires having SSG `previously installed <https://sqlsynthgen.readthedocs.io/en/latest/installation.html#enduser>`_).
15
15
The script assumes you have a local PostgresSQL server running at port 5432, username ``postgres`` and password ``password``, with a database called ``airbnb`` to upload the data to.
16
16
These assumptions can be edited in the ``main`` function of the script.
17
17
@@ -88,7 +88,7 @@ The ``generic`` object on line 9 is an instance of the Mimesis type `generic pro
88
88
Mimesis is a package for creating random data and has a wide array of providers (the Mimesis term for data generators) for different scenarios, which SSG makes extensive use of.
89
89
90
90
Similar edits as above for the ``users`` table need to be made for the primary key columns of the other tables.
91
-
See `this Python file <https://github.com/alan-turing-institute/sqlsynthgen/blob/migrate-adult-dataset-to-SQL/tests/examples/airbnb/ssg_manual_edit.py>`_ for the full changes to the ``ssg.py`` file.
91
+
See `this Python file <https://github.com/alan-turing-institute/sqlsynthgen/blob/main/examples/airbnb/ssg_manual_edit.py>`_ for the full changes to the ``ssg.py`` file.
92
92
93
93
Now when we run ``create-data`` we get valid, if not very sensible, values in each of our tables. For example:
94
94
@@ -585,4 +585,4 @@ Note that we make here the same trade off as we did before: generating very high
585
585
* Full transparency and control over the ways in which the source data is utilised, and thus the ways in which privacy could in principle be at risk, including easy implementation of differential privacy guarantees.
586
586
* The possibility of starting from very low fidelity data, and incrementally adding fidelity to particular aspects of the data, as is needed to serve the utility of whatever use case the synthetic data is created for.
587
587
588
-
Examples of the complete files generated by the tutorial can be found at: ``/sqlsynthgen/tests/examples/airbnb``.
588
+
Examples of the complete files generated by the tutorial can be found at: ``/sqlsynthgen/examples/airbnb``.
0 commit comments