Skip to content

Commit 059cdeb

Browse files
authored
Merge pull request #160 from alan-turing-institute/docsdocsdocs
Fix typos in docs
2 parents 1987946 + c07a744 commit 059cdeb

File tree

5 files changed

+17
-17
lines changed

5 files changed

+17
-17
lines changed

docs/source/configuration.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Configuration Reference
22
=======================
33

4-
SqlSynthGen is configured using a YAML file, which is passed to several commands with the ``--config`` option.
4+
SqlSynthGen is configured using a YAML file, which is passed to several commands with the ``--config-file`` option.
55
Throughout the docs, we will refer to this file as ``config.yaml`` but it can be called anything (the exception being that there will be a naming conflict if you have a vocabulary table called ``config``).
66

77
Below, we see the schema for the configuration file.

docs/source/faq.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ Can SqlSynthGen work with two different schemas?
55
************************************************
66

77
SqlSynthGen can only work with a single source schema and a single destination schema at a time.
8-
However, you can choose for the destination schema to have a different name to the source schema by setting the DST_SCHEMA environment variable.
8+
However, you can choose for the destination schema to have a different name to the source schema by setting the ``DST_SCHEMA`` environment variable.
99

1010
Which DBMSs does SqlSynthGen support?
1111
*************************************
1212

1313
* SqlSynthGen most fully supports **PostgresSQL**, which it uses for its end-to-end functional tests.
1414
* SqlSynthGen also supports **MariaDB**, as long as you don't set ``use-asyncio: true`` in your config.
15-
* SqlSynthGen *might*, work with **SQLite** but this is largely untested.
15+
* SqlSynthGen *might* work with **SQLite** but this is largely untested.
1616
* SqlSynthGen may also work with SQL Server.
1717
To connect to SQL Server, you will need to install `pyodbc <https://pypi.org/project/pyodbc/>`_ and an `ODBC driver <https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16>`_, after which you should be able to use a DSN setting similar to ``SRC_DSN="mssql+pyodbc://username:password@hostname/dbname?driver=ODBC Driver 18 for SQL Server"``.
1818

docs/source/health_data.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,10 @@ The full configuration we wrote for the CCHIC data set is available `here <https
1313

1414
Before getting into the config itself, we need to discuss a few peculiarities of the OMOP CDM that need to be taken into account:
1515

16-
1. Some versions of OMOP contain a circular foreign key, for instance between the `vocabulary`, `concept`, and `domain` tables.
17-
2. There are several standardized vocabulary tables (`concept`, `concept_relationship`, etc).
16+
1. Some versions of OMOP contain a circular foreign key, for instance between the ``vocabulary``, ``concept``, and ``domain`` tables.
17+
2. There are several standardized vocabulary tables (``concept``, ``concept_relationship``, etc).
1818
These should be marked as such in the sqlsynthgen config file.
19-
The tables will be exported to ``.yaml`` files during the ``make-tables`` step.
19+
The tables will be exported to ``.yaml`` files during the ``make-generators`` step.
2020
However, some of these vocabulary tables may be too large to practically be writable to ``.yaml`` files, and will need to be dealt with manually.
2121
You should also check the license agreement of each standardized vocabulary before sharing any of the ``.yaml`` files.
2222

@@ -195,7 +195,7 @@ Here is our config for the person table:
195195
columns_assigned: care_site_id
196196
197197
``num_rows_per_pass`` is set to 0, because all rows are generated by the story generator.
198-
Let's use the gender columns as an emxample.
198+
Let's use the gender columns as an example.
199199
Here is the relevant function from ``row_generators.py``.
200200

201201
.. code-block:: python
@@ -355,8 +355,8 @@ You can find examples of this in the `full configuration <https://github.com/ala
355355
After creating a person, ``patient_story`` creates possibly an entry in the ``death`` table, and then one for ``visit_occurrence``.
356356
The configurations and generators for these aren't very interesting, their main point is to make the chronology and time scales make sense, so that people born a long time ago are more likely to have died, and the order of birth, visit start, visit end, and possible death is correct.
357357

358-
After that the story generates a set of rows for tables like `observation`, `measurement`, `condition_occurrence`, etc., the ones that involve procedures and events that took place during the hospital stay.
359-
The procedure is very similar for each one of these, we'll discuss `measurement` as an example.
358+
After that the story generates a set of rows for tables like ``observation``, ``measurement``, ``condition_occurrence``, etc., the ones that involve procedures and events that took place during the hospital stay.
359+
The procedure is very similar for each one of these, we'll discuss ``measurement`` as an example.
360360

361361
The first stop is the ``avg_measurements_per_hour`` src-stats query, which looks like this
362362

@@ -394,11 +394,11 @@ The first stop is the ``avg_measurements_per_hour`` src-stats query, which looks
394394
upper: 100
395395
396396
Note how the ``query`` part, which is executed on the database server, tries to do as much of the work as possible:
397-
It extracts the number of `measurement` entries, divided by the length of the hospital stay, for each person.
397+
It extracts the number of ``measurement`` entries, divided by the length of the hospital stay, for each person.
398398
The ``dp-query`` then only computes the average.
399399
This is both to circumvent the limitations of SNSQL, which can't for instance do subqueries or differences between columns, and also to minimise the data transferred to and work done on the local machine running SSG.
400400

401-
Based on that information, we generate a set of times, roughly at the right frequency, at which a `measurement` entry should generated for our synthetic patient.
401+
Based on that information, we generate a set of times, roughly at the right frequency, at which a ``measurement`` entry should generated for our synthetic patient.
402402
The relevant `src-stats queries <https://github.com/alan-turing-institute/sqlsynthgen/blob/main/examples/cchic_omop/>`_ for this are
403403

404404
* ``count_measurements``, which counts the relative frequencies of various types of measurements, like blood pressure, pulse taking, different lab results, etc.

docs/source/introduction.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Now when we run ``create-data`` we get valid, if not very sensible, values in ea
106106
- 485
107107
- 534
108108

109-
SSG’s default generators have minimal fidelity: All data is generated based purely on the datatype of the its column, e.g. random strings in string columns.
109+
SSG’s default generators have minimal fidelity: All data is generated based purely on the datatype of the column, e.g. random strings in string columns.
110110
Foreign key relations are respected by picking random rows from the table referenced.
111111
Even this synthetic data, nearly the crudest imaginable, can be useful for instance for testing software pipelines.
112112
Note that this data has no privacy implications, since it is only based on the schema.
@@ -121,7 +121,7 @@ This should of course only be done for tables that hold no privacy-sensitive dat
121121
For instance, in the AirBnB dataset, the ``users`` table has a foreign key reference to a table of world countries: ``users.country_destination`` references the ``countries.country_destination`` primary key column.
122122
Since the ``countries`` table doesn’t contain personal data, we can make it a vocabulary table.
123123

124-
Besides manual edition, on SSG we can also customise the generation of ``ssg.py`` via a YAML file,
124+
Besides manually editing it, we can also customise the generation of ``ssg.py`` via a YAML file,
125125
typically named ``config.yaml``.
126126
We identify ``countries`` as a vocabulary table in our ``config.yaml`` file:
127127

@@ -164,7 +164,7 @@ We need to truncate any tables in our destination database before importing the
164164
$ sqlsynthgen remove-data --config-file config.yaml
165165
$ sqlsynthgen create-vocab
166166

167-
Since ``make-generators`` rewrote ``ssg.py``, we must now re-edit it to add the primary key ``VARCHAR`` workaroundsfor the ``users`` and ``age_gender_bkts`` tables, as we did in section above.
167+
Since ``make-generators`` rewrote ``ssg.py``, we must now re-edit it to add the primary key ``VARCHAR`` workarounds for the ``users`` and ``age_gender_bkts`` tables, as we did in section above.
168168
Once this is done, we can generate random data for the other three tables with::
169169

170170
$ sqlsynthgen create-data
@@ -293,7 +293,7 @@ Then, we tell SSG to import our custom ``airbnb_generators.py`` and assign the r
293293
columns_assigned: ["date_account_created", "date_first_booking"]
294294
295295
Note how we pass the ``generic`` object as a keyword argument to ``user_dates_provider``.
296-
Row generators can have positional arguments specified as a list under the ``args`` list and keyword arguments as a dictionary under the ``kwargs`` entry.
296+
Row generators can have positional arguments specified as a list under the ``args`` entry and keyword arguments as a dictionary under the ``kwargs`` entry.
297297

298298
Limitations to this approach to increasing fidelity are that rows can not be correlated with other rows in the same table, nor with any rows in other tables, except for trivially fulfilling foreign key constraints as in the default configuration.
299299
We will see how to address this later when we talk about :ref:`story generators <story-generators>`.
@@ -537,7 +537,7 @@ For instance, it may first yield a row specifying a person in the ``users`` tabl
537537
Three features make story generators more practical than simply manually writing code that creates the synthetic data bit-by-bit:
538538

539539
1. When a story generator yields a row, it can choose to only specify values for some of the columns. The values for the other columns will be filled by custom row generators (as explained in a previous section) or, if none are specified, by SSG's default generators. Above, we have chosen to specify the value for ``first_device_type`` but the date columns will still be handled by our ``user_dates_provider`` and the age column will still be populated by the ``user_age_provider``.
540-
2. Any default values that are set when the rows yielded by the story generator are written into the database are available to the story generator when it resumes. In our example, the user's ``id`` is available so that we can respect the foreign key relationship between ``users`` and ``sessions``, even though we did not explicitly set the user's ``id`` when creating the user.
540+
2. Any default values that are set when the rows yielded by the story generator are written into the database are available to the story generator when it resumes. In our example, the user's ``id`` is available so that we can respect the foreign key relationship between ``users`` and ``sessions``, even though we did not explicitly set the user's ``id`` when creating the user on line 8.
541541

542542
To use and get the most from story generators, we will need to make some changes to our configuration:
543543

docs/source/loan_data.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ We notice that the ``districts`` table doesn't contain any sensitive data so we
104104
.. literalinclude:: ../../examples/loans/config2.yaml
105105
:language: yaml
106106

107-
We can export the vocabularies to `.yaml` files, delete the old synthetic data, import the vocabularies and create new synthetic data with:
107+
We can export the vocabularies to ``.yaml`` files, delete the old synthetic data, import the vocabularies and create new synthetic data with:
108108

109109
.. code-block:: console
110110

0 commit comments

Comments
 (0)