You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
20
-
* Homepage: https://github.com/sdv-dev/CTGAN
21
22
22
23
## Overview
23
24
24
-
Based on previous work ([TGAN](https://github.com/sdv-dev/TGAN)) on synthetic data generation,
25
-
we develop a new model called CTGAN. Several major differences make CTGAN outperform TGAN.
26
-
27
-
-**Preprocessing**: CTGAN uses more sophisticated Variational Gaussian Mixture Model to detect
28
-
modes of continuous columns.
29
-
-**Network structure**: TGAN uses LSTM to generate synthetic data column by column. CTGAN uses
30
-
Fully-connected networks which is more efficient.
31
-
-**Features to prevent mode collapse**: We design a conditional generator and resample the
32
-
training data to prevent model collapse on discrete columns. We use WGANGP and PacGAN to
33
-
stabilize the training of GAN.
25
+
CTGAN is a collection of Deep Learning based Synthetic Data Generators for single table data, which are able to learn from real data and generate synthetic clones with high fidelity.
34
26
27
+
Currently, this library implements the **CTGAN** and **TVAE** models proposed in the [Modeling Tabular data using Conditional GAN](https://arxiv.org/abs/1907.00503) paper. For more information about these models, please check out the respective user guides:
28
+
*[CTGAN User Guide](https://sdv.dev/SDV/user_guides/single_table/ctgan.html).
29
+
*[TVAE User Guide](https://sdv.dev/SDV/user_guides/single_table/tvae.html).
35
30
36
31
# Install
37
32
@@ -49,9 +44,6 @@ pip install ctgan
49
44
50
45
This will pull and install the latest stable release from [PyPI](https://pypi.org/).
51
46
52
-
If you want to install from source or contribute to the project please read the
53
-
[Contributing Guide](CONTRIBUTING.rst).
54
-
55
47
## Install with conda
56
48
57
49
**CTGAN** can also be installed using [conda](https://docs.conda.io/en/latest/):
This will pull and install the latest stable release from [Anaconda](https://anaconda.org/).
64
56
65
57
66
-
# Data Format
67
-
68
-
**CTGAN** expects the input data to be a table given as either a `numpy.ndarray` or a
69
-
`pandas.DataFrame` object with two types of columns:
70
-
71
-
***Continuous Columns**: Columns that contain numerical values and which can take any value.
72
-
***Discrete columns**: Columns that only contain a finite number of possible values, wether
73
-
these are string values or not.
74
-
75
-
This is an example of a table with 4 columns:
76
-
77
-
* A continuous column with float values
78
-
* A continuous column with integer values
79
-
* A discrete column with string values
80
-
* A discrete column with integer values
81
-
82
-
|| A | B | C | D |
83
-
|---|------|-----|-----|---|
84
-
| 0 | 0.1 | 100 | 'a' | 1 |
85
-
| 1 | -1.3 | 28 | 'b' | 2 |
86
-
| 2 | 0.3 | 14 | 'a' | 2 |
87
-
| 3 | 1.4 | 87 | 'a' | 3 |
88
-
| 4 | -0.1 | 69 | 'b' | 2 |
58
+
# Usage Example
89
59
60
+
> :warning:**WARNING**: If you're just getting started with synthetic data, we recommend using the SDV library which provides user-friendly APIs for interacting with CTGAN. To learn more about using CTGAN through SDV, check out the user guide [here](https://sdv.dev/SDV/user_guides/single_table/ctgan.html).
90
61
91
-
**NOTE**: CTGAN does not distinguish between float and integer columns, which means that it will
92
-
sample float values in all cases. If integer values are required, the outputted float values
93
-
must be rounded to integers in a later step, outside of CTGAN.
62
+
To get started with CTGAN, you should prepare your data as either a `numpy.ndarray` or a `pandas.DataFrame` object with two types of columns:
94
63
95
-
# Python Quickstart
64
+
***Continuous Columns**: can contain any numerical value.
65
+
***Discrete Columns**: contain a finite number values, whether these are string values or not.
96
66
97
-
In this short tutorial we will guide you through a series of steps that will help you
98
-
getting started with **CTGAN**.
67
+
In this example we load the [Adult Census Dataset](https://archive.ics.uci.edu/ml/datasets/adult) which is a built-in demo dataset. We then model it using the **CTGANSynthesizer** and generate a synthetic copy of it.
99
68
100
-
## 1. Model the data
101
-
102
-
### Step 1: Prepare your data
103
-
104
-
Before being able to use CTGAN you will need to prepare your data as specified above.
105
-
106
-
For this example, we will be loading some data using the `ctgan.load_demo` function.
107
69
108
70
```python3
71
+
from ctgan import CTGANSynthesizer
109
72
from ctgan import load_demo
110
73
111
74
data = load_demo()
112
-
```
113
-
114
-
This will download a copy of the [Adult Census Dataset](https://archive.ics.uci.edu/ml/datasets/adult) as a dataframe:
115
-
116
-
| age | workclass | fnlwgt | ... | hours-per-week | native-country | income |
**Note that this code does not guarante workclass=" Private"**
204
96
205
-
## 4. Save and load the synthesizer
206
97
207
-
To save a trained ctgan synthesizer, you can call the `save` method passing a path to the file
208
-
in which the model will be saved:
209
-
210
-
```python3
211
-
ctgan.save('ctgan.pkl')
212
-
```
213
-
214
-
Later on, you can restore the saved synthetsizer by passing the path to the `load`
215
-
model of the `CTGANSynthetizer` method:
98
+
# Join our community
216
99
217
-
```python3
218
-
ctgan = CTGANSynthesizer.load('ctgan.pkl')
219
-
```
220
100
221
-
# Join our community
101
+
1. Please have a look at the [Contributing Guide](https://sdv.dev/SDV/developer_guides/contributing.html) to see how you can contribute to the project.
102
+
2. If you have any doubts, feature requests or detect an error, please [open an issue on github](https://github.com/sdv-dev/CTGAN/issues) or [join our Slack Workspace](https://sdv-space.slack.com/join/shared_invite/zt-gdsfcb5w-0QQpFMVoyB2Yd6SRiMplcw#/).
103
+
3. Also, do not forget to check the [project documentation site](https://sdv.dev/SDV/)!
222
104
223
-
1. If you would like to try more dataset examples, please have a look at the [examples folder](
224
-
https://github.com/sdv-dev/CTGAN/tree/master/examples) of the repository. Please contact us
225
-
if you have a usage example that you would want to share with the community.
226
-
2. If you want to contribute to the project code, please head to the [Contributing Guide](
227
-
CONTRIBUTING.rst) for more details about how to do it.
228
-
3. If you have any doubts, feature requests or detect an error, please [open an issue on github](
229
-
https://github.com/sdv-dev/CTGAN/issues)
230
105
231
106
# Citing TGAN
232
107
@@ -260,3 +135,15 @@ A package to easily deploy **CTGAN** onto a remote server. This package is devel
260
135
261
136
More details can be found in the corresponding repository: https://github.com/oregonpillow/ctgan-server-cli
0 commit comments