|
1 | 1 | --- |
2 | 2 | title: Preprocessing Real-World Data |
3 | | -jupyter: julia-1.7 |
4 | 3 | --- |
5 | 4 |
|
6 | 5 | ```{julia} |
7 | | -using Pkg; Pkg.activate("dev") |
8 | | -``` |
| 6 | +#| echo: false |
9 | 7 |
|
10 | | -```{julia} |
11 | | -include("dev/utils.jl") |
12 | | -using AlgorithmicRecourseDynamics |
13 | | -using CounterfactualExplanations, Flux, Plots, PlotThemes, Random, LaplaceRedux, LinearAlgebra |
14 | | -theme(:wong) |
| 8 | +include("docs/src/paper/setup.jl") |
| 9 | +eval(setup) |
15 | 10 | output_path = output_dir("real_world") |
16 | 11 | www_path = www_dir("real_world") |
17 | 12 | data_path = data_dir("real_world") |
18 | 13 | ``` |
19 | 14 |
|
20 | 15 | ## California Housing Data |
21 | 16 |
|
22 | | -Fetching the data using Python's `sklearn`: |
| 17 | +Fetching the data using Python's `sklearn` (run this in the Python REPL): |
23 | 18 |
|
24 | | -```{python} |
| 19 | +```python |
25 | 20 | from sklearn.datasets import fetch_california_housing |
26 | 21 | df, y = fetch_california_housing(return_X_y=True, as_frame=True) |
27 | 22 | df["target"] = y.values |
28 | | -data_path = "../../artifacts/upload/data/real_world" |
| 23 | +data_path = "dev/artifacts/upload/data/real_world" |
29 | 24 | import os |
| 25 | +if not os.path.isdir(os.path.join(data_path,"raw")): |
| 26 | + os.makedirs(os.path.join(data_path,"raw")) |
30 | 27 | df.to_csv(os.path.join(data_path,"raw/cal_housing.csv"), index=False) |
31 | 28 | ``` |
32 | 29 |
|
33 | 30 | Loading the data into Julia session: |
34 | 31 |
|
35 | 32 | ```{julia} |
36 | | -using CSV, DataFrames, Statistics, StatsBase |
37 | 33 | df = CSV.read(joinpath(data_path, "raw/cal_housing.csv"), DataFrame) |
38 | 34 | # Features: |
39 | 35 | X = Matrix(df[:,Not(:target)]) |
40 | | -dt = fit(ZScoreTransform, X, dims=1) |
| 36 | +dt = StatsBase.fit(ZScoreTransform, X, dims=1) |
41 | 37 | StatsBase.transform!(dt, X) |
42 | 38 | # Target: |
43 | 39 | y = df.target |
|
0 commit comments