Skip to content

Generalising the training module to accomodate varying input #20

Open
surbhigoel77 wants to merge 24 commits into
mainfrom
training
Open

Generalising the training module to accomodate varying input #20
surbhigoel77 wants to merge 24 commits into
mainfrom
training

Conversation

@surbhigoel77
Copy link
Copy Markdown
Collaborator

@surbhigoel77 surbhigoel77 commented Mar 4, 2024

Closes #19

In order to accomodate data coming from multiple sources, we need to replace the hardcoded inputs with variables whose values are extracted from the dataset during the code run.

Files that need updating:

  1. loaddata.py
  2. Model.py
  3. train.py

In train.py, the .nc data files are read and the variables are extracted and normalised. This part should be taken out of train.py and be kept in a separate preprocessing.py file, to keep train.py common for different data sources. train.py can be converted into a module instead. We can also have a test notebook in the repo.

@surbhigoel77 surbhigoel77 self-assigned this Mar 4, 2024
@surbhigoel77 surbhigoel77 added enhancement New feature or request Python-repo Part of the python NN repo labels Mar 4, 2024
@surbhigoel77 surbhigoel77 marked this pull request as draft March 4, 2024 18:19
@surbhigoel77
Copy link
Copy Markdown
Collaborator Author

surbhigoel77 commented Mar 4, 2024

In Model.py, the architecture of the NN is defined. The input/output layer of the NN is hardcoded with the number of input/output variables :

  • 93 vertical levels
  • 7 input variables that change with vertical levels
  • 4 input variables that do not change with vertical levels
  • 2 output variables that change with vertical levels

Assuming that the number of inputs would be different for all three sources, we need to generalise the code for Model.py as well (for varying no. of input/output nodes for different sources).

@surbhigoel77 surbhigoel77 requested a review from yqsun91 March 4, 2024 18:48
@surbhigoel77
Copy link
Copy Markdown
Collaborator Author

We can include the normalisation of the variables in the neural network architecture instead of train.py

@jatkinson1000 jatkinson1000 linked an issue Mar 18, 2024 that may be closed by this pull request
@surbhigoel77 surbhigoel77 marked this pull request as ready for review July 30, 2024 09:46
Copy link
Copy Markdown
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much better @surbhigoel77 with a few comments on there.
There also appear to be a number of conflicts with the main branch, so perhaps a rebase is needed (though I suggest testing this on a new branch.)

The big thing that is missing is documentation on how to use it which would be nice to allow handing over to collaborators for future re-use. Perhaps #14 can help?

The other thing to consider is some CI though this requires out collaborators understanding how to run formatters and linting. And I don;t think there are any tests yet...?
Was there any way to check that the same results are being generated after the refactor?

Comment thread newCAM_emulation/Model.py
Comment on lines +96 to +136
# if model is not None:
# # torch.save(model.state_dict(), 'conv_torch.pth')
# torch.save(model.state_dict(), 'trained_models/weights_conv')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dead code, or should it be wrapped in a conditional of some sort, or replaced by a comment?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure as to it's purpose, ask @yqsun91.

Comment thread newCAM_emulation/Model.py
Comment on lines +23 to +68
layers = []
input_size = in_ver * ilev + in_nover
for _ in range(hidden_layers):
layers.append(nn.Linear(input_size, hidden_size, dtype=torch.float64))
layers.append(nn.SiLU())
input_size = hidden_size
layers.append(nn.Linear(hidden_size, out_ver * ilev, dtype=torch.float64))
self.linear_stack = nn.Sequential(*layers)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief comment to summarise what this code is doing for those used to writing it in the previous layout might be useful here.

Comment thread newCAM_emulation/Model.py
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is looking much better, the net is much cleaner and the docstrings really help understand what is going on.

Comment thread newCAM_emulation/Model.py



class EarlyStopper:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately clear to me why this should be a class rather than a function, and whether it belons here with the Model, or if it would be better being moved to train.py with the training loop/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surbhigoel77 agrees this should be a singe function with inputs, and moved to train.py

Comment thread newCAM_emulation/NN_pred.py
dim_NNout = int(out_ver * ilev)
x_train = np.zeros([dim_NN, Ncol])
y_train = np.zeros([dim_NNout, Ncol])
target_var = ['UTGWSPEC', 'VTGWSPEC']
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have made the input names a variable input, but hardcoded the output variables.
Is it worth making both variable?

Comment thread newCAM_emulation/main.py
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this presents a much cleaner overview of the development, with details abstracted away into modules.

Comment thread newCAM_emulation/train.py
for t in range(epochs):
if t % 2 ==0:
print(f"Epoch {t+1}\n-------------------------------")
def train_loop(dataloader, model, loss_fn, optimizer):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ambiguous name?
This seems to be training a single epoch i.e. a single iteration of the full training loop?

Comment thread newCAM_emulation/train.py
if early_stopper.early_stop(val_loss):
print("BREAK!")
if early_stopper.early_stop(val_loss, model):
# print("BREAK!")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code?
Or, more likely, should we be writing some output to tell the user that training has finished because early stopping criteria was met?

Comment on lines +106 to +107
ilev : int
Number of vertical levels.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing some docs for input variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Python-repo Part of the python NN repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generalise the training module

2 participants