Generalising the training module to accomodate varying input by surbhigoel77 · Pull Request #20 · DataWaveProject/CAM_GW_pytorch_emulator

surbhigoel77 · 2024-03-04T18:19:03Z

Closes #19

In order to accomodate data coming from multiple sources, we need to replace the hardcoded inputs with variables whose values are extracted from the dataset during the code run.

Files that need updating:

loaddata.py
Model.py
train.py

In train.py, the .nc data files are read and the variables are extracted and normalised. This part should be taken out of train.py and be kept in a separate preprocessing.py file, to keep train.py common for different data sources. train.py can be converted into a module instead. We can also have a test notebook in the repo.

surbhigoel77 · 2024-03-04T18:32:27Z

In Model.py, the architecture of the NN is defined. The input/output layer of the NN is hardcoded with the number of input/output variables :

93 vertical levels
7 input variables that change with vertical levels
4 input variables that do not change with vertical levels
2 output variables that change with vertical levels

Assuming that the number of inputs would be different for all three sources, we need to generalise the code for Model.py as well (for varying no. of input/output nodes for different sources).

surbhigoel77 · 2024-03-11T16:19:58Z

We can include the normalisation of the variables in the neural network architecture instead of train.py

jatkinson1000

This is looking much better @surbhigoel77 with a few comments on there.
There also appear to be a number of conflicts with the main branch, so perhaps a rebase is needed (though I suggest testing this on a new branch.)

The big thing that is missing is documentation on how to use it which would be nice to allow handing over to collaborators for future re-use. Perhaps #14 can help?

The other thing to consider is some CI though this requires out collaborators understanding how to run formatters and linting. And I don;t think there are any tests yet...?
Was there any way to check that the same results are being generated after the refactor?

jatkinson1000 · 2024-08-05T08:32:36Z

+            # if model is not None:
+            #     # torch.save(model.state_dict(), 'conv_torch.pth')
+            #     torch.save(model.state_dict(), 'trained_models/weights_conv')


Is this dead code, or should it be wrapped in a conditional of some sort, or replaced by a comment?

Unsure as to it's purpose, ask @yqsun91.

jatkinson1000 · 2024-08-05T08:37:16Z

+        layers = []
+        input_size = in_ver * ilev + in_nover  
+        for _ in range(hidden_layers):
+            layers.append(nn.Linear(input_size, hidden_size, dtype=torch.float64))
+            layers.append(nn.SiLU())
+            input_size = hidden_size
+        layers.append(nn.Linear(hidden_size, out_ver * ilev, dtype=torch.float64))
+        self.linear_stack = nn.Sequential(*layers)


A brief comment to summarise what this code is doing for those used to writing it in the previous layout might be useful here.

jatkinson1000 · 2024-08-05T08:37:34Z

This file is looking much better, the net is much cleaner and the docstrings really help understand what is going on.

jatkinson1000 · 2024-08-05T08:39:32Z

-
-
-
+class EarlyStopper:


It's not immediately clear to me why this should be a class rather than a function, and whether it belons here with the Model, or if it would be better being moved to train.py with the training loop/

@surbhigoel77 agrees this should be a singe function with inputs, and moved to train.py

jatkinson1000 · 2024-08-05T08:51:07Z

+    dim_NNout = int(out_ver * ilev)
+    x_train = np.zeros([dim_NN, Ncol])
+    y_train = np.zeros([dim_NNout, Ncol])
+    target_var = ['UTGWSPEC', 'VTGWSPEC']


You have made the input names a variable input, but hardcoded the output variables.
Is it worth making both variable?

jatkinson1000 · 2024-08-05T08:52:48Z

I think this presents a much cleaner overview of the development, with details abstracted away into modules.

jatkinson1000 · 2024-08-05T08:56:26Z

-    for t in range(epochs):
-        if t % 2 ==0:
-            print(f"Epoch {t+1}\n-------------------------------")
+def train_loop(dataloader, model, loss_fn, optimizer):


Ambiguous name?
This seems to be training a single epoch i.e. a single iteration of the full training loop?

jatkinson1000 · 2024-08-05T08:57:40Z

-        if early_stopper.early_stop(val_loss):
-            print("BREAK!")
+        if early_stopper.early_stop(val_loss, model):
+            # print("BREAK!")


Dead code?
Or, more likely, should we be writing some output to tell the user that training has finished because early stopping criteria was met?

jatkinson1000 · 2024-08-05T09:31:15Z

+    ilev : int
+        Number of vertical levels.


Missing some docs for input variables.

…iles to loaddata from train.py

… main file

…train and deleted commented out code in model

surbhigoel77 self-assigned this Mar 4, 2024

surbhigoel77 added enhancement New feature or request Python-repo Part of the python NN repo labels Mar 4, 2024

surbhigoel77 marked this pull request as draft March 4, 2024 18:19

surbhigoel77 requested a review from yqsun91 March 4, 2024 18:48

surbhigoel77 mentioned this pull request Feb 26, 2024

Generalise the training module #19

Closed

jatkinson1000 linked an issue Mar 18, 2024 that may be closed by this pull request

Generalise the training module #19

Closed

surbhigoel77 marked this pull request as ready for review July 30, 2024 09:46

surbhigoel77 requested a review from jatkinson1000 July 30, 2024 09:47

jatkinson1000 requested changes Aug 5, 2024

View reviewed changes

jatkinson1000 reviewed Aug 5, 2024

View reviewed changes

surbhigoel77 added 17 commits August 5, 2024 13:10

Updating README

3da88a0

basic linting

4cc821f

basic linting

b29d46a

Added ruff linting tool

793f693

Updating train.py

992cb06

Updating train.py

da1eb23

Reverted the changes introduced by 9dde06f

5cf19cc

Adding normalisation in the model definition

0b595cc

created convection subfolder in demodata and moving reading of data f…

88aa237

…iles to loaddata from train.py

changed data_loader function

8b6175c

Added a cooment on how the data will flow from loaddata to model.py

2639bbe

Updated the notebook - work in progess

fec3e25

Updated loaddata, Model, train files and removed NN_pred, and added a…

28dd18f

… main file

Updated Model.py

b0ca08d

removed a side comment

ae1ffa7

removed redundant NN_pred.py, replaced with main.py

460db98

removed

e85a661

surbhigoel77 added 7 commits August 5, 2024 14:05

Updated name of the file Model.py to model.py

83e88ab

removed modelrun file, unused here

8f53dba

Changed hard coded input output values to variables

08b6987

Removed hard-coded values in data_loader

667d5f7

Updated model saving to trained_models folder

0d71a7d

Updated reshaping in loaddata dataloader, inccluded print message in …

81f0167

…train and deleted commented out code in model

Rebasing with main

fd780f5

surbhigoel77 force-pushed the training branch from cb8c6ff to fd780f5 Compare August 5, 2024 13:11

omarjamil self-requested a review November 19, 2024 13:48

Conversation

surbhigoel77 commented Mar 4, 2024 • edited by jatkinson1000 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

surbhigoel77 commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

surbhigoel77 commented Mar 11, 2024

Uh oh!

jatkinson1000 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

surbhigoel77 commented Mar 4, 2024 •

edited by jatkinson1000

Loading

surbhigoel77 commented Mar 4, 2024 •

edited

Loading

jatkinson1000 left a comment •

edited

Loading