Skip to content

Create _validate_map_items_post_run function#594

Open
dale-wahl wants to merge 1 commit into
masterfrom
move_map_item_warning
Open

Create _validate_map_items_post_run function#594
dale-wahl wants to merge 1 commit into
masterfrom
move_map_item_warning

Conversation

@dale-wahl
Copy link
Copy Markdown
Member

Run a validation on map_item after a new DataSet is created in BasicProcessor.

Why? The existing warn_unmappable with the option to send a warning to admins would fire every time a dataset called iterate items (IF a processor object is passed to iterate_items which may be another issue worth addressing as it is no required and other than this only used to check for interruption). This meant every processor run on a DataSet with unmappable items would create an alert. I recently created a more robust alert in the Search worker but it felt like we were missing any other DataSet that might have a map_item issue (e.g. other datasources with changes to form or potentially processors that use map_item such as the LLM prompter).

This is mostly unblocking (it will hold up the queue for the same processor type, but subsequent processors should run prior).

Error handling and alerting improvements:

  • Moved the new methodology for Zeeschuimer datasources in Search for imported files to a new _validate_map_item_post_run method in BasicProcessor
  • Removed the accumulation and emission of rolled-up admin alerts for unmappable items from the import_from_file method in search.py and simplified the per-item logging (that just updated DataSet logs now).

Hopefully this streamlines error reporting and reduces redundant admin notifications.

This does have a downside: I left the current map_item check in Search.import_from_file. We actually modify each item there so we are already iterating and checking map_item there as well allows us to update the status and clearly inform the user (which I cannot do in the validation function if I do it post finishing the dataset). So we iterate twice now through those DataSets. There may be a better solution for that so perhaps the review could give it a brief thought. This still seems cleaner and more comprehensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant