Add util functions for masking in select#97
Conversation
There was a problem hiding this comment.
Pull Request Overview
This pull request introduces two new utility scalar user-defined functions (UDFs) for DataFusion that provide conditional masking based on null values: mask_if_null and mask_if_not_null. These functions are designed to return the second argument conditionally based on the nullability of the first argument.
- Added
mask_if_nullUDF that returns the second argument when the first is null, otherwise returns null - Added
mask_if_not_nullUDF that returns the second argument when the first is not null, otherwise returns null - Registered both UDFs in the utility function registry for availability throughout the codebase
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| beacon-functions/src/util/mod.rs | Added module declarations and registered the new UDFs in the util_udfs function |
| beacon-functions/src/util/mask_if_null.rs | Implemented mask_if_null UDF with conditional logic, type checking, and simplification |
| beacon-functions/src/util/mask_if_not_null.rs | Implemented mask_if_not_null UDF with conditional logic, type checking, and simplification |
Comments suppressed due to low confidence (4)
beacon-functions/src/util/mask_if_null.rs:1
- The
CaseExprimport is unused in this file. Consider removing it to keep imports clean.
use std::sync::Arc;
beacon-functions/src/util/mask_if_null.rs:1
- The
is_nullimport is unused in this file. The code usesleft.is_null()method instead. Consider removing the unused import.
use std::sync::Arc;
beacon-functions/src/util/mask_if_not_null.rs:1
- The
CaseExprimport is unused in this file. Consider removing it to keep imports clean.
use std::sync::Arc;
beacon-functions/src/util/mask_if_not_null.rs:1
- The
is_nullimport is unused in this file. The code usesleft.is_not_null()method instead. Consider removing the unused import.
use std::sync::Arc;
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| @@ -0,0 +1,99 @@ | |||
| use std::sync::Arc; | |||
There was a problem hiding this comment.
The std::sync::Arc import is unused in this file. Consider removing it to keep imports clean.
| use std::sync::Arc; |
| @@ -0,0 +1,99 @@ | |||
| use std::sync::Arc; | |||
|
|
|||
| use arrow::datatypes::{DataType, Field, FieldRef}; | |||
There was a problem hiding this comment.
The Field and FieldRef imports from arrow::datatypes are unused in this file. Consider removing them to keep imports clean.
| use arrow::datatypes::{DataType, Field, FieldRef}; | |
| use arrow::datatypes::DataType; |
| logical_expr::{ | ||
| conditional_expressions::CaseBuilder, | ||
| simplify::{ExprSimplifyResult, SimplifyInfo}, | ||
| ColumnarValue, ExprSchemable, ReturnFieldArgs, ScalarUDF, ScalarUDFImpl, Signature, |
There was a problem hiding this comment.
The ExprSchemable and ReturnFieldArgs imports are unused in this file. Consider removing them to keep imports clean.
| ColumnarValue, ExprSchemable, ReturnFieldArgs, ScalarUDF, ScalarUDFImpl, Signature, | |
| ColumnarValue, ScalarUDF, ScalarUDFImpl, Signature, |
| // If the first argument is known to be non-null, we can simplify to the second argument | ||
| if let Ok(false) = info.nullable(&left) { | ||
| return Ok(ExprSimplifyResult::Simplified(right)); |
There was a problem hiding this comment.
The simplification logic is incorrect for mask_if_null. When the first argument is known to be non-null, the function should return null, not the second argument. The correct simplification should be Ok(ExprSimplifyResult::Simplified(Expr::Literal(datafusion::scalar::ScalarValue::Null))).
| // If the first argument is known to be non-null, we can simplify to the second argument | |
| if let Ok(false) = info.nullable(&left) { | |
| return Ok(ExprSimplifyResult::Simplified(right)); | |
| // If the first argument is known to be non-null, we can simplify to null | |
| if let Ok(false) = info.nullable(&left) { | |
| return Ok(ExprSimplifyResult::Simplified(Expr::Literal( | |
| datafusion::scalar::ScalarValue::Null, | |
| None, | |
| ))); |
| @@ -0,0 +1,99 @@ | |||
| use std::sync::Arc; | |||
There was a problem hiding this comment.
The std::sync::Arc import is unused in this file. Consider removing it to keep imports clean.
| use std::sync::Arc; |
| @@ -0,0 +1,99 @@ | |||
| use std::sync::Arc; | |||
|
|
|||
| use arrow::datatypes::{DataType, Field, FieldRef}; | |||
There was a problem hiding this comment.
The Field and FieldRef imports from arrow::datatypes are unused in this file. Consider removing them to keep imports clean.
| use arrow::datatypes::{DataType, Field, FieldRef}; | |
| use arrow::datatypes::DataType; |
| logical_expr::{ | ||
| conditional_expressions::CaseBuilder, | ||
| simplify::{ExprSimplifyResult, SimplifyInfo}, | ||
| ColumnarValue, ExprSchemable, ReturnFieldArgs, ScalarUDF, ScalarUDFImpl, Signature, |
There was a problem hiding this comment.
The ExprSchemable and ReturnFieldArgs imports are unused in this file. Consider removing them to keep imports clean.
| ColumnarValue, ExprSchemable, ReturnFieldArgs, ScalarUDF, ScalarUDFImpl, Signature, | |
| ColumnarValue, ScalarUDF, ScalarUDFImpl, Signature, |
This pull request introduces two new utility scalar user-defined functions (UDFs) for DataFusion:
mask_if_nullandmask_if_not_null. These functions are designed to conditionally mask values based on their nullability, and are now registered for use throughout the codebase.New conditional masking UDFs:
mask_if_nullUDF inmask_if_null.rs, which returns the second argument if the first argument is null, otherwise returns null. This function includes logic for simplification and type checking.mask_if_not_nullUDF inmask_if_not_null.rs, which returns the second argument if the first argument is not null, otherwise returns null. This function also supports simplification and type checking.Integration with utility UDF registry:
util_udfsfunction inmod.rs, making them available for use wherever utility UDFs are loaded.