Add trim_cr utility function and apply it to CSV parsing in multiple files#417
Open
drbergman wants to merge 1 commit into
Open
Add trim_cr utility function and apply it to CSV parsing in multiple files#417drbergman wants to merge 1 commit into
drbergman wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses CRLF (\r\n) line endings causing trailing \r characters to remain after std::getline on Unix/macOS, which breaks CSV parsing and downstream string comparisons in PhysiCell/BioFVM loaders and parsers.
Changes:
- Added a shared inline
trim_cr(std::string&)utility inBioFVM/BioFVM_utilities.h. - Applied
trim_crimmediately afterstd::getlinein multiple CSV-reading code paths (cells CSV loaders, rules CSV parsers, and substrate initial conditions CSV loader). - Added an explicit include of
BioFVM_utilities.hwhere needed.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
modules/PhysiCell_geometry.cpp |
Trims trailing \r for cell CSV parsing (v1/v2 and version detection). |
core/PhysiCell_rules.cpp |
Trims trailing \r for multiple rules CSV parser versions. |
BioFVM/BioFVM_utilities.h |
Introduces trim_cr as a reusable inline helper. |
BioFVM/BioFVM_microenvironment.cpp |
Trims trailing \r in substrate initial conditions CSV loader and adds required include. |
Comments suppressed due to low confidence (1)
BioFVM/BioFVM_microenvironment.cpp:1479
char c = line.c_str()[0];(and later fixed-position indexing intoline) assumesstd::getlinesucceeded and that the line has enough characters. If the CSV is empty or the first line is blank/short, this is undefined behavior. Add a guard forline.empty()(and length checks before using[2]/[4]) and emit a clear error if the header/first row is missing or malformed.
std::string line;
std::getline( file , line );
trim_cr(line);
char c = line.c_str()[0];
std::vector<int> substrate_indices;
bool header_provided = false;
if( c == 'X' || c == 'x' )
{
// do not support this with a header yet
if ((line.c_str()[2] != 'Y' && line.c_str()[2] != 'y') || (line.c_str()[4] != 'Z' && line.c_str()[4] != 'z'))
{
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix CRLF line endings breaking CSV parsers on Unix/macOS
Problem
On Unix and macOS,
std::getlinedoes not strip the carriage return (\r) from lines with Windows-style CRLF (\r\n) endings. This is a common real-world scenario: Excel exports CSV with CRLF line endings even on macOS, and files authored on Windows carry the same encoding. Every line retains a trailing\raftergetline, causing string comparisons to silently fail — e.g."default\r" != "default","oxygen\r" != "oxygen". This affects cell loading, rules parsing, and substrate initial conditions.Closes / related: #272
Prior to this fix, loading a CRLF cell CSV would produce console output like:
The first cell is skipped because
"default\r"does not match"default". The last line of the file succeeds because it has no trailing\r(no CRLF after the final newline). The garblednot found!ll_Definition for defaultline is caused by the\rreturning the cursor to column 0 mid-output and overwriting part of the previous message.What changed
Added
trim_cr(std::string& line)as a sharedinlineutility inBioFVM/BioFVM_utilities.h. It strips a trailing\rif present and is a no-op on LF-only files. It is called immediately after everystd::getlinethat reads from a file in the affected parsers.Files changed
BioFVM/BioFVM_utilities.hinline void trim_cr(std::string& line)BioFVM/BioFVM_microenvironment.cpp#include "BioFVM_utilities.h"; calltrim_crafter each filegetlineinload_initial_conditions_from_csvmodules/PhysiCell_geometry.cpptrim_crafter each filegetlineinload_cells_csv,load_cells_csv_v1,load_cells_csv_v2core/PhysiCell_rules.cpptrim_crafter each filegetlineinparse_csv_rules_v0,parse_csv_rules_v1,parse_csv_rules_v3Testing
Create a minimal cell CSV:
To save as CRLF in VS Code: Click the
LForCRLFindicator in the bottom-right status bar → selectCRLF→ save.To save as LF in VS Code: Same control → select
LF→ save.Confirm cells load correctly with both versions. Prior to this fix, the CRLF version would fail to match cell type names and skip all cells or exit with an error.