- Introduction
- Migrating from Ruby CSV
- Ruby CSV Pitfalls
- Parsing Strategy
- The Basic Read API
- The Basic Write API
- Batch Processing
- Configuration Options
- Row and Column Separators
- Header Transformations
- Header Validations
- Column Selection
- Data Transformations
- Value Converters
- Bad Row Quarantine
- Instrumentation Hooks
- Examples
- Real-World CSV Files
- SmarterCSV over the Years
- Release Notes
SmarterCSV automatically normalizes the values in each row. All transformations are configurable — most are enabled by default because they're the right behavior for the vast majority of CSV files.
Transformations run in this order for every row:
| Step | Option | Default | What it does |
|---|---|---|---|
| 1 | strip_whitespace |
true |
Strips leading/trailing whitespace from all values (and headers) at parse time |
| 2 | nil_values_matching |
nil |
Sets values matching the regexp to nil |
| 3 | remove_empty_values |
true |
Removes keys whose value is nil or blank |
| 4 | remove_zero_values |
false |
Removes keys whose value is numeric zero |
| 5 | convert_values_to_numeric |
true |
Converts numeric-looking strings to Integer or Float |
| 6 | value_converters |
nil |
Applies per-key custom converter lambdas or classes |
| 7 | remove_empty_hashes |
true |
Drops rows that are entirely empty after all transformations |
Steps 2–6 run per field in order.
value_convertersreceive the value after numeric conversion — guard against receivingInteger/Floatif your converter expects a string.
Default: true
Strips leading and trailing whitespace from all header names and all field values at parse time, before any other transformation runs.
# CSV with padded values:
# name, score
# Alice , 42
# Bob , 0
data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", score: 0}]
# ↑ "Alice " stripped to "Alice", " 42" stripped to "42" then converted
data = SmarterCSV.process(file, strip_whitespace: false)
# => [{"name"=>"Alice ", " score"=>" 42"}, ...]
# ↑ whitespace preserved in both headers and valuesDefault: nil (disabled)
Set values matching the given regular expression to nil. Combined with the default remove_empty_values: true, matching values are removed from the result hash. With remove_empty_values: false, the key is retained with a nil value — useful when you need to distinguish "field was absent" from "field had a sentinel value".
# Treat common null sentinels as nil and remove them
data = SmarterCSV.process(file, nil_values_matching: /\A(NULL|N\/A|NA|#N\/A|\(null\))\z/i)
# Nil-ify but retain the key (don't remove)
data = SmarterCSV.process(file,
nil_values_matching: /\A(NULL|N\/A)\z/i,
remove_empty_values: false)
# => [{name: "Alice", score: nil}] ← key retained with nil value
# Remove Excel error values
data = SmarterCSV.process(file, nil_values_matching: /\A(#VALUE!|#REF!|#DIV\/0!|NaN)\z/)Deprecated:
remove_values_matching:still works but emits a deprecation warning. Usenil_values_matching:instead.
Default: true
Removes key/value pairs where the value is nil or an empty string after strip_whitespace and nil_values_matching have run. This is why SmarterCSV result hashes only contain keys with actual values — sparse CSV rows don't produce hashes cluttered with nil entries.
# CSV: name,score,notes
# Alice,42,
# Bob,,great player
data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", notes: "great player"}]
# ↑ empty :notes and :score keys are dropped automatically
data = SmarterCSV.process(file, remove_empty_values: false)
# => [{name: "Alice", score: 42, notes: nil}, {name: nil, score: nil, notes: "great player"}]Default: false
When enabled, removes key/value pairs where the value is numeric zero (0, 0.0, "0", "0.0"). Useful when zero and absent mean the same thing in your domain.
# CSV: product,quantity,discount
# Widget,10,0
# Gadget,0,5
data = SmarterCSV.process(file, remove_zero_values: true)
# => [{product: "Widget", quantity: 10}, {product: "Gadget", discount: 5}]
# ↑ :discount=>0 and :quantity=>0 removedDefault: true
Converts string values that look like integers or floats to the appropriate numeric type. This is one of the most common sources of silent data loss if not configured carefully — fields like ZIP codes, phone numbers, and account numbers with leading zeros will be silently corrupted if not excluded.
data = SmarterCSV.process(file)
# "42" => 42 (Integer)
# "3.14" => 3.14 (Float)
# "01234" => 1234 ← leading zero lost! exclude this column
# Exclude specific columns from numeric conversion
data = SmarterCSV.process(file,
convert_values_to_numeric: { except: [:zip, :phone, :account_number] })
# => [{zip: "01234", phone: "800-555-0100", amount: 99.99}]
# Only convert specific columns (all others stay as strings)
data = SmarterCSV.process(file,
convert_values_to_numeric: { only: [:quantity, :price] })Default: true
After all per-field transformations, removes rows that have no remaining key/value pairs. This handles blank lines and rows where every field was empty or matched nil_values_matching.
# CSV with a blank line between records:
# name,score
# Alice,42
#
# Bob,99
data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", score: 99}]
# ↑ blank line silently dropped
data = SmarterCSV.process(file, remove_empty_hashes: false)
# => [{name: "Alice", score: 42}, {}, {name: "Bob", score: 99}]For type conversions beyond numeric (dates, booleans, currency, etc.), use value_converters. They run last in the pipeline, after numeric conversion. See Value Converters for full documentation.
data = SmarterCSV.process(file, value_converters: {
date: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },
active: ->(v) { v&.match?(/\Atrue\z/i) },
})PREVIOUS: Column Selection | NEXT: Value Converters | UP: README