A Mojo DataFrame library with a pandas-compatible API.
bison aims to be a drop-in replacement for pandas. The goal is that swapping
import pandas as pd for import bison as bs requires minimal changes to
calling code. The library provides the full pandas DataFrame and Series API;
methods that are not yet implemented natively raise an error until they are
ported to Mojo.
If you are writing a Mojo program that processes tabular data, the alternative to bison is wrapping pandas at the Python boundary — every DataFrame call then crosses the Python/Mojo language boundary and carries Python object overhead. bison runs natively in Mojo: data lives in Apache Arrow-backed columns, aggregations use SIMD kernels from marrow, and there is no Python interop in the hot path.
The pandas-compatible API means the transition cost is low. For most scripts
replacing import pandas as pd with import bison as bs is the bulk of the
change. See Migrating from pandas for the full
walkthrough. Methods not yet implemented natively raise immediately with a
clear message so you know exactly what to work around.
Most of the pandas DataFrame and Series API is implemented natively in Mojo across DataFrame, Series, GroupBy, string and datetime accessors, native CSV and JSON I/O, and reshape. A small number of DataFrame methods remain as stubs and raise:
bison.<method>: not implemented
from_pandas() and to_pandas() are available for wrapping and unwrapping
pandas objects.
Native I/O highlights:
read_csv— pure Mojo reader with automatic dtype inference (bool>int64>float64>String), configurable delimiter,usecols,nrows,skiprows, and NA-value handling.read_json— pure Mojo reader supportingrecords,split,columns,index,valuesorient formats, and JSON Lines / NDJSON (lines=True).read_parquet/to_parquet— native Parquet I/O via marrow (Apache Arrow for Mojo) for int64, float64, bool, and string columns. Falls back to pandas for object columns or when row-group filters are supplied.read_ipc/write_ipc— Arrow IPC (Feather v2) via PyArrow interop.read_excel— delegates to pandas (requiresopenpyxlorxlrd).
Mojo is distributed via the MAX conda channel. Pixi manages the environment.
curl -fsSL https://pixi.sh/install.sh | sh
git clone https://github.com/JRedrupp/bison.git
cd bison
pixi installimport bison as bs
from std.python import Python
def main() raises:
var pd = Python.import_module("pandas")
var pd_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
# Wrap a pandas DataFrame
var df = bs.DataFrame.from_pandas(pd_df)
print(df.shape()) # (3, 2)
print(df.columns()) # ["a", "b"]
# Sum each column natively
var totals = df.sum() # Series: a=6.0, b=15.0
# Get the backing pandas object back
var original = df.to_pandas()Read a CSV file directly without pandas:
import bison as bs
def main() raises:
# Dtype is inferred automatically: bool > int64 > float64 > String
var df = bs.read_csv("data.csv")
print(df.shape())
print(df["price"].mean())import bison as bs
print(bs.__version__)pixi run testbison is faster than pandas for element-wise operations. Complex operations — groupby, sort, merge — are slower at this stage and are being actively optimized.
Ratio = bison time / pandas time. Values below 1.0 mean bison is faster.
| Operation | bison | pandas | Ratio |
|---|---|---|---|
| iloc row | 0.002 ms | 0.041 ms | 0.04x |
| series_mean | 0.047 ms | 0.134 ms | 0.35x |
| loc slice | 0.014 ms | 0.031 ms | 0.45x |
| fillna | 0.037 ms | 0.069 ms | 0.53x |
| series_sum | 0.055 ms | 0.096 ms | 0.57x |
| csv roundtrip | 506 ms | 343 ms | 1.48x |
| series_apply | 0.817 ms | 0.221 ms | 3.70x |
| sort_values | 24.4 ms | 5.68 ms | 4.29x |
| merge | 13.1 ms | 2.0 ms | 6.56x |
| groupby_sum | 64.5 ms | 4.4 ms | 14.75x |
Benchmarks run at 100k rows (aggregation, groupby, sort) and 10k rows (apply). Full history and per-commit charts at jredrupp.github.io/bison.
pixi run bench # run the full suite
pixi run gen-report # merge results into docs/data.jsonFor most scripts the required changes are minimal:
- Replace
import pandas as pdwithimport bison as bs. - Wrap your entry point in
def main() raises:. - Declare variables with
var.
# Before — Python + pandas
import pandas as pd
df = pd.read_csv("sales.csv")
totals = df.groupby("region")["revenue"].sum()
df_clean = df.dropna(subset=["revenue"])
df.to_parquet("out.parquet")# After — Mojo + bison
import bison as bs
def main() raises:
var df = bs.read_csv("sales.csv")
var totals = df.groupby("region")["revenue"].sum()
var df_clean = df.dropna(subset=List[String]("revenue"))
df.to_parquet("out.parquet")If a method you rely on is not yet implemented natively, bison raises
immediately with a clear message (bison.<method>: not implemented).
See docs/migrating-from-pandas.md for a full walkthrough covering common patterns, known differences, and how to work around unimplemented methods.
These methods each provide two overloads:
String-based dispatches known function names to native methods. Unknown names raise an error listing the supported set:
var totals = df.apply("sum", axis=0) # -> Series (delegates to agg)
var row_sums = df.apply("sum", axis=1) # -> Series (row-wise)
var abs_df = df.applymap("abs") # -> DataFrame (element-wise)
var log_df = df.applymap("log") # -> DataFrame (element-wise)
var piped = df.pipe("abs") # -> DataFrameSupported element-wise string names for applymap, transform, and pipe:
abs, round, sqrt, exp, log, log10, ceil, floor, neg.
transform additionally supports cumsum, cumprod, cummin, cummax.
Compile-time accepts a user-defined function as a type parameter. This is fully native Mojo with no string dispatch:
def double(v: Float64) -> Float64:
return v * 2.0
var result = df.apply[double]() # element-wise on numeric columns
var mapped = df.applymap[double]() # same as apply[double]()
def add_rank(d: DataFrame) raises -> DataFrame:
# whole-DataFrame transform
return d.abs()
var piped = df.pipe[add_rank]()Series.apply and Series.map also accept compile-time functions:
var result = s.apply[double]()Runtime closures that capture variables are not yet supported in parameter
type constraints. Use clip() or where() for threshold-style operations
in the meantime.
- Getting started — installation, first DataFrame, core operations
- Migrating from pandas — step-by-step guide for porting pandas scripts
- API reference — full method listing with native/stub status
- Architecture — column storage, type predicates, marrow integration
- Mojo patterns — language-specific tips and pitfalls
- Testing — how to run and write tests
- CI/CD — GitHub Actions, pre-commit hooks, benchmarks, releasing
- Profiling — perf, samply, and callgrind guides
- Query/eval spec — grammar and null semantics
See CONTRIBUTING.md for the full guide.
- Pick a stub method (any
_not_implementedcall inbison/). - Replace the
_not_implementedcall with a native Mojo implementation. - Update the corresponding test: remove the "expect raise" assertion and add real assertions comparing against pandas output.
- Submit a pull request.
Apache 2.0. See LICENSE.