Portfolio | ItsBilly

How It Started

TableForge started as a small internal script.

I was working on a budgeting product that depended on bank and mobile money statements. The only way to get historical data was through PDF exports. Every bank. Every wallet. Every format — different.

At first, I assumed this was a solved problem.

It wasn’t.

The First Attempt

The initial approach was straightforward:

Extract text from the PDF
Split lines
Guess columns based on spacing
Export to CSV

It worked — until it didn’t.

A single document change would break everything:

A longer description pushed amounts into the next column
Headers repeated halfway down the page
Transactions wrapped onto two lines
Totals appeared where rows were expected

Worse, some failures were silent.
The spreadsheet looked right, but values were shifted.

That was unacceptable.

The Real Problem

The mistake wasn’t the code.

The mistake was treating PDFs like structured data.

PDFs don’t contain rows and columns. They contain positioned text. What looks like a table to a human is just coordinates and font metrics to a machine.

Once I accepted that, the problem became clearer: you can’t reliably convert a PDF unless you understand its visual structure first.

Rebuilding the Pipeline

Instead of optimizing for speed, I rebuilt the pipeline for correctness.

The new approach:

Extract raw text with positional data
Group text into lines based on Y-axis proximity
Infer columns using consistent X-axis boundaries
Merge wrapped rows deliberately
Validate numeric columns before export

Each step was explicit and debuggable.

If something went wrong, I could see where it went wrong.

Why It Became TableForge

At some point, this stopped being a one-off solution.

The same problems appeared again and again:

Different banks
Different layouts
Same failure modes

I extracted the logic into a reusable tool and named it TableForge — because the output needed to be shaped, not guessed.

The goal wasn’t perfect automation. The goal was repeatable, explainable results.

What Changed

With TableForge:

Failed conversions were obvious, not silent
Columns stayed aligned across pages
CSV and XLSX exports required little to no cleanup
Debugging took minutes instead of hours

More importantly, I trusted the output again.

What I Learned

PDF-to-table conversion is not a data problem — it’s a layout problem
“Mostly correct” is worse than visibly wrong
Deterministic parsing beats clever heuristics for financial data
Tooling should surface uncertainty, not hide it

TableForge exists because these lessons were learned the hard way.

Where It’s Used Now

TableForge now sits at the start of data pipelines:

Importing bank statements
Converting reports for analysis
Feeding downstream systems that expect clean spreadsheets

It’s not flashy. It’s reliable.

And that’s the point.