How It Started
TableForge started as a small internal script.
I was working on a budgeting product that depended on bank and mobile money statements. The only way to get historical data was through PDF exports. Every bank. Every wallet. Every format — different.
At first, I assumed this was a solved problem.
It wasn’t.
The First Attempt
The initial approach was straightforward:
- Extract text from the PDF
- Split lines
- Guess columns based on spacing
- Export to CSV
It worked — until it didn’t.
A single document change would break everything:
- A longer description pushed amounts into the next column
- Headers repeated halfway down the page
- Transactions wrapped onto two lines
- Totals appeared where rows were expected
Worse, some failures were silent.
The spreadsheet looked right, but values were shifted.
That was unacceptable.
The Real Problem
The mistake wasn’t the code.
The mistake was treating PDFs like structured data.
PDFs don’t contain rows and columns. They contain positioned text. What looks like a table to a human is just coordinates and font metrics to a machine.
Once I accepted that, the problem became clearer: you can’t reliably convert a PDF unless you understand its visual structure first.
Rebuilding the Pipeline
Instead of optimizing for speed, I rebuilt the pipeline for correctness.
The new approach:
- Extract raw text with positional data
- Group text into lines based on Y-axis proximity
- Infer columns using consistent X-axis boundaries
- Merge wrapped rows deliberately
- Validate numeric columns before export
Each step was explicit and debuggable.
If something went wrong, I could see where it went wrong.
Why It Became TableForge
At some point, this stopped being a one-off solution.
The same problems appeared again and again:
- Different banks
- Different layouts
- Same failure modes
I extracted the logic into a reusable tool and named it TableForge — because the output needed to be shaped, not guessed.
The goal wasn’t perfect automation. The goal was repeatable, explainable results.
What Changed
With TableForge:
- Failed conversions were obvious, not silent
- Columns stayed aligned across pages
- CSV and XLSX exports required little to no cleanup
- Debugging took minutes instead of hours
More importantly, I trusted the output again.
What I Learned
- PDF-to-table conversion is not a data problem — it’s a layout problem
- “Mostly correct” is worse than visibly wrong
- Deterministic parsing beats clever heuristics for financial data
- Tooling should surface uncertainty, not hide it
TableForge exists because these lessons were learned the hard way.
Where It’s Used Now
TableForge now sits at the start of data pipelines:
- Importing bank statements
- Converting reports for analysis
- Feeding downstream systems that expect clean spreadsheets
It’s not flashy. It’s reliable.
And that’s the point.