# Projects

Canonical URL: https://www.ambacelar.com/projects

Things I've built, or am still building.

## Kimbundu.org

**Technologies:** Next.js, TypeScript, Python, Tesseract OCR, Ollama, OpenAI API, Tailwind CSS, Vercel, JSON Corpus Engineering

Built a digital language-preservation platform for Kimbundu by transforming a historical scanned Kimbundu–Portuguese dictionary into a structured, auditable lexical corpus. Designed a multi-stage pipeline covering PDF page extraction, column segmentation, OCR capture, deterministic parsing, corpus reconstruction, conservative LLM auditing, and editorial merge workflows. Produced a final merged corpus of 10,679 entries with provenance and review tracking, then published a website-ready public dataset powering kimbundu.org.

- Built a multi-stage OCR → corpus reconstruction pipeline for a historical dictionary spanning hundreds of scanned pages
- Produced a final merged editorial corpus of 10,679 entries with provenance, cleanup metadata, and review workflows
- Published a public dictionary dataset and website experience to support Kimbundu language preservation

[Live site](https://www.kimbundu.org) | [Case study](https://www.ambacelar.com/projects/kimbundu-org)

