The Lion Learns to Write

Part 3 of 3: Techno-Feudalism and the AI Divide

There is an African proverb that I think about constantly:

"Until the lion learns how to write, the hunter will forever be the hero of the story."

I am a software engineer. I write code for a living. But for the past year, the most important thing I have written is a dictionary.

Kimbundu is one of the major Bantu languages of Angola. It is spoken by millions of people. It is culturally significant to the Ambundu people, to Angolan history, and to my family. If you go looking for serious digital resources for Kimbundu — structured data, searchable corpora, machine-readable lexicons — the landscape gets thin fast. The best references that exist are scanned books and old PDFs. No search engine can read them. No language model can learn from them.

I decided to change that. I took a historical Kimbundu–Portuguese dictionary, built a twelve-stage pipeline to digitise it — OCR, column segmentation, deterministic parsing, corpus reconstruction, AI-assisted audit, editorial merge — and published the result as a structured, searchable public resource at kimbundu.org.

The project taught me something about techno-feudalism that no amount of reading Varoufakis could have.

What the machines do not know

Large language models are trained on text. The text they are trained on determines what they know, how they reason, and what they treat as real. If a language is poorly digitised, the model will not speak it well. If a culture's history exists mainly in oral traditions, printed books, and institutional archives that have never been indexed, the model will not know that history.

This is not a bug. It is the system working as designed. The models are trained on what is digitally available, and what is digitally available reflects decades of investment, infrastructure, and institutional power that was never evenly distributed. English dominates because English-speaking institutions digitised their knowledge first, most extensively, and with the most resources. The same applies, to varying degrees, to Mandarin, Spanish, French, German.

Kimbundu was not digitised because there was no institutional incentive to digitise it. No major tech company's revenue depends on Kimbundu language data. No venture-funded startup saw a market in Angolan lexicography. The language fell through the cracks — not because it is unimportant, but because the systems that determine importance are calibrated to different values.

This is the cultural face of techno-feudalism. The cloud landlords do not deliberately exclude African languages and histories. They simply do not need them. The models are optimised for the populations that generate the most cloud rent. Everyone else is a rounding error.

The question of who tells the story

This is where the proverb bites hardest.

We are entering a period where more and more people will not search the web in the traditional sense. They will ask models. They will rely on generated summaries, synthetic answers, and systems that compress the entire breadth of human knowledge into a handful of probable responses. The model becomes the story. And whoever trained the model chose what stories it knows.

Consider Queen Nzinga — Ana de Sousa Nzinga Mbande — the 17th-century queen of the Ndongo and Matamba kingdoms in what is now Angola. She resisted Portuguese colonisation for decades. She was a military strategist, a diplomat, and a sovereign who refused to be subordinate. Her story matters. And it matters that it is told right.

Will a language model distinguish between the real Queen Nzinga and a fictional portrayal? Will it know the difference between her documented political strategies and a romanticised Western narrative? Will it know the Kimbundu words her court would have spoken, the cultural context in which her resistance took shape?

Maybe. If someone has digitised the right sources, tagged them properly, and made them available in a format the training pipeline can ingest. If not, the model will fill the gap with whatever adjacent information it has. That information will probably be in English or Portuguese, written from an external perspective, and missing the cultural specificity that makes the history meaningful.

This is already happening, with every model that ships. Every language that has not been digitised is a language that will be misrepresented or absent. Every history that exists only in books, oral traditions, and community knowledge is a history that the machines will not tell correctly — if they tell it at all.

The open-source illusion

The comforting narrative in AI discourse is that open-source models will democratise access. The weights are public. Anyone can download them. The playing field is level.

I have worked with these models. The playing field is not level.

An open-source model gives you weights. What it does not give you is the compute to run them — a serious model requires GPU infrastructure that costs tens of thousands of dollars per month. Nor the training data, which is often proprietary, scraped from sources that are themselves unevenly distributed across languages and cultures. Nor the institutional knowledge to fine-tune, evaluate, or deploy the model effectively.

For a developer in London or San Francisco, open-source AI is genuinely powerful. They have the hardware, the connectivity, the talent ecosystem, and the financial resources to make use of it. For a developer in Luanda or Kinshasa, the same open-source model is a PDF behind a paywall of infrastructure.

This is what "open" means in a techno-feudal economy. The blueprints are free. The land you need to build on is not. You can download the model. You cannot download the data centre.

And even if you could run the model, what would it know about your language, your history, your context? The training data reflects the same power asymmetries as everything else. Open-source models are trained overwhelmingly on English-language internet text. Kimbundu is not in there. Umbundu is not in there. Kikongo is not in there. The model is "open" in the sense that you are free to use a tool that was not built for you.

Why I built what I built

I did not build kimbundu.org to prove a point about techno-feudalism. I built it because the dictionary existed, it was trapped in a scan, and I had the skills to get it out.

But the process made the underlying dynamics visible in a way that reading about them never did.

The pipeline I built is technically interesting — OCR, column segmentation, deterministic parsing, corpus reconstruction, conservative LLM audit. But the most important design decision was not technical. It was political. I chose to keep the pipeline deterministic at its core. The AI was used as an auditor, not an editor. Every entry traces back to its source page and column. The corpus is inspectable, not generated.

I made that choice because the source material is irreplaceable. A historical dictionary is not data to be cleaned. It is a cultural record that deserves the same care as any archive. If I let a language model rewrite it, the corpus would become a product of Silicon Valley's probabilistic guesswork about a language it does not speak. That is exactly what I am arguing against.

The final corpus contains 10,679 entries. Each carries provenance and is searchable. And each is now legible to the machines that will increasingly mediate how people access knowledge.

That is a small thing. But it is the kind of small thing that, if done a thousand times for a thousand languages and cultural archives, starts to change the shape of what the machines know.

What this actually requires

I am sceptical of the policy recommendations that tend to appear at the end of articles like this. "International cooperation." "Public-private partnerships." "Inclusive AI development." These phrases are not wrong, but they are frictionless. They cost nothing to write and commit no one to anything.

So instead, here is what I think is actually needed, from the perspective of someone who builds things:

Digital cultural libraries are infrastructure. They are not side projects, academic curiosities, or nice-to-haves. They are the substrate on which future AI systems will either include or exclude entire populations. Every language that gets a structured, high-quality digital corpus becomes visible to the next generation of models. Every one that does not stays invisible. Governments, universities, and cultural institutions in the Global South need to treat digitisation as infrastructure investment, not as archival preservation.

Compute sovereignty matters too. It is not enough to train developers if the only place they can deploy is someone else's cloud. Regional compute infrastructure, even modest, changes the economic relationship from pure tenancy to something with more agency. This does not mean every country needs a hyperscaler. Regional cooperatives, university compute clusters, and publicly funded data centres should be part of the conversation alongside AWS and Azure. And data governance needs teeth to match. If African data trains models that African companies then pay to use, the extraction is circular. Data residency laws, licensing requirements for training data sourced from specific populations, transparency mandates on what goes into training corpora — these are the bare minimum for ensuring that the value chain does not flow entirely in one direction.

The work has to be done by the people closest to it. I could not have built kimbundu.org by contracting it out to a lab in Mountain View. The editorial judgement, the cultural knowledge, the understanding of what matters and what does not — these are not things a general-purpose model can supply. The lion has to write the story. No one else will get it right.

The dictionary is not the endpoint

Kimbundu.org is a dictionary. It is also a proof of concept. The broader goal is a digital cultural library: grammar resources, noun-class guides, stories, proverbs, songs, Bible texts in Kimbundu, eventually linked across languages and reference materials. A foundation that learners, researchers, and future tools can build on.

I do not want my children to grow up in a world where their access to Angolan culture is mediated by whatever fragments Silicon Valley happened to ingest. I want them to have access to real archives, real voices, and materials built with care and proximity to the people and histories involved.

That is a civilisational problem, not a technological one. And the technology either serves it or it doesn't.

Final thought

Varoufakis is right that something has changed. Whether you call it techno-feudalism or rentier capitalism or platform monopoly, the reality is the same: a small number of companies own the infrastructure that the rest of the world rents. AI accelerates this by raising the barrier to entry and deepening the dependency.

But structures are built by people. And they can be rebuilt by people.

The lion is learning to write. The question is whether anyone is building the library.

This is Part 3 of a three-part series on techno-feudalism and the AI divide. ← Part 1: Cloud Rent and the New Landlords | ← Part 2: The Infrastructure Trap

Adilson Bacelar is a senior software engineer based in the UK. He is the creator of kimbundu.org, a digital Kimbundu cultural library. You can find more of his work at ambacelar.com.