Open source, open collaboration, open data governance, and an inspectable technical route. Taiwan's local AI should not depend only on closed APIs, and its key language and speech capabilities should not be owned entirely by a few overseas platforms.
About
Hear, speak, read, and remember Taiwan.
OpenFormosa is a Taiwan-rooted open AI foundation-model initiative. The goal is not a model that merely answers in Traditional Chinese, but an AI infrastructure that understands Taiwan's context, voices, culture, documents, and everyday expression.
Why "OpenFormosa"?
The name carries the whole idea: an open, inspectable Taiwan AI foundation model — built from Taiwan, facing the world.
The beautiful island and its diversity: Traditional Chinese, Taigi, Hakka, Indigenous languages, Bopomofo, Tailo, local speech, internet language, historical memory, and the natural environment, all woven together.
OpenFormosa = an open Taiwan AI foundation model. Not a single product, but a model family, a data-engineering effort, a culture-preservation effort, and open infrastructure.
Why the jia-zhi bag?
The woven market bag is an everyday Taiwanese object — cheap, durable, and instantly recognizable. It is the perfect symbol for what OpenFormosa wants to build.
It carries Taiwan's corpora, voices, culture, documents, knowledge, and applications.
It lets model capability flow to researchers, developers, enterprises, and educators.
Not a one-off demo, but a base model to fine-tune, deploy, distill, and extend over and over.
A Taiwan symbol that does not shut others out — rooted in Taiwan, facing the world.
Its weave maps to a token lattice and data weave: text, language, speech, and signals woven into capability.
Taiwan is not a translation patch
Traditional Chinese is only the surface — Taiwan's accents, scripts, local terms, and documents all have their own texture.
Generic models often understand Traditional Chinese only as text. OpenFormosa focuses on the real texture of Taiwan: accents, code-switching, local terms, Bopomofo, Taigi, receipts, menus, public notices, and the living culture that appears in audio and documents.
It publishes training recipes, model cards, benchmark methods, and release evidence, so Taiwan-local capability can be inspected, not just asserted.
Taiwan context is foundation-model work.
This is not about translating a generic model into Traditional Chinese. Taiwan terms, speech, documents, and cultural memory need to be designed into the tokenizer, training recipe, evaluation suite, and task adapters from the beginning.
Not just Chinese
A model can write Traditional Chinese and still miss Taiwan: local institutions, place names, speech habits, internet tone, addresses, forms, and public-sector language.
Small can be infrastructure
A compact 1B-class model is not a toy if it is cheap to run, easy to fine-tune, private-deployable, and useful across ASR, TTS, OCR, RAG, and education workflows.
Open builds trust
Open means people can inspect model cards, evaluation sets, tokenizer choices, training recipes, benchmark artifacts, and release notes instead of guessing what happened.
Adapters, not chaos
Speech and document models should share a Taiwan language backbone, while ASR, TTS, and OCR keep their own adapters, heads, and evaluation rules.
OpenFormosa is an open AI model family rooted in Taiwan. Inspired by the jia-zhi bag, we weave Taiwan's voices, texts, images, and memories into open AI infrastructure.
We build compact, deployable, Taiwan-native language and multimodal models that understand Traditional Chinese, Taiwanese Mandarin, local speech, Bopomofo, Taigi, Taiwanese internet language, documents, and cultural context — so the world can see that Taiwan is not only using AI, but building its own.
— Hear Taiwan, speak Taiwan, read Taiwan, and remember Taiwan.