OpenFormosa

OpenFormosa 台灣取向模型訓練與評測工作站，先做 OpenFormosa-Base，再接 ASR、TTS、OCR 和台灣在地 benchmark。 https://openformosa.com/ 2026-06-23T00:00:00+08:00 BlueMagpie-TTS: Taiwanese-accent, Chinese–English code-switching speech synthesis https://openformosa.com/blog/2026/06/23/bluemagpie-tts/ 2026-06-23T00:00:00+08:00 2026-06-23T00:00:00+08:00

An open Taiwanese-accent text-to-speech model that handles Chinese–English code-switching — keep VoxCPM's acoustic stack, swap in the Barbet language model, and cut character error rate by about 58% on a hard test set.

Barbet 1B Base: a hybrid decoder-only language model for Traditional Chinese https://openformosa.com/blog/2026/06/21/barbet-1b-base/ 2026-06-21T00:00:00+08:00 2026-06-21T00:00:00+08:00

A 1B-parameter hybrid decoder-only causal language model — global and sliding-window attention interleaved with Mamba, context up to 1M, embedding tying, built on PangolinTokenizer.

PangolinTokenizer: a byte-level BPE tokenizer for Traditional Chinese and Taiwan https://openformosa.com/blog/2026/06/20/pangolin-tokenizer/ 2026-06-20T00:00:00+08:00 2026-06-20T00:00:00+08:00

A byte-level BPE tokenizer built for Taiwan — 114,688 merges, the lowest tokens/character on PangolinBench with the smallest vocabulary.