<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>OpenFormosa</title>
  <subtitle>台灣取向模型訓練與評測工作站，先做 OpenFormosa-Base，再接 ASR、TTS、OCR 和台灣在地 benchmark。</subtitle>
  <link href="https://openformosa.com/feed.xml" rel="self" type="application/atom+xml"/>
  <link href="https://openformosa.com/" rel="alternate" type="text/html"/>
  <id>https://openformosa.com/</id>
  <updated>2026-06-23T00:00:00+08:00</updated>
  
  <entry>
    <title>BlueMagpie-TTS: Taiwanese-accent, Chinese–English code-switching speech synthesis</title>
    <link href="https://openformosa.com/blog/2026/06/23/bluemagpie-tts/" rel="alternate" type="text/html"/>
    <id>https://openformosa.com/blog/2026/06/23/bluemagpie-tts/</id>
    <published>2026-06-23T00:00:00+08:00</published>
    <updated>2026-06-23T00:00:00+08:00</updated>
    <summary>An open Taiwanese-accent text-to-speech model that handles Chinese–English code-switching — keep VoxCPM&apos;s acoustic stack, swap in the Barbet language model, and cut character error rate by about 58% on a hard test set.</summary>
  </entry>
  
  <entry>
    <title>Barbet 1B Base: a hybrid decoder-only language model for Traditional Chinese</title>
    <link href="https://openformosa.com/blog/2026/06/21/barbet-1b-base/" rel="alternate" type="text/html"/>
    <id>https://openformosa.com/blog/2026/06/21/barbet-1b-base/</id>
    <published>2026-06-21T00:00:00+08:00</published>
    <updated>2026-06-21T00:00:00+08:00</updated>
    <summary>A 1B-parameter hybrid decoder-only causal language model — global and sliding-window attention interleaved with Mamba, context up to 1M, embedding tying, built on PangolinTokenizer.</summary>
  </entry>
  
  <entry>
    <title>PangolinTokenizer: a byte-level BPE tokenizer for Traditional Chinese and Taiwan</title>
    <link href="https://openformosa.com/blog/2026/06/20/pangolin-tokenizer/" rel="alternate" type="text/html"/>
    <id>https://openformosa.com/blog/2026/06/20/pangolin-tokenizer/</id>
    <published>2026-06-20T00:00:00+08:00</published>
    <updated>2026-06-20T00:00:00+08:00</updated>
    <summary>A byte-level BPE tokenizer built for Taiwan — 114,688 merges, the lowest tokens/character on PangolinBench with the smallest vocabulary.</summary>
  </entry>
  
</feed>
