Roadmap
From benchmarks to OpenFormosa-Base.
A staged roadmap for building a Taiwan-rooted base model first, then ASR, TTS, and OCR branches with reproducible evaluation and release evidence.
Benchmarks and release templates
Publish the evaluation workflow, model card, and training data sheet templates, with reproducibility and release-evidence expectations.
Tokenizer, benchmarks, base recipe
Prepare Taiwan multilingual tokenizer checks, training recipes, and evaluation sets for ASR correction, TTS normalization, OCR structured output, and base-model perplexity.
OpenFormosa-Base pretraining
Train the shared Taiwan-rooted base model from already cleared inputs, then publish run notes, checkpoints, and benchmark reports.
ASR / TTS / OCR task branches
Release small ASR, TTS, and OCR demos or adapters on top of OpenFormosa-Base with task-specific evaluation gates.
Safety-reviewed model demos
Launch generic Taiwan voice demos, OCR extraction tools, and ASR correction flows only after misuse, privacy, and benchmark checks.
Enterprise private deployment
Offer private fine-tuning and evaluation for partners without mixing customer data into the public base model.