Roadmap

From benchmarks to OpenFormosa-Base.

A staged roadmap for building a Taiwan-rooted base model first, then ASR, TTS, and OCR branches with reproducible evaluation and release evidence.

Benchmarks and release templates

Publish the evaluation workflow, model card, and training data sheet templates, with reproducibility and release-evidence expectations.

Tokenizer, benchmarks, base recipe

Prepare Taiwan multilingual tokenizer checks, training recipes, and evaluation sets for ASR correction, TTS normalization, OCR structured output, and base-model perplexity.

OpenFormosa-Base pretraining

Train the shared Taiwan-rooted base model from already cleared inputs, then publish run notes, checkpoints, and benchmark reports.

ASR / TTS / OCR task branches

Release small ASR, TTS, and OCR demos or adapters on top of OpenFormosa-Base with task-specific evaluation gates.

Safety-reviewed model demos

Launch generic Taiwan voice demos, OCR extraction tools, and ASR correction flows only after misuse, privacy, and benchmark checks.

Enterprise private deployment

Offer private fine-tuning and evaluation for partners without mixing customer data into the public base model.