Gio Paik*, Yongbeom Kim, Soungmin Lee, Sangmin Ahn†, and Chanwoo Kim†, EACL Findings 2026
* Corresponding Author, † Equal Contribution
HiKE is the first Korean-English Code-Switching (CS) Automatic Speech Recognition (ASR) benchmark composed of high-quality, natural CS data across various topics. We use Mixed Error Rate (MER) and Point of Interest Error Rate (PIER) [1] to precisely evaluate the models' CS ASR capability.
Experimental results show that all multilingual ASR models exhibit significantly higher error rates on code-switching data, and that their CS-ASR capabilities can be improved through fine-tuning.
For further details, please refer to our paper.
[1] Ugan et al., “PIER: A Novel Metric for Evaluating What Matters in Code-Switching”, ICASSP 2025
To provide more fine-grained comparison of model performance on different forms of code-switching, we labeled each utterance according to the following levels:
- Word-level CS: Code-switching that occurs at the word level, typically as the substitution of a single noun or adjective.
- Phrase-level CS: Occurs when a multi-word phrase within a sentence appears in another language.
- Sentence-level CS: The alternation between languages on a sentence-by-sentence basis.
Loanwords are words adopted from a foreign language and adapted to the phonology and orthography of the new language. For example, the Korean loanword '버스' [bəs] and the English word 'bus' [bʌs] are pronounced almost identically and can be used interchangeably in a CS context. To avoid this problem, we meticulously labeled all loanwords contained in our dataset.
git clone --recurse-submodules https://github.com/ThetaOne-AI/HiKE
cd HiKE
pip install -r requirements.txt
apt-get update && apt-get install -y ffmpeg # install ffmpeg if neededbash scripts/evaluate_whisper.sh
# or
python src/main.py --model whisper --model_name openai/whisper-large --batch_size 8The results will be saved in ./outputs.
- Implement a class that follows the
BaseASRinterface insrc/models/your_model.py, and register it insrc/main.py.
Create src/models/your_model.py:
from typing import List, Dict, Any
from src.models import BaseASR
class YourModel(BaseASR):
def __init__(self, model_name: str = "your/model-or-config"):
self.model_name = model_name
# TODO: load your model or client here
def generate(self, input, batch_size: int | None = None, **kwargs) -> List[Dict[str, Any]]:
if not isinstance(input, list):
input = [input]
return [{"text": your_transcribe_fn(x)} for x in input]Register in src/main.py:
elif model == "your_model":
from models.your_model import YourModel
asr = YourModel(model_name)Run:
python src/main.py --model your_model --model_name your/model-or-name@inproceedings{paik2026hike,
title = "{H}i{KE}: Hierarchical Evaluation Framework for {K}orean-{E}nglish Code-Switching Speech Recognition",
author = "Paik, Gio and
Kim, Yongbeom and
Lee, Soungmin and
Ahn, Sangmin and
Kim, Chan Woo",
editor = "Demberg, Vera and
Inui, Kentaro and
Marquez, Llu{\'i}s",
booktitle = "Findings of the {A}ssociation for {C}omputational {L}inguistics: {EACL} 2026",
month = mar,
year = "2026",
address = "Rabat, Morocco",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.findings-eacl.33/",
doi = "10.18653/v1/2026.findings-eacl.33",
pages = "673--681",
ISBN = "979-8-89176-386-9"
}
