For at least the following languages:
- Chinese (Simplified & Traditional)
- Japanese
- Khmer
- Lao
- Myanmar
- Thai
- Vietnamese
From MDN's documentation for Intl.Segmenter:
const text = '吾輩は猫である。名前はたぬき。';
const japaneseSegmenter = new Intl.Segmenter('ja-JP', { granularity: 'word' });
console.log([...japaneseSegmenter.segment(text)].filter((s) => s.segment));
//-> ['吾輩', 'は', '猫', 'で', 'ある', '。', '名前', 'は', 'たぬき', '。']
Compared with how splitting currently uses String::split:
const text = '吾輩は猫である。名前はたぬき。';
console.log(text.split(/(\s+)/).filter(t => !!t));
//-> ['吾', '輩', 'は', '猫', 'で', 'あ', 'る', '。', '名', '前', 'は', 'た', 'ぬ', 'き', '。']
For at least the following languages:
From MDN's documentation for
Intl.Segmenter:Compared with how splitting currently uses
String::split: