Support for script-continuous languages

For at least the following languages: 
- Chinese (Simplified & Traditional)
- Japanese
- Khmer
- Lao
- Myanmar
- Thai
- Vietnamese

From [MDN's documentation for `Intl.Segmenter`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/Segmenter):

```js
const text = '吾輩は猫である。名前はたぬき。';
const japaneseSegmenter = new Intl.Segmenter('ja-JP', { granularity: 'word' });
console.log([...japaneseSegmenter.segment(text)].filter((s) => s.segment));
//-> ['吾輩', 'は', '猫', 'で', 'ある', '。', '名前', 'は', 'たぬき', '。']
```

Compared with how splitting currently uses `String::split`:

```js
const text = '吾輩は猫である。名前はたぬき。';
console.log(text.split(/(\s+)/).filter(t => !!t));
//-> ['吾', '輩', 'は', '猫', 'で', 'あ', 'る', '。', '名', '前', 'は', 'た', 'ぬ', 'き', '。']
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for script-continuous languages #112

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support for script-continuous languages #112

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions