Utf-8 decode fails on chunk if character is split

https://github.com/textmate/python.tmbundle/blob/02dbf8b59c13419181efdaf41e885b501f186a3e/Support/bin/pycheckmate.py#L196

It happened that an unicode character appeared at position 4095 and therefore was split in two resulting in utf-8 decode fail.

```  File ".../pycheckmate.py", line 196, in poll
    bufs[fd] += os.read(fd, 4096).decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 4095: unexpected end of data
```

I'm not sure about this solution but it seems to work:
```
        if sys.version_info < (3, 0):
            for fd in fds:
                bufs[fd] += os.read(fd, 4096)
        else:
            for fd in fds:
                b = os.read(fd, 4096)
                for i in range(4):
                    try:
                        bufs[fd] += b.decode('UTF-8')
                        break
                    except UnicodeDecodeError:
                        if i < 4:
                            b += os.read(fd, 1)
                        else:
                            raise
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utf-8 decode fails on chunk if character is split #78

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Utf-8 decode fails on chunk if character is split #78

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions