How to read data from a big txt file in dart

Issue

When i read data from a big txt file block by block ,I got the error as blow:

Unfinished UTF-8 octet sequence (at offset 4096)
code:

File file = File(path!);
RandomAccessFile _raf = await file.open();
_raf.setPositionSync(skip ?? 0);
var data = _raf.readSync(block);// block = 64*64 
content.value = utf8.decode(data.toList());

Solution

UTF*8 is variable length encoding.
The error come from data not align to UTF8 boundary
Alternative way is to trim data byte on left and right before call utf.decode
This will lost first and last character. You may read and add more bytes to cover last character and align with utf8 boundary

bool isDataByte(int i) {
  return i & 0xc0 == 0x80;
}

Future<void> main(List<String> arguments) async {
  var _raf = await File('utf8.txt').open();
    _raf.setPositionSync(skip);
    var data = _raf.readSync(8 * 8);

    var utfData = data.toList();
    int l, r;
    for (l = 0; isDataByte(utfData[l]) && l < utfData.length; l++) {}

    for (r = utfData.length - 1; isDataByte(utfData[r]) && r > l; r--) {}
    var value = utf8.decode(utfData.sublist(l, r));
    print(value);
}

Optional read more 4 bytes and expand to cover last character


bool isDataByte(int i) {
  return i & 0xc0 == 0x80;
}

Future<void> main(List<String> arguments) async {
  var _raf = await File('utf8.txt').open();
    _raf.setPositionSync(skip);
    var block = 8 * 8;
    var data = _raf.readSync(block + 4);

    var utfData = data.toList();
    int l, r;
    for (l = 0; isDataByte(utfData[l]) && l < block; l++) {}

    for (r = block; isDataByte(utfData[r]) && r < block + 4; r++) {}

    var value = utf8.decode(utfData.sublist(l, r));
    print(value);
}

Answered By – Chart Chuo

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published