Arch Linux

Please read this before reporting a bug:

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!

FS#54828 - [libxml2] Unable to open some .DOCX files

Attached to Project: Arch Linux
Opened by Natrio (natrio) - Saturday, 15 July 2017, 18:28 GMT
Last edited by Jan de Groot (JGC) - Friday, 01 September 2017, 13:50 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Jan de Groot (JGC)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No


After libxml2 update from 2.9.4+16+g07418011-2 to 2.9.4+96+gfb56f80e-1 the LibreOffice can't open some big (like books) .DOCX files, with errors like this:

SAXParseException: '[word/document.xml line 2]: Input is not proper UTF-8, indicate encoding !
Bytes: 0xD0 EOF
', Stream 'word/document.xml', Line 2, Column 121509

Downgrading to libxml2-2.9.4+16+g07418011-2 solves the problem.

Example of big book .docx file attached.
This task depends upon

Closed by  Jan de Groot (JGC)
Friday, 01 September 2017, 13:50 GMT
Reason for closing:  Fixed
Additional comments about closing:  2.9.5rc2
Comment by Natrio (natrio) - Saturday, 15 July 2017, 18:29 GMT Comment by Natrio (natrio) - Sunday, 20 August 2017, 13:12 GMT
Not fixed. After upgrade to 2.9.4+99+g27f310d4-1 same files still fails to open:
File format error found at
SAXParseException: '[word/document.xml line 2]: PCDATA invalid Char value 0
', Stream 'word/document.xml', Line 2, Column 121509(row,col).