universal character encoding detector for Python2
Chardet takes a sequence of bytes in an unknown character encoding, and attempts to determine the encoding.
Supported encodings: * ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) * Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese) * EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese) * EUC-KR, ISO-2022-KR (Korean) * KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
- ISO-8859-2, windows-1250 (Hungarian) * ISO-8859-5,
windows-1251 (Bulgarian) * windows-1252 (English) * ISO-8859-7, windows-1253 (Greek) * ISO-8859-8, windows-1255 (Visual and Logical Hebrew) * TIS-620 (Thai)
This library is a port of the auto-detection code in Mozilla.
released on 8 June 2017
git clone https://github.com/chardet/chardet
Leaders and contributors
Resources and communication
This entry (in part or in whole) was last reviewed on 28 April 2020.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.
The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.