Extracts metadata information from files.
'libextractor' extracts meta-data from files of arbitrary type. It uses helper-libraries to perform the actual extraction, and is trivially extendable by linking against external extractors for additional file types. Its goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. 'libextractor' includes the command "extract" that can extract meta-data from a file and print the results to stdout. Currently, it supports the formats HTML, PDF, PS, OLE2 (doc, xls, ppt), StarOffice, OpenOffice, MAN, DVI, MP3 (ID3v1, ID3v2), OGG, WAV, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, Real, QT, MPEG, RIFF (AVI), ASF, and ELF. It also detects various MIME types, and can compute hash functions (SHA-1, MD5, ripemd160). A Java binding (JNI) is available.
17 May 2018
Leaders and contributors
Resources and communication
|Debian (Ref) (R)||https://tracker.debian.org/pkg/libextractor-python|
|VCS Repository Webview||https://gnunet.org/git/libextractor.git/|
|Debian (Ref) (R)||https://tracker.debian.org/pkg/libextractor-java|
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.
The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.