Difference between revisions of "Boilerpipe"
(Debian import) |
(Added Python and Ruby link) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 14: | Line 14: | ||
is usually quite accurate. | is usually quite accurate. | ||
|Homepage URL=http://code.google.com/p/boilerpipe | |Homepage URL=http://code.google.com/p/boilerpipe | ||
− | | | + | |Is High Priority Project=No |
− | + | |Decommissioned/Obsolete=No | |
− | | | + | |Accepts cryptocurrency donations=No |
− | | | ||
− | |||
− | |||
|Version identifier=1.2.0-1 | |Version identifier=1.2.0-1 | ||
− | |Version download=http://ftp.debian.org/debian/pool/main/b/boilerpipe/boilerpipe_1.2.0.orig.tar.gz | + | |Version download=http://ftp.debian.org/debian/pool/main/b/boilerpipe/boilerpipe_1.2.0.orig.tar.gz |
− | | | + | |Test entry=No |
+ | |Last review by=Bendikker | ||
+ | |Last review date=2018/04/16 | ||
|Submitted date=2015-07-17 | |Submitted date=2015-07-17 | ||
− | + | |Is GNU=No | |
− | |||
− | | | ||
− | |||
− | |||
− | |||
}} | }} | ||
{{Project license | {{Project license | ||
Line 49: | Line 43: | ||
|Role=contact | |Role=contact | ||
|Email=christian@kohlschutter.com | |Email=christian@kohlschutter.com | ||
+ | }} | ||
+ | {{Resource | ||
+ | |Resource audience=Python (Ref) | ||
+ | |Resource URL=https://pypi.org/project/boilerpipe | ||
+ | }} | ||
+ | {{Resource | ||
+ | |Resource audience=Ruby (Ref) | ||
+ | |Resource URL=https://rubygems.org/gems/boilerpipe | ||
+ | }} | ||
+ | {{Resource | ||
+ | |Resource audience=Debian (Ref) | ||
+ | |Resource URL=https://tracker.debian.org/pkg/boilerpipe | ||
}} | }} | ||
{{Resource | {{Resource | ||
|Resource kind=Download | |Resource kind=Download | ||
|Resource URL=http://code.google.com/p/boilerpipe/ | |Resource URL=http://code.google.com/p/boilerpipe/ | ||
+ | }} | ||
+ | {{Software category}} | ||
+ | {{Featured}} | ||
+ | {{Import | ||
+ | |Date=2015-07-17 | ||
+ | |Source=Debian | ||
+ | |Source link=http://packages.debian.org/sid/boilerpipe | ||
}} | }} |
Latest revision as of 11:59, 16 April 2018
Boilerpipe
http://code.google.com/p/boilerpipe
Boilerplate removal and fulltext extraction from HTML pages
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Download
http://ftp.debian.org/debian/pool/main/b/boilerpipe/boilerpipe_1.2.0.orig.tar.gz
Categories
Licensing
License
Verified by
Verified on
Notes
License
Verified by
Debian: Emmanuel Bourg <ebourg@apache.org>
Verified on
20 June 2013
Notes
License: apache-2.0
Leaders and contributors
Contact(s) | Role |
---|---|
Christian Kohlschütter | contact |
Resources and communication
Software prerequisites
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.
The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.