Boilerpipe

From Free Software Directory
Revision as of 11:59, 16 April 2018 by Bendikker (talk | contribs)

(diff) ← Older revision | Approved revision (diff) | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


[edit]

Boilerpipe

http://code.google.com/p/boilerpipe
Boilerplate removal and fulltext extraction from HTML pages

The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.





Licensing

License

Verified by

Verified on

Notes

License

Other

Verified by

Debian: Emmanuel Bourg <ebourg@apache.org>

Verified on

20 June 2013

Notes

License: apache-2.0




Leaders and contributors

Contact(s)Role
Christian Kohlschütter contact


Resources and communication

AudienceResource typeURI
Python (Ref)https://pypi.org/project/boilerpipe
Downloadhttp://code.google.com/p/boilerpipe/
Ruby (Ref)https://rubygems.org/gems/boilerpipe
Debian (Ref)https://tracker.debian.org/pkg/boilerpipe


Software prerequisites




Entry











"contact" is not in the list (Maintainer, Contributor, Developer, Sponsor, Unknown) of allowed values for the "Role" property.


"Python (Ref)" is not in the list (General, Help, Bug Tracking, Support, Developer) of allowed values for the "Resource audience" property.


"Ruby (Ref)" is not in the list (General, Help, Bug Tracking, Support, Developer) of allowed values for the "Resource audience" property.


"Debian (Ref)" is not in the list (General, Help, Bug Tracking, Support, Developer) of allowed values for the "Resource audience" property.








Date 2015-07-17
Source Debian
Source link http://packages.debian.org/sid/boilerpipe

[[Category:]]



Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.