Broaden your selection: Category/Works-with
- Anolis is an HTML document post-processor that takes an input HTML file, adds section numbers, a table of contents, and cross-references, and writes the output to another file.
- BHL is an Emacs mode that lets you convert plain TXT files into HTML, LaTeX, and SGML (Linuxdoc) files. The BHL mode handles common font-styles, three levels of sections, footnotes, and any kind of lists, tables, URLs and horizontal rules. It also handles a table of contents: you can browse the toc, insert the toc where you want, and update the sections' numbers with one keystroke.
- Beautiful Soup
- Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:
- 1. Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
- 2. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
- 3. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.
- Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text." Valuable data that was once locked up in poorly-designed websites is now within your reach. Projects that would have taken hours take only minutes with Beautiful Soup.
- 'Bib2html' converts the data in a BibTeX database to HTML files. Please note that there is another package by the name 'bib2html' (http://directory.fsf.org/bib2html.html) written by Kiri Wagstaff.
- 'bib2xhtml' is a program that converts BibTeX files into HTML (specifically, XHTML 1.0). The conversion is mostly done by specialized BibTeX style files, derived from a converted bibliography style template. This ensures that the original BibTeX styles are faithfully reproduced. Some post-processing is performed by Perl code. This is an update of the bib2html program written by David Hull in 1996 and maintained by him until 1998.
- Bluefish is a programmer's HTML editor written using GTK, designed to save the experienced webmaster some keystrokes. It features a multiple file editor, multiple toolbars, custom menus, image and thumbnail dialogs, open from the Web, CSS dialogs, PHP, SSI and RXML support, HTML validation, and lots of wizards. It is available in 11 languages.
- ClientTable is a Python module for generic HTML table parsing. It is most useful when used in conjunction with other parsers (htmllib or HTMLParser, regular expressions, etc.), to divide up the parsing work between your own code and ClientTable.
- 'cssed' is a CSS editor and validator with support for other web and programming languages, that can be extended through plugins. Although full-featured, it's meant to be small, consumes few resources, and can be run on a P100 with 32Mb of RAM.
- Docvert takes word processor files (typically .doc) and converts them to OpenDocument and clean HTML. The resulting OpenDocument is then optionally converted to HTML or any XML. This is done with XML Pipelines, an approach that supports XSLT, breaking up content over headings or sections, and saving those results to multiple files (e.g., chapter1.html, chapter2.htmlÃ¢â¬Â¦). The result is returned in a .zip file.