Broaden your selection: Category/Web-authoring
- BHL is an Emacs mode that lets you convert plain TXT files into HTML, LaTeX, and SGML (Linuxdoc) files. The BHL mode handles common font-styles, three levels of sections, footnotes, and any kind of lists, tables, URLs and horizontal rules. It also handles a table of contents: you can browse the toc, insert the toc where you want, and update the sections' numbers with one keystroke.
- Beautiful Soup
- Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:
- 1. Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
- 2. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
- 3. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.
- Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text." Valuable data that was once locked up in poorly-designed websites is now within your reach. Projects that would have taken hours take only minutes with Beautiful Soup.
- 'Bib2html' converts the data in a BibTeX database to HTML files. Please note that there is another package by the name 'bib2html' (http://directory.fsf.org/bib2html.html) written by Kiri Wagstaff.
- Docvert takes word processor files (typically .doc) and converts them to OpenDocument and clean HTML. The resulting OpenDocument is then optionally converted to HTML or any XML. This is done with XML Pipelines, an approach that supports XSLT, breaking up content over headings or sections, and saving those results to multiple files (e.g., chapter1.html, chapter2.htmlÃ¢â¬Â¦). The result is returned in a .zip file.
- Genshi is a Python library that provides an integrated set of components for parsing, generating, and processing HTML, XML or other textual content for output generation on the web. The main feature is a template language that is smart about markup: unlike conventional template languages that only deal with bytes and (if you're lucky) characters, Genshi knows the difference between tags, attributes, and actual text nodes, and uses that knowledge to your advantage.
- Grutatxt is a plain text to HTML converter. It successfully converts subtle text markup to lists, bold, italics, tables, and headings to their corresponding HTML tags without having to write unreadable source text files.
- HTML Code Convert
- HTML Code Convert helps speed up the conversion of HTML code into different format including Java Script, JavaServer Pages, PHP, Perl, and the UNIX Shell. It is particularly useful in CGI scripting.
- HTML Merge
- HTML::Merge is an embedded HTML/Perl/SQL tool used to create dynamic Web content. It uses TAG-based embedded Perl and SQL integration in templates that are used to automatically generate Perl code, which is run in the deployment mode.
- Converts files from html to xml:fo formats. The HTML code can be written with StarOffice or other WYSIWYM editors and need not be 100% valid; you will get some sort of output even with badly formatted code. The program supports tables and internal and external links.
- HTML_ToPDF takes the hassle out of generating a PDF file from a Web page. It will convert any HTML document into a format that will look the same on any platform and printer. It includes support for converting images, using the stylesheets to customize the look of the PDF file, and error handling.