Semantic search

Jump to: navigation, search


Java2html
When given a source java file, program produces an HTML source with syntax highlighting.
Jaxml
Jaxml is a python module designed to automatically generate human readable documents.
Juriscraper
Juriscraper is a scraper library that gathers judicial opinions and oral arguments in the American court system. It is currently able to scrape:
  • a variety of pages and reports within the PACER system
  • opinions from all major appellate Federal courts
  • opinions from all state courts of last resort except for Georgia (typically their "Supreme Court")
  • oral arguments from all appellate federal courts that offer them
Kwaff
Kwaff is a pretty tool to convert Kwaff format document into XML document, and also convert XML into Kwaff. Kwaff format is a friendly format for human to read and write than XML. Kwaff format makes XML as easy as YAML to read and write.
Kwalify
Kwalify is a parser, schema validator, and data binding tool for YAML and JSON. YAML and JSON are simple and nice format for structured data and easier for human to read and write than XML. But there have been no schema for YAML such as RelaxNG or DTD. Kwalify gets over this situation. From version 0.7, Kwalify supports data binding. If you specify class name in schema file, Kwalify YAML parser creates instance objects of that class instead of Hash objects. It means that you don't have to convert Hash into proper object any more. Data binding makes YAML much easier to handle and manipulate.
MESICON
Free software to assist in cataloguing challenging items in museum libraries (for example).
Majix
'MajiX' transforms RTF files such as Microsoft Word documents into XML. It can convert headings, lists (numbered or not), simple tables, bold, italics, and underline.
Mandoc
Mandoc is a suite of tools compiling mdoc, the roff macro language of choice for BSD manual pages, and man, the predominant historical language for UNIX manuals. It is small, ISO C, ISC-licensed, and quite fast. The main component of the toolset is the mandoc utility program, based on the libmandoc validating compiler, to format output for UTF-8 and ASCII UNIX terminals, HTML 5, PostScript, and PDF.
Mod xslt
mod-xslt is an Apache module that converts XML files into HTML files on the fly using XSLT stylesheets. It was written to overcome most of the limits of similar modules and uses a standard API, which can be used for other applications or to support more servers. It can dynamically parse generated documents, both in POST and GET requests, includes a fully featured language to choose the stylesheet to load from both configuration files and from .xml files, and allows stylesheets to access server variables. It supports redirects, dynamically generated stylesheets, and Apache versions 1 and 2.
OneModel
Today: You can take notes with it. Rearrange them easily, up and down in a list, or up/down in the hierarchy. Link them to each other. Navigate across links with simple keypresses. Make deeply nested lists. Link lists to lists. Compose long paragraphs and attach them. Or do more complicated things if desired, by creating relationship types and using those. Import txt or export txt or html. It's better than the alternatives for some people, because the navigation takes fewer keystrokes, you don't have to read a manual (it's all on the screen, or so I like to think), you can have the same thing in as many places as you want, it is Free (some alternatives are, others are not), and it has immense future potential for becoming a better-structured, much more powerful and flexible wikipedia-like tool, if we work together. Vision: The idea is to have the most efficient personal knowledge organizer (now available in a usable text-based interface), then support mobile access, easy internal automation, and effective sharing and collaboration. Then, to combine efforts and learn as we go until we integrate humankind's knowledge over time. The key differentiators are that it is to be Free, and based on an object model (easily created on the fly as a side-effect of using the system), rather than on massive amounts of words. The knowledge is the same, even if the words can change. One can think of that as "using building blocks of knowledge, starting at an atomic level (i.e. numbers, relationships...), free and efficient." Or, taking the best experiences of online organizer tools and wikis, but more structured, efficient, Free, open, and collaborative; and allowing full individual or organizational control.
OpenDAM
OpenDAM is now a Service As A Software Substitute. The source code is not available from the creator, and (unofficial cloned) repositories on GitHub have not been updated since 2013 so the code there is orphaned. No code on SourceForge.
PDFResurrect
PDFResurrect is a tool aimed at analyzing PDF documents. The PDF format allows for previous document changes to be retained in a more recent version of the document, thereby creating a running history of changes for the document. This tool attempts to extract all previous versions while also producing a summary of changes between versions. It can also "scrub" or write data over the original instances of PDF objects that have been modified or deleted, in an effort to disguise information from previous versions that might not be intended for anyone else to read.
PDFedit
Complete editing of PDF documents is possible with PDFedit. You can change raw pdf objects (for advanced users) or use many gui functions. Functionality can be easily extended using a scripting language (ECMAScript).
PDFrecycle
pdfrecycle creates a PDF file by composing pages from other PDF files. It lets you define PDF bookmarks, scale and rotate pages, put multiple logical pages onto each physical sheet and add metadata. pdfrecycle uses a simple text file format to define the layout and what pages to include. From this input file pdfrecycle creates a LaTeX file and then runs pdflatex to produced the PDF file.
PDFtoPNG
PDFtoPNG is a small utility that can convert documents in the Portable Document Format (PDF) to Portable Network Graphics (PNG).
Pandoc
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read Markdown and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, and DocBook XML; and it can write plain text, Markdown, reStructuredText, XHTML, HTML 5, LaTeX (including beamer slide shows), ConTeXt, RTF, DocBook XML, OpenDocument XML, ODT, Word docx, GNU Texinfo, MediaWiki markup, EPUB (v2 or v3), FictionBook2, Textile, groff man pages, Emacs Org-Mode, AsciiDoc, and Slidy, Slideous, DZSlides, or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX is installed. Pandoc's enhanced version of markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, fenced code blocks, superscript, subscript, strikeout, title blocks, automatic tables of contents, embedded LaTeX math, citations, and markdown inside HTML block elements (these enhancements can be disabled). In contrast to most existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document, and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer. PDF output via PDFLaTeX requires the package texlive-latex-recommended, via XeLaTeX it additionally requires texlive-xetex, and via LuaTeX additionally texlive-luatex.
Poppler
A PDF rendering library.
Poppler Encoding Data
File encoding data for the Poppler PDF library.
RTFX
'rtfx' converts RTF files into a generic XML format. It focuses on keeping metadata such as style names rather than every bit of formatting. This makes it handy for converting RTF documents into a custom XML format (using XSL or an additional processing step). The package currently supports the following features: page breaks, section breaks, style names, lists (various types), tables, info block, bold, italic, underline, hidden text, strike out, and text color. This package was formerly known as 'rtfm', but was changed due to a naming conflict.
Refafit
Refafit reduces PDF files recompressing their images, contrasts and resamples scanned documents, extracts plain text, joins and splits files. Useful both in desktop and server environments. Graphical interface is as context menu option to rebuild selected file.
Rlib
RLIB is a reporting engine that makes it possible to easily create professional reports in PDF, HTML, text, and CSV from one simple XML report definition file. It supports direct input from MySQL, PostgreSQL, ODBC, and programmable pluggable inputs, and has PHP and Python language bindings built in.
Rusty
Rusty is a collection of extensions (directives and roles) for Sphinx documentation framework. While the extensions are somewhat compatible with docutils, the usage of Sphinx is currently required.
Scdoc
scdoc is a documentation tool created by Drew DeVault.
Slidedown
Generate HTML slides from Markdown.
Sphinx
Sphinx is a tool that makes it easy to create intelligent and beautiful documentation for Python projects (or other documents consisting of multiple reStructuredText sources), written by Georg Brandl. It was originally created to translate the new Python documentation, but has now been cleaned up in the hope that it will be useful to many other projects. Sphinx uses reStructuredText as its markup language, and many of its strengths come from the power and straightforwardness of reStructuredText and its parsing and translating suite, the Docutils.
Src-highlite Heckert gnu.tiny.png
Source-highlight reads source language specifications dynamically, thus it can be easily extended (without recompiling the sources) for handling new languages. It also reads output format specifications dynamically, and thus it can be easily extended (without recompiling the sources) for handling new output formats. The syntax for these specifications is quite easy (take a look at the manual). Source-highlight is a command line program, and it can also be used as a CGI. Notice that Source-highlight can also be used as a formatter (i.e., without highlighting): you can, for instance, format a txt file in HTML (and it will take care of translating special characters, such as, <, >, &). Since version 2.2, source-highlight can also generate cross references; in order to do this it relies on Exuberant Ctags. These are the output formats already supported:
  • HTML
  • XHTML
  • LATEX
  • TEXINFO
  • ANSI color escape sequences (you can use this feature with less)
  • DocBook
These are the input languages (or input formats) already supported (in alphabetical order):
  • Ada
  • Autoconf files
  • C/C++
  • C#
  • Bib
  • Bison
  • Caml
  • Changelog
  • Css
  • Diff
  • Flex
  • Fortran
  • GLSL
  • Haxe
  • Html
  • ini files
  • Java
  • Javascript
  • KDE desktop files
  • Latex
  • Ldap files
  • Logtalk
  • Log files
  • lsm files (Linux Software Map)
  • Lua
  • Makefile
  • M4
  • ML
  • Pascal
  • Perl
  • PHP
  • Postscript
  • Prolog
  • Properties files
  • Python
  • RPM Spec files
  • Ruby
  • Scala
  • Shell
  • S-Lang
  • Sql
  • Tcl
  • XML
  • XOrg conf files
TeXML
'TeXML' is an XML vocabulary for TeX. Its processor transforms TeXML markup into TeX markup, escaping special and out-of-encoding characters. It is intended for developers who automatically generate TeX files.
Txt2man
Txt2man converts flat ASCII text into the man page format. This allows man pages to be authored without knowledge of nroff macros. It is a shell script that uses GNU awk, and it should run on any Unix-like system.
Txt2tags
txt2tags is a format conversion tool written in Python that generates documents from a plain text file with little marks. Different from other conversion tools, it is generic, and not target-specific (as a txt2html tool). This way, you can keep just one source text file and one tool for all your formatting needs.
Unrtf Heckert gnu.tiny.png
UnRTF is a command-line program which converts documents in Rich Text (.rtf) format to HTML, LaTeX, PostScript, and other formats. Converting to HTML, it supports font attributes, tables, embedded images (stored to separate files), paragraph alignment, and more.
Vim2html
vim2html exports Vim-editables file into well-formed HTML, simulating a Vim session. It fully supports Vim colorization (customizable) and authentic Vim syntax highlighting.It fully supports valid CSS and XHTML-1.0/Transitional or Strict with HTMLtidy.
Wkhtmltopdf
wkhtmltodpf is a utility for converting local or remote web pages to image or PDF format. This can be done without access to a desktop environment, for example from the command line on a remote server. A C library implementation (libwkthmltox) is also available.
Yodl
Yodl is a package that implements a pre-document language and tools to process it. The idea of Yodl is that you write up a document in a pre-language, then use the tools (e.g. yodl2html) to convert it to some final document language. Current converters are for HTML, man, LaTeX SGML and texinfo, a poor- man's text converter and an experimental xml converter. Main document types are "article", "report", "book", "manpage" and "letter". The Yodl document language was designed to be easy to use and extensible.
Z2tml
'z2html' translates files written in the free Z Document Language to compliant XHTML 1.1. It is both SGML and XML-compliant.
Zotero-better-bibtex
Better BibTeX (BBT) is an extension for Zotero and Juris-M that makes it easier to manage bibliographic data, especially for people authoring documents using text-based toolchains (e.g. based on LaTeX / Markdown).


Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.