Semantic search
This entry published by the Free Software Foundation.
Enca 'Enca' (Extremely Naive Charset Analyser) detects the encoding of text files, based on knowledge of their language. It can also convert them to other encodings, letting you to recode files without knowing their current encoding. It supports most of Central and East European languages, and a few Unicode variants, independently of language.
Fribidi
FriBidi is a free implementation of the Unicode Bidirectional (BiDi) Algorithm. It also provides utility functions to aid in the development of interactive editors and widgets that implement BiDi functionality. The BiDi algorithm is a prerequesite for supporting right-to-left scripts such as Hebrew, Arabic, Syriac, and Thaana.
HTMLatex htmlatex does on-the-fly rendering of LaTeX source to HTML documents. htmlatex is a mod_python application that uses memcached to reduce the massive overhead of repeatedly rendering the same equation. It has an option to sanitize the LaTeX source, removing any potentially dangerous code. It is fairly generous about the HTML and LaTeX it accepts. LaTeX source code is typed directly into an HTML file. The file is left unchanged, and the output is filtered, replacing raw source with images.
Html2ps html2pdf Convert nearly any URL or HTML document to PostScript or PDF using this PHP system. PDF converter may use Ghostscript, FPdf, or PDFLib; supports all common PDF versions. Over 200 CSS and HTML properties are supported, including floating elements (DHTML). Advanced API and complete documentation included. Freely distributed with 100% of source code on http://www.sourceforget.net and http://www.tufat.com
Htmlrecode 'htmlrecode' applies modifications to a HTML file. For example, you can completely change the character set you are using without making any of the characters unreadable.
JGloss JGloss is an application for adding reading and translation annotations to words in a Japanese text document. This can be done automatically and manually. When a text document is first opened, kanji words will be looked up in a dictionary and the first reading and translation (if any) used to annotate the word. The user can then edit the annotations: choose among the readings and translations found in the dictionaries, enter your own readings and translations, remove annotations, and add new annotations. The document can be exported as plain text with annotations, HTML, or LaTeX.
Leet-Generator Leet-Generator converts plaintext to leettext.
Lengualibre Lengualibre is a project to write a free online Spanish language dictionary (Spanish words and definitions, *not* a Spanish-English bilingual dictionary). It currently consists of a project manifesto, which outlines the philosphy and goals of the project. The Web site is entirely in Spanish; all documentation, which must be licensed under the GNU Free Documentation License, will also be in Spanish. There are currently no entries in the dictionary. The project will also include GPL'd programs that will interact with the entries in the dictionary. Currently under development is Fichalibre, a program designed to let people enter lexographical units. This project was a GNU package. It has since been decommissioned and is no longer developed.
Libtranslate 'libtranslate' is a library for translating text and Web pages between natural languages. Its modular infrastructure lets users implement new translation services separately from the core library. It is shipped with a generic module that supports Web-based translation services (ie, BabelFish) and lets new services be added simply by adding a few lines to an XML file. The distribution also includes a command line interface.
Libuninum libuninum is a library for converting Unicode strings to integers and integers to Unicode strings. Internal computation is done using arbitrary precision arithmetic, so there is no limit on the size of the integer that can be converted. Values are passed and returned as ASCII decimal strings, GNU MP mpz_t objects, or unsigned long integers. Auto-detection of the number system is provided. Group delimitation for output strings is fully controllable. Virtually all known number systems are supported.
Otl otl is intended to convert a text file to a HTML or XHTML file. It is different than many other text-to-HTML programs in that the input format (by default a simple highly readable plain text format) can be customized by the user, and the output format (by default XHTML) can be user-defined. It can process complex structures such as ordered and unordered lists (nested or not), and add custom "headers" and "footers" to documents. The conversion utilizes Perl regex, adding quite a bit of flexibility and power to the conversion process. Since both the syntax of the source file and of the output can be readily customized, otl in theory can be used for many types of conversions. The package also includes tag-remove, a script for stripping HTML/XHTML-ish tags from documents.
Pango The Pango project intends to provide a framework with which to lay out and render internationalized text. It uses Unicode for all of its encoding, and will eventually support output in all the world's major languages. Since Pango is an offshoot of the GTK+ and GNOME projects, the initial focus is operation in those environments. However, there is nothing fundamentally GTK+ or GNOME specific about Pango. Project goals include modularity for a faster development process, font system and toolkit independence, and high quality rendering of a large set of languages.
QaMoose QaMoose is an English-Arabic user-defined dictionary intended for use by translators and technical writers to establish and retain consistency in the terms used. Features include the ability to suggest new terms for approval and to search an approved database of terms. The term form includes English and Arabic spelled with Latin characters. All code is UTF-8 friendly. There is also an 'admin' page where you can apply for a term inspector/approver position.
Recode The program recognizes or produces approx. 150 character sets and can convert almost any character set to almost any other. When exact translations are not possible, the program may get rid of offending characters or use approximations. Particular attention has been paid to the proper representation of French language diacritics.
Rxvt-unicode rxvt-unicode' is a clone of the well known terminal emulator, rxvt, modified to store text in Unicode, and use locale-correct input and output. It also supports using multiple fonts at the same time, including xft fonts.
Turma Turma (Text Utils with Recursive Mambojambo Actions) is a search, and replace (optionally) utility, which operates on multiple files following a given pattern, with the possibility to recurse into subdirectories. It can handle more than words or lines of text, but blocks (paragraphs) of text.
WorldPrint WorldPrint is a filter for Mozilla (Galeon, etc.), Htmldoc, and Netscape PostScript output that uses TrueType fonts to allow the printing of pages written in Unicode, Big5, SJIS, KOI-8, the ISO-8859* charsets, and others.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.
The copyright and license notices on this page only apply to the text on this page. Any software described in this text has its own copyright notice and license, which can usually be found in the distribution itself.
