Free Software Foundation!

Join now

Semantic search

This entry published by the Free Software Foundation.

[Edit query]| Show embed code


Previous     Results 1– 20    Next        (20 | 50 | 100 | 250 | 500)


ASPSeek ASPSeek is an Internet search engine. It consists of an indexing robot, a search daemon, and a CGI search frontend. It Supports Webspaces, which means that the user can combine and perfrom searches within several Web sites simultaneously, instead of browsing each site individually. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRanks are used) or date. ASPSeek is optimized for multiple sites (threaded index, async DNS lookups, grouping results by site), but can be used for searching one site as well. Other features include stopwords and ispell support, a charset and language guesser, HTML templates for search results, excerpts, and query words highlighting.

Conflux Conflux is a data collection and management suite. Conflux can:

  • help you perform searches which best meet your result needs, across a number of search services
  • help you retain a list of search results for later use
  • help share research material among a team or workgroup
  • help you organize useful searches and search results into groups (bundles)
  • help you maintain a list of the resources you use most, giving you access to this list from a central location
  • Conflux allows you to submit searches to be performed, either immediately or later, by the Conflux collector. Once the search has been performed, you will have ready access to the URLs of the results provided by the search engine. Using Conflux, you can manage your past searches, better keeping track of materials you have found useful, aiding you in submitting follow-up searches. Both searches and search results may be bundled together, permitting you to best organize your information how you see fit. Best of all, you may enable your entire team to use Conflux, sharing the benefits of your research. The Conflux suite currently consists of a PHP3 frontend, a MySQL database schema, and a perl 'collector' (which may be run as a daemon, from cron or standalone).

DataparkSearch DataparkSearch is an Internet and Intranet search engine tool. Key features:

  • Support for http, https, ftp, nntp and news URL schemes.
  • htdb virtual URL scheme support for indexing SQL databases.
  • text/html, text/xml, text/plain, audio/mpeg (MP3) and image/gif mime types built-in support.
  • External parsers support for other document types.
  • Ability to index multilangual sites using content negotiagion.
  • Searching all of the word forms using ispell affixes and dictionaries.
  • Stopwords and synonyms lists.
  • Boolean query language support.
  • Results sorting by relevancy, popularity rank, last modified time and by importance (a multiplication of relevancy and popularity rank).
  • Various character sets support.
  • Accent insensitive search.
  • Phrases segmenting for Chinese, Japanese, Korean, and Thai.
  • mod_dpsearch - search module for Apache web server.
  • Internationalized Domain Names support

Doodle Doodle quickly searches the documents on a computer, then builds an index using meta-data contained in the documents and allows fast searches on the resulting database. It supports approximate searches and full-text indexing, and comes with a library for accessing the doodle database and making it easy to integrate doodle's functionality into other applications or user interfaces. Users can keep the doodle database always up-to-date by updating the database on-the-fly by using 'doodled' and 'fam' whenever files on the system change. They can also build one doodle database for all users on a multi-user system without compromising user privacy.

GIFT Heckert gnu.small.png The GNU Image Finding Tool is a Content Based Image Retrieval System (CBIRS). You can do Query By Example on images, giving you the opportunity to improve query results by relevance feedback. The program relies entirely on the content of the images to process queries, so you needn't annotate images before querying the collection. It comes with a tool which lets you index whole directory trees containing images in one go. You then can use the GIFT server and its clients to browse your own image collections.

Googleware Googleware automatically queries Google. User are notified by mail each time a new entry is found, and can browse query results with any regular Web browser.

Ht: Dig The ht://Dig system is a complete World Wide Web indexing and searching system for a domain or intranet. The system is not meant to replace Internet-wide search engines such as Alta Vista, but instead to cover the search needs for a single company, campus, or even a particular subsection of a large Web site. ht://Dig can easily span several Web servers; the type of server doesn't matter as long as it covers common protocols like HTTP. Many different types of searches can be set up using a common database. Additional features include support for robot exclusion, Boolean expression and fuzzy configurable search results, ability to search both text and HTML files, searches on subsections of the database, the ability to index a protected server, limit the depth of the search, and add keywords to HTML documents.

Hyper Estraier Hyper Estraier is a full-text search system. You can search lots of documents for some documents including specified words. If you run a web site, it is useful as your own search engine for pages in your site. Also, it is useful as search utilities of mail boxes and file servers. The characteristic of Hyper Estraier is the following.

  • High performance of search
  • High scalability of target documents
  • Perfect recall ratio by N-gram method
  • High precision by hybrid mechanism of N-gram and morphological analyzer
  • Phrase search, regular expressions, attribute search, and similarity search
  • Multilingualism with Unicode
  • Independent of file format and repository
  • Intelligent web crawler
  • Simple and powerful API
  • Supporting P2P architecture

Mapnik Mapnik is a toolkit for developing GIS applications. At the core is a C++ shared library providing algorithms/patterns for spatial data access and visualization. Essentially a collection of geographic objects (map, layer, datasource, feature, and geometry), the library doesn't rely on "windowing systems" and can be deployed in any server environment. It is intended to play fair in a multi-threaded environment and is aimed primarily, but not exclusively, at Web-based development. High-level Python bindings (boost.python) facilitate rapid application development, targeting zope3, django, etc.

MnoGoSearch Search engine for intranet and internet servers, from searching within your site to a specialized search such as cooking recipes or newspaper search, ftp archive search, news articles search, etc. It has full-text indexing and searching for HTML, PDF, and text documents. mnoGoSearch consists of two parts. The first is an indexing mechanism (indexer). The indexer walks through HTTP, FTP, NEWS servers or local files, recursively grabbing all the documents and storing meta-data about that document in a SQL database. After every document is referenced by its corresponding URL, the meta-data collected by the indexer is used later in a search process. The search is performed via Web interface. The distribution includes C, CGI, PHP and Perl search front ends.

OpenFTS OpenFTS (Open Source Full Text Search engine) is an advanced PostgreSQL-based search engine that provides online indexing of data and relevance ranking for database searching. Close integration with database allows use of metadata to restrict search results.

Pagecast Pagecast makes it easy to submit lists of URLs. It also has more advanced features such as the ability to check the URL's for problematic conditions. It is designed to be simple to use and effective at what it does. Pagecast runs from either the command line or as a mail-robot. It was developed and tested on a GNU/Linux system, and should run on any Unix-like system and possibly Windows, Macintosh, or any other system Python supports. Running as a mail-robot means that anyone who knows the right Subject: line can email an account on the system where Pagecast is set up, putting the URL's you want to submit in the body of your email. Pagecast will do its magic and then send a reply to you telling you what happened. All of these features are also availible from the command-line.

PhpDig PhpDig is a search engine written in PHP that uses a MySQL database backend. It indexes both static and dynamic pages, spiders almost all links in HTML content, hrefs, areamaps, and frames, and supports full text indexing. The search results appearence is skin-able using a very simple templates system.

PhpSera 'phpSERA' is a PHP/MySQL-based tool for Search Engine Ranking Analysis (SERA). The rankings are based on parsing output of search engines, using simple regular expressions. There is a list of supported search engines on the package's home page.

Pinot Pinot is a metasearch tool for the Free Desktop built around the Xapian Information Retrieval library, the language guessing functionality of libtextcat and the GTKmm toolkit. It enables one to query sources, display as well as analyze and locally index the returned results. Supported sources are search plugins, either Open Search Description XML or Sherlock files as used by FireFox, the Google SOAP API and Xapian indexes (local or remotely served by xapian-tcpsrv). Supported document types include plain text, HTML, PDF, RTF, MS Word, XML, OpenDocument/StarOffice, mbox, MP3 and Ogg Vorbis. It is expected that more formats will be supported through plugins as the project matures. The main goal is to make all these search engines easily available to the end-user. The second goal is to harness Xapian (and maybe other Information Retrieval toolkits in the future) to index the user's personal documents. Pinot is moving towards what Beagle and Kat do, while still retaining a focus on metasearch. All code is covered by the GNU GPL. The author is Fabrice Colin

SPINdex 'SPINdex' is a site searching suite. It currently includes a live (real-time) search engine, with plans on adding enterprise-level indexing and database searching very soon. SPINdex traverses the directories specified as "search directories", and recursively runs through each of the subdirectories (except for those flagged as excluded) and checks each file with a suitable extension for the string specified with a case-insensitive regular expression. If the expression matches, the name (specified by the text between the tags) and URL are recorded.

Search4files This program acts as a frontend for different file search engines. The interface is intentionally lightweight and simple. But it takes configuration options from the command line.Currently find, (s)locate, tracker and beagle are supported as backends.

SearchMonkey searchMonkey provides powerful text searches on GNU/Linux using regular expressions for both the file name and the search text. It is the graphical equivalent of find + grep.

Snatcher Snatcher is a simple full-text search engine for Japanese or English text. It features full-text retrieval of a Web site by the use of only one command. Snatcher can be extended by using preprocessor programs. It can deal not only with plain text, but also with HTML, XML, man, PDF and so on. Snatcher features not only keyword search with a boolean information retrieval model, but also relational document search with a vector space model. Snatcher was designed for Japanese, and documents and log messages are only in Japanese. But it can process both English and Japanese documents.

Teardrop Teardrop provides a way to query multiple search engines at the same time, and explore their results as a single source. It's available both in a command line and a graphical version.

Previous     Results 1– 20    Next        (20 | 50 | 100 | 250 | 500)

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software described in this text has its own copyright notice and license, which can usually be found in the distribution itself.


The FSF is a charity with a worldwide mission to advance software freedom — learn about our history and work.

Copyright © 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.

Licensed under the GNU Free Documentation License, version 1.3 or later.

The FSF also has sister organizations in France, Latin America, Europe and India.

Powered by MediaWiki and Semantic MediaWiki

Toolbox