Free Software Foundation!

Join now

Category/Web-authoring/search-engine

From Free Software Directory
 
Jump to: navigation,

Broaden your selection: Category/Web-authoring

Category/Web-authoring Search icon.png

search-engine (27)



ASPSeek
ASPSeek is an Internet search engine. It consists of an indexing robot, a search daemon, and a CGI search frontend. It Supports Webspaces, which means that the user can combine and perfrom searches within several Web sites simultaneously, instead of browsing each site individually. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRanks are used) or date. ASPSeek is optimized for multiple sites (threaded index, async DNS lookups, grouping results by site), but can be used for searching one site as well. Other features include stopwords and ispell support, a charset and language guesser, HTML templates for search results, excerpts, and query words highlighting.

Conflux
Conflux is a data collection and management suite. Conflux can:

  • help you perform searches which best meet your result needs, across a number of search services
  • help you retain a list of search results for later use
  • help share research material among a team or workgroup
  • help you organize useful searches and search results into groups (bundles)
  • help you maintain a list of the resources you use most, giving you access to this list from a central location
  • Conflux allows you to submit searches to be performed, either immediately or later, by the Conflux collector. Once the search has been performed, you will have ready access to the URLs of the results provided by the search engine. Using Conflux, you can manage your past searches, better keeping track of materials you have found useful, aiding you in submitting follow-up searches. Both searches and search results may be bundled together, permitting you to best organize your information how you see fit. Best of all, you may enable your entire team to use Conflux, sharing the benefits of your research. The Conflux suite currently consists of a PHP3 frontend, a MySQL database schema, and a perl 'collector' (which may be run as a daemon, from cron or standalone).

DataparkSearch
DataparkSearch is an Internet and Intranet search engine tool. Key features:

  • Support for http, https, ftp, nntp and news URL schemes.
  • htdb virtual URL scheme support for indexing SQL databases.
  • text/html, text/xml, text/plain, audio/mpeg (MP3) and image/gif mime types built-in support.
  • External parsers support for other document types.
  • Ability to index multilangual sites using content negotiagion.
  • Searching all of the word forms using ispell affixes and dictionaries.
  • Stopwords and synonyms lists.
  • Boolean query language support.
  • Results sorting by relevancy, popularity rank, last modified time and by importance (a multiplication of relevancy and popularity rank).
  • Various character sets support.
  • Accent insensitive search.
  • Phrases segmenting for Chinese, Japanese, Korean, and Thai.
  • mod_dpsearch - search module for Apache web server.
  • Internationalized Domain Names support

Doodle
Doodle quickly searches the documents on a computer, then builds an index using meta-data contained in the documents and allows fast searches on the resulting database. It supports approximate searches and full-text indexing, and comes with a library for accessing the doodle database and making it easy to integrate doodle's functionality into other applications or user interfaces. Users can keep the doodle database always up-to-date by updating the database on-the-fly by using 'doodled' and 'fam' whenever files on the system change. They can also build one doodle database for all users on a multi-user system without compromising user privacy.

GIFT Heckert gnu.small.png
The GNU Image Finding Tool is a Content Based Image Retrieval System (CBIRS). You can do Query By Example on images, giving you the opportunity to improve query results by relevance feedback. The program relies entirely on the content of the images to process queries, so you needn't annotate images before querying the collection. It comes with a tool which lets you index whole directory trees containing images in one go. You then can use the GIFT server and its clients to browse your own image collections.

Googleware
Googleware automatically queries Google. User are notified by mail each time a new entry is found, and can browse query results with any regular Web browser.

Ht: Dig
The ht://Dig system is a complete World Wide Web indexing and searching system for a domain or intranet. The system is not meant to replace Internet-wide search engines such as Alta Vista, but instead to cover the search needs for a single company, campus, or even a particular subsection of a large Web site. ht://Dig can easily span several Web servers; the type of server doesn't matter as long as it covers common protocols like HTTP. Many different types of searches can be set up using a common database. Additional features include support for robot exclusion, Boolean expression and fuzzy configurable search results, ability to search both text and HTML files, searches on subsections of the database, the ability to index a protected server, limit the depth of the search, and add keywords to HTML documents.

Hyper Estraier
Hyper Estraier is a full-text search system. You can search lots of documents for some documents including specified words. If you run a web site, it is useful as your own search engine for pages in your site. Also, it is useful as search utilities of mail boxes and file servers. The characteristic of Hyper Estraier is the following.

  • High performance of search
  • High scalability of target documents
  • Perfect recall ratio by N-gram method
  • High precision by hybrid mechanism of N-gram and morphological analyzer
  • Phrase search, regular expressions, attribute search, and similarity search
  • Multilingualism with Unicode
  • Independent of file format and repository
  • Intelligent web crawler
  • Simple and powerful API
  • Supporting P2P architecture

Lucene
Lucene is a Java full-text search engine. It's not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications.

This package is also known as Lucene Core. It is part of the Apache Lucene project which includes some other free software, including Solr or PyLucene.

Mapnik
Mapnik is a toolkit for developing GIS applications. At the core is a C++ shared library providing algorithms/patterns for spatial data access and visualization. Essentially a collection of geographic objects (map, layer, datasource, feature, and geometry), the library doesn't rely on "windowing systems" and can be deployed in any server environment. It is intended to play fair in a multi-threaded environment and is aimed primarily, but not exclusively, at Web-based development. High-level Python bindings (boost.python) facilitate rapid application development, targeting zope3, django, etc.

… further results



Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.


Personal tools
Namespaces

Variants
Actions
Navigation
Contribute