Free Software Foundation!

Join now

Help us raise $300,000 by January 30th

Mc 2

This entry published by the Free Software Foundation.



mc

http://www.cs.utexas.edu/users/jfan/dm/
MC is a C++ program that creates vector-space models from text documents that can be used for text mining applications. MC provides an efficient multi-threaded implementation that can process very large document collections. The MC program: 1. Recursively descends directories, finding text files 2. Processes files selectively through full regular expression matching of file names. 3. Builds a sparse matrix of word/token counts. The particular sprse marix format used is given here. 4. Processes any user specified text formats(email address or URLs) as a whole token through regular expression matching or FLEX definition. 5. Prunes vocabulary by word length and frequency 6. Excludes user specified stop words 7. Sets word vector weights according any of the txx, txn, tfn, tfx, lxx, lxn, lfn, lfx scaling schemes. 8. Writes all data structures to disk in the Compressed Column Storage format. The application does not have English parsing or part-of-speech tagging facilities or complete documentation

Documentation

User README available in HTML format from http://www.cs.utexas.edu/users/jfan/dm/README.html

Heckert gnu.small.png This is a GNU package

Download

Download External-link-icon.png version 2.19 (stable)
released on 26 June 2001

Categories


Licensing

License Verified by Verified on Notes
GPLv2 Janet Casey 2452092.52 July 2001
GPLv2 Janet Casey 2452092.52 July 2001


Leaders and contributors

Contact(s)Role
"Email jfan@cs.utexas.edu" James Fan Maintainer
"Email jfan@cs.utexas.edu" James Fan Maintainer

Resources and communication

Audience Resource type URI
Bug Tracking,Developer,Support E-mail mailto:jfan@cs.utexas.edu
Bug Tracking,Developer,Support E-mail mailto:jfan@cs.utexas.edu


Software prerequisites

Kind Description
Required to build pthread library
Required to build STL
Required to build FLEX
Required to build STL
Required to build pthread library
Required to build FLEX


Click here if you'd like to report a problem or make a suggestion that could


This entry (in part or in whole) was last reviewed on 30 April 2008.



Problem with this listing?
















Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software described in this text has its own copyright notice and license, which can usually be found in the distribution itself.


This page was last modified on 12 April 2011, at 12:21.

The FSF is a charity with a worldwide mission to advance software freedom — learn about our history and work.

Copyright © 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.

Licensed under the GNU Free Documentation License, version 1.3 or later.

The FSF also has sister organizations in France, Latin America, Europe and India.

Powered by MediaWiki and Semantic MediaWiki

Toolbox