Free Software Foundation!

Join now

Browse wiki

This entry published by the Free Software Foundation.

Mc 2
Computer languages C++  +
Database administration  +
Documentation note User README available in HTML format from http://www.cs.utexas.edu/users/jfan/dm/README.html
Full description MC is a C++ program that creates vector-sp MC is a C++ program that creates vector-space models from text documents that can be used for text mining applications. MC provides an efficient multi-threaded implementation that can process very large document collections. The MC program: 1. Recursively descends directories, finding text files 2. Processes files selectively through full regular expression matching of file names. 3. Builds a sparse matrix of word/token counts. The particular sprse marix format used is given here. 4. Processes any user specified text formats(email address or URLs) as a whole token through regular expression matching or FLEX definition. 5. Prunes vocabulary by word length and frequency 6. Excludes user specified stop words 7. Sets word vector weights according any of the txx, txn, tfn, tfx, lxx, lxn, lfn, lfx scaling schemes. 8. Writes all data structures to disk in the Compressed Column Storage format. The application does not have English parsing or part-of-speech tagging facilities or complete documentation gging facilities or complete documentation
Homepage URL http://www.cs.utexas.edu/users/jfan/dm/  +
Interface command-line  +
Is GNU true  +
Keywords data mining  + , text mining  + , vector space model  + , bag of words  + , MC  +
Last review by James Fan +
Last review date 30 April 2008  +
License GPLv2 +
License verified by Janet Casey  +
License verified date 2 July 2001  +
Name mc  +
Prerequisite description STL  + , pthread library  + , FLEX  +
Prerequisite kind Required to build  +
Real name James Fan  +
Resource URL mailto:jfan@cs.utexas.edu  +
Resource audience Bug Tracking  + , Developer  + , Support  +
Resource kind E-mail  +
Revisionid 253  +
Revisiontimestamp 12 April 2011 12:21:01  +
Revisionuser WikiSysop +
Role Maintainer  +
Short description Converts text documents into a vector space model  +
Submitted by Database conversion +
Submitted date 1 April 2011  +
User level none  +
Version comment 2.19 stable released 2001-06-26
Version date 26 June 2001  +
Version download http://www.cs.utexas.edu/users/jfan/dm/src/  +
Version identifier 2.19  +
Version status stable  +
Works-with database  +
Modification dateThis property is a special property in this wiki. 24 May 2012 22:06:33  +
Page has default formThis property is a special property in this wiki. Entry  +
EmailThis property is a special property in this wiki. jfan@cs.utexas.edu  +
hide properties that link here 
MC#3 + , Mc 2#3 + License of
MC#1 + , Mc 2#1 + Person of
MC#2 + , Mc 2#2 + Resource of
MC#4 + , MC#5 + , MC#6 + , Mc 2#4 + , Mc 2#5 + , Mc 2#6 + Software prerequisite of
 

 

Enter the name of the page to start browsing from.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software described in this text has its own copyright notice and license, which can usually be found in the distribution itself.


The FSF is a charity with a worldwide mission to advance software freedom — learn about our history and work.

Copyright © 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.

Licensed under the GNU Free Documentation License, version 1.3 or later.

The FSF also has sister organizations in France, Latin America, Europe and India.

Powered by MediaWiki and Semantic MediaWiki

Toolbox