Difference between revisions of "User:Ritacon"

From Free Software Directory
Jump to: navigation, search
Line 2: Line 2:
 
|?Full description=Description
 
|?Full description=Description
 
|?Homepage URL=Homepage
 
|?Homepage URL=Homepage
|?License
+
|?License=
 
|limit=100
 
|limit=100
 
}}
 
}}

Revision as of 15:35, 22 March 2014

 DescriptionHomepage 
AletheiaIn short, Aletheia is software for getting science published and into the hands of everyone, for free. It's a decentralised and distributed database used as a publishing platform for scientific research.

So, Aletheia is software. But software without people is nothing. To comprehensively answer the question what is Aletheia, Aletheia is software surrounded by a community of people who want to change the world through open access to scientific knowledge.

For a more in depth explanation, Aletheia is an Ethereum Blockchain application utilising IPFS for decentralised storage that anyone can upload documents to, download documents from, that also handles the academic peer review process. The application runs on individual PCs, all forming part of the IPFS database. This gives us an open source platform that cannot be bought out by the large publishers (and any derivitive works must also be open source) that should also be hard to take down due to the database being spread across the globe in multiple legal jurisdictions. Aletheia is designed to be a resilient platform run transparently by the community, not some black box corporation or editorial board, meaning all users can see the decisions Aletheia is making and have a stake in that decision making process if they so desire. By this nature, Aletheia is decentralised, it has no key person risk. Should the core group who invented Aletheia dissapear Aletheia won't cease to exist, it will continue to be run by the community. The community moderates content through various mechanisms (peer review, reputation scores etc.,) to ensure quality of content.
https://aletheia-foundation.io/GPL
Apophenia'Apophenia' is a statistical library for C. It provides functions on the same level as those of the typical stats package (OLS, probit, singular value decomposition, &c.) but doesn't tie the user to an ad hoc language or environment. It uses the GNU Scientific Library for number crunching and SQLite for data management, so the library itself focuses on model estimation and quickly processing data.http://apophenia.infoGPLv2orlater
GPLv2
AutoclassAutoClass solves the problem of automatic discovery of classes in data (sometimes called clustering or unsupervised learning), as distinct from the generation of class descriptions from labeled examples (called supervised learning). It aims to discover the 'natural' classes in the data. AutoClass is applicable to observations of things that can be described by a set of attributes, without referring to other things. The data values corresponding to each attribute are limited to be either numbers or the elements of a fixed set of symbols. With numeric data, a measurement error must be provided.http://ic-www.arc.nasa.gov/ic/projects/bayes-group/autoclass/PublicDomain
Bc'Bc' is an arbitrary precision numeric processing language. Its syntax is similar to C, but differs in many substantial areas. It supports interactive execution of statements. 'Bc' is a utility included in the POSIX P1003.2/D11 draft standard. This version does not use the historical method of having bc be a compiler for the dc calculator (the POSIX document doesn't specify how bc must be implemented). This version has a single executable that both compiles the language and runs the resulting 'byte code.' The byte code is not the dc language.https://www.gnu.org/software/bc/GPLv2orlater
GPLv3orlater
LGPLv3orlater
Cl-anacl-ana is a library of modular utilities for reasonably high performance data analysis & visualization using Common Lisp. (Reasonably means I have to be able to use it for analyzing particle accelerator data). The library is made of various sublibraries and is designed in a very bottom-up way so that if you don't care about some feature you don't have to load it.

The functionality support so far are

  • Tabular data analysis: Read-write of large datasets stored in HDF5 files are supported, along with ntuple datasets, CSVs, and in-memory data tables. Users can add their own table types by defining 4 methods and extending the table CLOS type.
  • Histograms: Binned data analysis is supported with both contiguous and sparse histogram types; functional interface is provided via map (which allows reduce/fold) and filter.
  • Plotting: Uses gnuplot for plotting dataset samples, plain-old lisp functions, histograms, strings-as-formulae, and anything else the user wishes to add via methods on a couple of generics.
  • Fitting: Uses GSL for non-linear least squares fitting. Uses plain-old lisp functions as the fit functions and can fit against dataset samples, histograms, and whatever the user adds.
  • Generic mathematics: CL doesn't provide extendable math functions, so cl-ana provides these as well as a convenient mechanism (a single function) for using these functions instead of the non-extendable versions. Already included are error propogation and quantities (values with units, e.g. 5 meters) as well as a GNU Octave-style handling of sequences (e.g. (+ (1 2) (3 4)) --> (4 6)).
https://github.com/ghollisjr/cl-anaGPLv3
DapDap is a small statistics and graphics package, based on C, that provides core methods of data management, analysis, and graphics commonly used in statistical consulting practice. Anyone familiar with basic C syntax can learn Dap quickly and easily from the manual and the examples in it. Advanced features of C are not necessary, although they are available. As of Version 3.0, Dap can read SBS programs, thereby freeing the user from having to learn any C at all to run straightforward analyses. The manual contains a brief introduction to the C syntax needed for C-style programming for Dap. Because Dap processes files one line at a time, rather than reading entire files into memory, it can be, and has been, used on data sets that have very many lines and/or very many variables.https://www.gnu.org/software/dap/GPLv3orlater
Data FrameIn the R language, a dataframe object is a way to group tabular data. The functions in this package allow the manipulation of data in a similar way in Octave. Dataframe objects in Octave can be created in a variety of ways (from other objects or from tabular data in a file) and then can be accessed either as matrix or by column name. This Octave add-on package is part of the Octave-Forge project.http://octave.sourceforge.net/dataframe/GPLv3orlater
DataMeltDataMelt (DMelt) is an environment for numeric computation, statistical analysis, data mining, and graphical data visualization on the Java platform. This Java multiplatform program is integrated with a number of scripting languages: Jython (Python), Groovy, JRuby, BeanShell. DMelt can be used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations, function minimization, linear algebra, solving systems of linear and differential equations. Linear, non-linear and symbolic regression are also available. Neural networks and various data-manipulation methods are integrated using powerful Java API. Elements of symbolic computations using Octave/Matlab scripting are supported.https://jwork.org/dmelt/LGPLv3
DataStatixDataStatix is a free software for GNU/Linux and Windows useful to manage data of every kind (although it has been written to manage biomedical data), to create descriptive statistics and graphs and to export items easily to R environment or to other statistic softwares. In order to handle properly big amount of data and many concurrent users, DataStatix works with MySql database and it has been developed and tested with MySql community edition 5.5.

Some features of the software are: users management (create, delete, modify password) within the software; different users levels of data access (administrator, default, read only); user defined templates (models) of data, to create new databases easily; importation and esportation of data in CSV format (used also by Calc and Excel); updating of existing data from a CSV file created with DataStatix; descriptive statistics from every data (some more kind of statistics to come);

graphs from every data.
https://sites.google.com/site/datastatix/GPLv3
DatamashDatamash is a command-line program which performs basic numeric, textual and statistical operations on input textual data files. Datamash is designed to be portable and reliable, and aid researchers to easily automate analysis pipelines, without writing code or even short scripts.https://www.gnu.org/software/datamash/GPLv3orlater
Dinrhiw2Primary aim of the dinrhiw is to be linear algebra library and machine learning

library. For this reason dinrhiw implements PCA and neural network codes. Currently, the neural network code only supports:

  • hamiltonian monte carlo sampling (HMC) and simple bayesian neural network
  • second order L-BFGS search
  • gradient descent (backpropagation)
As well as mathematical routines for arbitrary precision mathematics, hermite curve interpolation and many other things.
https://github.com/cslr/dinrhiw2GPLv3
GretlGretl, an acronym for Gnu Regression Econometrics and Time-series Library, is a package for performing statistical computations for econometrics. It consists of both a command-line client and a graphical client. It features a variety of estimators such as least-squares and maximum likelihood; several time series methods such as ARIMA and GARCH; limited dependent variables such as logit, probit and tobit; and a powerful scripting language. It can output models as LaTeX files. It also may be linked to GNU R and GNU Octave for further data analysis.http://gretl.sourceforge.net/GPLv3orlater
INFOTOPOPrograms for Information Topology Data Analysis Information Topology is a program written in Python (compatible with Python 3.4.x), with a graphic interface built using TKinter [1], plots drawn using Matplotlib [2], calculations made using NumPy [3], and scaffold representations drawn using NetworkX [4]. It computes all the results on information presented in the study [5], that is all the usual information functions: entropy, joint entropy between k random variables (Hk), mutual informations between k random variables (Ik), conditional entropies and mutual informations and provides their cohomological (and homotopy) visualisation in the form of information landscapes and information paths together with an approximation of the minimum information energy complex [5]. It is applicable on any set of empirical data that is data with several trials-repetitions-essays (parameter m), and also allows to compute the undersampling regime, the degree k above which the sample size m is to small to provide good estimations of the information functions [5]. The computational exploration is restricted to the simplicial sublattice of random variable (all the subsets of k=n random variables) and has hence a complexity in O(2^n). In this simplicial setting we can exhaustively estimate information functions on the simplicial information structure, that is joint-entropy Hk and mutual-informations Ik at all degrees k=<n and for every k-tuple, with a standard commercial personal computer (a laptop with processor Intel Core i7-4910MQ CPU @ 2.90GHz * 8) up to k=n=21 in reasonable time (about 3 hours). Using the expression of joint-entropy and the probability obtained using equation and marginalization [5], it is possible to compute the joint-entropy and marginal entropy of all the variables. The alternated expression of n-mutual information given by equation then allows a direct evaluation of all of these quantities. The definitions, formulas and theorems are sufficient to obtain the algorithm [5]. We will further develop a refined interface (help welcome) but for the moment it works like this, and requires minimum Python use knowledge. Please contact pierre.baudot [at] gmail.com for questions, request, developments (etc.):


[1] J.W. Shipman. Tkinter reference: a gui for python. . New Mexico Tech Computer Center, Socorro, New Mexico, 2010. [2] J.D. Hunter. Matplotlib: a 2d graphics environment. Comput. Sci. Eng., 9:22–30, 2007. [3] S. Van Der Walt, C. Colbert, and G. Varoquaux. The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng., 13:22– 30, 2011. [4] A.A. Hagberg, D.A. Schult, and P.J. Swart. Exploring network structure, dynamics, and function using networkx. Proceedings of the 7th Python in Science Conference (SciPy2008). Gel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pages 11–15, 2008. [5] M. Tapia, P. Baudot, M. Dufour, C. Formisano-Tréziny, S. Temporal, M. Lasserre, J. Gabert, K. Kobayashi, JM. Goaillard . Information topology of gene expression profile in dopaminergic neurons doi: https://doi.org/10.1101/168740 http://www.biorxiv.org/content/early/2017/07/26/168740
https://github.com/pierrebaudot/INFOTOPOGPLv3orlater
KNIMEKNIME [naim] is a user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting. The open integration platform provides over 1000 modules (nodes), including those of the KNIME community and its extensive partner network.http://www.knime.orgGPLv3 with exception
MLPACKMLPACK is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and flexibility for expert users. MLPACK contains the following algorithms: Collaborative Filtering, Density Estimation Trees, Euclidean Minimum Spanning Trees, Fast Exact Max-Kernel Search (FastMKS), Gaussian Mixture Models (GMMs), Hidden Markov Models (HMMs), Kernel Principal Component Analysis (KPCA), K-Means Clustering, Least-Angle Regression (LARS/LASSO), Local Coordinate Coding, Locality-Sensitive Hashing (LSH), Logistic regression, Naive Bayes Classifier, Neighbourhood Components Analysis (NCA), Non-negative Matrix Factorization (NMF), Principal Components Analysis (PCA), Independent component analysis (ICA), Rank-Approximate Nearest Neighbor (RANN), Simple Least-Squares Linear Regression (and Ridge Regression), Sparse Coding, Tree-based Neighbor Search (all-k-nearest-neighbors, all-k-furthest-neighbors), Tree-based Range Search.http://mlpack.orgLGPLv3orlater
MastraveMastrave is a free software library written to perform vectorized scientific computing and to be as compatible as possible with both GNU Octave and Matlab computing frameworks, offering general purpose, portable and freely available features for the scientific community. Mastrave is mostly oriented to ease complex modeling tasks such as those typically needed within environmental models, even when involving irregular and heterogeneous data series.

Semantic array programming

The Mastrave project attempts to allow a more effective, quick interoperability between GNU Octave and Matlab users by using a reasonably well documented wrap around the main incompatibilities between those computing environments and by promoting a reasonably general idiom based on their common, stable syntagms. It also promotes the systematic adoption of data-transformation abstractions and lightweight semantic constraints to enable concise and reliable implementations of models following the paradigm of semantic array programming.

There are a couple of underlying ideas: library design is language design and vice versa (Bell labs); language notation is definitely a "tool of thought" (version), in the sense that there is a feedback between programming/mathematical notation and the ability to think new scientific insights. And perhaps ethic ones.


Science and society

Mastrave is free software, which is software respecting your freedom. As many other free scientific software packages, it is offered to the scientific community to also promote the development of a free society more concerned about cooperation rather than competitiveness, heading toward knowledge and culture freedom.

Such a vision implies the possibility for motivated individuals to freely access, review and contribute even to the cutting-edge academic culture. This possibility relies on the development of tools and methodologies helping to overcome economic, organizational and institutional barriers (i.e. knowledge oligopolies) while systematically promoting reproducible research. This is a long-term goal to which the free software paradigm can and has been able to actively cooperate.
http://mastrave.orgGPLv3orlater
MathGeneMathGene is a comprehensive JavaScript mathematics engine that delivers the ability to perform advanced numerical and symbolic mathematics processing of LaTeX expressions and send the output to pure HTML for rendering on a conventional web browser or via web server.

MathGene has two modules: •mg_translate.js, which translates between LaTeX, HTML, and native MG format. •mg_calculate.js, which performs the calculations.

mg_translate.js can be used without mg_calculate.js to perform mathematics rendering only. Both modules are required to perform calculations.
https://github.com/MathGene/MathGeneGPLv3orlater
McsimMCSim is a simulation and statistical inference tool for algebraic or differential equation systems. While other programs have been created to the same end, many of them are not optimal for performing computer intensive and sophisticated Monte Carlo analyses. MCSim was created specifically to perform Monte Carlo analyses in an optimized and easy to maintain environment.https://www.gnu.org/software/mcsim/GPLv3orlater
MedianTrackerMedianTracker supports efficient median queries on and dynamic additions to a list of values. It provides both the lower and upper median of all values seen so far. Any __cmp__()-able object can be tracked, in addition to numeric types. add() takes log(n) time for a tracker with n items; lower_median() and upper_median() run in constant time. Since all values must be stored, memory usage is proportional to the number of values added (O(n)).http://mediantracker.sourceforge.net/Expat
OpenCalcOpenCalc is terminal based open-source calculator.https://github.com/communotron/opencalcGPLv3
OpennnOpenNN is a class library written in C++ which implements neural networks.

The library is intended for advanced users, with high C++ and machine learning skills. OpenNN provides an effective framework for the research and development of data mining and predictive analytics algorithms and applications.

OpenNN is based on the most popular neural network model, the multilayer perceptron. The package comes with unit testing, many examples and extensive documentation.

The library has been designed to learn from data sets. Some typical applications here are function regression (modelling), pattern recognition (classification) and time series prediction (forecasting).

OpenNN is being developed by Intelnics, a company specialized in the development and application of neural networks.
http://www.intelnics.com/opennnLGPLv3
OptionMatrixThese calculators are real-time multi-model option chain pricers with analytics and interactive controls. optionmatrix is the GTK+ graphical user interface version and optionmatrix_console is the Curses version. Both programs feature: greeks, decimal date to real-date translations, real-date to decimal date translations, real-time time bleeding, configurable option expiration date engines, calendars, strike control systems, tickers and over 168+ option models. optionmatrix also supports: spreads, bonds, term structures, cash flow editing, source code viewing and text exporting.http://sourceforge.net/projects/optionmatrix/?source=directoryGPLv3orlater
PapertrailPapertrail is a Ballot Counting Software. It helps scanning and counting regular paper ballots as known from common election situations. It is Free Software, licensed via the GPLv3, to make the election process as dependable as possible, but speeding up the manual counting process a lot.http://antcom.de/papertrailGPLv3orlater
PloticusProduces full-color lineplots, bargraphs, histograms, scatterplots, pie graphs, rangebars, boxplots, tables, tabular plots etc. Many labeling and style features. Produce graphs for publications, slides, posters, web pages and intranets. Plots from tabular data sets. Handles numeric, date, time, and alphanumeric data. Script-driven, non-interactive. Can render in Postscript, PNG, GIF, or X11.http://ploticus.sourceforge.net/GPLv2orlater
PredictionIOPredictionIO is a free software Machine Learning server system. It enables developers and data engineers to build smarter web and mobile applications through a simple set of APIs. Admin UI is provided for developers to select and tune algorithms.http://prediction.ioAGPLv3
PsppPSPP is a program for statistical analysis of sampled data. It is a replacement for IBM SPSS.

It is a powerful tool which can be used for exploratory data analysis, hypothesis testing, data preprocessing and visualisation. Available procedures include t-test, anova, linear and logistic regression, non-parametric tests, factor analysis, principle components analysis, cluster analysis, receiver operating characteristic and many more.

It can be used either with a command line or graphical user interface.
https://www.gnu.org/software/pspp/GPLv3orlater
PuffinPlotPuffinPlot is a user-friendly, cross-platform program which analyses and plots palaeomagnetic data. It provides several plot types and analysis functions commonly used in palaeomagnetism, user-configurable graph layout, CSV data export, and SVG and PDF graph export. It has facilities for both interactive and bulk analysis, and can also be controlled and extended using any JVM-based scripting language (including Python). PuffinPlot is written in Java.http://talvi.net/puffinplot/GPLv3orlater
PyChemThe purpose of this software is to provide a simple to install and easy to use graphical interface to multivariate algorithms.
The package currently supports: storage of supporting experimental data (metadata); data pre-processing; principal components analysis (PCA); discriminant function analysis (DFA,CVA,LDA,DA); cluster analysis; partial least squares regression (PLSR, PLS1); genetic algorithm (GA) based variable selector coupled to PLS and DFA.
http://fruitcake.mib.man.ac.uk/pychem/GPLv2orlater
RR is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.

The core of R is an interpreted computer language which allows branching and looping as well as modular programming using functions. Most of the user-visible functions in R are written in R. It is possible for the user

to interface to procedures written in the C, C++, or FORTRAN languages for efficiency. The R distribution contains functionality for a large number of statistical procedures. Among these are: linear and generalized linear models, nonlinear regression models, time series analysis, classical parametric and nonparametric tests, clustering and smoothing. There is also a large set of functions which provide a flexible graphical environment for creating various kinds of data presentations. Additional modules ("add-on packages") are available for a variety of specific purposes.
https://www.r-project.org/GPLv2orlater
RstudioAn integrated development environment (IDE) for R. R is a programming language and software environment. RStudioIt includes a console, syntax-highlighting editor that supports direct code execution, tools for plotting, history, debugging and workspace management.https://www.rstudio.com/products/rstudio/AGPL-3.0
SalStatSalStat is an small application for the statistical analysis of scientific data (with a special concentration on psychology). It can already do 18 kinds of descriptive statistics, t tests (paired, unpaired and one sample), 3 kinds of correlations linear regression and point biserial tests, and single factor anova (both within and between subjects). Data is entered on an easy-to-use datagrid like a spreadsheet, and all the analyses are driven by menus and dialog boxes. Output can be formatted to HTML.http://salstat.sourceforge.net/GPLv2orlater
StatistStatist is a small and portable statistics program written in C. It is terminal-based, but can utilise GNUplot for plotting purposes. It is simple to use and can be run in scripts. Big datasets are handled reasonably well on small machines.http://wald.intevation.org/projects/statist/GPLv2orlater
StatistXStatistX is a GUI frontend for the statistics program statist. Currently, it provides about 20 different statistical tests and regressions. Results are presented either as text or Gnuplot graphs. It is not intended to replace tools like R.http://www.usf.uni-osnabrueck.de/~abeyer/private/StatistX/GPLv2orlater
StatlibThe goal of the project is to combine several python statistics modules into a single package.http://code.google.com/p/python-statlib/PublicDomain
Expat
TensorFlowTensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.https://www.tensorflow.org/Apache2.0
VilnoA full-featured statistics package needs data preparation software, to manipulate data and prepare data for analysis. This software package, called Vilno, is data transformation software. It can be used instead of the SAS datastep for data transformation. It can be used to clean and prepare data before importing the derived data into R ( a statistics package using the S programming language ).http://code.google.com/p/vilnoGPLv2


Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.