# INFOTOPO

### INFOTOPO

https://github.com/pierrebaudot/INFOTOPO

Programs for Information Topology Data Analysis Information Topology

Programs for Information Topology Data Analysis Information Topology is a program written in Python (compatible with Python 3.4.x), with a graphic interface built using TKinter [1], plots drawn using Matplotlib [2], calculations made using NumPy [3], and scaffold representations drawn using NetworkX [4]. It computes all the results on information presented in the study [5], that is all the usual information functions: entropy, joint entropy between k random variables (Hk), mutual informations between k random variables (Ik), conditional entropies and mutual informations and provides their cohomological (and homotopy) visualisation in the form of information landscapes and information paths together with an approximation of the minimum information energy complex [5]. It is applicable on any set of empirical data that is data with several trials-repetitions-essays (parameter m), and also allows to compute the undersampling regime, the degree k above which the sample size m is to small to provide good estimations of the information functions [5]. The computational exploration is restricted to the simplicial sublattice of random variable (all the subsets of k=n random variables) and has hence a complexity in O(2^n). In this simplicial setting we can exhaustively estimate information functions on the simplicial information structure, that is joint-entropy Hk and mutual-informations Ik at all degrees k=<n and for every k-tuple, with a standard commercial personal computer (a laptop with processor Intel Core i7-4910MQ CPU @ 2.90GHz * 8) up to k=n=21 in reasonable time (about 3 hours). Using the expression of joint-entropy and the probability obtained using equation and marginalization [5], it is possible to compute the joint-entropy and marginal entropy of all the variables. The alternated expression of n-mutual information given by equation then allows a direct evaluation of all of these quantities. The definitions, formulas and theorems are sufficient to obtain the algorithm [5]. We will further develop a refined interface (help welcome) but for the moment it works like this, and requires minimum Python use knowledge. Please contact pierre.baudot [at] gmail.com for questions, request, developments (etc.):

[1] J.W. Shipman. Tkinter reference: a gui for python. . New Mexico Tech Computer Center, Socorro, New Mexico, 2010. [2] J.D. Hunter. Matplotlib: a 2d graphics environment. Comput. Sci. Eng., 9:22–30, 2007. [3] S. Van Der Walt, C. Colbert, and G. Varoquaux. The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng., 13:22– 30, 2011. [4] A.A. Hagberg, D.A. Schult, and P.J. Swart. Exploring network structure, dynamics, and function using networkx. Proceedings of the 7th Python in Science Conference (SciPy2008). Gel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pages 11–15, 2008. [5] M. Tapia, P. Baudot, M. Dufour, C. Formisano-Tréziny, S. Temporal, M. Lasserre, J. Gabert, K. Kobayashi, JM. Goaillard . Information topology of gene expression profile in dopaminergic neurons doi: https://doi.org/10.1101/168740 http://www.biorxiv.org/content/early/2017/07/26/168740

### Documentation

INFOTOPO is currently divided into 3 programs:INFOTOPO_COMPUTATION_V1.1.py This program computes all the information quantities for all k-tuples below n=Nb_var and save them in object file. The input is an excel (.xlsx) table containing the data values, e.g. the matrix D with first row and column containing the labels, the rows are the random variables (computation with usual PC up to n=21 rows , n=Nb_var=Nb_vartot) and the columns are the differents trials-repetitions-essays (parameter m). It first estimate the joint probability density at a given grianing-resampling of the variables (parameter N=Nb_bins) [5]. It prints the overall histograms of raw and resampled values and of the raw and resampled matrix. The information functions are then estimated for each k-tuples and saved in object-files: _ 'ENTROPY'.pkl save the object Nentropy: a dictionaries (x,y) with x a list of kind (1,2,5) and y a Hk value in bit. It contains the 2^n values of joint entropies _ 'ENTROPY_ORDERED'.pkl save the object Nentropy_per_order_ordered, the ordered dictionary of Nentropy where the order is given by the entropy values. _ 'INFOMUT'.plk save the object Ninfomut: a dictionaries (x,y) with x a list of kind (1,2,5) and y a Ik value in bit; It contains the 2^n values of mutual informations. _ 'ENTROPY_SUM'.pkl save the object entropy_sum_order: a dictionaries (k,y) with k the degree and y the mean Hk value over all k-tuple in bit _ 'INFOMUT_ORDERED'.pkl save the object Ninfomut_per_order_ordered, the odered dictionary of Ninfomut where the order is given by the infomut values. _ 'INFOMUT_ORDEREDList'.plk save the object infomut_per_order: the same as INFOMUT_ORDERED but saved as a list. _ 'INFOMUT_SUM'.pkl save the object Infomut_sum_order: a dictionaries (k,y) with k the degree and y the mean Ik value over all k-tuple in bit

INFOTOPO_READFILE_V1.1.py Short program just to print the saved files .plk .

INFOTOPO_VISUALIZATION_V1.1.py This program computes all the visualization of information quantities in the form of distributions, information landscapes, mean landscapes and information paths together with an approximation of the minimum information energy complex, and scafolds of the Information. The input is the saved object-files .plk. . You have to choose the plk to load at the begining of the program and the corresponding figures of output you want by assigning a boolean value: SHOW_results_ENTROPY = False SHOW_results_INFOMUT = False SHOW_results_COND_INFOMUT = False SHOW_results_ENTROPY_HISTO = True SHOW_results_INFOMUT_HISTO = False SHOW_results_INFOMUT_path = False SHOW_results_SCAFOLD = False

## Download

https://github.com/pierrebaudot/INFOTOPO/archive/master.zip u

version V1.1
(alpha)

released on 1 September 2017

### Microblog

https://github.com/pierrebaudot/INFOTOPO/issues

### Categories

- Biology:genetics
- Biology:health-and-medicine
- Mathematics:statistics
- Programming-language:python
- Runs-on:GNU/Linux
- Runs-on:Windows
- Science:artificial-intelligence
- Science:biology
- Science:engineering
- Science:physics
- Science:scientific-visualization
- UI Toolkit:sdl
- UI Toolkit:wxwidgets
- Use:mathematics
- Version-control:git
- Works-with:database
- Works-with:spreadsheet

## Licensing

License | Verified by | Verified on | Notes |
---|---|---|---|

License:GPLv3orlater |

## Leaders and contributors

Contact(s) | Role |
---|---|

Pierre Baudot (Pierre.Baudot) | researcher (math-info-bio) |

## Resources and communication

Audience | Resource type | URI |
---|---|---|

## Software prerequisites

Kind | Description |
---|---|

Required to use | Readme notice |

This entry (in part or in whole) was last reviewed on 1 September 2017.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.