http://carmelab.huji.ac.il/software/MVA/mva.html
Multivariate Analysis Toolbox for Matlab®
written by: Liran Carmel
Last modified: 13:14, Mon 13-Sep-2010
We have tried to break down a typical process of multivariate data analysis, in trying to identify key components. We then built a fully object-oriented toolbox, with an object fitting each of those key components.
Data objects. We have identified three entities, which are the building blocks of any multivariate data process. The samplesetobject carries information about the different samples, also called observations, conditions, or experiments. grouping object carries information about labeling of the samples, i.e., their association with specific clusters. The measurements themselves are in a datamatrix. The datamatrix object is the general framework of a datamatrix, from which more specialized data matrices are derived by object-oriented inheritence. These more specialized data matrices encompass most of the data organization forms one may encounter. The vsmatrix object describes a rectangular two-way matrix of variables-by-samples. For example, a result of a gene array experiment in the form of genes-by-conditions will be represented in our toolbox by a vsmatrix object. The ssmatrix object describes relationships between samples. For example, a distance matrix will be represented in our toolbox as a ssmatrix. Thevvmatrix object describes relationships between variables. For example, a correlation matrix will be represented in our toolbox as a vvmatrix.
Graph theory. The graph object describes a general mathematical graph. This toolbox includes more specific graphs (such asdigraphs and trees) that are derived from this general object.
Pairwise data objects. These objects describe specific forms of pairwise data, and are all derived from either ssmatrix orvvmatrix (see above). The covmatrix object describes a covariance or a correlation matrix. The distmatrix object describes a distance matrix. The dissimatrix and simatrix objects describe pairwise dissimilarity or similarity information, respectively.
Dimensionality reduction algorithms. Each object in this group stands for a particular dimensionality reduction technique. Currently available are pcatrans that makes principal component analysis (PCA), wpcatrans that makes weighted principal component analysis (wPCA), and fishtrans that identifies discriminant direction according to the Fisher linear discriminant analysis.
Statistics. This portion of the toolbox includes general statistical functions, mainly various hypothesis testing procedures, as well as the object ctable that describes a contingency table.
Navigate to: General Description List of Objects List of Functions Download
Data objects. We have identified three entities, which are the building blocks of any multivariate data process. The samplesetobject carries information about the different samples, also called observations, conditions, or experiments. grouping object carries information about labeling of the samples, i.e., their association with specific clusters. The measurements themselves are in a datamatrix. The datamatrix object is the general framework of a datamatrix, from which more specialized data matrices are derived by object-oriented inheritence. These more specialized data matrices encompass most of the data organization forms one may encounter. The vsmatrix object describes a rectangular two-way matrix of variables-by-samples. For example, a result of a gene array experiment in the form of genes-by-conditions will be represented in our toolbox by a vsmatrix object. The ssmatrix object describes relationships between samples. For example, a distance matrix will be represented in our toolbox as a ssmatrix. Thevvmatrix object describes relationships between variables. For example, a correlation matrix will be represented in our toolbox as a vvmatrix.
Graph theory. The graph object describes a general mathematical graph. This toolbox includes more specific graphs (such asdigraphs and trees) that are derived from this general object.
Pairwise data objects. These objects describe specific forms of pairwise data, and are all derived from either ssmatrix orvvmatrix (see above). The covmatrix object describes a covariance or a correlation matrix. The distmatrix object describes a distance matrix. The dissimatrix and simatrix objects describe pairwise dissimilarity or similarity information, respectively.
Dimensionality reduction algorithms. Each object in this group stands for a particular dimensionality reduction technique. Currently available are pcatrans that makes principal component analysis (PCA), wpcatrans that makes weighted principal component analysis (wPCA), and fishtrans that identifies discriminant direction according to the Fisher linear discriminant analysis.
Statistics. This portion of the toolbox includes general statistical functions, mainly various hypothesis testing procedures, as well as the object ctable that describes a contingency table.
Navigate to: General Description List of Objects List of Functions Download
Core objects:
- grouping
- labeling of the data according to a classification scheme
- sampleset
- information about samples (observations, conditions, experiments)
- variable
- information about variables (features, coordinates)
Core data objects:
- datamatrix
- a two-way matrix object
- dataset
- repository of information regarding a certain dataset
- graph
- general undirected graph
Pairwise data objects:
- covmatrix
- covariance matrix (inherits vvmatrix)
- dissimatrix
- dissimilarity matrix (inherits ssmatrix)
- distmatrix
- distance matrix (inherits ssmatrix)
- simatrix
- similarity matrix (inherits ssmatrix)
Dimensionality reduction algorithms:
- lintrans
- general dimensionality reduction by linear transformation
Statistics:
- ctable
- contingency table
Combinatorics:
- multinom
- computes the multinomial coefficient.
Data manipulations:
- lineup
- ranks a vector in increasing order.
- majority
- finds the most frequent entry.
- subs_incomp_data
- substitue given data in an incompleted data array
- subsample
- picks up at random a subsample of a vector.
- substitute
- substitutes values in a list with a different set of values.
Graph Theory:
- chowliu
- applies the Chow-Liu algorithm.
- code2dag
- finds the DAG associated with a DAG-code.
- code2rank
- finds the rank of a DAG-code.
- dispdagcode
- displays a DAG code to the screen.
- enumdagcodes
- enumerates all DAG codes for a fixed number of nodes.
- enummarkovclasses
- enumerates all DAG codes for a fixed number of nodes.
- nodags
- computes the number of DAGs with fixed number of nodes.
- rank2code
- finds the DAG-code whose rank is {r}.
- thd2wgt
- computes, given THD, a default weight matrix.
- wgt2thd
- computes, given weights, a default THD matrix.
Grouping:
- group
- turns a list into assignment vector and naming cell array.
Hypothesis testing:
- testbinom
- computes the p-value of testing a binomial parameter.
- testchi2hist
- uses the chi2 test to compare a histograms to a standard.
- testchi2hists
- uses the chi2 test to compare two histograms.
- testchi2independence
- computes the p-value of independence hypothesis.
- testfisherexact
- computes the p-value of Fisher's exact test (© A. Trujillo-Ortiz et al.).
- testfisheromnibus
- computes p-value for the Fisher Omnibus test.
- testkshist
- uses KS test to compare a histograms to a standard.
- testkshists
- uses KS test to compare two histograms.
- testmultinom
- computes the p-value of testing multinomial parameters.
Information Theory:
- centropy
- computes the conditional entropy between two variables.
- emutualentropy
- estimates pairwise mutual entropy
- entropy
- computes the entropy of a distribution.
- kdiv
- computes the K-divergence between distributions p and q.
- kl
- computes the relative entropy.
- ldiv
- computes the L-divergence between distributions p and q.
Linear transformations:
- fa_engin
- performs factor analysis on the data.
- factorscores
- estimate the scores after factor analysis.
- fish_engin
- performs Fisher transformation of a grouped dataset.
- pca_engin
- performs PCA analysis on the data.
Pairwise Relationships:
- distmat
- calculates distance matrix.
Regression analysis:
- regress1d
- linearly regress one variable on another.
Statistics:
- allstats
- computes all common statistics (© D.C. Hanselman).
- fdr
- calculates
- kendall
- computes the Kendall rank correlation matrix.
- pearson
- computes the Pearson (linear) correlation matrix.
- spearman
- computes the Spearman rank correlation matrix.
Visualization:
- scatter2d_engin
- the engine used for 2D scatter plots.
- scatter3d_engin
- the engine used for 3D scatter plots.
No comments:
Post a Comment