|
|
|
The use of high-density oligonucleotide arrays to
measure thousands of mRNA abundance levels in parallel
has become commonplace. In order to take further
advantage of the growing body of data and to enable
others to do so, we have developed a method and
computer program to mine the hybridization patterns in
oligonucleotide array-based gene expression data to
identify genes with sequence differences. The program
enables the broad, unbiased and opportunistic
extraction of genetic information from new or
pre-existing gene expression data obtained with
high-density oligonucleotide arrays.
|
|
This analysis method allows predictor variables
collected on the samples to be related to variation
in pair-wise distance values reflected in a distance
matrix. The proposed multivariate method avoids the
need for reducing the dimensions of the matrix, can
be used to assess relationships between data used
to construct the matrix and additional information
collected on the samples under study, and can be used
to analyze individual data points or groups of data
points identified in different ways. It provides a
formal statistical test, rooted in traditional linear
models, of how independent variables are associated to
the variation present in a pair-wise distance matrix.
|
|
A Multivariate Likelihood Ratio Test that Simultaneously
Assesses Mean and Covariance Matrix Difference Among a
Set of Variables. This methodology considers the fact
that groups (e.g., subjects broken down into haplotype
or genotype categories) can differ with respect to
mean values of a set of traits or with respect to the
relationships between those variables. The proposed
tests are therefore more comprehensive and useful
than many traditional multivariate tests which do not
leverage information about both means and covariance
matrices in their construction. The proposed tests are
modifications and extensions of multvariate ANOVA and
other techniques. A simultaneous test of the equality
of means and covariance matrices across the genotypic
categories can be constructed as simply the product of
the test of means and covariances.
|
|
SNPEM
is used to assess the accuracy of haplotype
frequency estimation as a function of a number of
factors, including sample size, number of loci
studied, allele frequencies, and locus-specific
allelic departures from Hardy-Weinberg and linkage
equilibrium. Many haplotype-analysis methods require
phase information that can be difficult to obtain from
samples of nonhaploid species. SNPEM employs
strategies for estimating haplotype frequencies from
unphased diploid genotype data collected on a sample of
individuals that make use of the expectation-maximization
(EM) algorithm to overcome the missing phase information.
|
| External Programs |
|
From the Complearn website:
CompLearn is a suite of simple-to-use utilities
that you can use to apply compression techniques to
the process of discovering and learning patterns.
The compression-based approach used is powerful
because it can mine patterns in completely different
domains. It can classify musical styles of pieces of
music and identify unknown composers. It can identify
the language of bodies of text. It can discover the
relationships between species of life and even the
origin of new unknown viruses such as SARS. Other
uncharted areas are up to you to explore.
This method is so general that it requires
no background knowledge about any particular
classification. There are no domain-specific parameters
to set and only a handful of general settings. Just
feed and run.
|
|
|