Topimage rigidle
In introduction to HCI class.
Interactive Power Analysis Tool

Interactive Power Analysis Tool for Microarray Hypothesis Testing and Generation

Jinwook Seo, , and Eric P. Hoffman / 2006


power calculations for both experimental design in hypothesis testing, and hypothesis generation

Motivation

Human clinical projects typically require a priori statistical power analysis. Towards this end, we sought to build a flexible and interactive power analysis tool for microarray studies integrated into our public domain HCE 3.5 software package. We then sought to determine if probe set algorithms or organism type strongly influenced power analysis results.

Results

The HCE 3.5 power analysis tool was designed to import any pre-existing microarray project, and interactively test the effects of user-defined definitions of α (significance), β (1-power), sample size, and effect size. The tool generates a filter for all probe sets or more focused ontology-based subsets, with or without noise filters that can be used to limit analyses of a future project to appropriately powered probe sets. We studied projects from three organisms (Arabidopsis, rat, human), and three probe set algorithms (MAS5.0, RMA, dChip PM/MM). We found large differences in power results based on probe set algorithm selection and noise filters. RMA provided exquisite sensitivity for low numbers of arrays, but this came at a cost of high false positive results (24% false positive in the human project studied). Our data suggests that a priori power calculations are important for both experimental design in hypothesis testing, and hypothesis generation, as well as for selection of optimized data analysis parameters.

Design and implementation of a power analysis tool for microarrays

We designed and implemented an interactive power analysis method that enables rational design of experiments, thereby minimizing ethical concerns while maximizing the effectiveness of the study. The design strategy is shown in Fig. 3. Researchers first identify a pre-existing project that best matches their proposed project using any one of the existing data repositories (e.g. GEO, ArrayExpress, PEPR). The project must use the same microarray as in the proposed (future) experiment. The power analysis tool in HCE can use either a one-sample t-Test (one group of microarrays corresponding to replicates with a single variable), or a two-sample t-Test (two groups of microarrays differing by one variable).


Interactive Power Analysis Framework in HCE3.5

Effect of biological noise and probe set algorithms on power analysis results

We used three different pre-existing microarray projects to test our power calculation tool, one from a plant (Arabidopsis), one from a rat spinal cord damage project, and one from a human muscular dystrophy patient muscle biopsy project. We chose these three projects to test the effects of two variables, biological noise and probe set algorithms, on the resulting power calculations.
We designed and implemented an interactive power analysis method that enables rational design of experiments, thereby minimizing ethical concerns while maximizing the effectiveness of the study. The design strategy is shown in Fig. 3. Researchers first identify a pre-existing project that best matches their proposed project using any one of the existing data repositories (e.g. GEO, ArrayExpress, PEPR). The project must use the same microarray as in the proposed (future) experiment. The power analysis tool in HCE can use either a one-sample t-Test (one group of microarrays corresponding to replicates with a single variable), or a two-sample t-Test (two groups of microarrays differing by one variable).

  • A higher proportion of probe sets were sufficiently powered by the one-sample t-Test, compared to the two-sample.
  • All projects also showed an increased proportion of sufficiently powered probe sets as the number of replicates per group increased.
  • There was a clear correlation between the amount of biological noise and the resulting power calculations.
  • Unexpected was the very different power results from the three different probe set algorithms.
  • As the number of samples per group increased, the three probe set algorithms begin to converge.
  • Our power calculation studies were consistent with the project-based normalizations being more effective at reducing variance than the chip-based normalization.

Effect of noise filters on power calculations

Not all genes are expressed into mRNA in each cell or tissue type. Those probe sets detecting mRNAs that are not expressed, or expressed at very low levels, are expected to result in signals that are at or near background (noise) levels. We therefore tested the effects of a “present call” noise filter on the resulting power calculations; we expected that the “performance” (e.g. proportion of sufficiently powered probe sets for any given number of arrays) would improve with this noise filter.
we applied a fairly stringent noise filter (50% present calls)

  • Both dChip PM/MM and MAS5.0 showed improved performance in all organisms and at all numbers of microarrays, as expected except for the slight degradation of dChip with the rat data.
  • The RMA algorithm showed a consistent degradation of performance in each organism, with the noise filter leading to a decrease in the proportion of sufficiently powered probe sets.

Effect of noise filter on concordance of power analysis results by probe set algorithms

Given the strong effects of the probe set algorithm choice on the proportion of sufficiently powered probe sets (Fig. 4), we then tested the intersection of the appropriately powered probe sets. For this test, we selected a gene ontology group, inflammatory response genes, where we expected many of the probe sets to show relatively low signals. We used the two-sample t-Test, and studied both the rat and human data. It should be noted that both of these projects are known to show increased inflammatory gene expression in one of the two groups (severe damage rat group; Duchenne muscular dystrophy human group). We also studied the intersection with and without a 50% present call noise filter.

For the rat project, there were 110 probe sets included within the "inflammatory response" group.

  • Without the noise filter and n=3 per group, there was relatively good concordance between RMA and dChip, with approximately 64% of probe sets showing sufficient powering, and about half of these concordant between the two algorithms.
  • MAS5.0 showed poor sensitivity for these same settings, with only 7% of probe sets showing sufficient powering.
  • Use of the noise filter resulted in loss of 77% of the appropriately powered probe sets in the intersection between RMA and dChip.
  • For inflammatory genes in this example (rat project, two group, n=3), a good data mask would be the intersection of RMA and dChip with no noise filter.

For the human project with the same parameters as in the rat project,

  • It showed considerably less concordance between probe set algorithms. (Without a noise filter, only 1% of appropriately powered probe sets using the RMA algorithm were concordant with dChip, and none were concordant with MAS5.) 
  • Application of the noise filter significantly reduced the number of genes entered into the power calculation, but also reduced the concordance.
  • The higher levels of confounding biological noise were too high for the relatively low number of replicates (n=3) in this example.
  • RMA is indeed a much more sensitive probe set algorithm, but this sensitivity comes at a cost of a relatively high false positive rate, and for this noisy human project, this is about 24% of probe sets deemed “significant” by RMA.

How to prepare input files

You have to prepare two files, probe set signal file and probe set detection call file (or probe set detection p-value file).  As you can see in the figure, you can use the probe set detection p-value file from MAS5 for all other signal files generated by probe set signal algorithms other than MAS5.

File names

The two files should be in the same folder.  The extension of the probe set detection call file should be pma.
Please refer to the following example.

  1. Using tab delimited text files

    If the signal file name is mah-mas5.exp (or mah-mas5.txt), the probe set detection call file name should be mah-mas5.pma.

  2. Using Excel files

    If the signal file name is mah-mas5.xls, the probe set detection call file name should be mah-mas5.pma.xls.

Format of the input files

  • Each row is a probe set, and each column is a chip (or a sample).
  • The order of rows and columns should be the same in both signal file and p-value file.  
  • As shown in the following figure, each column can have a class ID that represents the known biological group of the sample.  

Example

Please take a close look at this small example input files (spinal-cord.txt and spinal-cord.pma) in spinal-cord.zip.

Probe set signal file


Probe set detection call file (generated by MAS5)


Please note that the order of rows and columns is the same as in the signal file.

To perform power analysis in HCE 3.5

  • load your data
  • go to Tool -> Power Calculation & Filtering
  • assign samples to groups
  • select a model (one or two-sample t-Test)
  • select a dependent parameter
  • adjust values for independent parameters
  • click "Calculate" button
  • adjust the double sided slider control to change the thresholds for the dependent parameter
  • export or highlight the resulting probe sets ("export" or "highlight" button)

Download

Download HCE 3.5 version for interactive power analysis April 25, 2006(first released on Nov. 11, 2005)
Old User manual(New one for ver 3.5 in preparation)

System requirements
Intel® Pentium® processor
Microsoft® Windows 2000® Windows XP®

Publications