Topic Overview:
As many new high-throughput technologies were developed and rapidly improved in the past decade, a tremendous amount of genomic data were generated and stored in the public domain. Information integration of multiple genomic studies (genomic meta-analysis) has since become commonplace in many areas of biomedical research. Advantages of such analyses include: (1) increased statistical power when studies have a small sample size and weak signal, (2) robust and validated conclusions from different but relevant studies, and (3) enhanced knowledge and ability to narrow gene targets and better design future experiments. Currently, classical analysis methods (such as the Venn diagram, vote counting, and Fisher’s method) are regularly used in genomic meta-analysis applications. More sophisticated and efficient statistical methods―customized for specific genomic data and biological purposes―are seldom developed. This absence is mainly due to a lack of awareness or understanding of the deep complexity of genomic meta-analysis; each study involves an intricate set of research problems that depend on both the data structure available and specific biological questions being asked.

In the past few years, our group has focused on developing new computational and statistical methods for combining multiple transcriptomic studies. We designed and built a software suite, called “MetaGenomics,” which contains the following seven sub-packages: (1) MetaDiagnosis, a quantitative quality assessment for inclusion/exclusion criteria for genomic meta-analysis; (2) MetaDE, methods to combine genomic studies for biomarker (differentially expressed gene) detection; (3) MetaPath, methods to combine genomic studies for pathway identification; (4) MetaClust, methods to combine genomic studies for gene clustering; (5) MetaDimR, methods to combine genomic studies for dimension reduction; (6) MetaPredict, methods to combine genomic studies for inter-study prediction; and (7) MetaNetwork, methods to combine genomic studies for network or co-expression analysis. This discussion will provide an overview of our established and ongoing efforts and present in-depth examples of these applications.