Comparison of Methods for Differential Gene Expression Using Proteomics Count Data
The main goal of the thesis is to identify proteomics gene expression associated with certain experimental conditions or diseases. Many researchers have compared different statistical methods which identify differentially expressed genes. However, very few are relevant to proteomics datasets. The present research examines modeling, transformation, and normalization methods, selects certain leading packages with built-in methods for the proteomics datasets, and detects genes whose mean expressions differ among the treatment and control groups. Two methods, TweeDEseq and Limma-Voom, are recommended because they are superior to the other approaches regarding modeling the proteomics data and data manipulation. TweeDEseq, built on the Poisson-Tweedie model, is supposed to adapt any over-dispersion data. Although Limma-Voom is based on a negative binomial model, the Voom method can boost flexibility with its built-in function to generate a precision weight for each observation. Both methods perform a good trade-off between the statistical power and False Discovery Rate (FDR) control.
History
Language
engDegree
- Master of Science
Program
- Applied Mathematics
Granting Institution
Ryerson UniversityLAC Thesis Type
- Thesis