This Data is a bigger data set with 96 micro arrays files for analysis. The experiment conditions are also different. - There is no WT strain - Plus serves as a within strain control - There are unequal number of samples per group
‘Array quality metrics’ package provides us with a repeatable qc metrics. It looks at the quality of our files in terms of bias from experiment and confounding variables.
The output is saved as an HTML file in the working directory.
# arrayQualityMetrics(expressionset = rawData,
#     outdir = "/Users/hp/R projects/LupusProject",
#     force = TRUE, do.logtransform = TRUE,
#     intgroup = c("Strain", "Tissue", "Weeks"))Once we’ve loaded up the data we can look at the distribution of the expressions across all samples using a box plot. The arrayqualitymetrics package provides us with insight into what celfiles are not within the range of the majority.
boxplot(rawData,target="core") # choose the level of summarization: "core"After normalizing the data set using RMA algorithm we can see that the expressions follow a more normal distribution.
boxplot(norm.data, target= "core")We will annotate the probesets using the ‘mogene11sttranscriptcluster’ database. This database was created using the probset information from affymetirix and can be downloaded from Bioconductor. We will annotate the data with the Gene Symbol, Name, Entrez ID, Gene Ontology & Description. And save those with the featuredata.
We choose not to remove probes with no annotation at this stage. However, in further analysis we will remove them. If you wanted to remove them now the chunck below does so.
There is a considerable amount of batch effect. But it is hard to determine if this is from experiment error or if it is from the difference in tissues. i.e, it could be batch effect due to biological difference.
plotMDS(norm.data) #checking batch effectIn the pca from the array quality metrics we noticed that there was
significant groping of the celfiles based on pc1 and pc2 we will
replicate the PCA plot and compare it with the normalized data.
For the normalized Data:
##                index       filename Strain      Tissue Subject Weeks
## 1_MoGene.CEL       1   1_MoGene.CEL   plus    Amygdala     537    24
## 101_MoGene.CEL     2 101_MoGene.CEL    lpr      Kidney    1127    12
## 102_MoGene.CEL     3 102_MoGene.CEL    lpr      Kidney     724    12
## 103_MoGene.CEL     4 103_MoGene.CEL    lpr      Kidney     738    12
## 104_MoGene.CEL     5 104_MoGene.CEL    lpr      Kidney     874     6
## 105_MoGene.CEL     6 105_MoGene.CEL    lpr      Kidney     992     6
## 106_MoGene.CEL     7 106_MoGene.CEL    lpr      Kidney     993     6
## 107_MoGene.CEL     8 107_MoGene.CEL    lpr      Kidney     994     6
## 108_MoGene.CEL     9 108_MoGene.CEL   lpr2      Kidney    1104    18
## 109_MoGene.CEL    10 109_MoGene.CEL   lpr2      Kidney    1018    18
## 11_MoGene.CEL     11  11_MoGene.CEL    lpr    Amygdala     439    24
## 110_MoGene.CEL    12 110_MoGene.CEL   lpr2      Kidney    1003    12
## 112_MoGene.CEL    13 112_MoGene.CEL   lpr2      Kidney    1005    12
## 113_MoGene.CEL    14 113_MoGene.CEL   lpr2      Kidney    1137     6
## 114_MoGene.CEL    15 114_MoGene.CEL   lpr2      Kidney    1181     6
## 115_MoGene.CEL    16 115_MoGene.CEL   lpr2      Kidney    1136     6
## 116_MoGene.CEL    17 116_MoGene.CEL   plus      Kidney     869    24
## 117_MoGene.CEL    18 117_MoGene.CEL   plus      Kidney     752    12
## 118_MoGene.CEL    19 118_MoGene.CEL   plus      Kidney     751    12
## 119_MoGene.CEL    20 119_MoGene.CEL   plus      Kidney    1164     6
## 12_MoGene.CEL     21  12_MoGene.CEL    lpr    Amygdala     472    24
## 120_MoGene.CEL    22 120_MoGene.CEL   plus      Kidney    1165     6
## 125_MoGene.CEL    23 125_MoGene.CEL    lpr    Amygdala    1118    18
## 129_MoGene.CEL    24 129_MoGene.CEL    lpr    Amygdala    1119    18
## 13_MoGene.CEL     25  13_MoGene.CEL    lpr    Amygdala     473    24
## 132_MoGene.CEL    26 132_MoGene.CEL    lpr    Amygdala     700    12
## 134_MoGene.CEL    27 134_MoGene.CEL    lpr    Amygdala     699    12
## 136_MoGene.CEL    28 136_MoGene.CEL   lpr2    Amygdala     622    24
## 137_MoGene.CEL    29 137_MoGene.CEL   lpr2    Amygdala     796    24
## 14_MoGene.CEL     30  14_MoGene.CEL    lpr    Amygdala     551    24
## 142_MoGene.CEL    31 142_MoGene.CEL   lpr2    Amygdala    1138     6
## 146_MoGene.CEL    32 146_MoGene.CEL   plus    Amygdala     604    18
## 147_MoGene.CEL    33 147_MoGene.CEL   plus    Amygdala     592    18
## 15_MoGene.CEL     34  15_MoGene.CEL    lpr    Amygdala     764    24
## 155_MoGene.CEL    35 155_MoGene.CEL    lpr Hippocampus    1118    18
## 156_MoGene.CEL    36 156_MoGene.CEL    lpr Hippocampus     700    12
## 158_MoGene.CEL    37 158_MoGene.CEL    lpr Hippocampus     701    12
## 159_MoGene.CEL    38 159_MoGene.CEL    lpr Hippocampus     724    12
## 16_MoGene.CEL     39  16_MoGene.CEL    lpr    Amygdala    1024    24
## 162_MoGene.CEL    40 162_MoGene.CEL    lpr Hippocampus     849     6
## 163_MoGene.CEL    41 163_MoGene.CEL    lpr Hippocampus     981     6
## 165_MoGene.CEL    42 165_MoGene.CEL   lpr2 Hippocampus     794    24
## 166_MoGene.CEL    43 166_MoGene.CEL   lpr2 Hippocampus     622    24
## 167_MoGene.CEL    44 167_MoGene.CEL   lpr2 Hippocampus     834    18
## 168_MoGene.CEL    45 168_MoGene.CEL   lpr2 Hippocampus     863    18
## 17_MoGene.CEL     46  17_MoGene.CEL    lpr    Amygdala    1025    24
## 174_MoGene.CEL    47 174_MoGene.CEL    lpr      Kidney     997    18
## 19_MoGene.CEL     48  19_MoGene.CEL    lpr    Amygdala     844    18
## 2_MoGene.CEL      49   2_MoGene.CEL   plus    Amygdala     538    24
## 20_MoGene.CEL     50  20_MoGene.CEL    lpr    Amygdala     997    18
## 21_MoGene.CEL     51  21_MoGene.CEL    lpr    Amygdala     998    18
## 22_MoGene.CEL     52  22_MoGene.CEL    lpr    Amygdala    1110    18
## 23_MoGene.CEL     53  23_MoGene.CEL    lpr    Amygdala    1117    18
## 24_MoGene.CEL     54  24_MoGene.CEL    lpr    Amygdala     693    12
## 25_MoGene.CEL     55  25_MoGene.CEL    lpr    Amygdala     694    12
## 26_MoGene.CEL     56  26_MoGene.CEL    lpr    Amygdala     728    12
## 27_MoGene.CEL     57  27_MoGene.CEL    lpr    Amygdala     738    12
## 29_MoGene.CEL     58  29_MoGene.CEL    lpr    Amygdala    1128    12
## 3_MoGene.CEL      59   3_MoGene.CEL   plus    Amygdala     544    24
## 30_MoGene.CEL     60  30_MoGene.CEL    lpr    Amygdala     849     6
## 31_MoGene.CEL     61  31_MoGene.CEL    lpr    Amygdala     844    18
## 32_MoGene.CEL     62  32_MoGene.CEL    lpr    Amygdala     998    18
## 33_MoGene.CEL     63  33_MoGene.CEL    lpr    Amygdala    1110    18
## 34_MoGene.CEL     64  34_MoGene.CEL    lpr    Amygdala     994     6
## 35_MoGene.CEL     65  35_MoGene.CEL    lpr Hippocampus     809    18
## 36_MoGene.CEL     66  36_MoGene.CEL   lpr2    Amygdala    1137     6
## 38_MoGene.CEL     67  38_MoGene.CEL   lpr2    Amygdala    1083     6
## 45_MoGene.CEL     68  45_MoGene.CEL    lpr Hippocampus    1110    18
## 46_MoGene.CEL     69  46_MoGene.CEL    lpr Hippocampus     600    24
## 47_MoGene.CEL     70  47_MoGene.CEL    lpr Hippocampus     764    24
## 5_MoGene.CEL      71   5_MoGene.CEL   plus    Amygdala     582    18
## 50_MoGene.CEL     72  50_MoGene.CEL    lpr Hippocampus     844    18
## 51_MoGene.CEL     73  51_MoGene.CEL    lpr    Amygdala     738    12
## 53_MoGene.CEL     74  53_MoGene.CEL    lpr Hippocampus    1111    18
## 54_MoGene.CEL     75  54_MoGene.CEL    lpr Hippocampus     694    12
## 57_MoGene.CEL     76  57_MoGene.CEL    lpr Hippocampus    1127    12
## 58_MoGene.CEL     77  58_MoGene.CEL    lpr Hippocampus    1128    12
## 59_MoGene.CEL     78  59_MoGene.CEL    lpr Hippocampus     874     6
## 6_MoGene.CEL      79   6_MoGene.CEL   plus    Amygdala     590    18
## 60_MoGene.CEL     80  60_MoGene.CEL    lpr Hippocampus     992     6
## 61_MoGene.CEL     81  61_MoGene.CEL    lpr Hippocampus     993     6
## 62_MoGene.CEL     82  62_MoGene.CEL    lpr Hippocampus     994     6
## 63_MoGene.CEL     83  63_MoGene.CEL    lpr    Amygdala    1128    12
## 64_MoGene.CEL     84  64_MoGene.CEL   lpr2 Hippocampus    1005    12
## 65_MoGene.CEL     85  65_MoGene.CEL   lpr2 Hippocampus    1186    12
## 66_MoGene.CEL     86  66_MoGene.CEL   lpr2 Hippocampus    1136     6
## 67_MoGene.CEL     87  67_MoGene.CEL    lpr    Amygdala     849     6
## 68_MoGene.CEL     88  68_MoGene.CEL   lpr2 Hippocampus    1181     6
## 7_MoGene.CEL      89   7_MoGene.CEL   plus    Amygdala     752    12
## 8_MoGene.CEL      90   8_MoGene.CEL    lpr    Amygdala     809    18
## 84_MoGene.CEL     91  84_MoGene.CEL    lpr Hippocampus     598    24
## 86_MoGene.CEL     92  86_MoGene.CEL    lpr Hippocampus     998    18
## 9_MoGene.CEL      93   9_MoGene.CEL   plus    Amygdala    1173     6
## 97_MoGene.CEL     94  97_MoGene.CEL    lpr      Kidney    1024    24
## 98_MoGene.CEL     95  98_MoGene.CEL    lpr      Kidney    1117    18
## 99_MoGene.CEL     96  99_MoGene.CEL    lpr      Kidney     843    18LIMMA (Linear Models for Microarray Analysis) is a statistical method for identifying differentially expressed genes in microarray experiments. It uses linear models to analyze the intensity values of the genes measured in the microarray, taking into account factors such as the experimental design and the inherent variability of the data. The result of a LIMMA analysis is a list of genes that are significantly differentially expressed between the experimental conditions being compared.
I’m renaming the columns because, make.contrasts throws an error when using ‘:’ and ‘.’.
Remove the samples that do not have a value. therefore, the design becomes a full rank.
design<-design[,-c(12,20,24,30)]In this code, lmFit is used to fit a linear model to the data in norm.data using the design matrix design. This model is then passed to the eBayes function, which applies empirical Bayes moderation to the standard errors of the model.
fit1 <- lmFit(norm.data, design)
fit1 <- eBayes(fit1, trend=TRUE, robust=TRUE) # the compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a global value. 
results <- decideTests(fit1)
summary(results)##        lprAmygdalaW6 lprHippocampusW6 lprKidneyW6 lpr2AmygdalaW6
## Down               0                0           0              0
## NotSig             0                0           0              0
## Up             35556            35556       35556          35556
##        lpr2HippocampusW6 lpr2KidneyW6 plusAmygdalaW6 plusKidneyW6
## Down                   0            0              0            0
## NotSig                 0            0             11            0
## Up                 35556        35556          35545        35556
##        lprAmygdalaW12 lprHippocampusW12 lprKidneyW12 lpr2HippocampusW12
## Down                0                 0            0                  0
## NotSig              0                 0            0                  1
## Up              35556             35556        35556              35555
##        lpr2KidneyW12 plusAmygdalaW12 plusKidneyW12 lprAmygdalaW18
## Down               0               0             0              0
## NotSig             0              27             1              0
## Up             35556           35529         35555          35556
##        lprHippocampusW18 lprKidneyW18 lpr2HippocampusW18 lpr2KidneyW18
## Down                   0            0                  0             0
## NotSig                 0            0                  0             0
## Up                 35556        35556              35556         35556
##        plusAmygdalaW18 lprAmygdalaW24 lprHippocampusW24 lprKidneyW24
## Down                 0              0                 0            0
## NotSig               0              0                 0           13
## Up               35556          35556             35556        35543
##        lpr2AmygdalaW24 lpr2HippocampusW24 plusAmygdalaW24 plusKidneyW24
## Down                 0                  0               0             0
## NotSig               1                  0               0             6
## Up               35555              35556           35556         35550table(targets$Strain, targets$Tissue, targets$Weeks)## , ,  = 6
## 
##       
##        Amygdala Hippocampus Kidney
##   lpr         3           6      4
##   lpr2        3           2      3
##   plus        1           0      2
## 
## , ,  = 12
## 
##       
##        Amygdala Hippocampus Kidney
##   lpr         9           6      3
##   lpr2        0           2      2
##   plus        1           0      2
## 
## , ,  = 18
## 
##       
##        Amygdala Hippocampus Kidney
##   lpr        11           6      3
##   lpr2        0           2      2
##   plus        4           0      0
## 
## , ,  = 24
## 
##       
##        Amygdala Hippocampus Kidney
##   lpr         7           3      1
##   lpr2        2           2      0
##   plus        3           0      1Based on the table above the possible comparisions are between 1 - Amygdala and Hippocampus for LPR vs LPR 2 throughout the weeks 6, 24 2 - LPR Amygdala at Weeks 6, 12, 18, 24 + Vs. Plus 3 - LPR2 Amygdala at Weeks 6, 24 + Vs. Plus 4 - LPR Hippocampus at Weeks 6, 12, 18, 24 5 - LPR2 Hippocampus at Weeks 6, 12, 18, 24 6 - Change in Plus Amygdala throughout the weeks 7 - Kidney LPR Vs LPR2 at Weeks 6, 12, 18 8 - Kidney LPR for the Week 6, 12, 18, 24 9 - Kidney LPR2 for the Week 6, 12, 18 10 - Kidney Plus for the Week 6, 12, 24
##        lprAmygdalaW6 - lprHippocampusW6 lprAmygdalaW6 - lpr2AmygdalaW6
## Down                               1526                              4
## NotSig                            32856                          35538
## Up                                 1174                             14
##        lprHippocampusW6 - lpr2HippocampusW6 lpr2AmygdalaW6 - lpr2HippocampusW6
## Down                                      5                                869
## NotSig                                35547                              34058
## Up                                        4                                629
##        lprAmygdalaW24 - lprHippocampusW24 lprAmygdalaW24 - lpr2AmygdalaW24
## Down                                 1249                            12933
## NotSig                              33303                            11728
## Up                                   1004                            10895
##        lprHippocampusW24 - lpr2HippocampusW24
## Down                                        1
## NotSig                                  35546
## Up                                          9
##        lpr2AmygdalaW24 - lpr2HippocampusW24
## Down                                   9835
## NotSig                                14397
## Up                                    11324##        lprAmygdalaW6 - lprAmygdalaW12 lprAmygdalaW6 - lprAmygdalaW18
## Down                              123                             10
## NotSig                          35344                          35530
## Up                                 89                             16
##        lprAmygdalaW6 - lprAmygdalaW24 lprAmygdalaW6 - plusAmygdalaW6
## Down                               14                          15133
## NotSig                          35496                           8348
## Up                                 46                          12075
##        lprAmygdalaW12 - lprAmygdalaW18 lprAmygdalaW12 - lprAmygdalaW24
## Down                               951                            1513
## NotSig                           34068                           32733
## Up                                 537                            1310
##        lprAmygdalaW12 - plusAmygdalaW12 lprAmygdalaW18 - lprAmygdalaW24
## Down                               1344                               0
## NotSig                            32936                           35554
## Up                                 1276                               2
##        lprAmygdalaW18 - plusAmygdalaW18 lprAmygdalaW24 - plusAmygdalaW24
## Down                               2645                                4
## NotSig                            30012                            35530
## Up                                 2899                               22##        lpr2AmygdalaW6 - lpr2AmygdalaW24 lpr2AmygdalaW6 - plusAmygdalaW6
## Down                              12310                           14987
## NotSig                            12782                            8293
## Up                                10464                           12276
##        lprAmygdalaW24 - plusAmygdalaW24
## Down                                  4
## NotSig                            35530
## Up                                   22##        lprHippocampusW6 - lprHippocampusW12
## Down                                     83
## NotSig                                35471
## Up                                        2
##        lprHippocampusW6 - lprHippocampusW18
## Down                                     10
## NotSig                                35543
## Up                                        3
##        lprHippocampusW6 - lprHippocampusW24
## Down                                    113
## NotSig                                35422
## Up                                       21
##        lprHippocampusW12 - lprHippocampusW18
## Down                                       0
## NotSig                                 35556
## Up                                         0
##        lprHippocampusW12 - lprHippocampusW24
## Down                                       0
## NotSig                                 35555
## Up                                         1
##        lprHippocampusW18 - lprHippocampusW24
## Down                                       0
## NotSig                                 35556
## Up                                         0##        lpr2HippocampusW6 - lpr2HippocampusW12
## Down                                        0
## NotSig                                  35556
## Up                                          0
##        lpr2HippocampusW6 - lpr2HippocampusW18
## Down                                     6699
## NotSig                                  22200
## Up                                       6657
##        lpr2HippocampusW6 - lpr2HippocampusW24
## Down                                        0
## NotSig                                  35556
## Up                                          0
##        lpr2HippocampusW12 - lpr2HippocampusW18
## Down                                      5454
## NotSig                                   24332
## Up                                        5770
##        lpr2HippocampusW12 - lpr2HippocampusW24
## Down                                         0
## NotSig                                   35556
## Up                                           0
##        lpr2HippocampusW18 - lpr2HippocampusW24
## Down                                      5755
## NotSig                                   24112
## Up                                        5689##        plusAmygdalaW6 - plusAmygdalaW12 plusAmygdalaW6 - plusAmygdalaW18
## Down                              11119                            12043
## NotSig                            11148                             8742
## Up                                13289                            14771
##        plusAmygdalaW6 - plusAmygdalaW24 plusAmygdalaW12 - plusAmygdalaW18
## Down                              12205                                 9
## NotSig                             8139                             35531
## Up                                15212                                16
##        plusAmygdalaW12 - plusAmygdalaW24 plusAmygdalaW18 - plusAmygdalaW24
## Down                                 366                               149
## NotSig                             34713                             35157
## Up                                   477                               250##        lprKidneyW6 - lpr2KidneyW6 lprKidneyW12 - lpr2KidneyW12
## Down                           23                            7
## NotSig                      35517                        35538
## Up                             16                           11
##        lprKidneyW18 - lpr2KidneyW18
## Down                            125
## NotSig                        35194
## Up                              237##        lprKidneyW6 - lprKidneyW12 lprKidneyW6 - lprKidneyW18
## Down                          487                        847
## NotSig                      35009                      34382
## Up                             60                        327
##        lprKidneyW6 - lprKidneyW24 lprKidneyW12 - lprKidneyW18
## Down                          637                          11
## NotSig                      34880                       35536
## Up                             39                           9
##        lprKidneyW12 - lprKidneyW24 lprKidneyW18 - lprKidneyW24
## Down                            32                           5
## NotSig                       35524                       35549
## Up                               0                           2##        lpr2KidneyW6 - lpr2KidneyW12 lpr2KidneyW6 - lpr2KidneyW18
## Down                             59                          751
## NotSig                        35461                        34081
## Up                               36                          724
##        lpr2KidneyW12 - lpr2KidneyW18
## Down                              41
## NotSig                         35495
## Up                                20##        plusKidneyW6 - plusKidneyW12 plusKidneyW6 - plusKidneyW24
## Down                              7                           74
## NotSig                        35543                        35455
## Up                                6                           27
##        plusKidneyW12 - plusKidneyW24
## Down                              14
## NotSig                         35537
## Up                                 5To extract the significantly expressed genes we create a for loop to go through each coefficient (comparison) for each contrast matrix. We then apply the topTable function to extract genes with Log Fold Change above 1.5.
Volcano plots plot the negative logarithm of the p-value on the y-axis and the logarithm of the fold change on the x-axis, with genes that are significantly differentially expressed appearing as points above a threshold line on the plot. This allows for quick visualization of which genes are most significantly differentially expressed, as well as their direction of expression change.
We use the cutoff LFC 1.5 and P-value < 0.1
## ERROR : non-numeric argument to mathematical function## ERROR : non-numeric argument to mathematical function## ERROR : non-numeric argument to mathematical function## ERROR : non-numeric argument to mathematical function## ERROR : non-numeric argument to mathematical function## ERROR : non-numeric argument to mathematical functionHeatmaps are often used to visualize the expression levels of genes across a set of samples. This can be useful for identifying patterns and trends in the data, such as which genes are co-expressed or differentially expressed between groups of samples.
topGO is an R package used to perform Gene Ontology (GO) enrichment analysis. GO enrichment analysis can be used to identify which GO terms are significantly overrepresented in a set of genes of interest. The output from the topGO analysis has been saved in a PDFs and is available in the data folder. Rectangle color represents the relative significant, ranging from dark red (most significant) to bright yellow (least significant).
Reactome is a database of biological pathways, which provides detailed information about the molecular events that take place in cells. We use the ‘ReactomePA’ package for performing enrichment analysis on sets of genes or proteins to identify pathways that are significantly enriched in the data. The output from the Reactome analysis has been saved in a PDFs and is available in the data folder.
The Following code runs through the genes extracted and grabs the NCBI summary/abstract from the NCBI website. However, it is extremely slow taking about 2 seconds per gene and there are close to 80,000 Genes to be referenced. I have left it commented out. You can run it bu removing the #s (ctrl-shift-c).
In the meantime, I’ve explored running the loop in parallel using the NCBI api. It is faster and more efficient. Average run time is 50 seconds per 200 genes using parlapply (parallel version of lapply). You can run the code below by switching which cgene_comp to use.
# # Set the number of cores to use for parallel processing
# # Don't use all your core's your computer WILL crash. I've left 2 for the OS. 
# par_data<- cgene_comp9 #change to comparison of interest
# ncores <- (detectCores()-2)
# 
# # Initialize a parallel backend
# cl <- makeCluster(ncores)
# entrez_ids <- par_data$Entrez_IDs 
# 
# # Use parLapply to run the API calls in parallel
# summary_text <- parLapply(cl, entrez_ids, function(entrez_id) {
#   #pass the required package to the worker nodes
#   library("httr")
#   library("RCurl")
#   library("XML")
#     #api_key generated from NCBI. If you don't have one, you can create an account with NCBI and generate one
#     api_key <- readLines("api.txt")
#     url <- paste0("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=",
#                   entrez_id, "&api_key=", api_key)
#     response <- getURL(url, ssl.verifypeer = FALSE)
#     # Pause for 1 second after each API request limit of 10 requests per second or your API gets banned
#     Sys.sleep(1)
#     # Parse the XML data in the response
#     gene_summary <- xmlParse(response)
#     # Extract the summary information from the XML data
#     # (Note: the exact method for extracting the information will depend on the structure of the XML data)
#     summary_text_list <- xmlToList(gene_summary)
#     summary_text_comp9<- summary_text_list[["DocumentSummarySet"]][["DocumentSummary"]][["Summary"]] #change name to match comparison of interest
# })
# 
# # Stop the parallel backend when done
# stopCluster(cl)The gene ontology annotation takes a long time as well. I have not found an API to make it faster, but a parallel version is a work in progress.
The data extracted from the analysis was exported as excel files for easier visualization outside of R Studio.