#Github integration
This data has already been analyzed in Excel. Now, we are going to recruit the power of R to do a deeper analysis.
The raw data was loaded up into R by splitting the master data by week and loading it using the import dataset function. We will utilize the pipe function when selecting the specific tests.
Let’s first load the data and run some simple summaries.
Calculating this to determine if we need to re-run video in Ethovision and also understand the how the percent error might affect any conclusions we make.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -64.733 -3.789 -1.983 -3.721 -0.875 0.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -50.3556 -2.7083 -1.3444 -2.4108 -0.2667 0.0000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -96.8000 -2.7611 -0.8778 -3.9922 -0.2222 0.0000
Percentage error calculations for Pre-enrichment data samples
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -89.38422 -0.40067 0.02523 -0.57817 0.54389 21.92253
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -21.9918 -0.6498 -0.4106 -1.4753 -0.2553 0.2116
Percentage error calculations for Preference Testing
Clean up the data based on the necessary values. I’m keeping the Weeks separate so we can compare with the 2022 data easily + one r function away from combining all.
For 2022 samples
Split the data by Retention interval.Cleaned data has all variables as Characters except for Object_Zone_1 & 2. I’ll calculate percent time first for each zone and sort based on object side.
For 2022 data we can use the FO test to measure the object side preference so we won’t split the data based on side. Calculating percent time on each side.
##Preference testing 2022
##
## Welch Two Sample t-test
##
## data: NOa and NOb
## t = -0.2715, df = 24.658, p-value = 0.7883
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -40.47468 31.05235
## sample estimates:
## mean of x mean of y
## 46.51645 51.22762
##
## Paired t-test
##
## data: PrefData22$Percent_time_zone1 and PrefData22$Percent_time_zone2
## t = 34.112, df = 26, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 80.59930 90.93581
## sample estimates:
## mean difference
## 85.76755
Lets calculate a quick summary and errors for temp1 and same for temp2
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Test Subject Week Sex
## Length:28 Min. : 1.00 Min. :0 Length:28
## Class :character 1st Qu.:18.75 1st Qu.:0 Class :character
## Mode :character Median :34.50 Median :0 Mode :character
## Mean :31.57 Mean :0
## 3rd Qu.:45.25 3rd Qu.:0
## Max. :57.00 Max. :0
## Novel_object Treatment Percent_time_zone2
## Min. :2.0 Length:28 Min. : 4.397
## 1st Qu.:2.0 Class :character 1st Qu.: 61.863
## Median :2.5 Mode :character Median : 93.197
## Mean :2.5 Mean : 75.405
## 3rd Qu.:3.0 3rd Qu.: 98.988
## Max. :3.0 Max. :100.000
## Test Subject Week Sex
## Length:29 Min. : 2.00 Min. :0 Length:29
## Class :character 1st Qu.:14.00 1st Qu.:0 Class :character
## Mode :character Median :25.00 Median :0 Mode :character
## Mean :27.52 Mean :0
## 3rd Qu.:43.00 3rd Qu.:0
## Max. :58.00 Max. :0
## Novel_object Treatment Percent_time_zone1
## Min. :2.000 Length:29 Min. : 0.00
## 1st Qu.:2.000 Class :character 1st Qu.: 28.36
## Median :3.000 Mode :character Median : 54.29
## Mean :2.724 Mean : 56.33
## 3rd Qu.:3.000 3rd Qu.: 95.66
## Max. :3.000 Max. :100.00
Lets calculate a quick summary and errors for temp1 and same for temp2 for week 1
## Test Subject Week Sex
## Length:28 Min. : 2.00 Min. :1 Length:28
## Class :character 1st Qu.:12.00 1st Qu.:1 Class :character
## Mode :character Median :30.00 Median :1 Mode :character
## Mean :29.68 Mean :1
## 3rd Qu.:48.50 3rd Qu.:1
## Max. :58.00 Max. :1
## Novel_object Treatment Percent_time_zone2
## Length:28 Length:28 Min. : 5.716
## Class :character Class :character 1st Qu.:32.219
## Mode :character Mode :character Median :63.890
## Mean :57.918
## 3rd Qu.:81.999
## Max. :97.814
## Test Subject Week Sex
## Length:29 Min. : 1.00 Min. :1 Length:29
## Class :character 1st Qu.:16.00 1st Qu.:1 Class :character
## Mode :character Median :28.00 Median :1 Mode :character
## Mean :29.31 Mean :1
## 3rd Qu.:43.00 3rd Qu.:1
## Max. :54.00 Max. :1
## Novel_object Treatment Percent_time_zone1
## Length:29 Length:29 Min. : 0.00
## Class :character Class :character 1st Qu.: 30.41
## Mode :character Mode :character Median : 54.88
## Mean : 56.69
## 3rd Qu.: 89.79
## Max. :100.00
Lets calculate a quick summary and errors for temp1 and same for temp2 for week 2
## Test Subject Week Sex
## Length:31 Min. : 2.00 Min. :2 Length:31
## Class :character 1st Qu.:10.50 1st Qu.:2 Class :character
## Mode :character Median :26.00 Median :2 Mode :character
## Mean :25.42 Mean :2
## 3rd Qu.:38.00 3rd Qu.:2
## Max. :50.00 Max. :2
## Novel_object Treatment Percent_time_zone2
## Length:31 Length:31 Min. : 0.00
## Class :character Class :character 1st Qu.: 37.87
## Mode :character Mode :character Median : 81.85
## Mean : 65.49
## 3rd Qu.: 93.19
## Max. :100.00
## Test Subject Week Sex
## Length:27 Min. : 1.00 Min. :2 Length:27
## Class :character 1st Qu.:19.50 1st Qu.:2 Class :character
## Mode :character Median :36.00 Median :2 Mode :character
## Mean :34.19 Mean :2
## 3rd Qu.:51.50 3rd Qu.:2
## Max. :58.00 Max. :2
## Novel_object Treatment Percent_time_zone1
## Length:27 Length:27 Min. : 0.04151
## Class :character Class :character 1st Qu.: 24.67149
## Mode :character Mode :character Median : 54.50628
## Mean : 55.14182
## 3rd Qu.: 79.07906
## Max. :100.00000
grid.arrange(g1, g2, ncol = 2) #WEEK 0
grid.arrange(g1W1, g2W1, ncol = 2) #WEEK 1
grid.arrange(g1W2, g2W2, ncol = 2) #WEEK 2
Lets calculate a quick summary and errors for FO vs NOa and FO vs NOb for pre-enrichment (w0) 2022
## Treatment n mean sd se
## Control :1 Min. :16 Min. :47.82 Min. :34.76 Min. :7.773
## Enriched:1 1st Qu.:17 1st Qu.:48.72 1st Qu.:34.98 1st Qu.:8.056
## Median :18 Median :49.61 Median :35.19 Median :8.339
## Mean :18 Mean :49.61 Mean :35.19 Mean :8.339
## 3rd Qu.:19 3rd Qu.:50.51 3rd Qu.:35.41 3rd Qu.:8.622
## Max. :20 Max. :51.41 Max. :35.62 Max. :8.905
## ic
## Min. :16.27
## 1st Qu.:16.95
## Median :17.63
## Mean :17.63
## 3rd Qu.:18.30
## Max. :18.98
## Treatment n mean sd se
## Control :1 Min. :15.00 Min. :43.98 Min. :31.18 Min. :6.973
## Enriched:1 1st Qu.:16.25 1st Qu.:47.63 1st Qu.:31.46 1st Qu.:7.313
## Median :17.50 Median :51.29 Median :31.73 Median :7.654
## Mean :17.50 Mean :51.29 Mean :31.73 Mean :7.654
## 3rd Qu.:18.75 3rd Qu.:54.95 3rd Qu.:32.00 3rd Qu.:7.994
## Max. :20.00 Max. :58.61 Max. :32.28 Max. :8.334
## ic
## Min. :14.59
## 1st Qu.:15.41
## Median :16.23
## Mean :16.23
## 3rd Qu.:17.05
## Max. :17.87
grid.arrange(g1W022, g2W022, ncol = 2)
## Warning in wilcox.test.default(c(35.2607019679448, 21.3649851632047,
## 32.0754716981132, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(92.9944095038435, 24.8190719455087,
## 94.7269081042822, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(55.6015586945933, 99.7462465637556,
## 54.8819044076623, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(30.1065043742868, 9.77777777777778,
## 99.8336106489185, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(70.8781869688385, 10.4862119013062,
## 99.1807963421604, : cannot compute exact p-value with ties
# Pre enrichment: preference for side grouped by sex
W0_NO22 %>% filter(Novel_Object_Side == 'Right') %>%
ggplot( aes(x= Sex, y= Percent_time_zone1, fill = Treatment))+
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_signif(comparisons = list(c("F", "M")),
map_signif_level=TRUE)+
geom_jitter(color="black", size=0.4, alpha=0.9) +
labs(title = "Plot of time spent with Familiar Object at zone 1 pre-enrichment")
## Warning in wilcox.test.default(c(54.5454297915598, 0, 25.1336675623235, :
## cannot compute exact p-value with ties
W0_NO22 %>% filter(Novel_Object_Side == 'Left')%>%
ggplot( aes(x= Sex, y= Percent_time_zone2, fill = Treatment))+
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_signif(comparisons = list(c("F", "M")),
map_signif_level=TRUE)+
geom_jitter(color="black", size=0.4, alpha=0.9) +
labs(title = "Plot of time spent with Familiar Object at zone 2 pre-enrichment")
## Warning in wilcox.test.default(c(100, 100, 86.8995687475095, 43.3333083443285,
## : cannot compute exact p-value with ties
# pre-enrichment
W0_22_combined %>%
ggplot( aes(x= Sex, y= Percent_time_obj, fill = Treatment))+
geom_violin(width=1) +
geom_boxplot(width=1, color="black", alpha=0.2) +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
labs(title = "Plot of time spent with FO Vs. NOa pre-enrichment")+
geom_jitter(color="black", size=0.4, alpha=0.9)
W0_22_combined %>%
ggplot( aes(x= Test, y= Percent_time_obj, fill = Treatment))+
geom_violin(width=1) +
geom_boxplot(width=1, color="black", alpha=0.2) +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
labs(title = "Plot of time spent with Familiar Objects pre-enrichment")+
geom_jitter(color="black", size=0.4, alpha=0.9)
##
## Paired t-test
##
## data: W0_NO$Percent_time_zone1 and W0_NO$Percent_time_zone2
## t = -1.9057, df = 56, p-value = 0.06183
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -37.9842017 0.9481825
## sample estimates:
## mean difference
## -18.51801
##
## Paired t-test
##
## data: W1_NO$Percent_time_zone1 and W1_NO$Percent_time_zone2
## t = -0.11271, df = 56, p-value = 0.9107
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -18.25094 16.30653
## sample estimates:
## mean difference
## -0.9722035
##
## Paired t-test
##
## data: W2_NO$Percent_time_zone1 and W2_NO$Percent_time_zone2
## t = -1.273, df = 57, p-value = 0.2082
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -30.295217 6.746863
## sample estimates:
## mean difference
## -11.77418
pre enrichment
##
## Paired t-test
##
## data: W0_NO22$Percent_time_zone1 and W0_NO22$Percent_time_zone2
## t = -0.14096, df = 70, p-value = 0.8883
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -14.77579 12.82505
## sample estimates:
## mean difference
## -0.975369
##
## Welch Two Sample t-test
##
## data: NOa and NOb
## t = -0.8952, df = 42.695, p-value = 0.3757
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -27.67475 10.66111
## sample estimates:
## mean of x mean of y
## 60.47659 68.98341
##
## Welch Two Sample t-test
##
## data: NOa and NOb
## t = -2.1384, df = 55, p-value = 0.03695
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -33.73552 -1.09384
## sample estimates:
## mean of x mean of y
## 48.43292 65.84760
##
## Welch Two Sample t-test
##
## data: NOa and NOb
## t = -0.64883, df = 54.501, p-value = 0.5192
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -23.94469 12.23391
## sample estimates:
## mean of x mean of y
## 57.64565 63.50105
Pre enrichment Control vs Enriched RI 1
E22<-W0_22_combined %>% filter(Treatment == 'Enriched') %>% select(Percent_time_obj)
C22<-W0_22_combined%>% filter(Treatment == 'Control') %>% select(Percent_time_obj)
t.test(E22,C22)
##
## Welch Two Sample t-test
##
## data: E22 and C22
## t = -0.66817, df = 63.571, p-value = 0.5064
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.33468 10.64128
## sample estimates:
## mean of x mean of y
## 47.69234 53.03904
## Top-Zone Middle-Zone Bottom-Zone
## Week-0 0.07425683 0.39094419 0.17451771
## Week-1 0.23510452 0.88198472 0.57619875
## Week-2 0.08668103 0.08771739 0.01791661
## [1] "factor"
Pre-enrichment RI6
Paired charts
## Warning in geom_boxplot(aes(fill = Novel_object), alpha = 0.2, ylim = c(0, :
## Ignoring unknown parameters: `ylim`
Preenrichment Paired analysis
The primary study design we attempted to replicate in this experiment is from a paper by May et. al. tittled “Object recognition memory in zebrafish”. Link: https://pubmed.ncbi.nlm.nih.gov/26376244/
The discrimination indices were statistically compared to a theoretical mean of 0 for D1 and D2 and a mean of 0.5 for D3. A significant difference from this theoretical mean represents a preference for the familiar or novel object in T2, with a positive value indicating a preference for the familiar object and a negative index indicating a preference for the novel object (May et al., 2016).
##
## Paired t-test
##
## data: W0_Lego$D1 and W0_Lego$E1
## t = -4.3153, df = 56, p-value = 6.548e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -76.86978 -28.12812
## sample estimates:
## mean difference
## -52.49895
##
## Pearson's product-moment correlation
##
## data: W0_Lego$D1 and W0_Lego$E1
## t = 3.6755, df = 55, p-value = 0.0005398
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2075037 0.6315533
## sample estimates:
## cor
## 0.4440615
## `geom_smooth()` using formula = 'y ~ x'
##
## Paired t-test
##
## data: W1_Lego$D1 and W1_Lego$E1
## t = -5.7058, df = 56, p-value = 4.552e-07
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -115.81231 -55.62348
## sample estimates:
## mean difference
## -85.71789
##
## Pearson's product-moment correlation
##
## data: W1_Lego$D1 and W1_Lego$E1
## t = -0.31819, df = 55, p-value = 0.7515
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3000808 0.2201619
## sample estimates:
## cor
## -0.04286475
## `geom_smooth()` using formula = 'y ~ x'
#Other analysis: ` ## Principle component analysis
W0_numeric <- W0_Lego %>% select(E1,E2,D1,D2,D3)
W0_pca<- prcomp(W0_numeric, center = TRUE,scale. = TRUE)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 1.7272 1.2062 0.59649 0.4539 4.47e-16
## Proportion of Variance 0.5967 0.2910 0.07116 0.0412 0.00e+00
## Cumulative Proportion 0.5967 0.8876 0.95880 1.0000 1.00e+00
## PC1 PC2 PC3 PC4 PC5
## E1 -0.3345125 -0.566405650 -0.74232997 -0.1274058 0.000000e+00
## E2 -0.2820646 -0.650343647 0.54687254 0.4454471 -2.264968e-16
## D1 -0.5279654 0.008550449 0.36314403 -0.7676626 4.264310e-16
## D2 -0.5146812 0.357885925 -0.09487639 0.3130805 7.071068e-01
## D3 -0.5146812 0.357885925 -0.09487639 0.3130805 -7.071068e-01
## Warning in FUN(newX[, i], ...): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): NAs introduced by coercion
#calculate total variance explained by each principal component
var_explained <- W0_pca$sdev^2 / sum(W0_pca$sdev^2)
?ggplot
#create scree plot
qplot(c(1:length(var_explained)), var_explained) +
geom_line() +
xlab("Principal Component (Dimension)") +
ylab("Variance Explained") +
ggtitle("Scree Plot") +
ylim(0, 1)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#Moving Forward:
This document (R-Markdown file) provides all the necessary code to replicate this analysis on a different data set. However, the error calculation at the top should ALWAYS be performed first in order to avoid analyzing biased data. In addition, this analysis ios far from over. There should be a repeated trial with a larger sample to confirm findings.
After discussing how much error is acceptable for your question, you can go ahead with the analysis. Load up the data using readxl or readcsv depending on the file type or read URL if online. I would recommend copying the code I used before modifying it and then replacing the names with your data. You should mostly just be replacing the names. The formula should remain the same for the most part.
If you want to modify something you can check what it does first by typing: ?TERM, i.e, ?ggplot would bring the r documentation for ggplot.
Helpful links: - info for graphs: https://r-graph-gallery.com/ggplot2-package.html - How to pretty much anything in R: https://www.statology.org/r-guides/ - Questions: https://www.geeksforgeeks.org/r-tutorial/ or stack overflow - tip while Googling add ‘how to … in r’ so it is r specific and doesn’t return python/pandas