#Github integration

Enrichment Data Analysis in R

This data has already been analyzed in Excel. Now, we are going to recruit the power of R to do a deeper analysis.

The raw data was loaded up into R by splitting the master data by week and loading it using the import dataset function. We will utilize the pipe function when selecting the specific tests.

Let’s first load the data and run some simple summaries.

Load Data

Calculate the percent error for time

Calculating this to determine if we need to re-run video in Ethovision and also understand the how the percent error might affect any conclusions we make.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -64.733  -3.789  -1.983  -3.721  -0.875   0.000
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -50.3556  -2.7083  -1.3444  -2.4108  -0.2667   0.0000
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -96.8000  -2.7611  -0.8778  -3.9922  -0.2222   0.0000

Calculate the percent error for time 2022 sample

Percentage error calculations for Pre-enrichment data samples

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -89.38422  -0.40067   0.02523  -0.57817   0.54389  21.92253
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -21.9918  -0.6498  -0.4106  -1.4753  -0.2553   0.2116

Percentage error calculations for Preference Testing

Novel Tank test

Calculate time spent at Novel Tank Zone

Novel Object test

Data wrangling and sorting

Clean up the data based on the necessary values. I’m keeping the Weeks separate so we can compare with the 2022 data easily + one r function away from combining all.

For 2022 samples

Split the data by Retention interval.Cleaned data has all variables as Characters except for Object_Zone_1 & 2. I’ll calculate percent time first for each zone and sort based on object side.

For 2022 data we can use the FO test to measure the object side preference so we won’t split the data based on side. Calculating percent time on each side.

##Preference testing 2022

## 
##  Welch Two Sample t-test
## 
## data:  NOa and NOb
## t = -0.2715, df = 24.658, p-value = 0.7883
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -40.47468  31.05235
## sample estimates:
## mean of x mean of y 
##  46.51645  51.22762
## 
##  Paired t-test
## 
## data:  PrefData22$Percent_time_zone1 and PrefData22$Percent_time_zone2
## t = 34.112, df = 26, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  80.59930 90.93581
## sample estimates:
## mean difference 
##        85.76755

Standard error for time spent on each side

Lets calculate a quick summary and errors for temp1 and same for temp2

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
##      Test              Subject           Week       Sex           
##  Length:28          Min.   : 1.00   Min.   :0   Length:28         
##  Class :character   1st Qu.:18.75   1st Qu.:0   Class :character  
##  Mode  :character   Median :34.50   Median :0   Mode  :character  
##                     Mean   :31.57   Mean   :0                     
##                     3rd Qu.:45.25   3rd Qu.:0                     
##                     Max.   :57.00   Max.   :0                     
##   Novel_object  Treatment         Percent_time_zone2
##  Min.   :2.0   Length:28          Min.   :  4.397   
##  1st Qu.:2.0   Class :character   1st Qu.: 61.863   
##  Median :2.5   Mode  :character   Median : 93.197   
##  Mean   :2.5                      Mean   : 75.405   
##  3rd Qu.:3.0                      3rd Qu.: 98.988   
##  Max.   :3.0                      Max.   :100.000
##      Test              Subject           Week       Sex           
##  Length:29          Min.   : 2.00   Min.   :0   Length:29         
##  Class :character   1st Qu.:14.00   1st Qu.:0   Class :character  
##  Mode  :character   Median :25.00   Median :0   Mode  :character  
##                     Mean   :27.52   Mean   :0                     
##                     3rd Qu.:43.00   3rd Qu.:0                     
##                     Max.   :58.00   Max.   :0                     
##   Novel_object    Treatment         Percent_time_zone1
##  Min.   :2.000   Length:29          Min.   :  0.00    
##  1st Qu.:2.000   Class :character   1st Qu.: 28.36    
##  Median :3.000   Mode  :character   Median : 54.29    
##  Mean   :2.724                      Mean   : 56.33    
##  3rd Qu.:3.000                      3rd Qu.: 95.66    
##  Max.   :3.000                      Max.   :100.00

Lets calculate a quick summary and errors for temp1 and same for temp2 for week 1

##      Test              Subject           Week       Sex           
##  Length:28          Min.   : 2.00   Min.   :1   Length:28         
##  Class :character   1st Qu.:12.00   1st Qu.:1   Class :character  
##  Mode  :character   Median :30.00   Median :1   Mode  :character  
##                     Mean   :29.68   Mean   :1                     
##                     3rd Qu.:48.50   3rd Qu.:1                     
##                     Max.   :58.00   Max.   :1                     
##  Novel_object        Treatment         Percent_time_zone2
##  Length:28          Length:28          Min.   : 5.716    
##  Class :character   Class :character   1st Qu.:32.219    
##  Mode  :character   Mode  :character   Median :63.890    
##                                        Mean   :57.918    
##                                        3rd Qu.:81.999    
##                                        Max.   :97.814
##      Test              Subject           Week       Sex           
##  Length:29          Min.   : 1.00   Min.   :1   Length:29         
##  Class :character   1st Qu.:16.00   1st Qu.:1   Class :character  
##  Mode  :character   Median :28.00   Median :1   Mode  :character  
##                     Mean   :29.31   Mean   :1                     
##                     3rd Qu.:43.00   3rd Qu.:1                     
##                     Max.   :54.00   Max.   :1                     
##  Novel_object        Treatment         Percent_time_zone1
##  Length:29          Length:29          Min.   :  0.00    
##  Class :character   Class :character   1st Qu.: 30.41    
##  Mode  :character   Mode  :character   Median : 54.88    
##                                        Mean   : 56.69    
##                                        3rd Qu.: 89.79    
##                                        Max.   :100.00

Lets calculate a quick summary and errors for temp1 and same for temp2 for week 2

##      Test              Subject           Week       Sex           
##  Length:31          Min.   : 2.00   Min.   :2   Length:31         
##  Class :character   1st Qu.:10.50   1st Qu.:2   Class :character  
##  Mode  :character   Median :26.00   Median :2   Mode  :character  
##                     Mean   :25.42   Mean   :2                     
##                     3rd Qu.:38.00   3rd Qu.:2                     
##                     Max.   :50.00   Max.   :2                     
##  Novel_object        Treatment         Percent_time_zone2
##  Length:31          Length:31          Min.   :  0.00    
##  Class :character   Class :character   1st Qu.: 37.87    
##  Mode  :character   Mode  :character   Median : 81.85    
##                                        Mean   : 65.49    
##                                        3rd Qu.: 93.19    
##                                        Max.   :100.00
##      Test              Subject           Week       Sex           
##  Length:27          Min.   : 1.00   Min.   :2   Length:27         
##  Class :character   1st Qu.:19.50   1st Qu.:2   Class :character  
##  Mode  :character   Median :36.00   Median :2   Mode  :character  
##                     Mean   :34.19   Mean   :2                     
##                     3rd Qu.:51.50   3rd Qu.:2                     
##                     Max.   :58.00   Max.   :2                     
##  Novel_object        Treatment         Percent_time_zone1 
##  Length:27          Length:27          Min.   :  0.04151  
##  Class :character   Class :character   1st Qu.: 24.67149  
##  Mode  :character   Mode  :character   Median : 54.50628  
##                                        Mean   : 55.14182  
##                                        3rd Qu.: 79.07906  
##                                        Max.   :100.00000
grid.arrange(g1, g2, ncol = 2)   #WEEK 0

grid.arrange(g1W1, g2W1, ncol = 2)   #WEEK 1

grid.arrange(g1W2, g2W2, ncol = 2)   #WEEK 2

Lets calculate a quick summary and errors for FO vs NOa and FO vs NOb for pre-enrichment (w0) 2022

##     Treatment       n           mean             sd              se       
##  Control :1   Min.   :16   Min.   :47.82   Min.   :34.76   Min.   :7.773  
##  Enriched:1   1st Qu.:17   1st Qu.:48.72   1st Qu.:34.98   1st Qu.:8.056  
##               Median :18   Median :49.61   Median :35.19   Median :8.339  
##               Mean   :18   Mean   :49.61   Mean   :35.19   Mean   :8.339  
##               3rd Qu.:19   3rd Qu.:50.51   3rd Qu.:35.41   3rd Qu.:8.622  
##               Max.   :20   Max.   :51.41   Max.   :35.62   Max.   :8.905  
##        ic       
##  Min.   :16.27  
##  1st Qu.:16.95  
##  Median :17.63  
##  Mean   :17.63  
##  3rd Qu.:18.30  
##  Max.   :18.98
##     Treatment       n              mean             sd              se       
##  Control :1   Min.   :15.00   Min.   :43.98   Min.   :31.18   Min.   :6.973  
##  Enriched:1   1st Qu.:16.25   1st Qu.:47.63   1st Qu.:31.46   1st Qu.:7.313  
##               Median :17.50   Median :51.29   Median :31.73   Median :7.654  
##               Mean   :17.50   Mean   :51.29   Mean   :31.73   Mean   :7.654  
##               3rd Qu.:18.75   3rd Qu.:54.95   3rd Qu.:32.00   3rd Qu.:7.994  
##               Max.   :20.00   Max.   :58.61   Max.   :32.28   Max.   :8.334  
##        ic       
##  Min.   :14.59  
##  1st Qu.:15.41  
##  Median :16.23  
##  Mean   :16.23  
##  3rd Qu.:17.05  
##  Max.   :17.87
grid.arrange(g1W022, g2W022, ncol = 2)

Side preference grouped by sex

Week 0

## Warning in wilcox.test.default(c(35.2607019679448, 21.3649851632047,
## 32.0754716981132, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(92.9944095038435, 24.8190719455087,
## 94.7269081042822, : cannot compute exact p-value with ties

Week 1

## Warning in wilcox.test.default(c(55.6015586945933, 99.7462465637556,
## 54.8819044076623, : cannot compute exact p-value with ties

Week 2

## Warning in wilcox.test.default(c(30.1065043742868, 9.77777777777778,
## 99.8336106489185, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(70.8781869688385, 10.4862119013062,
## 99.1807963421604, : cannot compute exact p-value with ties

preenrichment 22’

# Pre enrichment:  preference for side grouped by sex
W0_NO22 %>% filter(Novel_Object_Side == 'Right') %>%
  ggplot( aes(x= Sex, y= Percent_time_zone1, fill = Treatment))+
  geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6)  +
  geom_signif(comparisons = list(c("F", "M")), 
              map_signif_level=TRUE)+
    geom_jitter(color="black", size=0.4, alpha=0.9)    +
  labs(title = "Plot of time spent with Familiar Object at zone 1 pre-enrichment")
## Warning in wilcox.test.default(c(54.5454297915598, 0, 25.1336675623235, :
## cannot compute exact p-value with ties

W0_NO22 %>% filter(Novel_Object_Side == 'Left')%>%
  ggplot( aes(x= Sex, y= Percent_time_zone2, fill = Treatment))+
  geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6)  +
  geom_signif(comparisons = list(c("F", "M")), 
              map_signif_level=TRUE)+
    geom_jitter(color="black", size=0.4, alpha=0.9) +
  labs(title = "Plot of time spent with Familiar Object at zone 2 pre-enrichment")
## Warning in wilcox.test.default(c(100, 100, 86.8995687475095, 43.3333083443285,
## : cannot compute exact p-value with ties

Novel Object prefrence comparision

Grouped by Sex

# pre-enrichment 

W0_22_combined %>%
  ggplot( aes(x= Sex, y= Percent_time_obj, fill = Treatment))+
  geom_violin(width=1) + 
  geom_boxplot(width=1, color="black", alpha=0.2) +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
  labs(title = "Plot of time spent with FO Vs. NOa pre-enrichment")+
    geom_jitter(color="black", size=0.4, alpha=0.9) 

Grouped by Novel Object

W0_22_combined %>%
  ggplot( aes(x= Test, y= Percent_time_obj, fill = Treatment))+
  geom_violin(width=1) + 
  geom_boxplot(width=1, color="black", alpha=0.2) +
  
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
  labs(title = "Plot of time spent with Familiar Objects pre-enrichment")+
    geom_jitter(color="black", size=0.4, alpha=0.9) 

T-TEST

T-TEST side preference

## 
##  Paired t-test
## 
## data:  W0_NO$Percent_time_zone1 and W0_NO$Percent_time_zone2
## t = -1.9057, df = 56, p-value = 0.06183
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -37.9842017   0.9481825
## sample estimates:
## mean difference 
##       -18.51801
## 
##  Paired t-test
## 
## data:  W1_NO$Percent_time_zone1 and W1_NO$Percent_time_zone2
## t = -0.11271, df = 56, p-value = 0.9107
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -18.25094  16.30653
## sample estimates:
## mean difference 
##      -0.9722035
## 
##  Paired t-test
## 
## data:  W2_NO$Percent_time_zone1 and W2_NO$Percent_time_zone2
## t = -1.273, df = 57, p-value = 0.2082
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -30.295217   6.746863
## sample estimates:
## mean difference 
##       -11.77418

pre enrichment

## 
##  Paired t-test
## 
## data:  W0_NO22$Percent_time_zone1 and W0_NO22$Percent_time_zone2
## t = -0.14096, df = 70, p-value = 0.8883
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -14.77579  12.82505
## sample estimates:
## mean difference 
##       -0.975369

T-TEST object preference

## 
##  Welch Two Sample t-test
## 
## data:  NOa and NOb
## t = -0.8952, df = 42.695, p-value = 0.3757
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -27.67475  10.66111
## sample estimates:
## mean of x mean of y 
##  60.47659  68.98341
## 
##  Welch Two Sample t-test
## 
## data:  NOa and NOb
## t = -2.1384, df = 55, p-value = 0.03695
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -33.73552  -1.09384
## sample estimates:
## mean of x mean of y 
##  48.43292  65.84760
## 
##  Welch Two Sample t-test
## 
## data:  NOa and NOb
## t = -0.64883, df = 54.501, p-value = 0.5192
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -23.94469  12.23391
## sample estimates:
## mean of x mean of y 
##  57.64565  63.50105

Pre enrichment Control vs Enriched RI 1

E22<-W0_22_combined %>% filter(Treatment == 'Enriched') %>% select(Percent_time_obj)
C22<-W0_22_combined%>% filter(Treatment == 'Control')  %>% select(Percent_time_obj)
t.test(E22,C22)
## 
##  Welch Two Sample t-test
## 
## data:  E22 and C22
## t = -0.66817, df = 63.571, p-value = 0.5064
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -21.33468  10.64128
## sample estimates:
## mean of x mean of y 
##  47.69234  53.03904

T-TEST Novel Tank Test

##          Top-Zone Middle-Zone Bottom-Zone
## Week-0 0.07425683  0.39094419  0.17451771
## Week-1 0.23510452  0.88198472  0.57619875
## Week-2 0.08668103  0.08771739  0.01791661

RI1 Vs. RI6

## [1] "factor"

Pre-enrichment RI6

Paired charts

## Warning in geom_boxplot(aes(fill = Novel_object), alpha = 0.2, ylim = c(0, :
## Ignoring unknown parameters: `ylim`

Preenrichment Paired analysis

Applying Lego Paper

The primary study design we attempted to replicate in this experiment is from a paper by May et. al. tittled “Object recognition memory in zebrafish”. Link: https://pubmed.ncbi.nlm.nih.gov/26376244/

The discrimination indices were statistically compared to a theoretical mean of 0 for D1 and D2 and a mean of 0.5 for D3. A significant difference from this theoretical mean represents a preference for the familiar or novel object in T2, with a positive value indicating a preference for the familiar object and a negative index indicating a preference for the novel object (May et al., 2016).

## 
##  Paired t-test
## 
## data:  W0_Lego$D1 and W0_Lego$E1
## t = -4.3153, df = 56, p-value = 6.548e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -76.86978 -28.12812
## sample estimates:
## mean difference 
##       -52.49895
## 
##  Pearson's product-moment correlation
## 
## data:  W0_Lego$D1 and W0_Lego$E1
## t = 3.6755, df = 55, p-value = 0.0005398
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2075037 0.6315533
## sample estimates:
##       cor 
## 0.4440615
## `geom_smooth()` using formula = 'y ~ x'

## 
##  Paired t-test
## 
## data:  W1_Lego$D1 and W1_Lego$E1
## t = -5.7058, df = 56, p-value = 4.552e-07
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -115.81231  -55.62348
## sample estimates:
## mean difference 
##       -85.71789
## 
##  Pearson's product-moment correlation
## 
## data:  W1_Lego$D1 and W1_Lego$E1
## t = -0.31819, df = 55, p-value = 0.7515
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3000808  0.2201619
## sample estimates:
##         cor 
## -0.04286475
## `geom_smooth()` using formula = 'y ~ x'

#Other analysis: ` ## Principle component analysis

W0_numeric <- W0_Lego %>% select(E1,E2,D1,D2,D3) 
W0_pca<- prcomp(W0_numeric, center = TRUE,scale. = TRUE)
## Importance of components:
##                           PC1    PC2     PC3    PC4      PC5
## Standard deviation     1.7272 1.2062 0.59649 0.4539 4.47e-16
## Proportion of Variance 0.5967 0.2910 0.07116 0.0412 0.00e+00
## Cumulative Proportion  0.5967 0.8876 0.95880 1.0000 1.00e+00
##           PC1          PC2         PC3        PC4           PC5
## E1 -0.3345125 -0.566405650 -0.74232997 -0.1274058  0.000000e+00
## E2 -0.2820646 -0.650343647  0.54687254  0.4454471 -2.264968e-16
## D1 -0.5279654  0.008550449  0.36314403 -0.7676626  4.264310e-16
## D2 -0.5146812  0.357885925 -0.09487639  0.3130805  7.071068e-01
## D3 -0.5146812  0.357885925 -0.09487639  0.3130805 -7.071068e-01

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

Scree Plots:

#calculate total variance explained by each principal component
var_explained <- W0_pca$sdev^2 / sum(W0_pca$sdev^2)
?ggplot
#create scree plot
qplot(c(1:length(var_explained)), var_explained) + 
  geom_line() + 
  xlab("Principal Component (Dimension)") + 
  ylab("Variance Explained") +
  ggtitle("Scree Plot") +
  ylim(0, 1)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#Moving Forward:

This document (R-Markdown file) provides all the necessary code to replicate this analysis on a different data set. However, the error calculation at the top should ALWAYS be performed first in order to avoid analyzing biased data. In addition, this analysis ios far from over. There should be a repeated trial with a larger sample to confirm findings.

After discussing how much error is acceptable for your question, you can go ahead with the analysis. Load up the data using readxl or readcsv depending on the file type or read URL if online. I would recommend copying the code I used before modifying it and then replacing the names with your data. You should mostly just be replacing the names. The formula should remain the same for the most part.

If you want to modify something you can check what it does first by typing: ?TERM, i.e, ?ggplot would bring the r documentation for ggplot.

Helpful links: - info for graphs: https://r-graph-gallery.com/ggplot2-package.html - How to pretty much anything in R: https://www.statology.org/r-guides/ - Questions: https://www.geeksforgeeks.org/r-tutorial/ or stack overflow - tip while Googling add ‘how to … in r’ so it is r specific and doesn’t return python/pandas