1.2 - Tutorial material and case study
Within this tutorial, we use data from a case study of comparative transcriptomics based on the species Sparus aurata that was previously published in Pérez-Sánchez et al. (2019). The tutorial material consists of nine RNAseq samples from spleen biopsies from specimens of S. aurata. Specimens were separated into two groups: control (BC) (n = 4) and parasite-infected fishes (BI) (n = 5). In Table 1, we provide the nine fastq files with the following SRA Accessions, a summarization of each group and the assignation of samples per group.
Table 1: Samples and case study groups
SRA accession | Library Names | Tags |
---|---|---|
SRR8255970 | ZFG-17-12_03_26333_S7_R1_001.fastq | BC1 |
SRR8255963 | ZFG-17-12_06_26336_S10_R1_001.fastq | BC2 |
SRR8255962 | ZFG-17-12_09_26339_S13_R1_001.fastq | BC3 |
SRR8255949 | ZFG-17-12_12_26342_S16_R1_001.fastq | BC4 |
SRR8255945 | ZFG-17-12_16_26346_S2_R1_001.fastq | BI1 |
SRR8255941 | ZFG-17-12_20_26350_S6_R1_001.fastq | BI2 |
SRR8255956 | ZFG-17-12_24_26354_S10_R1_001.fastq | BI3 |
SRR8255952 | ZFG-17-12_28_26358_S14_R1_001.fastq | BI4 |
SRR8255939 | ZFG-17-12_32_26362_S18_R1_001.fastq | BI5 |
*BC = control; BI = Infected fish.
The 9 fastq files can be downloaded from NCBI at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA507368. If you need to help downloading this material from NCBI, contact us for support at https://forum.biotechvana.com.
RefSeq material: To complete the tutorial you will need the following reference sequences:
-
The genome assembly draft of S. aurata (fSpaAur1.1 Torre de la Sal release) will be used as a reference genome sequence in the Tophat/Hisat2 & Cufflinks protocol.
-
The GTF file associated with the coding genes of the fSpaAur1.1 release will be used as a reference genome sequence in the Tophat/Hisat2 & Cufflinks protocol.
-
The RefSeq file of transcripts of S. aurata (fSpaAur1.1 release) will be used as a transcriptome reference sequence in the Mapping & counting protocol.
-
A csv file with the functional descriptions and annotations for all gene features of S. aurata (fSpaAur1.1 release) that will be used to integrate functional information such as gene ontology (GO categories) descriptions or formal annotations to the results of differential expression.
You can download the RefSeq material from TorreLaSal CSIC Nutrigroup at https://nutrigroup-iats.org/welcome/request_file. For more details contact Professor Jaume Perez-Sanchez (jaime.perez.sanchez@csic.es).
Alternatively,
you can also use the Refseq release provided by NBCI at https://www.ncbi.nlm.nih.gov/assembly/GCF_900880675.1. However, please note that NCBI release for S.aurata
differ in size and annotations to the TorreLaSal release and so differential
expression results would likely vary from the results presented in this
tutorial (which is based on the TorreLaSal release.
GOSeq input material: The tutorial demonstrates how to execute GOseq analyses for DE using either: Tophat/Hisat2 & Cufflinks” or “Mapping & Counting”. Enrichment analysis are performed with the software GOseq ( Young et al., 2010). As S. aurata is a customized species for this software, you need 4 input files for the analysis; 1) assayed genes; 2) differential expressed genes; 3) gene sizes, and 4) GO terms per gene.
To facilitate this tutorial, we provide you with the following material:
- “Assayed genes” → assayed_genes.txt
- “Differentially expressed Genes” → diff_genes.csv
- “Gene size” → length_genes.txt
- “Go terms” → Go_final_saurata.txt
The contents of these four files will differ if the analysis is performed via the “Tophat/Hisat2 & Cufflinks” or “Mapping & Counting” protocols. Nevertheless, the procedure is identical in both cases. For this reason, we provide you these four files pre-created for the “Tophat/Hisat2 & Cufflinks” simply to show you the format of each input file.
Remember that these files are only valid for GOseq analyses performed in “Tophat/Hisat2 & Cufflinks”. If you want to complete a GOseq analysis under the “Mapping & Counting” protocol, you need to prepare these four files yourself. Similarly, the pre-prepared files for GOSeq analyses will not be valid if you use the NCBI release and so they must be prepared seperately.