1.2 - Tutorial material and case study
Within this tutorial, we use data from a group case study of comparative transcriptomics based on 15 oral biopsies from 10 human patients with Proliferative Verrucous Leukoplakia (PLV) lesions and from the mucosa of 5 healthy individuals. The study was
previously published in (Llorens, et al., 2021) based on the research Professor Jose Vte Bagan from the University of Valencia.
In Table 1, we provide the 15 fastq files with the following SRA Accessions, a summarization of each group and the assignation of samples per group.
Table 1: Samples and case study groups
SRA accession | Library Names | Groups |
---|---|---|
SAMN13426702 | JVB-R1_S1_R1_001.fastq | PVL |
SAMN13426703 | JVB-R2_S2_R1_001.fastq | PVL |
SAMN13426704 | JVB-R3_S8_R1_001.fastq | PVL |
SAMN13426705 | JVB-R4_S9_R1_001.fastq | PVL |
SAMN13426706 | JVB-R5_S3_R1_001.fastq | PVL |
SAMN13426707 | JVB-R6_S4_R1_001.fastq | PVL |
SAMN13426708 | JVB-R7_S10_R1_001.fastq | PVL |
SAMN13426709 | JVB-R8_S11_R1_001.fastq | PVL |
SAMN13426710 | JVB-R9_S5_R1_001.fastq | PVL |
SAMN13426711 | JVB-R10_S12_R1_001.fastq | PVL |
SAMN13426712 | JVB-R11_S6_R1_001.fastq | control |
SAMN13426713 | JVB-R12_S7_R1_001.fastq | control |
SAMN13426714 | JVB-R13_S13_R1_001.fastq | control |
SAMN13426715 | JVB-R14_S14_R1_001.fastq | control |
SAMN13426716 | JVB-R15_S15_R1_001.fastq | control |
The 15 fastq files can be downloaded from NCBI at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA507368. If you need to help downloading this material from NCBI, contact us for support at https://forum.biotechvana.com.
RefSeq material: To complete the tutorial you will need the following reference sequences:
-
The human genome assembly (GRCh38.95 version in the Ensembl release 95) that will be used as a reference genome sequence in the Tophat/Hisat2 & Cufflinks protocol. The genome sequence is available in fasta format and as an index for mapping with Tophat or Bowtie.
-
The GTF file of the GRCh38.95 release that will be used as a reference genome sequence in the Tophat/Hisat2 & Cufflinks protocol.
-
The human RefSeq transcriptome of the GRCh38.95 genome version in the Ensembl release 95) that will be used as a reference sequence in the Mapping and Count protocol. The refseq transcriptome is available in fasta format and as an index for mapping with Tophat or Bowtie.
-
A .csv file called gene_annotations_95.csv with the functional descriptions and gene ontology (GO)_annotations (Huntley, et al., 2015) for all gene features of the human genome GRCh38.95 version. This will be used to integrate functional information such as GO descriptions and enzyme commission number (EC) annotations to the results of differential expression.
-
A .csv file called pathways_annotations.csv with the annotations and descriptions of metabolic pathways and their KEEG maps (Kotera, et al., 2012) associated to the ECs annotated for all gene features. This file will be used to integrate pathway annotations to the results of differential expression.
You can download that material at RefSeq folder. If you need help, contact us for support at https://forum.biotechvana.com
Goseq input material: The tutorial shows how to execute GOseq analyses for testing differential enrichment of GOs using either: “Tophat/Hisat2 & Cufflinks” or “Mapping & Counting”. Enrichment analysis are performed with the software GOseq (Young, et al., 2010). As the analysis is customized, you need 4 input files for the analysis; 1) assayed genes; 2) differential expressed genes; 3) gene sizes, and 4) GO terms per gene.
To facilitate this tutorial, we provide you with the following material:
- “Assayed genes” → assayed_genes.txt
- “Differentially expressed Genes” → diff_genes.csv
- “Gene size” → length_genes.txt
- “Go terms” → gos_homo_sapiens.txt
Pathseq input material: The tutorial shows how to execute GOseq analyses for diferential enrichment of metabolic pathways using either: “Tophat/Hisat2 & Cufflinks” or “Mapping & Counting”. Enrichment analysis are performed with the software GOseq (Young, et al., 2010). As the analysis is customized you need 4 input files for the analysis; 1) assayed genes; 2) differential expressed genes; 3) gene sizes, and 4) Metabolic maps per gene.
To facilitate this tutorial, we provide you with the following material:
- “Assayed genes” → assayed_genes.txt
- “Differentially expressed Genes” → diff_genes.csv
- “Gene size” → length_genes.txt
- “Pathway maps” → maps_homo_sapiens.txt