Tutorial¶

print view
notebook

Experiment: Arabidopsis thaliana (Thale Cress)¶

  • Comparing wild type (WT) and atrx-1 mutant plants to analyze how the loss of function of ATRX protein results in changes in gene expression.

  • The ATRX protein is a histone chaperone known to be an important player in the regulation of gene expression.

  • RNA was isolated from three WT replicates and three mutant replicates.

  • RNA-seq was carried out on an Illumina Hiseq 2500.

  • The sequencing reads are paired-end data, hence we have 2 files per replicate.

Create directory in which we will be doing the analysis¶

  • create directory: mkdir -p tutorial/RNAseq
  • Go to the directory: cd tutorial/RNAseq

Download the data:¶

  • wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR442/003/SRR4420293/SRR4420293_1.fastq.gz
  • wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR442/003/SRR4420293/SRR4420293_2.fastq.gz
  • wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR442/006/SRR4420296/SRR4420296_1.fastq.gz
  • wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR442/006/SRR4420296/SRR4420296_2.fastq.gz

Get the Genome reference file and associated GTF/GFF file for Arabidopsis.¶

  • Create a directory for the genome: mkdir Genome
  • Go to the directory: cd Genome
  • wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.fna.gz
  • wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.gtf.gz

Unzip the files¶

  • gunzip GCF_000001735.4_TAIR10.1_genomic.fna.gz
  • gunzip GCF_000001735.4_TAIR10.1_genomic.gtf.gz
  • Check if they have been unzipped

Quality Check¶

  • Create a folder for fastqc files: mkdir fastqcReports
  • Check the number of cores:
    • import os
    • os.cpu_count()
  • fastqc -t 4 -o ./fastqcReports/ ./*.gz

Detour: When more than one QC file: Compile all the results together using multiqc¶

  • install multiqc: pip3 install multiqc
  • multiqc -p -o ./fastqcReports/ ./fastqcReports/ --title "Arabidopsis_RNAseq"

Detour: adapter trimming¶

  • installation: conda install bioconda::trimmomatic
  • mkdir adapter
  • cd adapter
  • download the adapter fasta:
    • wget https://github.com/usadellab/Trimmomatic/blob/main/adapters/TruSeq3-PE.fa
  • apply trimmomatic
  • trimmomatic PE -threads 4
    SRR4420293_1.fastq.gz SRR4420293_2.fastq.gz
    SRR4420293_1_trimmed_paired.fastq.gz SRR4420293_1_trimmed_unpaired.fastq.gz
    SRR4420293_2_trimmed_paired.fastq.gz SRR4420293_2_trimmed_unpaired.fastq.gz
    ILLUMINACLIP:./adapter/TruSeq3-PE.fa:2:30:10
    LEADING:3 TRAILING:3
    SLIDINGWINDOW:4:15
    MINLEN:36

Aligning the reads:¶

  • Install hisat2: conda install -c bioconda hisat2

Extract splice sites and exons from GTF (optional but recommended)¶

  • hisat2_extract_splice_sites.py ./Genome/GCF_000001735.4_TAIR10.1_genomic.gtf > ./splicesites.txt
  • hisat2_extract_exons.py ./Genome/GCF_000001735.4_TAIR10.1_genomic.gtf > ./exons.txt

Build HISAT2 index with splice site information¶

hisat2-build \ -p 4 \ --ss ./splicesites.txt \ --exon ./exons.txt \ ./GCF_000001735.4_TAIR10.1_genomic.fna \ hisat_index/TAIR10

Now run HISAT2 with the index PREFIX to align the samples¶

hisat2 -p 4 \ -x ./genome/hisat_index/TAIR10 \ -1 SRR4420293_1.fastq.gz \ -2 SRR4420293_2.fastq.gz \ -S aligned_reads/SRR4420293.sam \ --summary-file ./aligned_reads/SRR4420293_alignment_summary.txt

install samtools¶

  • conda install -c bioconda samtools
  • verify the installation: samtools --version

Convert to sorted BAM¶

samtools view -@ 4 -bS aligned_reads/SRR4420293.sam | \ samtools sort -@ 4 -o aligned_reads/SRR4420293.sorted.bam && \ samtools index aligned_reads/SRR4420293.sorted.bam && \ rm aligned_reads/SRR4420293.sam

install featurecounts¶

  • conda install -c bioconda subread
  • verify installation: featureCounts -v

Let's count the features¶

featureCounts \ -a ./genome/GCF_000001735.4_TAIR10.1_genomic.gtf \ -o ./gene_counts/SRR4420293_counts.txt \ -p \ -t exon \ -g gene_id \ -T 4 \ ./aligned_reads/SRR4420293.sorted.bam

Quiz¶

  • Convert and save the counts txt file(s) into a csv file