Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of SOPs/gzipping

Timestamp:: 11/15/13 12:19:04 (12 years ago)
Author:: gbell
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/gzipping

               v1
+== Creating an analysis pipeline of compressed files ==
+Almost all steps of a short-read analysis pipeline can be created with gzipped files to save storage space.  For example,
+# Untar from solexa_public, gunzip, and gzip \\
+tar -xOzf /lab/solexa_public/LAB/RUN/QualityScore/Foo.1.tar.gz | gzip -f > fastq/My_sample_2.1.fq.gz \\
+tar -xOzf /lab/solexa_public/LAB/RUN/QualityScore/Foo.2.tar.gz | gzip -f > fastq/My_sample_2.2.fq.gz \\
+# Run FastQC \\
+fastqc fastq/My_sample_2.1.fq.gz \\
+fastqc fastq/My_sample_2.2.fq.gz \\
+# Trim adapters \\
+gunzip -c fastq/My_sample_2.1.fq.gz | fastx_clipper -v -z -l 30 -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG -o fastq/My_sample_2.noVec.1.fq.gz \\
+gunzip -c fastq/My_sample_2.2.fq.gz | fastx_clipper -v -z -l 30 -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG -o fastq/My_sample_2.noVec.2.fq.gz \\
+fastqc fastq/My_sample_2.noVec.1.fq.gz \\
+fastqc fastq/My_sample_2.noVec.2.fq.gz \\
+# Drop reads of low quality \\
+gunzip -c fastq/My_sample_2.noVec.1.fq.gz | fastq_quality_filter -v -q 20 -p 75 -z -o fastq/My_sample_2.noVec.qgt20.1.fq.gz \\
+gunzip -c fastq/My_sample_2.noVec.2.fq.gz | fastq_quality_filter -v -q 20 -p 75 -z -o fastq/My_sample_2.noVec.qgt20.2.fq.gz \\
+# Trim low-quality bases at end of reads \\
+gunzip -c fastq/My_sample_2.noVec.qgt20.1.fq.gz | fastq_quality_trimmer -v -t 20 -l 30 -z -o fastq/My_sample_2.noVec.qgt20.t.1.fq.gz \\
+gunzip -c fastq/My_sample_2.noVec.qgt20.2.fq.gz | fastq_quality_trimmer -v -t 20 -l 30 -z -o fastq/My_sample_2.noVec.qgt20.t.2.fq.gz \\
+# After trimming, get reads that are still paired (and output as regular fastq) \\
+/nfs/BaRC_Public/BaRC_code/Perl/cmpfastq/cmpfastqgz.pl fastq/My_sample_2.noVec.qgt20.t.1.fq.gz fastq/My_sample_2.noVec.qgt20.t.2.fq.gz
+# Map paired reads with bowtie \\
+# THIS STEP CANNOT USE GZ FILES \\
+bowtie -l 30 -n 1 -X 1000 --best -S /nfs/genomes/a.thaliana_TAIR_10/bowtie/tair10_SI -1 fastq/My_sample_2.noVec.qgt20.t.1.fq-common.out -2 fastq/My_sample_2.noVec.qgt20.t.2.fq-common.out mapped_reads/My_sample_2.30.1.sam
+# Sort and index mapped reads \\
+/nfs/BaRC_Public/BaRC_code/Perl/SAM_to_BAM_sort_index/SAM_to_BAM_sort_index.pl mapped_reads/My_sample_2.30.1.sam \\
+# Delete SAM file (since BAM file has all the same info) \\
+rm -f mapped_reads/My_sample_2.30.1.sam \\
+# Get summary counts \\
+zcat fastq/My_sample_2.1.fq.gz | echo $((`wc -l`/4)) \\
+zcat fastq/My_sample_2.noVec.1.fq.gz | echo $((`wc -l`/4)) \\
+zcat fastq/My_sample_2.noVec.2.fq.gz | echo $((`wc -l`/4)) \\
+zcat fastq/My_sample_2.noVec.qgt20.1.fq.gz | echo $((`wc -l`/4)) \\
+zcat fastq/My_sample_2.noVec.qgt20.2.fq.gz | echo $((`wc -l`/4)) \\
+more fastq/My_sample_2.noVec.qgt20.t.1.fq | echo $((`wc -l`/4)) \\
+more fastq/My_sample_2.noVec.qgt20.t.2.fq | echo $((`wc -l`/4)) \\
+zmore fastq/My_sample_2.noVec.qgt20.t.1.fq-common.out.gz | echo $((`wc -l`/4)) \\
+samtools flagstat mapped_reads/My_sample_2.30.1.sorted.bam > mapped_reads/My_sample_2.30.1.flagstat.txt \\
+grep ' mapped (' mapped_reads/My_sample_2.30.1.flagstat.txt