Files

clipper
- HyperLevelDB
- samtools
- sparsehash-2.0.3
- t
- tidx
- CHANGES
- Chrdex.pm
- Grun.pm
- Makefile
- README
- affy-csv-to-bed.pl
- affy-liftover.pl
- aggregate-results-by-gene.pl
- alc
- align-sw.cpp
- bam-filter.cpp
- bowtie-gzip.patch
- bwa-to-bowtie
- check-clipper.sh
- contig-stats
- count-ambig.cpp
- debug
- determine-phred
- ea-bcl2fastq.cpp
- ea-utils.spec
- ea-utils.spex
- fasta-qual-to-fastq
- fastq-clipper.cpp
- fastq-join.cpp
- fastq-join.t
- fastq-lib.cpp
- fastq-lib.h
- fastq-mcf.cpp
- fastq-multx.cpp
- fastq-stats.cpp
- fastq-to-fasta
- fastx-graph
- gcModel.cpp
- gcModel.h
- getgenbankannot
- getline.c
- gff2gtf
- grun
- gtf2bed
- install-bamtools.sh
- master-barcodes.txt
- merge-fifo
- mirna-quant.cpp
- mixfastqs.pl
- multx.sh
- pbi-clean.cpp
- qsh
- qsub
- randomFQ
- sam-stats.cpp
- sam-stats.pl
- seqsig.cpp
- testchrdex-speed.pl
- testchrdex.pl
- testsuf.pl
- utils.cpp
- utils.h
- varcall-matrix
- varcall.cpp
- xjoin
- zhead
Dockerfile
README.md

clipper

Merge pull request #63 from wltrimbl/testEOFbeginning

Oct 20, 2021

ae6747c · Oct 20, 2021

Name		Name	Last commit message	Last commit date
parent directory ..
HyperLevelDB		HyperLevelDB
samtools		samtools
sparsehash-2.0.3		sparsehash-2.0.3
t		t
tidx		tidx
CHANGES		CHANGES
Chrdex.pm		Chrdex.pm
Grun.pm		Grun.pm
Makefile		Makefile
README		README
affy-csv-to-bed.pl		affy-csv-to-bed.pl
affy-liftover.pl		affy-liftover.pl
aggregate-results-by-gene.pl		aggregate-results-by-gene.pl
alc		alc
align-sw.cpp		align-sw.cpp
bam-filter.cpp		bam-filter.cpp
bowtie-gzip.patch		bowtie-gzip.patch
bwa-to-bowtie		bwa-to-bowtie
check-clipper.sh		check-clipper.sh
contig-stats		contig-stats
count-ambig.cpp		count-ambig.cpp
debug		debug
determine-phred		determine-phred
ea-bcl2fastq.cpp		ea-bcl2fastq.cpp
ea-utils.spec		ea-utils.spec
ea-utils.spex		ea-utils.spex
fasta-qual-to-fastq		fasta-qual-to-fastq
fastq-clipper.cpp		fastq-clipper.cpp
fastq-join.cpp		fastq-join.cpp
fastq-join.t		fastq-join.t
fastq-lib.cpp		fastq-lib.cpp
fastq-lib.h		fastq-lib.h
fastq-mcf.cpp		fastq-mcf.cpp
fastq-multx.cpp		fastq-multx.cpp
fastq-stats.cpp		fastq-stats.cpp
fastq-to-fasta		fastq-to-fasta
fastx-graph		fastx-graph
gcModel.cpp		gcModel.cpp
gcModel.h		gcModel.h
getgenbankannot		getgenbankannot
getline.c		getline.c
gff2gtf		gff2gtf
grun		grun
gtf2bed		gtf2bed
install-bamtools.sh		install-bamtools.sh
master-barcodes.txt		master-barcodes.txt
merge-fifo		merge-fifo
mirna-quant.cpp		mirna-quant.cpp
mixfastqs.pl		mixfastqs.pl
multx.sh		multx.sh
pbi-clean.cpp		pbi-clean.cpp
qsh		qsh
qsub		qsub
randomFQ		randomFQ
sam-stats.cpp		sam-stats.cpp
sam-stats.pl		sam-stats.pl
seqsig.cpp		seqsig.cpp
testchrdex-speed.pl		testchrdex-speed.pl
testchrdex.pl		testchrdex.pl
testsuf.pl		testsuf.pl
utils.cpp		utils.cpp
utils.h		utils.h
varcall-matrix		varcall-matrix
varcall.cpp		varcall.cpp
xjoin		xjoin
zhead		zhead

README

OVERVIEW:

fastq-mcf

Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.

fastq-multx

Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.

fastq-join
Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.

fastq-stats
Outputs stats for fastqs

sam-stats
Output stats for sam/bam files

varcall
Variant caller, takes bam or pileup output and does variant calling with advanced features like PCR duplicate filtering, homopolymer repeat filtering, calculation of error rate and dectectibility (minimum percentage) thresholds.

REQUIRES:

On Ubuntu:

sudo apt-get install subversion zlib1g-dev libgsl0-dev

For building sam-stats, please install this first!

https://github.com/pezmaster31/bamtools/wiki/Building-and-installing

QUICK FAQ:

This is based on feedback/emails, etc.

fastq-mcf does a 300k sub-sampling to determine what to do. There are lots of paramters to play with, but the "automatic" mode should do the right thing most of the time. If it doesn't, I really would like to hear why/what it did. The point in this tool is that the basic quality and adapter filtering should be something that's done automagically 90% of the time - not by manually picking paramters for each run. The fact that it's making decisions "for the user" means it will probably change more over time than the other tools.

If you want fastq-mcf to be similar to other tools, you need to pass -m XX, and -s 100, so it's a fixed-length. If you try running with unrealistic, or "test" data, the heuristic won't work. Instead, try with a subsample of 50000 or so "real" reads.

fastq-mcf doubles as a read-filtering program, it supports a broad range of filtering arguments.

fastq-join produces a "report". This is just a list of lengths of joined reads. Also it chooses the "better quality base" when overlapping. Very stable code at this point.

fastq-multx is intended to keep mates in sync, so you can demultiplex in one-pass. For single-reads, it's not better than other tools out there, except that you don't need to predefine your sets... which can help logistics in high-volume situations. Also, notice the output file's "%-sign" substitution... this is instead of lots of prefix and suffix arguments. Mismatch algorithm is "maximal unique"... ie... if it's possible that 2 barcodes can match, it won't use *either*. Qualities are *no longer* ignored, you can explicity set low quality reads as mismatches. Minimum edit distances can be specified, useful for recovering when CASAVA demultiplexing was poor, especially for dual indexed. Very stable code at this point.

Dual-indexed codes are listed as SEQUENCE-SEQUENCE in the barcode file. I haven't tried mixing them with others on the autodetect code, I can't imaginge there's a reason to do that.

The latest version can ignores bases that have extremely low qualities (<5), and refuses to match a barcode that isn't a minimum distance from another best match. It's a lot safer, but for some poor-quality runs these features will need to be disabled.

sam-stats take a lot of options for a variety of reports. The most important ones to note are -D, which builds a huge hash of probe ids, and -R which produces a coverage matrix. It could autodetect if reads are sorted by probe ID and save RAM. It could also reduce RAM by removing common prefixes from the hash after some X reads. It doesn't do those things now.

INSTALL:

Should be able to run "make install" on most machines that have g++ installed. On windows, install a copy of the MinGW environment. You'll need zlib installed for some tools. fastq-mcf, fastq-stats, etc. are pretty basic, and work without any external libs.

Example:

PREFIX=/usr make install

OR to a subdir:

BINDIR=/usr/bin/ea-utils make install

Or with other options:

CC=g++ PREFIX=/usr/local make install

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

clipper

clipper

README

Collapse file tree

Files

clipper

Directory actions

More options

Directory actions

More options

Latest commit

History

clipper

Folders and files

parent directory

README