Advanced usage

capC-MAP is actually a suit of programs written in C++ along with a Python “front end” which allows a whole processing pipeline to be run via a single command line. For advanced usage each of the component programs can be run independently, and these are documented here. As well as the four core capC-MAP programs, there is a further additional tool capClocation2fragment.

capCdigestfastq

The program capCdigestfastq performs a in silico restriction enzyme digestion of a pair of fastq sequence files. The program can run in two modes. In the standard mode, as used in the capC-MAP pipeline, the program reads from the two paired end read fastq files line-by-line, checking that the names match for each of the pair. The program splits the read pair into smaller restriction enzyme fragments at the specified cutting sequence. As well as the cut sequence, a cut position within that sequence has to be specified; this does not have to match the real cut position of the enzyme and will not affect downstream capC-MAP results. All fragments are output to a single fastq file, with read pair names given in a format suitable for use the the capC-MAP program capCmain, once the fastq has been mapped to the reference genome.

The program can also run in an alternative “long” mode, where only the longest restriction enzyme fragment from each of the pair is retained, and output is given in two separate fastq files. This output is not suitable for use with the capCmain program.

capCmain

The capCmain is the main work-horse program of capC-MAP, and takes as an input a name-sorted SAM file generated using bowtie to map a fastq file which was generated by the digestfastq program. It also requires a map of restriction enzyme fragments for the reference genome (as generated by capC-MAP genomedigest), and bed file containing a list of target restriction enzyme fragments. The output is a list of intrachromosomal interactions and a list of interchromosomal interactions for each target.

capCpair2bg

The capCpair2bg program reads in a single bed file list of intrachromosomal interactions (as output by capCmain), and generates a “pile-up” of interaction counts at each restriction enzyme fragment in bedGraph format - i.e. an interaction profile.

capCpileup2binned

The capCpileup2binned program reads in a restriction enzyme fragment level intrachromosomal interaction profile (as generated by capCpair2bg) and generates a binned, smoothed, and normalized interaction profile. Optional parameters are a binning step size S and windows size W (as detailed in section capC-MAP run), and a total number of reads T for normalization to reads-per-million. If only S and W are specified the profile is not normalized; if only T is provided, a normalized profile at restriction enzyme fragment resolution is generated.

capClocation2fragment

The capClocation2fragment program reads in a bed file of genome intervals, and a genome wide map of restriction enzyme fragments (as generated by capC-MAP genomedigest). It finds the mid-point of each interval in the input file, and outputs the restriction enzyme fragment which it falls in. This can be useful for generating the targets file required by capC-MAP, where each target must appear in the genome wide fragments map.

A typical procedure to generate the targets file might be to

  1. Run the capture oligo sequences through BLAST to find their locations within the reference genome.
  2. Format the resulting list of locations into a bed file. Run this file though capClocation2fragment to find the list of restriction enzyme fragments to which the oligos map.
  3. Edit the output bed file to add useful target names and remove duplicated entries (typically oligos will be designed such that there is one at either end of a restriction enzyme fragment).