massachusetts institute of technology (mit)
computer science and artificial intelligence laboratory (csail)
theory of computation group (toc)

computation and biology group (compbio)

email queries

JOKERCAKE is a software to generate the shortest sequnece to cover k-mers using joker characters.
The algorithm is based on 2 methods. The first is a greedy heuristic that adds k-mers that cover as many uncovered k-mers. The second is an ILP formulation solved by black-box solvers.

Input and Output

JOKERCAKE takes as input nine parameters:

The algorithm is given:
1. joker template - string over 0's and 1's, where 1's tell where the joker is.
2. alphabet + joker character - for example, DNA: ACGTN, Amino acid: ARNDCQEGHILKMFPSTWYVX
3. multiplicity filename - in the file for each k-mer how many times it has to occur.
4. reverse complement - 0 for coering all k-mers, 1 for covering either a k-mer or its reverse complement
5. time_limit - time limit for the ILP run (in seconds). When set to 0, no ILP solution is provided.
6. heuristic of the first algorithm - 0 no heuristic, 1 for greedy, 2 for random.
7. first algorithm output filename.
8. ILP output filename.

It outputs two sequence files as textual files.

JOKERCAKE was developed by Yaron Orenstein in Bonnie Berger's group at Massachusetts Institute of Technology: MIT.

Get the software

A binary is available here (requires java 1.7 or higher):

For GUROBI solver, see (free for academic use).

How to use it

java -jar joker.jar <template> <alphabet+joker> <multi_file> <reverse_complement> <time_limit> <heuristic_to_use> <heuristic_filename> <ILP_filename>

Example run:

java -jar joker.jar 00001 ACGTN multi1_1024.txt 0 200 1 heuristic.txt ILP.txt

Without ILP:

java -jar joker.jar 00001 ACGTN multi1_1024.txt 0 0 2 heuristic.txt ILP.txt

The file multi1_1024.txt has 1 for each of the 1024 k-mers.

Interpreting the output

The output file is a text file containing the sequences. Each file contains a different sequence.

Computed sequences

Results of greedy heuristic and ILP on multiplicity 1, DNA 5 <= k <= 8, amino acid 3 <= k <= 4
  • Results of greedy heuristic on DNA k=6, 1 <= multiplicity <= 16
  • Pho4 MITOMI experiment used in this study

  • Pho4_Seq.fas
  • Pho4_rNN.txt
  • Cite this work:

    Yaron Orenstein, Ryan Kim, Polly Fordyce and Bonnie Berger. Joker de Bruijn: sequence libraries to cover all k-mers using joker characters. RECOMB 2017.
    Yaron Orenstein, Robert Pucinelli, Ryan Kim, Polly Fordyce and Bonnie Berger. Optimized sequence library design for efficient in vitro interaction mapping. Cell Systems 2017.
    Yaron Orenstein, Yun William Yu and Bonnie BErger. Joker de Bruijn: covering k-mers using joker characters. Submitted.