This tool rewrites your coding DNA sequence using synonymous codon changes to reduce
predicted splice sites, while keeping the amino acid sequence unchanged.
1. What is being optimized?
Each time you run the optimizer, your sequence is sent (via a small local proxy) to the
ASSP splice-site prediction server. ASSP returns a table of predicted donor/acceptor
sites with a score and confidence for each site. The optimizer focuses on
undesired sites, defined as:
-
Constitutive donors (always considered unwanted), or
-
Sites with a score above the configured score threshold, or
-
Intermediate-score sites (score between 5.0 and your threshold) that have confidence
above the configured confidence threshold.
For each sequence, the optimizer computes a global cost:
(number of undesired sites, maximum score among those sites). Sequences with
fewer undesired sites are considered better; if two sequences have the same count, the
one with the lower maximum score is preferred.
2. How codons are chosen for mutation
In each iteration, the optimizer picks the worst undesired site (usually the
highest-score constitutive donor, otherwise the highest-score/highest-confidence site
overall). It then:
-
Uses the mixed-case motif reported by ASSP to locate the exon–intron boundary in your
sequence.
-
Identifies the “center” codon that contains the boundary, and constructs a
small codon window around it (C0, C−1, C+1, and optionally neighbors).
-
Only considers synonymous codons in this window (codons that encode
the same amino acid), so your protein sequence stays unchanged.
3. Search strategy and tiers
The optimizer runs in up to three tiers of increasing search cost:
-
Fast-path (C0 only): Tries synonymous codons for the center codon
C0, in a human-leaning order. For each candidate sequence, it calls ASSP again and
computes the new cost. The first candidate that strictly improves the global cost is
accepted and the optimizer moves to the next iteration.
-
Slow-path (local window): If the fast-path cannot find an
improvement, the optimizer searches a small window of codons around the junction
(C0, C−1, C+1). It evaluates synonymous changes in this window and keeps the single
candidate that gives the best improvement in global cost (steepest descent within the
window).
-
Very-slow-path (extended window): If the local window still cannot
improve the cost and you selected the full multi-tier strategy, the optimizer extends
the window further (to include C−2 and C+2) and repeats the steepest-descent search.
The optimization strategy setting controls how far the algorithm goes:
-
Fast-path only: Only mutate C0; stop as soon as C0 cannot be
improved.
-
Fast + windowed slow-path: Try C0 first; if that fails, try the
local window. Stop once the window cannot find any improvement.
-
Full multi-tier: Use fast-path, then slow-path, then (if needed)
extended window around the site before giving up.
4. Stopping conditions and safeguards
The optimizer stops when any of the following happens:
- No undesired sites remain according to your thresholds.
-
The local synonymous search window around the targeted site cannot find any
single-codon change that improves the global cost.
- The maximum number of iterations is reached.
To avoid looping, the optimizer also keeps a cache of all sequences it has already
evaluated. Candidate sequences that were seen before are skipped immediately, so the
search always moves to genuinely new codon patterns.
5. What is and is not changed
-
Only internal codons are mutated; your start/stop codons and reading frame are
preserved because the sequence is always enforced to have a length that is a multiple
of three and to contain only A/C/G/T.
-
All changes are synonymous with respect to the standard nuclear
genetic code: the amino acid sequence remains the same.
-
Codon choices are biased toward human-preferred codons but still allow alternatives
when needed to reduce problematic splice motifs.
The result is a coding sequence that should be better behaved with respect to predicted
cryptic and constitutive splice sites, while keeping the encoded protein identical.