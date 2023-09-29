The FML-seq method

The protocol for FML-seq comprises only three steps (Fig 1A). First, genomic DNA (gDNA) is digested by a methylation-dependent restriction endonuclease that cuts at a certain distance from the 5-methylcytosine or 5-hydroxymethylcytosine in its motif and leaves a short overhang (10). Second, a master mix is added with combined reagents for sticky-end adapter ligation, preparation of the specially designed adapters (15), and indexing PCR. Finally, a single cleanup without size selection is sufficient to purify the library, because the digestion does not produce unusably short fragments (Fig S1) and the adapter design prevents byproducts without gDNA inserts (Fig S2). The resulting library contains unaltered genome sequences alignable by standard pipelines. Each end of a library fragment is derived from a methylation-dependent digestion, so paired-end sequencing detects two methylated cytosine positions per fragment (Fig 1B). FML-seq represents a substantial simplification of previous protocols based on methylation-dependent digestion.

Figure 1. Diagram of fragmentation at methylated loci and sequencing. (A) Library preparation reactions. Genomic DNA is digested by a methylation-dependent restriction endonuclease that cuts at a known distance from the methylated cytosine in its motif and leaves a short single-strand overhang of unknown bases. Stem-loop (hairpin) sequencing adapters with complementary random overhangs are ligated to the digested genomic DNA fragments, but the phosphodiester backbone is completed only on one strand because the adapters lack a 5′ phosphate. The resulting single-strand nick is extended by DNA polymerase to fill in a second strand complementary to the adapter’s loop, whereas the unneeded stem strand is degraded. This library of genomic DNA inserts between double-stranded linear short adapters is then amplified by standard polymerase chain reaction with long indexing primers to produce a sequencing-ready library. A standard solid-phase reversible immobilization bead cleanup without size selection is sufficient to purify the library. Paired-end sequencing reads imply the location of the two methylated cytosines resulting in each observed fragment. (B) Counting fragmentation at methylated loci and sequencing fragments as hits at methylated motif sites. The restriction endonuclease used here, MspJI, cuts at the motif mCNNR. Each copy of this motif on either strand implies a potential cut site at a certain distance past its 3′ end. When paired-end sequence reads are aligned to the reference genome, each end of a sequenced fragment counts as one hit for the corresponding motif site; for example, the fragment marked by an asterisk tallies one hit each for the red and green motif sites. The number of hits for a given motif site corresponds to the fraction of genome copies methylated at that motif’s cytosine position.

Figure S1. DNA digestion by MspJI restriction endonuclease. (A) The enzyme’s recognition domain (blue) binds a motif (blue) containing methylated cytosine (red), mCNNR, whereas its endonuclease domain (magenta) cuts at distances of 13 and 17 bp from the methylated cytosine position. Digestion leaves a 5′ overhang of 4 nt with a terminal phosphate, and terminal hydroxyl on the 3′ underhang. (B) Digestion by MspJI cannot result in very short fragments. The shortest theoretically possible library insert would be generated by recognition of a mCNNR site immediately flanking the 4 nt overhang resulting from a previous digestion, leaving a total insert of 21 bp after ligating the adapters. In reality, MspJI may not necessarily be able to bind its motif without additional paired bases on both sides. Empirically, we observe, in high-input libraries without size selection, only about 0.04% of library inserts shorter than 22 bp, 0.05% shorter than 25 bp, 0.33% shorter than 30 bp (Fig S7A). (C) In the special case of a fully methylated CpG within a YNCGNR palindrome, two MspJI enzymes may digest the DNA symmetrically and produce a 32 bp-insert. In fragmentation at methylated loci and sequencing libraries from human genomic DNA, inserts of this length are disproportionately common but not the majority, about 4% of all inserts (Fig S7A).

Figure S2. Fragmentation at methylated loci and sequencing molecule sequences using MspJI restriction endonuclease and Nextera sequencing adapters. For convenience a palindromic YNCGNR site with two complementary CGNR motif sites is shown, though fragments can be produced by a non-palindromic single site and a given fragment does not necessarily contain the motif sites responsible for its digestion. The stem-loop (hairpin) adapters are an equimolar mix of two versions, one for each end of a sequencing-ready library molecule (Illumina P5 and P7), but except for their slightly different sequences, they are used identically and the ligation product has no polarity. By chance, half of the digestion products will be ligated between two of the same adapter (P5 and P5 or P7 and P7), producing an unsequenceable molecule that is unlikely to amplify because of PCR suppression; this limits the efficiency of the library synthesis to 50%. Nick extension replaces the second adapter strand, whereas uracil-DNA glycosylase excises uracil and the abasic sites are hydrolyzed, leaving an amplifiable library molecule. Because the stem-loop adapters lack 5′ phosphate, they ligate to a genomic DNA fragment on only one strand, and if two adapters anneal without a genomic DNA insert, the nick extension stops at the nick on the opposite strand. Oligonucleotide sequences @ 2021 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited.

FML-seq uses sequencing adapters with 5′ overhangs of 4 random bases (4N) for efficient sticky-end ligation to the corresponding overhangs of unknown bases resulting from digestion by a restriction endonuclease whose motif includes methylated cytosine. This would be expected to result in an overwhelming byproduct from dimerization of adapters that ligate directly to each other without a gDNA insert. FML-seq’s library protocol uses a combination of several techniques to prevent this behavior: (1) The adapters are added to the gDNA before digestion. After digestion there is a heat denaturation of the restriction endonuclease, and at this step, adapters that annealed to each other during storage at higher concentration may melt apart and reanneal to the ends of new gDNA fragments instead. (2) The adapters lack a phosphate at the 5′ terminus, which is required for DNA ligase to connect that terminus to a matching 3′ hydroxyl. Thus, neither strand in an adapter dimer can be ligated, and even if two adapters with complementary overhangs temporarily anneal, 4 bp of hydrogen bonding may not keep them together when the temperature is raised in subsequent steps. (3) Before PCR, the same polymerase is used to fill in the second strand of the adapter by extension from the nick where the adapter’s 5′ end is not joined to the gDNA insert’s 3′ end. Even if an adapter dimer holds together at the PCR polymerase’s extension temperature, it will also have a nick on the opposite strand, so the polymerase would have to jump over a gap in the template strand to fill in the new strand. (4) The 5′ end of the stem loop (hairpin) adapter has the 4N sticky-end overhang and a complementary sequence that forms the double-stranded stem. In addition to PCR suppression, this stem could also cause adapter dimerization by melting and reannealing to a different molecule. However, in the stem sequence, all thymine bases are replaced with uracil, and a dU-intolerant proofreading polymerase is used to prevent the stem sequence from being replicated. (5) Furthermore, the stem sequence’s uracils are excised before PCR by uracil–DNA glycosylase (UDG), leaving abasic sites that further deter replication and may be fully destroyed by hydrolysis at PCR denaturation temperature. In this protocol, UDG from a hyperthermophile (Archaeglobus fulgidus) is included in the combined ligation/loop-breaking/PCR master mix, because the traditional Escherichia coli UDG interferes with the low-temperature ligation and would need to be added in an additional step between ligation and PCR, whereas the hyperthermophilic UDG appears inert at ligation temperature. (6) For the Illumina sequencing platform, the adapters are based on the less traditional but equally well-supported Nextera sequence (Tn5 transposon) rather than standard TruSeq, because that sequence is T-rich on the 5′ strand and therefore this protocol’s adapters are U-rich in the portion removed by UDG. (7) These adapter sequences are incomplete and require long PCR primers to extend them to full length with multiplexing indexes. Like the Nextera design, the PCR primers are also incomplete and only partially overlap the ligated adapter sequences. Thus, both an adapter and a PCR primer are required to create a full-length product matching both the flowcell’s amplification primers and the sequencing primers, so neither the original adapters nor the PCR primers can form sequenceable byproducts alone.

An additional consequence of destroying the 5′ ends of the original adapters is that the random overhang is also destroyed; even if it contains a mismatch to the complementary gDNA sequence that is tolerated by the ligase, the mismatched base is not part of the final molecule’s sequence. The final sequence is derived only from the gDNA and the random overhang does not contribute mismatches at the ends of sequence reads.

Previous methods that used MspJI or another methylation-specific restriction endonuclease focused exclusively on the 32-bp fragment produced by two complete, symmetric digestions of a fully methylated CpG (Fig S1C). This requires a precise size selection such as cutting a band out of an acrylamide electrophoresis gel, and discards about 96% of the library product (see Fig S7). In contrast, FML-seq treats every cut site as informative of a methylated base, even where digestion is incomplete or the sequence motif is not palindromic.