Sequencing DNA means to
determine the exact order of nucleotides in a DNA molecule. DNA sequencing
began in the 1970s when two researchers groups developed different methods for
sequencing, the Maxam-Gilbert method and the Sanger method, at almost the same
time. Conventional DNA sequencing technologies were useful but have high cost
and low throughput capacity. Next generation technologies were introduced to
overcome the limitation of conventional DNA sequencing technologies with
ability to sequence entire genomes of organisms.
DNA sequencing is
the process of determining the precise order of nucleotides within a DNA molecule
via any method or technology to determine the order of the four bases i.e.
adenine, guanine, cytosine, and thymine in a DNA strand. An
alteration in a DNA sequence can lead to an altered or non functional protein,
hence to a genetic disorder. DNA sequencing plays important roles in detecting
the type of mutations in genetic diseases which are important to the
understanding of changes that occurs in the characters of the organisms.
Knowledge of DNA
sequences has become indispensable for basic biological research, other
research branches utilizing DNA sequencing, and in numerous applied fields such
as: diagnostic, biotechnology, forensic biology and biological systematic. The
advent of DNA sequencing has significantly accelerated biological research and
discovery. The rapid speed of sequencing attained with modern DNA sequencing
technology has been instrumental in the sequencing of the human genome, in the
Human Genome Project. Related projects, often by scientific collaboration
across continents, have generated the complete DNA sequences of microbes, plants
Deoxyribonucleic acid (DNA)
was first discovered and isolated by Friedrich Miescher in 1869, but it
remained under studied for many decades because proteins, rather than DNA, were
thought to hold the genetic blueprint to life. DNA is the information store
that ultimately dictates the structure of every gene product, delineates every
part of the organisms. The order of the bases along DNA contains the complete
set of instructions that make up the genetic inheritance. This situation
changed after 1944 as a result of some experiments by Oswald
Avery, Colin MacLeod, and Maclyn. McCarty demonstrated that
purified DNA could change one strain of bacteria into another. This was the
first time that DNA was shown capable of transforming the properties of cells. In
1953 James Watson and Francis Crick put forward
their double-helix model of DNA, based on crystallized X-ray
structures being studied by Rosalind Franklin. According to the model, DNA
is composed of two strands of nucleotides coiled around each other, linked
together by hydrogen bonds and running in opposite directions. Each strand is
composed of four complementary nucleotides – adenine (A), cytosine (C), guanine
(G) and thymine (T) with an A on one strand always paired with T on the other,
and C always paired with G. They proposed such a structure allowed each strand
to be used to reconstruct the other, an idea central to the passing on of
hereditary information between generations.
Frederick Sanger, a pioneer of sequencing. Sanger is one of the few scientists
who were awarded two Nobel prizes, one for the sequencing of proteins, and
the other for the sequencing of DNA. The foundation for sequencing proteins was
first laid by the work of Frederick Sanger who by 1955 had completed
the sequence of all the amino acids in insulin, a small protein secreted
by the pancreas.
of Sequencing Methods
In 1975, Sanger
introduced the plus and minus method for DNA sequencing (Sanger and Coulson, 1975).
This was a critical transition technique leading to the modern generation of
methods that have completely dominated sequencing over the past 30 years. The
key to this advance was the use of polyacrylamide gels to separate the products
of primed synthesis by DNA polymerase in order of increasing chain length.
Gilbert (1977) developed a DNA sequencing method that was similar to the Sanger
method in using polyacrylamide gels to resolve bands that terminated at each
base throughout the target sequence, but very different in the way that
products ending in a specific base were generated.
plus-minus and Maxam – Gilbert methods were rapidly replaced by the
chain-terminator method (Sanger, Nicklen, Coulson, 1977), also known as Sanger
sequencing (after its developer Frederick Sanger) or dideoxy sequencing.
In 1986 the
laboratory of Leroy Hood at Caltech, in collaboration with Applied Biosystems (ABI),
published the first report of automation of DNA sequencing (Smith et al, 1986),
which established the dye-terminator variant of Sanger sequencing.
sequencing using dye-terminators became the dominant sequencing technique until
the introduction of so-called next-generation sequencing technologies beginning
The National Human Genome
Research Institute (NHGRI) has echoed this need through its vision for genomics
research (Collins et al 2003).
The advancement of sequencing
techniques, the past decade will be remembered as the decade of the genome research.
Since the publications of first complete genomes of human (Lander et al 2001;
venter et al 2001)
In the first automated fluorescent
DNA sequencing equipment, a complete gene locus for the hypoxanthine-guanine phospho
ribosyl transferase (HPRT) gene was sequenced, using for the first time the paired-end
sequencing approach (Edwards et al 1990)
Landmark was achieved by
the DNA sequencing of the first small phage genome (5386 bases in length) and sequencing
of the human genome of up to ?3
billion bases (Lander et al 2001; Venter et al 2001). It is remarkable that such
progress has been made using methods that are refinements of the basic ‘dideoxy’
method introduced by Sanger in 1977.
Lander ES, Linton LM, Birren
B, Nusbaum , Zody MC, Baldwin J, Devon K, Dewar K et al (2001) initially sequenced and analyzed the human
Aigrain et al 2016
suggested that the emergence of PCR free protocols and simplified protocols
merging several steps into one will certainly improve not only the workflow,
overall and hand on times of DNA library
preparation, but also the chemical efficiency of these.
Fouhy et al in 2016
conducted the comparison of the Illumina MiSeq and Ion PGM sequencers and showed
that the MiSeq and Ion PGM sequencers offer good sequencing depth and provides
information at species level, not attainable using older platforms.
of DNA sequencing
A. Sequencing by
-Sanger’s chain-termination method
-Maxam & Gilbert chemical method
B. Sequencing by
C. Sequencing by degradation
D. Sequencing by hybridization
-Oligo-probes microarray and fluorescently labeled
unknown DNA fragments
E. Direct sequencing
F. Single molecule Sequencing
chain-termination method: The first method described by Sanger and Coulson for DNA
sequencing was called ‘plus and minus’ (Sanger & Coulson, 1975). This
method used Escherichia coli DNA polymerase I and DNA polymerase from
bacteriophage T4 (Englund, 1971, 1972) with different limiting nucleoside
triphosphates. The products generated by the polymerases were resolved by
ionophoresis on acrylamide gels. Due to the inefficacy of the ‘ plus and minus’
method, 2 year later, Sanger and his co-workers described a new breakthrough
method for sequencing oligonucleotides via enzymic polymerization (Sanger et
al. 1977) known as the chain termination method or the dideoxynucleotide
This method consisted
of a catalyzed enzymic reaction that polymerizes the DNA fragments
complementary to the template DNA of interest (unknown DNA). Briefly, a 32P-labelled
primer (short oligonucleotide with a sequence complementary to the template
DNA) was annealed to a specific known region on the template DNA, which
provided a starting point for DNA synthesis. In the presence of DNA
polymerases, catalytic polymerization of deoxynucleoside triphosphates (dNTP)
onto the DNA occurred. The polymerization was extended until the enzyme
incorporated a modified nucleoside called a terminator or dideoxynucleoside
triphosphate (ddNTP) into the growing chain. This method was performed in four
different tubes, each containing the appropriate amount of one of the four
terminators. All the generated fragments had the same 5′-end whereas the
residue at the 3′-end was determined by the dideoxynucleotide used in the
reaction. After all four reactions were completed; the mixture of
different-sized DNA fragments was resolved by electrophoresis on a denaturing
polyacrylamide gel, in four parallel lanes. The pattern of bands showed the
distribution of the termination in the synthesized strand of DNA and the
unknown sequence could be read by autoradiography.
2. Maxam & Gilbert
chemical method: A sequencing method based on a chemical degradation was
described by Maxam & Gilbert (1977). In this method, end-labelled DNA
fragments are subjected to random cleavage at adenine, cytosine, guanine, or
thymine positions using specific chemical agents. The chemical attack is based
on three steps: base modification, removal of the modified base from its sugar,
and DNA strand breaking at that sugar position (Maxam & Gilbert, 1977). The
products of these four reactions are then separated using polyacrylamide gel
electrophoresis. The sequence can be easily read from the four parallel lanes
in the sequencing gel. The template used in this sequencing method can be
either double-stranded (ds) DNA or ssDNA from chromosomal DNA. In general, the
fragments are first digested with an appropriate restriction enzyme (Maxam
& Gilbert, 1980), but they can also be prepared from an inserted or
rearranged DNA region (Maxam, 1980). These DNA templates are then end-labelled
on one of the strands. Originally, this labelling was done with phosphate or
with a nucleotide linked to 32P and enzymically incorporated into
the end fragment (Maxam & Gilbert, 1977). Alternatively, restriction
fragments through 35S dideoxyadenosine 5′-(?-thio) triphosphate (35S
ddATP?S) and terminal deoxynucleotidyltransferase were used (Ornstein &
Kashdan, 1985). These substitutions showed several advantages, including a
longer lifetime, low-emission energy, increase in the autoradiograph resolution
and higher stability after labelling.
sequencing utilizes labelling of the chain terminator ddNTPs, which permits
sequencing in a single reaction, rather than four reactions as in the labelled-
primer method. In dye- terminator sequencing, each of the four
dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which
with different wavelengths of fluorescence and emission.
Next-generation sequencing (NGS):
The emergence of
next-generation sequencing (NGS) technologies in the past decade has allowed
the democratization of DNA sequencing both in terms of price per sequenced
bases and ease to produce DNA libraries. The common feature of next generation
sequencing methods is that they are massively parallel, meaning that the number
of sequence reads from a single experiment is vastly greater than the 96
obtained with modern capillary electrophoresis-based Sanger sequencers.
At present this very
high throughput is achieved with sacrifices in length and accuracy of the
individual reads when compared to Sanger sequencing. Nonetheless, assemblies of
such data can be highly accurate because of the high degree of sequence
coverage obtainable. The methods are designed for projects that employ the WGS
approach. They are most readily applied to resequencing, in which sequence data
is aligned with a reference genome sequence in order to look for differences
from that reference. Three platforms for massively parallel DNA sequencing read
production are in reasonably widespread use at present: the Roche/454 Life
Sciences GS FLX, the Illumina/ Solexa Genome Analyzer, and the Applied
Biosystems SOLiDTM System.
sequencing: The first of the massively parallel methods to become commercially
available was developed by 454 Life Sciences and is based on the pyrosequencing
technique (Margulies et al, 2005). This system allows shotgun sequencing of
whole genomes without cloning in E. coli or any host cell. First DNA is
randomly sheared and ligated to linker sequences that permit individual
molecules captured on the surface of a bead to be amplified while isolated
within an emulsion droplet. A very large collection of such beads is arrayed in
the 1.6 million wells of a fiber-optic slide.
As with the Sanger
method, sequencing is carried out using primed synthesis by DNA polymerase. The
array is presented with each of the four dNTPs, sequentially, and the amount of
incorporation is monitored by luminometric detection of the pyrophosphate
released (hence the name “pyrosequencing”). A CCD imager coupled to the fiber-optic
array collects the data. In sequencing across a homopolymer run, the run length
is estimated from the amount of pyrophosphate released, which is proportional
to the number of residues incorporated. Errors that result from misjudging the
length of homopolymer runs result in single-base insertions and deletions
(indels). These constitute the major source of errors in 454 data. The 454
Genome Sequencer FLX is reportedly able to produce 100 Mb of sequence with
99.5% accuracy for individual reads averaging over 250 bases in length.
(Solexa) sequencing: The second next-generation sequencing technology to be
released (in 2006) was Illumina (Solexa) sequencing (Bennett, 2004). A key difference
between this method and the 454 is that it uses chain-terminating nucleotides.
The fluorescent label on the terminating base can be removed to leave an
unblocked 3′ terminus, making chain termination a reversible process. The
method reads each base in a homopolymer run in a separate step and therefore
does not produce as many indels within such runs as the 454. Because the
reversible dye terminator nucleotides are not incorporated efficiently, the
read length of the Solexa method is less than for 454. Also more
base-substitution errors are observed due to the use of modified polymerase and
dye terminator nucleotides. The method sequences clusters of DNA molecules
amplified from individual fragments attached randomly on the surface of a flow
cell. Because of the very high densities of clusters that can be analyzed, the
machine can reportedly produce more than one billion bases (1 Gb ) of 75 base
(paired) reads in a single run.
3. ABI’s SOLiD
sequencing: Applied Biosystems has also developed a massively parallel
sequencer, its Supported Oligonucleotide Ligation and Detection system (SOLiD),
released in 2008. The technology is based on hybridization-ligation chemistry
(Shendure et al, 2005). The sample preparation aspect of this technology
including library preparation, clonal amplification of the target DNA by
emulsion PCR on beads is very similar to the 454 processes in principle. However,
the size of the beads used for emPCR (1 µm versus 26 µm) and the array format
(random versus ordered) are different. These differences afford the SOLiD
technology the potential of generating a significantly higher density
sequencing array (potentially over a few hundred fold higher), as well as more flexibility
in terms of sample input format. The sequence interrogation is done through
repeated cycles of hybridization of a mixture of sequencing primers and fluorescently
labeled probes, followed by ligation of the sequencing primers and the probes, then
the detection of the fluorescent signals on the probes which encode the bases
that are being interrogated. Although it has a short read length of about 25 –
35, it can generate ~ 2 – 3 Gb of sequence per run.
Next (3rd) – generation sequencing
sequencing: Single-molecule sequencing was initially conceived as a laser-based
technique that allows the fast sequencing of DNA fragments of 40 kb or more at
a rate of 100–1000 bases per second (Jett et al. 1989).
single molecule sequencing: Helioscope sequencing uses DNA fragments with added
polyA tail adapters, which are attached to the flow cell surface. The next
steps involve extension-based sequencing with cyclic washes of the flow cell
with fluorescently labeled nucleotides. The read are performed by the
Helioscope sequencer. The read are short, up to 55 bases per run, but recent improvement
of the methodology allows more accurate read of homopolymers and RNA
2. Single molecule
SMRT(TM) sequencing: SMRT sequencing is based on the sequencing by synthesis
approach. The DNA is synthesized in so called zero-mode wave-guides (ZMWs) –
small well-like containers with the capturing tools located at the bottom of
the well. The sequencing is performed with use of unmodified polymerase and
fluorescently labelled nucleotides flowing freely in the solution. The wells
are constructed in a way that only the fluorescence occurring by the bottom of
the well is detected. The fluorescent label is detached from the nucleotide at
its incorporation into the DNA strand, leaving an unmodified DNA strand. The
SMTR technology allows detection of nucleotide modifications. This happens
through the observation of polymerase kinetics. This approach allows reads of
3. Single molecule real
time (RNAP) sequencing: This method is based on RNA polymerase (RNAP), which is
attached to a polystyrene bead, with distal end of sequenced DNA is attached to
another bead, with both beads being placed in optical traps. RNAP motion during
transcription brings the beads in closer and their relative distance changes,
which can then be recorded at a single nucleotide resolution. The sequence is
deduced based on the four readouts with lowered concentrations of each of the
four nucleotide types.
DNA sequencing has been applied in forensics science to identify particular
individual because every individual has unique sequence of his/her DNA. It is
particularly used to identify the criminals by finding some proof from the
crime scene in the form of hair, nail, skin or blood samples. DNA sequencing is
also used to determine the paternity of the child. Similarly, it also
identifies the endangered and protected species.
medical research, DNA sequencing can be used to detect the genes which are
associated with some heredity or acquired diseases. Scientists use different
techniques of genetic engineering like gene therapy to identify the defected
genes and replace them with the healthy ones.
DNA sequencing has played vital role in the field of agriculture. The mapping
and sequencing of the whole genome of microorganisms has allowed the
agriculturists to make them useful for the crops and food plants. For example,
specific genes of bacteria have been used in some food plants to increase their
resistance against insects and pests and as a result the productivity and
nutritional value of the plants also increases. These plants can also fulfill
the need of food in poor countries. Similarly, it has been useful in the
production of livestock with improved quality of meat and milk.
Sequencing is used
in molecular biology to study genomes and the proteins they encode. Information obtained using
sequencing allows researchers to identify changes in genes, associations with
diseases and phenotypes, and identify potential drug targets.
Since DNA is an
informative macromolecule in terms of transmission from one generation to
another, DNA sequencing is used in evolutionary biology to study how
different organisms are related and how they evolved.
The field of
metagenomics involves identification of organisms present in a body of water,
sewage, dirt, debris filtered from the air, or swab samples from organisms knowing
which organisms are present in a particular environment is critical to research
in ecology, epidemiology, microbiology, and other fields.
Sequencing enables researchers to determine which types of microbes may be
present in a microbiome.
sequencing in Nepal
DNA sequencing in Nepal
is still an advancing technology. Some work nearer to Sequencing is as follows.
PCR-based diagnosis of citrus
Huanglongbing disease in Nepal.
Molecular characterization and DNA
barcoding of medicinal and aromatic plants of Nepal, using PCR-based and DNA
sequencing based molecular markers.
Exploration, molecular and
biotechnological characterization medicinal plants and fungal biodiversity
of Khumbu region.
Exploration of hot spring thermophiles for
the production of industrially important enzymes.
Exploration, molecular and
biotechnological characterization of probiotic microorganisms of the dairy
products of Nepal.
Himalayan seed bank for utilization of
medicinal and aromatic plants and wild plant biodiversity of Nepal.
Ancient DNA Sequenced from ‘Sky Cave’
Burials in Nepal.
and the Maxam & Gilbert method was the milestone to the DNA sequencing.
With the increase application and improvement in sequencing methods, a growth in scale
from a few kilobases to the first human genome, and now to millions of human
and a myriad of other genomes with next (2nd) – generation
sequencing method and next next (3rd) – generation sequencing method.
There are good prospects for the emergence of new and non-conventional methods
of DNA sequencing, which may one day revolutionize the field of DNA sequencing.. DNA sequencing
has been extensively and creatively repurposed, including as a ‘counter’ for a
vast range of molecular phenomena and also has wide range of application
field such as molecular biology, medicine, agriculture, forensic and