Abstract
Three codes are reported for storing written information in DNA. We refer to these codes as the Huffman code, the comma code and the alternating code. The Huffman code was devised using Huffman's algorithm for constructing economical codes. The comma code uses a single base to punctuate the message, creating an automatic reading frame and DNA which is obviously artificial. The alternating code comprises an alternating sequence of purines and pyrimidines, again creating DNA that is clearly artificial. The Huffman code would be useful for routine, short-term storage purposes, supposing – not unrealistically – that very fast methods for assembling and sequencing large pieces of DNA can be developed. The other two codes would be better suited to archiving data over long periods of time (hundreds to thousands of years).
Similar content being viewed by others
References
Abramson N (1963) Information Theory and Coding. New York: McGraw-Hill, pp. 77–81.
Anonymous (2000) A Y3K bug. Nat. Biotechnol. 18: 1.
Bancroft C, Bowler T, Bloom B, Clelland CT (2001) Long-term storage of information in DNA. Science 293: 1763–1765.
Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao J-I, Luo S, Kirchner JJ, Eletr S, DuBridge RB, Burcham T, Albrecht G (2000) In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc. Natl. Acad. Sci. USA 97: 1665–1670.
Cello J, Paul AV, Wimmer E (2002) Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template. Science 297: 1016–1018.
Cox JPL (2001) Long-term data storage in DNA. Trends Biotechnol. 19: 247–250.
Crick FHC, Griffith JS, Orgel LE (1957) Codes without commas. Proc. Natl. Acad. Sci. USA 43: 416–421.
Doig AJ (1997) Improving the efficiency of the genetic code by varying the codon length – the perfect genetic code. J. Theor. Biol. 188: 355–360.
Gehani A, LaBean TH, Reif JH (2000) DNA-based cryptography. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 54: 233–249.
Golomb SW (1962) Efficient coding for the desoxyribonucleic channel. Proceedings of the Fifteenth Symposium for Applied Mathematics 14: 87–100.
Huffman DA (1952) A method for the construction of minimumredundancy codes. Proceedings of the I.R.E. 40: 1098–1101.
Sauer B (1996) Multiplex Cre/lox recombination permits selective site-specific DNA targeting to both a natural and an engineered site in the yeast genome. Nucl. Acids Res. 24: 4608–4613.
Singh S (1999) The Code Book. London: Fourth Estate, p. 19.
Stemmer WPC, Crameri A, Ha KD, Brennan TM, Heyneker HL (1995) Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene 164: 49–53.
Withers-Martinez C, Carpenter EP, Hackett F, Ely B, Sajid M, Grainger M, Blackman MJ (1999) PCR-based gene synthesis as an efficient approach for expression of the A+T-rich malaria genome. Prot. Eng. 12: 1113–1120.
Wong PC, Wong KK, Foote H (2003) Organic data memory using the DNA approach. Commun. ACM 46: 95–98.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Smith, G.C., Fiddes, C.C., Hawkins, J.P. et al. Some possible codes for encrypting data in DNA. Biotechnology Letters 25, 1125–1130 (2003). https://doi.org/10.1023/A:1024539608706
Issue Date:
DOI: https://doi.org/10.1023/A:1024539608706