Skip to main content
Log in

Some possible codes for encrypting data in DNA

  • Published:
Biotechnology Letters Aims and scope Submit manuscript

Abstract

Three codes are reported for storing written information in DNA. We refer to these codes as the Huffman code, the comma code and the alternating code. The Huffman code was devised using Huffman's algorithm for constructing economical codes. The comma code uses a single base to punctuate the message, creating an automatic reading frame and DNA which is obviously artificial. The alternating code comprises an alternating sequence of purines and pyrimidines, again creating DNA that is clearly artificial. The Huffman code would be useful for routine, short-term storage purposes, supposing – not unrealistically – that very fast methods for assembling and sequencing large pieces of DNA can be developed. The other two codes would be better suited to archiving data over long periods of time (hundreds to thousands of years).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abramson N (1963) Information Theory and Coding. New York: McGraw-Hill, pp. 77–81.

    Google Scholar 

  • Anonymous (2000) A Y3K bug. Nat. Biotechnol. 18: 1.

    Google Scholar 

  • Bancroft C, Bowler T, Bloom B, Clelland CT (2001) Long-term storage of information in DNA. Science 293: 1763–1765.

    Google Scholar 

  • Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao J-I, Luo S, Kirchner JJ, Eletr S, DuBridge RB, Burcham T, Albrecht G (2000) In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc. Natl. Acad. Sci. USA 97: 1665–1670.

    Google Scholar 

  • Cello J, Paul AV, Wimmer E (2002) Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template. Science 297: 1016–1018.

    Google Scholar 

  • Cox JPL (2001) Long-term data storage in DNA. Trends Biotechnol. 19: 247–250.

    Google Scholar 

  • Crick FHC, Griffith JS, Orgel LE (1957) Codes without commas. Proc. Natl. Acad. Sci. USA 43: 416–421.

    Google Scholar 

  • Doig AJ (1997) Improving the efficiency of the genetic code by varying the codon length – the perfect genetic code. J. Theor. Biol. 188: 355–360.

    Google Scholar 

  • Gehani A, LaBean TH, Reif JH (2000) DNA-based cryptography. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 54: 233–249.

    Google Scholar 

  • Golomb SW (1962) Efficient coding for the desoxyribonucleic channel. Proceedings of the Fifteenth Symposium for Applied Mathematics 14: 87–100.

    Google Scholar 

  • Huffman DA (1952) A method for the construction of minimumredundancy codes. Proceedings of the I.R.E. 40: 1098–1101.

    Google Scholar 

  • Sauer B (1996) Multiplex Cre/lox recombination permits selective site-specific DNA targeting to both a natural and an engineered site in the yeast genome. Nucl. Acids Res. 24: 4608–4613.

    Google Scholar 

  • Singh S (1999) The Code Book. London: Fourth Estate, p. 19.

    Google Scholar 

  • Stemmer WPC, Crameri A, Ha KD, Brennan TM, Heyneker HL (1995) Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene 164: 49–53.

    Google Scholar 

  • Withers-Martinez C, Carpenter EP, Hackett F, Ely B, Sajid M, Grainger M, Blackman MJ (1999) PCR-based gene synthesis as an efficient approach for expression of the A+T-rich malaria genome. Prot. Eng. 12: 1113–1120.

    Google Scholar 

  • Wong PC, Wong KK, Foote H (2003) Organic data memory using the DNA approach. Commun. ACM 46: 95–98.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan P.L. Cox.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smith, G.C., Fiddes, C.C., Hawkins, J.P. et al. Some possible codes for encrypting data in DNA. Biotechnology Letters 25, 1125–1130 (2003). https://doi.org/10.1023/A:1024539608706

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024539608706

Navigation