Protocols | Transgene Design
A transgene is an artificial gene, and transgene design must incorporate
all appropriate elements critical for gene expression. A simple
construction scheme has been developed that provides the best transgene
expression: each transgene contains a promoter, an intron, a protein coding
sequence (termed the reporter), and transcriptional stop sequence. These
elements are typically assembled in a bacterial plasmid, and sequences are
usually chosen from previous transgenes with proven function. In addition,
the construct must be linearized and prokaryotic sequences removed before
injection into the nucleus of a mouse zygote.
The promoter. The transgene promoter is a regulatory sequence that will
determine in which cells and at what time the transgene is active. The
promoter is typically derived from sequences of a mammalian gene upstream
from the start site of transcription, and has been tested to contain the
appropriate transcriptional regulatory elements. For example, the insulin
promoter sequence spanning from -585 to the start site of transcription
will direct transgene expression exclusively to pancreatic β-cells
(1).
Each promoter will produce the same expression pattern with any protein
reporter. A large number of promoters have now been characterized that
direct a wide variety of transgene expression patterns. When choosing
which promoter to use, it is worthwhile to carefully read the original
papers defining the promoter expression pattern. Subtle differences from
the expression pattern of the original gene are often evident, as well as
differences in developmental expression or expression in ectopic tissues.
The exact promoter sequence that was originally characterized and published
should be used to construct your transgene. The promoter sequence normally
contains the transcriptional start site as well as the transcription
regulatory sequences. In addition, the promoter sequence also typically
contains some extraneous sequence downstream of the transcriptional start
as will be discussed below. Synthetic promoters
have also been designed
for inducible gene expression and other specialized applications.
The reporter (protein coding sequence).
Transgenes are normally designed
to produce a protein, and must contain a valid protein coding sequence
(CDS). This sequence is usually derived from the cDNA for the protein of
interest. The CDS must contain a translational start codon (ATG) and
translational stop codon, plus a Kozak sequence upstream of the start
codon. The ideal Kozak sequence is GCCGCCACC, but the ten nucleotides
found just five prime to any start codon can be assumed to have appropriate
function. Frequently, extra linker sequences are incorporated between the
transcriptional start site and the translational start codon as byproducts
of assembling the transgene fragments. These sequences should be examined
for ATG start codons or other potential regulatory elements
(see below),
but otherwise rarely cause problems. Five prime and three prime
non-translated sequences from the protein coding transcript should be
avoided as much as possible, since these may contain regulatory elements
controlling translation or mRNA stability.
Introns and transcriptional stop sequences.
Transgenes are incorporated
into the murine genome at random sites and transgene expression will vary
depending on the sequences that surround the insertion site. The same
transgene will be more active in one site than another, and a four log
variation in activity is not unusual among insertion sites. This variation
in expression levels complicates the study of factors that influence
transgene activity. However, inclusion of an intron in a transgene
construct results in a significantly greater percentage of active
transgenes (2-4).
In a direct comparison, 6/7 transgenes with an intron
had detectable activity, while 2/5 identical constructs without an intron
had detectable albeit weaker expression (3).
The exact mechanism for this
effect is unknown, but is hypothesized to be related to the known
functional link between transcription and splicing.
Each transgene must also contain a transcriptional stop signal to match the
start signal typically included in the promoter. Eukaryotic
transcriptional stop signals include a polyA addition sequence (AAAUAA) as
well as hundreds of downstream nucleotides where function is important but
not clearly understood. Numerous introns and transcriptional stop
sequences have been tested in transgenes. The most convenient arrangement
is to include a gene or gene fragment at the end of the coding sequence
with both an intron and transcriptional stop sequence. Examples of introns
commonly used are the rabbit β-globin intron or SV40 intron. Examples of
transcriptional stop sequences are those from SV40 or human growth hormone.
Examples of combined sequences are an SV40 intron/stop, the last exon of
human growth hormone plus stop sequences, or the entire human growth
hormone gene.
Transgene linearization and removal of bacterial ori and prokaryotic
sequences. Transgenes are subject to epigenetic regulation, which may be
influenced by the transgene sequence as well as the integration site.
Transgenes that include prokaryotic ori sequences are less likely to be
expressed and to be expressed at lower levels than transgenes without the
prokaryotic sequences. In a direct comparison, 4/4 transgenes without
vector sequences were expressed while only 2/6 transgenes that included
vector sequences were active at detectable levels (5).
In addition,
transgene incorporation into the murine genome is increased orders of
magnitude if the construct is linear as opposed to circular. It is
normally convenient to remove the prokaryotic sequences and purify the
linear transgene fragment with a single restriction endonuclease digest,
and the transgene construction scheme should include a strategy for cutting
out the transgene from the plasmid backbone.
Linker and extra sequences. Typical bacterial cloning methods will result
in inclusion of extra sequences between each segment of the transgene that
derive from sequences between restriction endonuclease sites or plasmid
polylinker sequences. These sequences may be as much as one hundred
nucleotides in length and contain multiple restriction sites, but they do
not normally affect transgene function as long as inadvertent regulatory
elements are not created. For example, the promoter will contain the start
site of transcription and usually tens of nucleotides downstream of the
start site of transcription that will ultimately be incorporated into the
5' untranslated sequence of the transgene transcript. This sequence must
be verified to be free of translational start or stop sites. Any
extraneous sequences must be examined to ensure the absence of unwanted
functional elements. In addition, any plasmid sequences included in the
final linearized transgene construct should be known to be free of
regulatory function. For example, if the transgene is freed from the
plasmid backbone with restriction endonucleases that leave an extra hundred
nucleotides at the ends of the transgene, these extra sequences should be
known to be free of eukaryotic enhancers or promoters that are frequently
present in plasmids.
A note on transgene construction strategy.
Transgenes are normally
assembled from proven promoters, introns, etc., and the protein coding
sequence of interest. A cloning scheme should be designed with a clear
understanding of all the elements and a strategy to free the transgene from
vector sequences once constructed. Meticulous attention to detail in
transgene design will be rewarded by avoiding the time and expense spent
generating a transgene that does not function as anticipated.
Synthetic promoters. Most transgene promoters are derived from endogenous
mammalian gene regulatory sequences, but synthetic promoters with
specialized functions have also been developed. The most common synthetic
promoters respond to synthetic activators as part of a binary system to
allow for inducible gene expression
(6). For example, one transgene will
produce a synthetic transcription factor that contains a prokaryotic
tetracycline binding domain and a tet-operon DNA binding domain coupled to
a eukaryotic transcriptional activating acidic domain. The second
transgene will be constructed of multimerized tetO binding sites upstream
of a minimal promoter that contains only a TATA box. The second transgene
will only be active in the presence of the synthetic transcription factor
and in the absence of tetracycline, allowing inducible, tissue-specific
transgene activation. These transgenes are designed and assembled the same
way as other transgenes.
References
1. Hanahan, D. Heritable formation of pancreatic β-cell tumours in
transgenic mice expressing recombinant insulin/simian virus 40 oncogenes.
Nature 315, 115-122 (1985).
2. Clark, A.J., Archibald, A.L., McClenaghan, M., Simons, J.P., Wallace,
R., Whitelaw, C.B. Enhancing the efficiency of transgene expression. Philos
Trans R Soc Lond B Biol Sci 339, 225-232 (1993).
3. Choi, T., Huang, M., Gorman, C., Jaenisch, R. A generic intron increases
gene expression in transgenic mice. Molecular Cellular Biology 11,
3070-3074 (1991).
4. Duncker, B.P., Davies, P.L., Walker, V.K. Introns boost transgene
expression in Drosophila melanogaster. Mol Gen Genet 254, 291-296 (1997).
5. Kjer-Nielsen, L., Holmberg, K., Perera, J.D., McCluskey, J. Impaired
expression of chimaeric major histocompatibility complex transgenes
associated with plasmid sequences. Transgenic Research 1, 182-187 (1992).
6. Lewandoski, M. Conditional control of gene expression in the mouse. Nat
Rev Genet 2, 743-755 (2001).
or TOP
|