Introduction

The transition from a hunter-gatherer lifestyle to one including food production has had profound effects on the technology, demography, social organization and even the biology of most human populations (Diamond 1998). Two contrasting population models have been put forward for this transition. The first model, demic diffusion, attributes the spread of farming to the local growth and expansion of farmers (Ammerman and Cavalli-Sforza 1973, 1984). The second model, cultural diffusion, involves farming being passed from one local group to the next without substantial movement of farming populations (Edmonson 1961; Zvelebil and Zvelebil 1988). The genetic consequences of these two modes of diffusion are different; the demic diffusion of farmers involves the spread of the farmers’ genes, whereas the cultural diffusion model is expected to leave no genetic signature from the farmers. Genetic evidence has been used to support a demic diffusion model for the expansion of farming to Europe (Barbujani et al. 1994; Menozzi et al. 1978) and to India (Cordaux et al. 2004). It is important to note, however, that these two models are not mutually exclusive and that in fact both processes probably contributed to the expansion of agriculture and pastoralism (Ammerman and Cavalli-Sforza 1984).

The history and pre-history of North Africa has recently been the subject of intense investigation, a main goal of which has been to determine whether contemporary North African populations are primarily the descendents of Neolithic agriculturalists/pastoralists, or whether they have a significant local ancestry that extends into the Mesolithic. The Berber occupy a crucial position in this debate as they have been largely regarded as aboriginal North Africans since antiquity (Brett and Fentress 1996). The 14–20 million Berber speakers are concentrated mainly in Morocco and Algeria but extend into the whole Sahara and part of the northern Sahel. The Berber have a pastoralist component to their subsistence economy (Blench 2001, Table 14). For example, the Iwellemmeden Kel Denneg, a group of Berber-speaking pastoralists from the Sahara, obtains close to half their calories from milk and its products (Bernus 1990). They do not seem to have retained a pre-Neolithic language; the Berber languages clearly belong to the Afro-Asiatic family (Greenberg 1963) whose expansion into Africa is often attributed to the spread of Neolithic populations from its homeland in the Levant (Militarev 2003; Renfrew 1991), although others locate the homeland of Afro-Asiatic in Africa (Diakonoff 1998; Ehret 1984).

It has been argued that the rapid, widespread and sudden development of pastoralism in North Africa is better understood in terms of responses by local hunter-gatherers to aridification without substantial influence from incoming Neolithic farmers and pastoralists from the Middle East (Barker 2003). Although there is general agreement among archaeologists that certain elements of a Neolithic economy were introduced into North Africa and were not developed there independently (Lubell 1984), the observation that the Capsian culture persisted well into the Neolithic indicates a possible persistence of the Mesolithic population and a cultural adoption of agriculture and animal husbandry with little Neolithic admixture (i.e. the cultural diffusion model), rather than extensive occupation by Neolithic populations originating in the Middle East (i.e. the demic diffusion model) (Camps-Fabrer 1989; Camps 1982; MacDonald 1998; McBurney 1967; Roubet 1978).

The genetic evidence is certainly no less controversial. Several studies have suggested that the Berber have a significant local North African ancestry that extends into the Mesolithic (Bosch et al. 1997, 2001; Flores et al. 2000a, b, 2001; Macaulay et al. 1999; Rando et al. 1998). Bosch et al. (1997) argue that the Neolithic transition did not involve massive gene flow into northwest Africa, and Barbujani et al. (1994) found little statistical support for a coupled language-gene Neolithic expansion for the Afro-Asiatic language family from the Levant. While none of these studies rejects the possibility of a Neolithic contribution altogether, they do support the notion that a substantial proportion of the Berber’s genetic ancestry is derived from Mesolithic North African populations.

In sharp contrast to previous results, a recent analysis by Arredi et al. (2004) suggests that the patterns of Y-chromosome variation in North Africa are predominantly of Middle Eastern Neolithic origin. They describe an east-west cline in the frequency of haplogroup E3b2 and an inverse cline for E3b2 STR variation and interpret these results as evidence for a demic diffusion from the Middle East (Arredi et al. 2004). Their time to the most recent common ancestor (TMRCA) calculations for the relevant haplogroups suggest that this demic diffusion into North Africa was of a Neolithic origin and involved the expansion of Afro-Asiatic speaking pastoralists from the Middle East.

The putative expansion of pastoralists from the Middle East into North Africa would have likely brought the tradition of milk drinking along with it. The consensus view, also known as the “culture historical hypothesis”, is that populations who adopted a culture that relied on milk as a main nutritional source co-directed their own biological evolution by creating a selection pressure for lactose tolerance (McCracken 1971; Simoons 1970). A single mutation ~14 kb 5′ of the lactase gene (C-13910T) has been shown to associate completely with lactose tolerance in a large sample of Finns (Enattah et al. 2002). Transfection studies support the notion that this SNP is located in an enhancer element and that the two alleles show a difference in function (Olds and Sibley 2003; Troelsen et al. 2003). However, a recent study by Mulcare et al. (2004) found that −13910T was completely absent in numerous sub-Saharan African dairying populations and therefore concluded that it is not the worldwide causal allele.

The expansion of pastoralists from the Middle East into North Africa would presumably have resulted in the spread of lactose tolerance and thus the recently identified −13910T allele may provide useful insights into questions concerning the spread of dairying. In order to investigate the origin(s) and spread of dairying in North Africa, we genotyped the putatively causal −13910T allele and constructed haplotypes from several polymorphic sites in and around the lactase gene from three Berber populations and compared our results to previously published data. We discuss our results in light of other archaeological, linguistic and genetic evidence.

Materials and methods

Samples

In total, 105 Berber samples were tested from three groups: 33 Mzab–Wargla speakers from the Mzab oases of Algeria; 33 Tamazight speakers from the Middle Atlas mountain range in Morocco; and 39 Tamazight speakers from Amizmiz, a village in the High Atlas mountains of Morocco. The DNA was extracted from blood with standard phenol/chloroform procedure. Informed consent was obtained from all subjects who contributed DNA and the appropriate ethical procedures were followed.

Laboratory analysis

Primers were designed flanking the polymorphism(s) of interest from GenBank sequences M81834 and M6185. All primer sequences are available from the author’s website (http://email.eva.mpg.de/~myles/files/pubs.html). Each 10-μl PCR reaction contained 20 ng genomic DNA, 0.625 mM deoxyribonucleoside triphosphates (dNTPs), 2–3 mM MgCl2, 0.5 mM of each PCR primer (MWG, Ebersberg, Germany) 1× reaction buffer, and 1.25 U Thermoprime Plus DNA Polymerase (ABgene, Epsom, UK). Cycling conditions included an initial denaturation at 95° for 5 min; 35 cycles at 95° for 30 s, T for 30 s, 72° for 30 s, where T is the annealing temperature which varied from 59 to 62° depending on the primer pairs used; and 1 cycle at 72° for 5 min. The PCR products were verified by agarose-gel electrophoresis and purified using a QIAquick PCR purification kit (Qiagen, Crawley, UK). We sequenced each amplified segment in both directions using the original PCR primers and internal primers where necessary. Cycle sequencing was performed in 10-μl reaction volumes by use of the ABI BigDye Terminator Sequencing Kit (Applied Biosystems, Warrington, UK). Cycling conditions included 30 cycles of 95° for 30 s, 55° for 30 s, and 60° for 4 min. After cleaning, the samples were electrophoresed on an ABI 3700 automated sequencer (Applied Biosystems). Sequence trace files were evaluated using DNAStar (DNAStar, Madison, Wis., USA). Polymorphisms were verified by visual evaluation of the individual sequence traces.

Statistical analyses

The allele frequency for each polymorphism was determined by gene counting and the standard error of allele frequencies was calculated as \(s = \sqrt {pq/n} ,\) where p and q are the allele frequencies and n is the number of chromosomes tested. Deviations from Hardy–Weinberg equilibrium were tested using the random-permutation procedure implemented in Arlequin (Schneider et al. 2000). Allele frequency differences between the three Berber populations were evaluated by implementing the exact test of population differentiation (Goudet et al. 1996) and computing a population pairwise Fst matrix (Slatkin 1995) in Arlequin. Maximum-likelihood estimates of haplotypes were conducted by use of the program PHASE (Stephens et al. 2001).

Results

Figure 1 shows the positions of the polymorphic sites analysed. All sites have been described elsewhere (Harvey et al. 1995; Hollox et al. 1999, 2001). The raw genotype data from the three Berber populations are available from the author’s website (http://email.eva.mpg.de/~myles/files/pubs.html). The allele frequencies of the two putatively causal alleles (−13910T and −22018A) and the 11 sites that make up the lactase haplotypes in the Berber are shown in Table 1. Out of the 11 previously identified polymorphisms, three SNPs were found to be invariant in the Berber. These SNPs include C-942G, which occurs at low frequencies only in Asia; A-946G, which occurs at low frequencies only in sub-Saharan Africa; and C458intT, which is also sub-Sahara specific. It is worth noting that there was missing data in the data set used: 41 and nine individuals had one and two missing genotypes, respectively.

Fig. 1
figure 1

The region of interest on chromosome 2q21. The 11 SNPs from the 5′-flanking region, exons 2 and 17 of the lactase gene (LCT) were used to construct haplotypes. The two putatively causal polymorphisms are located in introns 9 and 13 of the neighboring MCM6 gene

Table 1 Allele frequencies for the 11 polymorphic sites that make up the lactase haplotypes and the two putatively causal alleles for lactose tolerance in three Berber populations (n indicates the number of chromosomes). The standard error of each allele frequency was estimated by assuming that the allele frequencies were binomially distributed

The expected genotype frequencies, assuming Hardy–Weinberg equilibrium, were calculated for all ten polymorphic sites in the entire Berber sample and two sites (C-958T and G666A) were found to deviate significantly from equilibrium expectations (P <0.05, with Bonferroni correction). The observed deficiency of heterozygotes at these two sites could be the result of population subdivision within the sample. The results from both the exact test of population differentiation and the population pairwise Fst matrix suggest that there is significant population differentiation within the Berber sample (P<0.05 for each pairwise comparison in at least one test). We therefore tested Hardy–Weinberg equilibrium on the three Berber populations separately and found no significant deviations from equilibrium expectations when correcting for multiple comparisons.

In order to compare lactase haplotype frequencies in the Berber to other populations worldwide, haplotypes of the same 11 polymorphic sites that were used in Hollox et al. (2001) were deduced by the maximum-likelihood approach implemented in the computer program PHASE. The allelic composition of each haplotype is described elsewhere (Hollox et al. 2001). Haplotypes A, B, D, F, J, P and W* were all identified unambiguously by the presence of individuals who were homozygous at all sites or heterozygous at only one site. Haplotypes that were found at frequencies ≤0.01 and were assigned to one or no individuals were left out of the analysis because they were considered non-informative and could be artifacts of the maximum-likelihood procedure. By these criteria, only one haplotype, haplotype W*, was inferred from the Berber data but was not found by Hollox et al. (2001) in their analysis of the same 11 polymorphic sites in 1,338 chromosomes from 11 populations. One Amizmiz individual was homozygous for haplotype W*, so it can be concluded that this haplotype truly exists and is not simply an artifact of maximum-likelihood analysis. Haplotype W* differs from haplotype W by the T to C transition at position 5579 in exon 17 and also from haplotype A by the G to A substitution at position 666 of exon 2. The T5579C polymorphism could be the result of a mutation at a CpG site. The notion that the T5579C site is highly mutable is supported by the fact that it differentiates several haplotypes from one another at many points in the haplotype network (see Fig. 5a in Hollox et al. 2001). Table 2 provides an overview of lactase haplotype frequencies in the three Berber samples and the worldwide sample from Hollox et al. (2001).

Table 2 Worldwide lactase haplotype frequencies. The allelic constitution of each haplotype is described in Hollox et al. (2001). The three Berber populations are found in the final three columns and all other data are from Hollox et al. (2001) (n indicates the number of chromosomes sampled in each population)

The two sites identified by Enattah et al. (2002) as being the two most likely candidates for the causal element for lactose tolerance were typed in the three Berber samples. The −13910T allele showed complete association with lactose tolerance in individuals of Finnish origin while the −22018A showed a tight association with lactose tolerance (Enattah et al. 2002). Our data are consistent with the findings of Enattah et al. (2002) and Poulter et al. (2003), whereby individuals who carry −13910T always carry the −22018A allele, but not vice-versa: there were four individuals in our data set who were homozygous −13910CC and heterozygous −22018GA. Individuals have been found who carry the −22018A allele and are unambiguously lactose intolerant (Enattah et al. 2002; Poulter et al. 2003) and transfection studies have demonstrated that the G-22018A possesses minimal or no enhancement of the lactase promoter (Olds and Sibley 2003; Troelsen et al. 2003). These data support the notion that −22018A is not the causal change and we therefore focus our attention on the most likely causal candidate, −13910T, for the remaining analyses.

The frequency of −13910T in our sample of Mozabites from the Mzab oases in Algeria (0.17) is in close agreement with the results of Bersagelieri et al. (2004) (0.22, n=60 chromosomes) from the same population. Figure 2 summarises data from several data sets and provides an overview of the frequency of −13910T and the frequency of lactose tolerance in populations who rely on milk as a main nutritional source or who have a significant pastoralist component to their subsistence pattern.

Fig. 2
figure 2

Correlation between the frequency of lactose tolerance as measured by lactose tolerance tests and the frequency of lactose tolerance as predicted by the frequency of the -13910T allele assuming Hardy–Weinberg equilibrium in Eurasian, North African and Sub Saharan African populations. The diagonal line represents a perfect correlation. A perfect correlation is not expected because the phenotype data were collected separately from the genotype data. Data on the frequency of the -13910T allele was obtained from or referenced in Swallow (2003), Bersaglieri et al. (2004), Mulcare et al. (2004) and the present study. Frequencies of lactose tolerance are summarized in Swallow (2000, 2003). Populations are abbreviated as follows: NE Northern Europe; US United States (Northern European); IR Ireland; FI Finnland; FR France; IN Northern India; SI Sindhi (Pakistan); IT Northern Italy; RU Russia; SE Southern Europe; FU Fulbe (Cameroon); HA Hausa (Cameroon); BE Berber; GA Gaali (Northern Sudan); WO Wolof (Senegal); SH Shaigi (Northern Sudan); DI Dinka (Southern Sudan); NU Nuer (Southern Sudan and Ethiopia)

It has previously been hypothesised that the high frequency of the A haplotype in Northern Europeans was caused by selection for the linked allele for lactose tolerance (Hollox et al. 2001) and it has been demonstrated that −13910T occurs exclusively on the background of a very extended A haplotype (Poulter et al. 2003). In order to examine which haplotypes are associated with −13910T in the Berber, we looked at which haplotypes were assigned to individual’s homozygous −13910TT and heterozygous −13910CT. All four individuals who were homozygous −13910TT were assigned two A haplotypes. However, eight of the 22 individuals who were heterozygous −13910CT were assigned two non-A haplotypes. All eight individuals were homozygous 666AA, a derived allele that is not part of the A haplotype. Thus, the association of the −13910T allele with non-A haplotypes is not an artifact of maximum-likelihood analysis. None of these individuals had missing data and none of their haplotypes could be inferred unambiguously (all eight individuals were heterozygous ≥3 sites). Four of these eight individuals were assigned haplotypes W and B with probabilities ≥0.94. The genotypes at each site, the haplotype assignments and their associated probabilities as deduced by PHASE for each of these individuals are listed in Table 3.

Table 3 Individuals with the −13910T allele on non-A haplotypes. The genotypes for every polymorphic site that make up the lactase haplotypes are shown along with the genotypes at the two putatively causal sites, C-13910T and G-22018A. Haplotypes were deduced using the program PHASE, which lists the most likely pairs of haplotypes for each individual, together with their probability. Haplotypes that have not been previously identified are shown as multi-locus genotypes with each allele separated by a period

Discussion

The persistence of lactase in the small intestine into adulthood is caused by a cis-acting element, which is believed to be the target of natural selection in populations who have relied on fresh milk as a main nutritional source. Hollox et al. (2001) hypothesised that the unusually high frequency of the A haplotype in Northern Europeans (0.86) is the result of selection for this linked cis-acting element. Haplotype A is present in every population and is the most common haplotype in all populations except the Papua New Guineans, the San and Bantu-speaking South Africans (Hollox et al. 2001). The frequency of the A haplotype in the Berber is 0.25 and represents an intermediate frequency.

Several features of the distribution of haplotypes in the Berber are worth noting. For example, haplotype P was previously found at appreciable frequencies only in sub-Saharan Africa (4 and 10% in the Bantu and San, respectively) and is found at relatively high frequencies in the Berber (20 and 11% in the Amizmiz and Mzab, respectively). In addition, haplotypes X and Y, which were previously found only in the Bantu (X=0.07) and the San (Y=0.13), were observed in the Moyen Atlas (X=0.03) and the Mzab (Y=0.02), albeit at relatively low frequencies. The distribution of haplotypes P, X and Y are consistent with previous studies which have demonstrated that the Sahara has been a strong but not impenetrable barrier to gene flow (Arredi et al. 2004; Bosch et al. 2001; Comas et al. 2000; Rando et al. 1998).

It has been shown that the patterns of variation in and surrounding the lactase gene show a strong signature of recent positive selection (Bersaglieri et al. 2004) and that the putatively causal allele for lactose tolerance, −13910T, occurs on an A haplotype that extends over at least 1 Mb in Europeans (Poulter et al. 2003). Transfection studies provide additional evidence for a functional role for −13910T (Olds and Sibley 2003; Troelsen et al. 2003). Lactose tolerance is associated with the A haplotype (Harvey et al. 1998), although occasionally lactose tolerance was found in combination with a non-A haplotype (Harvey et al. 1998; Wang et al. 1995). In a sample of Northern Europeans, all non-A chromosomes carried −13910C and all the −13910T alleles were carried on A haplotype chromosomes (Poulter et al. 2003). In the Berber, eight individuals carried −13910T on a non-A chromosome (see Table 3). This observation could be the result of a recombination event which disassociated −13910T from the rest of the A haplotype. Another possibility is that −13910T arose independently on non-A haplotype backgrounds. Additional studies that look at the relationship between the lactase haplotypes and the −13910T allele, especially in Africa, will help to elucidate the history of this polymorphism.

We suggested that the frequency and distribution of −13910T could aid us in elucidating the origin(s) and diffusion(s) of dairying culture. It has previously been shown that, while the frequency of −13910T strongly predicts the frequency of lactose tolerance in Northern Europeans, this is not the case for many sub-Saharan milk-drinking populations. In fact, −13910T was found to be completely absent in several dairying populations from Malawi, Senegal, Sudan and Ethiopia (Mulcare et al. 2004). Figure 2 shows that the frequency of lactose tolerance is strongly predicted by the frequency of −13910T in the Berber and all of the Eurasian populations for which data was available, but that this is generally not the case in sub-Saharan Africa. The two sub-Saharan exceptions, the Fulbe and the Hausa from Cameroon, could be the result of a proposed recent back-migration from Asia into sub-Saharan Africa (Cruciani et al. 2002), which may have introduced the −13910T allele into these populations (Mulcare et al. 2004). We propose that the observed correlation between the frequencies of the −13910T allele and lactose tolerance in Eurasian and Berber populations and the absence of −13910T in several sub-Saharan African dairying populations supports a scenario in which the Berber share a dairying origin with European and Asian pastoralists but not with sub-Saharan African pastoralists.

This conclusion rests on a correlational analysis: it has not been demonstrated that the −13910T allele causes lactose tolerance. Also, since the mutation(s) responsible for lactose tolerance in sub-Saharan African dairying populations have yet to be identified, it is not known whether the mutation(s) are shared with North African or other populations. Future studies that involve the determination of an individual’s genotype and lactose tolerance phenotype in numerous African dairying populations will be required before any robust conclusions about shared dairying origins can be drawn.

It could be argued that −13910T was geographically widespread in the Mesolithic, that selection raised it to high frequencies in populations who independently adopted dairying and that the correlation between −13910T frequency and lactose tolerance among populations does not necessarily point to a shared dairying origin. Poulter et al. (2003) compared the occurrence of −13910T with many alleles that subdivide the A haplotype and suggest that −13910T is the most recent of these alleles. In addition, Bersagelieri et al. (2004) show that −13910T is likely not so old as to pre-date the differentiation of European and African populations. They found −13910T at high frequencies in Pakistan and at somewhat lower frequencies in the Middle East and noted that −13910T was observed exclusively on the A haplotype in these populations as in Europeans. Although we observed −13910T on non-A haplotypes in eight individuals (see Table 3), it was primarily found in association with the A haplotype in the Berber. Thus, in general the data suggest that −13910T is a relatively young mutation and that it was most likely not geographically widespread among pre-Neolithic populations. However, our finding that −13910T is found on non-A haplotypes will hopefully stimulate further studies that aim to estimate the age of the −13910T allele.

The possibility that −13910T was introduced from Europe via the Gibraltar Strait seems unlikely since numerous studies have shown that the Gibraltar Strait has been a strong boundary to gene flow (Arredi et al. 2004; Bosch et al. 1999, 2000, 2001; Rando et al. 1998; Simoni et al. 1999) and −13910T reaches an appreciable frequency (0.17) in our Berber sample. In addition, both historical records (McEvedy 1980) and genetic evidence (Bosch et al. 1997, 1999, 2000; Comas et al. 2000; Flores et al. 2001) support the notion that more recent historical processes [e.g. invasions into North Africa by the Phoenicians (814 BC), the Vandals (AD 429), the Byzantines (AD 533) and the Arabs (7th century)], had very limited demographic impact.

The use of domestic ovicaprids (i.e. sheep and goats) is believed to be the first type of farming to be practiced in the coastal regions of North Africa (Phillipson 1993) and it dates back to ~7,000 BP (McBurney 1967; Roubet 1978). It is widely believed that ovicaprids were domesticated in the Near East and introduced into North Africa (Clutton-Brock 1993; Higgs 1967). Also, archaeological evidence indicates that the change to pastoralism in coastal North Africa was abrupt and not developed locally over a long period of time (Holl 1998). It has been suggested, based on linguistic evidence, that the ovicaprid herders who spread west from the Nile Valley from ~7,000 BP were speaking some form of Berber (Blench 2001) and recent Y-chromosomal data support a model in which the Neolithic transition in North Africa involved the demic diffusion of pastoralists from the Middle East (Arredi et al. 2004). The data presented here are consistent with a scenario in which proto-Berber-speaking ovicaprid pastoralists introduced the −13910T allele, and thereby lactose tolerance, into North Africa. This scenario implies a genetic input from migrating pastoralists from the Middle East and suggests that contemporary Berber populations share a Middle Eastern dairying origin with other Eurasian populations.