Skip to main content

SNP-based genetic linkage map of tobacco (Nicotiana tabacum L.) using next-generation RAD sequencing

Abstract

Background

Tobacco (Nicotiana tabacum L.) is an important model system, which has been widely used in plant physiological studies and it is particularly useful as a bioreactor. Despite its importance, only limited molecular marker resources are available for genome analysis, genetic mapping and breeding. Restriction-site associated DNA sequencing (RAD-seq) is a powerful new method for targeted sequencing across the genomes of many individuals. This approach has broad potential for genetic analysis through linkage mapping.

Results

We constructed a RAD library using genomic DNA from a BC1 backcross population. Sequencing of 196 individuals was performed on an Illumina HiSeq 2500. Two linkage maps were constructed, one with a reference genome and another, termed as de novo identification of single nucleotide polymorphism (SNP) by RAD-seq, without a reference genome. Overall, 4138 and 2162 SNP markers with a total length of 1944.74 and 2000.9 cM were mapped to 24 linkage groups in the genetic maps based on reference genome and without reference, respectively.

Conclusions

Using two different SNP discovery methods based on next generation RAD sequencing technology, we have respectively mapped 2162 and 4318 SNPs in our backcross population. This study gives an excellent example for high density linkage map construction, irrespective of genome sequence availability, and provides saturated information for downstream genetic investigations such as quantitative trait locus analyses or genomic selection (e.g. bioreactor suitable cultivars).

Background

Tobacco (Nicotiana tabacum L., 2n = 4x = 48) is an important model system in plant biotechnology [1], due to its unique advantages over other plant species. It not only has relatively short generation time and high protein content, but also can be easily genetically transformed [2, 3]. For this reason, tobacco has been widely used in studies on plant response to pathogens [4], pyridine alkaloid (like nicotine) biosynthesis [5], cell cycle [6, 7], oxidative stress [8] and pollen tube development [9]. More importantly, tobacco is an attractive green bioreactor proved to be able to produce a wide range of therapeutic proteins including antibodies [10–12], vaccines [13, 14] and immunomodulatory molecules such as cytokines [15, 16].

Despite the prospective applications of tobacco in pharmaceutical production, limited cultivars exist with low nicotine and alkaloid contents. Breeding new cultivars suitable for pharmaceutical production is further complicated by the paltry genomic information available to the public. Genetic linkage mapping based on molecular markers permits the elucidation of genome structure and organization [17]. It provides critical information for quantitative trait locus (QTL) marker assisted selection. For some economic plants, including potato (Solanum tuberosum), tomato (Solanum lycopersicum), eggplant (Solanum melongena), pepper (Capsicum species) and Petunia (Petunia hybrida), whole genome sequencing and genetic linkage maps have elucidated their genome structures and assisted breeding cultivars with molecular markers [18]. Therefore, a high density genome-based linkage map of the tetraploid tobacco will improve current genetic research tools in search of new cultivars. Thus far, linkage maps for tobacco have been constructed by using low-throughput molecular markers like simple sequence repeats (SSRs), which resulted in low density linkage maps [19, 20].

Single nucleotide polymorphisms (SNPs) as the most abundant type of DNA variations are currently used as genetic markers for their wide distribution in the genome [21]. Compared to genetic markers based on size discrimination or hybridization, SNPs directly interrogate sequence variation and possess the potential of reducing genotyping errors [22]. SNP discovery is amenable to high-throughput next-generation sequencing (NGS) technologies, which produce DNA sequences at a rate several orders of magnitude faster than conventional sequencing methods [17].

According to unpublished data, the genome size of tobacco is approximately 4.5 Gb. Because of the huge genome, great challenges must be faced up to. Reduced representation library sequencing is an energetic approach, which has been used for many genome studies [23]. Restriction site associated DNA sequencing (RAD-seq) technology [24–26] facilitates genetic variant discovery by allowing ortholog sequences to be targeted in multiple individuals [27]. This method relies on sequencing of DNA regions flanking the restriction sites of specific restriction enzymes. In brief, DNA fragments from the digestion of a chosen restriction enzyme are ligated with an adapter, which contains a molecular identifying sequence (MID) unique to each sample. The DNA sequences flanking each restriction site are sequenced via the massively parallel Illumina sequencing technology [28]. RAD sequencing is highly successful in re-identifying genomic regions controlling known phenotypes [29–31].

To generate a high density genome linkage map for tobacco, we have developed here 4138 SNP markers using the Illumina HiSeq 2500 high-throughput platform. The mapping population was generated by crossing two tobacco (N. tabacum L.) cultivars. The F1 progeny was back-crossed to the parents. A total of 193 progenies were generated and all individuals were used for linkage map construction. We conducted SNP detection both with and without a reference genome, the latter referred to as de novo identification of SNP by RAD-seq (DISR). We compared these two methods and constructed a genetic map of tobacco based on a backcross (BC1) population.

Results

RAD library preparation and sequencing

A total of 196 sampled individuals from three generations, HD (Hong hua Da jin yuan), RBST (Resistance to Black Shank Tobacco), F1 (HD × RBST) and 193 BC1 progenies were used in the construction of 10 libraries used for RAD-sequencing (Table 1). In summary, 2641 Gb of raw data containing 26.4 billion pair-end 2 × 100 bp raw reads for approximately 2640 billion base pairs were obtained. Library detail information is provided in Additional file 1. We removed the following types of reads: (a) reads with >10 % unidentified nucleotides (N), (b) reads with >40 bases having Phred quality ≤7, and (c) putative PCR duplicates generated by PCR amplification in the library construction process (i.e., read 1 and read 2 of two paired-end reads that were completely identical). These reads were stringently filtered from the index sequences to get clean data for each sample (Fig. 1). Totally, 2481 Gb clean data contain 24.8 billion clean reads after filtering with an average volume of 12.11 Gb for each sample, at an average sequencing depth of 2.7× (the unpublished tobacco genome size is approximately 4.5 Gb).

Table 1 Library information and data output
Fig. 1
figure 1

The statistic of read number for each sample

SNP calling and genotyping

Two distinct protocols were executed in SNP calling and genotyping: the first was with a reference genome; the second was without a reference genome, which we refer to as DISR. In the first protocol, 24.8 billion clean reads were aligned to the reference sequences (unpublished data) using SOAPaligner [32] (Release 2.21, http://soap.genomics.org.cn/). The mapping results were processed with Samtools [33]. Variations were called using the Unified Genotyper (Version 3.1, Genome Analysis Tool Kit) [34]. Any nucleotide difference between reads and the reference genome was initially called as variant. A large volume output of 7,343,419 raw SNPs suggested improvement in data assemblage. Three parameters (genotype coverage, genotype quality, and SNP quality) generated by the Unified Genotyper were used as criteria for filtering variant output.

Using a maximum missing data (MMD) threshold of 45 % in the BC1 population for each locus, a total of 8664 SNPs (p < 0.01) were recovered. Although the criteria are much looser than many other studies [31], the effective genotype size is larger than 100, which is sufficient for linkage map construction. In total, 5286 markers (χ 2 < 15) were selected for genetic map construction by using JoinMap 4.0 [35] (Table 2).

Table 2 Statistics for SNPs based on the two different methods

In the second protocol (DISR), 181,770 raw SNPs were obtained after the clean reads were processed. Using the same MMD threshold as the first protocol, a total of 7457 SNPs (p < 0.01) were recovered. In total, 3282 markers were then selected (by the χ 2 test) for the construction of genetic map in JoinMap 4.0 [35] (Table 2).

Linkage mapping

The first linkage map from sequence with reference genome was constructed with a total of 8664 SNPs (p < 0.01) which generated 4138 markers and mapped 24 linkage groups (LGs) successfully with a total length of 1944.74 cM. The LGs ranged from 33.58 to 129.176 cM in length. Six LGs contained over 220 marker loci. LG09, LG23 and LG24 were the shortest LGs, spanning 73.937–107.485 cM, respectively, and comprising 65 loci, whereas LG05 was the largest LG, spanning 60.73 cM, containing 494 loci with marker density of 0.123 cM/locus. The marker densities ranged from 0.117 cM/locus in LG12 to 1.679 cM/locus in LG23, resulting in an average distance of 0.712 cM between markers for the entire map (Table 3; Fig. 2).

Table 3 Statistics of 24 linkage groups with the reference genome
Fig. 2
figure 2

Linkage maps based on the reference genome. This was constructed with a total of 8664 SNPs (p < 0.01) which generated 4138 markers mapping 24 linkage groups (LGs) successfully with a total length of 1944.74 cM. The LGs distance ranged from 33.58 to 129.176 cM. Six LGs contained over 220 marker loci and for these LGs Haldane’s map unit is used while for other LGs we used Kosambi’s map unit. The LG09, LG23 and LG24 were the shortest LGs, spanning 73.937–107.485 cM, respectively, and comprising 65 loci, whereas LG05 was the longest LG, spanning 60.73 cM and containing 494 loci with a marker density of 0.123 cM/locus. The marker densities ranged from 0.117 cM/locus in LG12 to 1.679 cM/locus in LG23, resulting in an average distance of 0.712 cM between markers for the entire map

The second linkage map from DISR was constructed with 7457 SNPs that gave 3282 markers. Out of those, 2162 markers successfully mapped 24 LGs with a total length of 2700.9 cM. The LGs ranged from 58.1 to 238.4 cM in length, and only one LG contained over 220 marker loci. LG24 was the shortest LG, comprising only 13 loci, whereas LG01 was the largest LG, spanning 159.9 cM and containing 224 loci with marker density of 0.7 cM/locus. The marker densities ranged from 0.5 cM/locus in LG02 to 5.6 cM/locus in LG24, resulting in an average distance of 1.8 cM between markers for the entire map (Table 4; Fig. 3).

Table 4 Statistics of 24 linkage groups without the reference genome (DISR)
Fig. 3
figure 3

Linkage maps based on DISR. This map was constructed with 7457 SNPs that produced 3282 markers. Out of those, 2162 markers successfully mapped 24 LGs with a total length of 2700.9 cM. The LGs ranged from 58.1 to 238.4 cM in length. LG24 was the shortest LG, comprising only 13 loci, whereas LG01 was the longest, spanning 159.9 cM and containing 224 loci with a marker density of 0.7 cM/locus (map unit determined by Haldane’s distance while for other LGs Kosambi’s distance was used). The marker densities ranged from 0.5 cM/locus in LG02 to 5.6 cM/locus in LG24, resulting in an average distance of 1.8 cM between markers for the entire map

Comparison of the DISR and the reference genome methods

Comparison was performed by presenting the ratio of the marker overlaps between the genetic maps based on reference genome and DISR. The consensus sequence was mapped back to the reference genome to mark the loci of the SNPs. After this process, the markers from the DISR method were compared with the markers generated from the reference genome method. Consistent markers were recorded and presented as a Venn diagram. In total, 677 overlapping markers, constituting 30 % of the DISR map and 16 % of the map based on reference genome were observed. All in all, 1535 makers were specified for the DISR map and 3461 markers for the map based on reference genome (Fig. 4).

Fig. 4
figure 4

Comparison of the two map versions. In total, 677 overlapping markers, constituting 30 % of the DISR map and 16 % of the map based on the reference genome were observed. All in all, 1535 makers were specified for the DISR map and 3461 markers for the map based on the reference genome

Discussion

Although tobacco has been proved to be an attractive green bioreactor for the production of therapeutic proteins, the paucity of cultivars with low nicotine and alkaloid contents has blocked its movement from bench to field scale. A high density genetic map can provide sufficient information to accelerate the genome breeding. Previous attempts for genetic linkage map construction for tobacco were achieved by using molecular marker based techniques, including restriction fragment length polymorphism (RFLP) [36], conserved ortholog sequences (COS) [37] and simple sequence repeat (SSR) markers [19, 20]. As the best of the three linkage maps, the SSR linkage map comprises 2318 SSR markers mapping to 2363 loci in 24 clearly defined LGs with a total length of 3270 cM [19] (Table 5). In comparison, our technique generated 4138 SNP markers for tobacco that defined 24 LGs with a total coverage of 1944.7 cM. This result is not only an improvement over those of previous reports, but also a confirmation of SNPs in providing excellent marker density for linkage mapping and genomic selection [38]. To our knowledge, the tobacco linkage maps from this study, particularly the map generated with a reference genome, provide the highest number of markers among all available population-specific linkage maps.

Table 5 Comparison of linkage maps for tobacco

The Mendelian basis of quantitative traits provides a genetic framework for the dissection of polygenic traits [39] and can pave the way for the identification of candidate loci controlling the inheritance of complex traits. NGS technology makes it possible to achieve dense SNP marker coverage of genomes without the need for a reference sequence [24, 26]. An example of this is restriction-associated DNA sequencing (RAD-seq), which was originally developed as a tool for genetic mapping in fish and fungi [29] and later expanded to many other species, including plants (Lolium perenne L., Momordica charantia, Corchorus olitorius L.) [25, 30, 40, 41]. In this study, a separate linkage map via the DISR method was also obtained, which did not need a reference genome. The DISR linkage map contains 2162 markers with a total coverage of 2700.9 cM and an average distance of 1.8 cM between markers. It demonstrates that these two high density linkage maps are compelling tools for gene (Table 5) and QTL mapping and marker-assisted breeding [42].

A comparison of the two maps showed an overlap of 677 markers (Fig. 4). We compared the ratios of overlaps between the two protocols and found that the use of a reference genome was more efficient than without a reference genome. In the method of DISR, the information of only one end of the pair reads is used for the SNP calling. However, if we conduct the SNP calling with a reference genome, whole genome information is used. This kind of experiment is often required in nature, particularly in building linkage maps for species that do not have a complete genome sequence database. However, an integration of the two protocols could result in a higher density map and thus, assist in the breeding of other low nicotine and alkaloid content cultivars.

Conclusions

Using next generation RAD sequencing technology for two distinct SNP discovery methods, we have respectively mapped 2162 and 4318 SNPs in tobacco. This study gives an excellent example for high density linkage map construction, irrespective of reference genome sequence availability, and provides saturated information for downstream genetic investigations such as QTL analyses or genomic selection (e.g. bioreactor suitable cultivars).

Methods

Mapping population

Two tobacco varieties, Hong hua Da jin yuan (HD) and Resistance to Black Shank Tobacco (RBST) were used to develop the BC1 inbred population. HD is a high leaf mass cultivar from southwest of China. RBST has high resistance to tobacco black shank disease. The BC1 inbred population was generated through a (HD × RBST) × HD crossing in a breeding unit in Yuxi of Yunnan Province.

RAD library preparation and sequencing

Fresh young leaves were collected from HD, RBST, F1 (HD × RBST) and 193 individuals of BC1 (F1 × HD) population. Leaf samples were snap frozen in liquid nitrogen and stored at −80 °C. Genomic DNA isolation and purification were conducted using a DNA extraction kit (Qiagen). DNA quality was analyzed in 1 % agarose gel. The concentration of extracted DNA was determined by a spectrophotometer. Approximately 15 μg of purified DNA was processed to obtain 10 RAD libraries, each including about 20 individuals following the protocol of Baird et al. [29] and the instructions of the reagent manufacturers. Genomic DNA from individual samples was digested with EcoRI (New England Biolabs). Individual specific barcodes were ligated with an adaptor by T4 DNA ligase for sample multiplexing. Ligated DNA samples were pooled and sheared, and consequently electrophoresed to isolate DNA fragments with sizes of 300–700 bp in 1.5 % agarose gel. Quick Blunting Kit (New England Biolabs) was used to generate phosphorylated blunt ends. Klenow Fragment (3′ → 5′ exo-; New England Biolabs) was used to add adenosine to the 3′ end. An adapter with divergent ends (P2 adapter) was ligated to enable selective PCR. The samples were PCR-amplified and the libraries purified with MinElute column (Qiagen) to obtain approximately 100 μl (>50 ng μl−1) of sequencing libraries. The obtained RAD libraries were sequenced on an Illumina HiSeq 2500 in 100 bp pair-end reads.

SNP calling with reference genome

The raw reads were removed using the following criteria: (a) reads with >10 % unidentified nucleotides (N), (b) reads with >40 bases having Phred quality ≤7, and (c) putative PCR duplicates generated by PCR amplification in the library construction process (i.e., read 1 and read 2 of two paired-end reads that were completely identical). All the obtained short clean reads were aligned to reference sequences (unpublished data) using SOAPaligner (Release 2.21, http://soap.genomics.org.cn/) [32]. During alignment, long reads with high error rates at 3′-ends were substituted with 5′ 32 bp subsequence as seeds. The entire lengths of the reads were used. Five mismatches in one read were allowed (important arguments: -l 32 -v 5). The mapping results SAM files were converted with Samtools [33]. Variations were called using the Unified Genotyper (Version 3.1, Genome Analysis Tool Kit) [34]. Any nucleotide difference between reads and the reference genome was identified as a variant. This criterion generated a large variant output, which was filtered by three parameters generated with the Unified Genotyper, including genotype coverage, genotype quality, and SNP quality.

SNP calling without reference genome (DISR)

Besides, the method based on reference, we have attempted to call SNPs by DISR. Instead, we used a multistep process to identify RAD tag loci within populations, assign a consensus sequence to each individual at each RAD tag locus, and align consensus sequences across populations (Fig. 5). A flowchart is also provided for clarity in Additional file 2.

Fig. 5
figure 5

SNP calling based on DISR. a Nicotiana tabacum L. has 24 nuclear chromosomes, each of which contains multiple EcoRI cut sites (red marks). The genomic DNA is digested, bar coded with a population-specific sequence, and amplified resulting in multiple sequence reads from each of the RAD tag sites in the genome. Each sequence consists of a population-specific 5-bp barcode (black), the enzyme-recognition sequence (red), and the downstream sequence. b The de novo RAD tag pipeline compares all the sequenced reads and builds clusters of exactly matching tags. c Pair wise comparisons are made between all clusters. d There is a cluster in the locus that is SNP. e The consensus sequence for that RAD tag site within the population

Within each individual, identical reads were aligned together into clusters (other study termed it as stacks) (Fig. 5b–d). The pairwise sequence divergence among clusters was used to group them into putative loci (Fig. 5e). Loci were defined as a set of clusters such that for each cluster there is another cluster in the locus that is at most one nucleotide divergent. Clusters containing excessive numbers of sequence reads can occur when multiple, repetitive sites in the genome are all within a single nucleotide of one another. For this analysis, all clusters with a depth of coverage greater than two standard deviations above the mean cluster depth were removed and the remaining clusters were merged into a locus. For each nucleotide site in a locus, a likelihood ratio test of the read counts of alternative nucleotides was used to test whether the allele frequency of the most observed nucleotide was significantly larger than a threshold p following the method of Emerson et al. [43]. After these processes, an in-house perl script was used to integrate the clusters of parents and F1 progeny into a catalog and create a set of all possible loci in a mapping cross. Then, clusters of BC1 progenies are matched against the catalog to determine the genotype at each locus in every individual in the cross population.

Genotyping and linkage mapping

Distorted markers (p < 0.01) were filtered off to construct a genetic map by a Chi square test (χ 2 < 15 was selected for JoinMap 4.0) [35]. LGs were identified with an independent logarithm of odds (LOD) threshold of 7. Due to the large number of markers segregating in the population, if the number of the linkage group is more than 220, we used (in JoinMap 4.0) a maximum likelihood algorithm mapping the marker order for calculation efficiency [44]. We also calculated genetic distances (cM) using Haldane’s mapping function. However, the scope of corresponding linkage groups (3000–6000 cM) exceeded JoinMap 4.0 and therefore, the linkage length was divided by 100 for map presentation. In other linkage groups whose maker number was equal or less than 220, a linear regression algorithm and Kosambi’s mapping function was used for map construction and genetic distance estimation [45]. Following the initial mapping, potential errors that appeared as doubtful double-recombinants were identified using genotype probabilities function of JoinMap 4.0 [35] (p < 0.001). The suspicious genotype was replaced by a missing value as suggested by Isidore et al. [46] and Van Ooijen [35]. A linkage map was then constructed afresh using the corrected dataset. Potential error elimination and linkage map construction was iterated until no dubious genotype was identified. Markers with >45 % missing values or distorted (χ 2 test, p < 0.001, d.f. = 2) were removed in each step of the iteration.

References

  1. Olmstead RG, Bohs L, Migid HA, Santiago-Valentin E, Garcia VF, Collier SM. A molecular phylogeny of the Solanaceae. Taxon. 2008;57:1159–81.

    Google Scholar 

  2. Komori T, Imayama T, Kato N, Ishida Y, Ueki J, Komari T. Current status of binary vectors and superbinary vectors. Plant Physiol. 2007;145:1155–60.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  3. Sparkes IA, Runions J, Kearns A, Hawes C. Rapid, transient expression of fluorescent fusion proteins in tobacco plants and generation of stably transformed plants. Nat Protoc. 2006;1:2019–25.

    Article  CAS  PubMed  Google Scholar 

  4. Romeis T. Protein kinases in the plant defence response. Curr Opin Plant Biol. 2001;4:407–14.

    Article  CAS  PubMed  Google Scholar 

  5. Häkkinen ST, Tilleman S, Swiatek A, De Sutter V, Rischer H, Vanhoutte I, et al. Functional characterisation of genes involved in pyridine alkaloid biosynthesis in tobacco. Phytochemistry. 2007;68:2773–85.

    Article  PubMed  Google Scholar 

  6. Mironov V, Veylder LD, Montagu MV, Inzéa D. Cyclin-dependent kinases and cell division in plants—the nexus. Plant Cell. 1999;11:509–22.

    PubMed Central  CAS  PubMed  Google Scholar 

  7. Nakagami H, Sekine M, Murakami H, Shinmyo A. Tobacco retinoblastoma-related protein phosphorylated by a distinct cyclin-dependent kinase complex with Cdc2/cyclin D in vitro. Plant J. 1999;18:243–52.

    Article  CAS  PubMed  Google Scholar 

  8. Langebartels C, Kerner K, Leonardi S, Schraudner M, Trost M, Heller W, et al. Biochemical plant responses to ozone: I. Differential induction of polyamine and ethylene biosynthesis in tobacco. Plant Physiol. 1991;95:882–9.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Taylor LP, Hepler PK. Pollen germination and tube growth. Annu Rev Plant Physiol Plant Mol Biol. 1997;48:461–91.

    Article  CAS  PubMed  Google Scholar 

  10. Tremblay R, Wang D, Jevnikar AM, Ma S. Tobacco, a highly efficient green bioreactor for production of therapeutic proteins. Biotechnol Adv. 2010;28:214–21.

    Article  CAS  PubMed  Google Scholar 

  11. Villani ME, Morgun B, Brunetti P, Marusic C, Lombardi R, Pisoni I, et al. Plant pharming of a full-sized, tumour-targeting antibody using different expression strategies. Plant Biotechnol J. 2009;7:59–72.

    Article  CAS  PubMed  Google Scholar 

  12. Sack M, Paetz A, Kunert R, Bomble M, Hesse F, Stiegler G, et al. Functional analysis of the broadly neutralizing human anti-HIV-1 antibody 2F5 produced in transgenic BY-2 suspension cultures. FASEB J. 2007;21:1655–64.

    Article  CAS  PubMed  Google Scholar 

  13. Brandsma M, Wang X, Diao H, Kohalmi SE, Jevnikar AM, Ma S. A proficient approach to the production of therapeutic glucagon-like peptide-1 (GLP-1) in transgenic plants. Open Biotechnol J. 2009;3:9.

    Article  Google Scholar 

  14. Burtin D, Chabre H, Olagnier D, Didierlaurent A, Couret MN, Comeau D, et al. Production of native and modified recombinant Der p 1 molecules in tobacco plants. Clin Exp Allergy. 2009;39:10.

    Article  Google Scholar 

  15. Menassa R, Du C, Yin Z, Ma S, Poussier P, Brandle J, et al. Therapeutic effectiveness of orally administered transgenic low-alkaloid tobacco expressing human interleukin-10 in a mouse model of colitis. Plant Biotechnol J. 2007;5:50–9.

    Article  CAS  PubMed  Google Scholar 

  16. Wang D, Brandsma M, Yin Z, Wang A, Jevnikar A, Ma S. A novel platform for biologically active recombinant human interleukin-13 production. Plant Biotechnol J. 2008;6:11.

    Article  CAS  Google Scholar 

  17. Zhou X, Xia Y, Ren X, Chen Y, Huang L, Huang S, et al. Construction of a SNP-based genetic linkage map in cultivated peanut based on large scale marker development using next-generation double-digest restriction-site-associated DNA sequencing (ddRADseq). BMC Genom. 2014;15:351.

    Article  Google Scholar 

  18. Foolad MR, Panthee DR. Marker-assisted selection in tomato breeding. Crit Rev Plant Sci. 2012;31:93–123.

    Article  Google Scholar 

  19. Bindler G, Plieske J, Bakaher N, Gunduz I, Ivanov N, Van der Hoeven R, et al. A high density genetic map of tobacco (Nicotiana tabacum L.) obtained from large scale microsatellite marker development. Theor Appl Genet. 2011;123:219–30.

    Article  PubMed Central  PubMed  Google Scholar 

  20. Bindler G, van der Hoeven R, Gunduz I, Plieske J, Ganal M, Rossi L, et al. A microsatellite marker based linkage map of tobacco. Theor Appl Genet. 2006;114:341–9.

    Article  PubMed  Google Scholar 

  21. Brookes AJ. The essence of SNPs. Gene. 1999;234:177–86.

    Article  CAS  PubMed  Google Scholar 

  22. Oliver RE, Lazo GR, Lutz JD, Rubenfield MJ, Tinker NA, Anderson JM, et al. Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology. BMC Genom. 2011;12:77.

    Article  CAS  Google Scholar 

  23. Young AL, Abaan HO, Zerbino D, Mullikin JC, Birney E, Margulies EH. A new strategy for genome assembly using short sequence reads and reduced representation libraries. Genome Res. 2010;20:249–56.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Barchi L, Lanteri S, Portis E, Acquadro A, Valè G, Toppino L, et al. Identification of SNP and SSR markers in eggplant using RAD tag sequencing. BMC Genom. 2011;12:304.

    Article  CAS  Google Scholar 

  25. Hegarty M, Yadav R, Lee M, Armstead I, Sanderson R, Scollan N, et al. Genotyping by RAD sequencing enables mapping of fatty acid composition traits in perennial ryegrass (Lolium perenne(L.)). Plant Biotechnol J. 2013;11:572–81.

    Article  CAS  PubMed  Google Scholar 

  26. Bonaventure G, Barchi L, Lanteri S, Portis E, Valè G, Volante A, et al. A RAD tag derived marker based eggplant linkage map and the location of QTLs determining anthocyanin pigmentation. PLoS One. 2012;7:e43740.

    Article  Google Scholar 

  27. Baxter SW, Davey JW, Johnston JS, Shelton AM, Heckel DG, Jiggins CD, et al. Linkage mapping and comparative genomics using next-generation RAD sequencing of a non-model organism. PLoS One. 2011;6:e19315.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–9.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  29. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3:e3376.

    Article  PubMed Central  PubMed  Google Scholar 

  30. Pfender WF, Saha MC, Johnson EA, Slabaugh MB. Mapping with RAD (restriction-site associated DNA) markers to rapidly identify QTL for stem rust resistance in Lolium perenne. Theor Appl Genet. 2011;122:1467–80.

    Article  CAS  PubMed  Google Scholar 

  31. Chutimanitsakun Y, Nipper RW, Cuesta-Marcos A, Cistué L, Filichkina T, Johnson EA, et al. Construction and application for QTL analysis of a restriction site associated DNA (RAD) linkage map in barley. BMC Genom. 2011;12:4.

    Article  CAS  Google Scholar 

  32. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009;25:1966–7.

    Article  CAS  PubMed  Google Scholar 

  33. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.

    Article  PubMed Central  PubMed  Google Scholar 

  34. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–8.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  35. Van Ooijen JW. JoinMap ® 4, Software for the calculation of genetic linkage maps in experimental populations. Wageningen: Kyazma BV; 2006.

  36. Fulton TM, Van der Hoeven R, Eannetta NT, Tanksley SD. Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. Plant Cell. 2002;14:1457–67.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  37. Wu F, Mueller LA, Crouzillat D, Pétiard V, Tanksley SD. Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics. 2006;174:1407–20.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  38. Poland J, Endelman J, Dawson J, Rutkoski J, Wu S, Manes Y, et al. Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Gen. 2012;5:103–13.

    Article  CAS  Google Scholar 

  39. Jansen RC. Complex plant traits: time for polygenic analysis. Trends Plant Sci. 1996;1:89–94.

    Article  Google Scholar 

  40. Matsumura H, Miyagi N, Taniai N, Fukushima M, Tarora K, Shudo A, et al. Mapping of the gynoecy in bitter gourd (Momordica charantia) using RAD-seq analysis. PLoS One. 2014;9:e87138.

    Article  PubMed Central  PubMed  Google Scholar 

  41. Kundu A, Chakraborty A, Mandal NA, Das D, Karmakar PG, Singh NK, et al. A restriction-site-associated DNA (RAD) linkage map, comparative genomics and identification of QTL for histological fibre content coincident with those for retted bast fibre yield and its major components in jute (Corchorus olitorius L., Malvaceae s. l.). Mol Breed. 2015;35:19.

    Article  Google Scholar 

  42. Xu Y, Crouch JH. Marker-assisted selection in plant breeding: from publications to practice. Crop Sci. 2008;48:391.

    Article  Google Scholar 

  43. Emerson KJ, Merz CR, Catchen JM, Hohenlohe PA, Cresko WA, Bradshaw WE, et al. Resolving postglacial phylogeography using high-throughput sequencing. Proc Natl Acad Sci USA. 2010;107:16196–200.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  44. Haldane J. The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet. 1919;8:299–309.

    Article  Google Scholar 

  45. Kosambi DD. The estimation of map distances from recombination values. Ann Eugen. 1943;12:172–5.

    Article  Google Scholar 

  46. Isidore E, van Os H, Andrzejewski S, Bakker J, Barrena I, Bryan GJ, et al. Toward a marker-dense meiotic map of the potato genome: lessons from linkage group I. Genetics. 2003;165:2107–16.

    PubMed Central  CAS  PubMed  Google Scholar 

  47. Hong-Bo M, Jian-Min Q, Yan-Kun L, Jing-Xia L, Tao W, Tao L, et al. Construction of a molecular genetic linkage map of tobacco based on SRAP and ISSR markers. Acta Agron Sin. 2008;34:1958–63.

    Google Scholar 

  48. Cai C-C, Chai L-G, Wang Y, Xu F-S, Zhang J-J, Lin G-P. Construction of genetic linkage map of burley tobacco (Nicotiana tabacum L.) and genetic dissection of partial traits. Acta Agron Sin. 2009;35:1646–54.

    Article  CAS  Google Scholar 

  49. Lu X, Gui Y, Xiao B, Li Y, Tong Z, Liu Y, et al. Development of DArT markers for a linkage map of flue-cured tobacco. Chin Sci Bull. 2012;58:641–8.

    Article  Google Scholar 

  50. Tong Z-J, Jiao F-C, Wu X-F, Wang F-Q, Chen X-J, Li X-Y, et al. Mapping of quantitative trait loci underlying six agronomic traits in flue-cured tobacco. Acta Agron Sin. 2013;38:1407–15.

    Article  Google Scholar 

Download references

Authors’ contributions

BX: constructed map population, extracted tobacco DNA, performed sequencing and wrote part of the text. YT: performed most of the bioinformatic analysis and wrote part of the text. NL: performed data preprocess. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by Grants from CNTC [110201201003 (JY-03), 110201301006 (JY-06)], and YNTC (2012YN01, 2013YN01).

Compliance with ethical guidelines

Competing interests The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongping Li.

Additional information

Bingguang Xiao and Yuntao Tan contributed equally to this work

Additional files

40709_2015_34_MOESM1_ESM.xlsx

Additional file 1. Library detail information.

40709_2015_34_MOESM2_ESM.pdf

Additional file 2. A flowchart for bioinformatic analysis procedure in this study.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, B., Tan, Y., Long, N. et al. SNP-based genetic linkage map of tobacco (Nicotiana tabacum L.) using next-generation RAD sequencing. J of Biol Res-Thessaloniki 22, 11 (2015). https://doi.org/10.1186/s40709-015-0034-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40709-015-0034-3

Keywords