Introduction

Among the food crops, cereals including rice (Oryza sativa), wheat (Triticum aestivum), maize (Zea mays), barley (Hordeum vulgare), pearl millet (Pennisetum glaucum), and sorghum (Sorghum bicolor), and legumes including chickpea (Cicer arietinum), pigeon pea (Cajanus cajan), cowpea (Vigna unguiculata), pea (Pisum sativum), common bean (Phaseolus vulgaris), faba bean (Vicia faba), soybean (Glycine max), groundnut (Arachis hypogaea), and lentil (Lens culinaris) are major sources of energy, carbohydrates, proteins, and fibers in the human diet. Cereals and legumes contribute globally 26 and 9%, respectively to total human food (1271.4 Mt) (Dwivedi et al. 2018). Legume crops like groundnut and soybean are also main sources of edible oil. As the world population reaches ~9.8 billion by 2050 (Nawaz and Chung 2020), these crops are going to be the main food sources to meet the expected demand of a growing population (FAO 2009). However, global warming is becoming a threat to agriculture and food security (Nawaz and Chung 2020), and rising global temperatures are showing their visible negative impact on crop yield (Arora 2019). Changes in the current climatic conditions are impacting crop growth and yield due to increasing episodes of drought, heat, and water logging, infestations of insect-pests, diseases, weed prevalence, and a decreasing population of pollinating insects (Myers et al. 2017). Also, a decline in the nutritional quality of foods is leading to adverse impacts on human and animal health (Dwivedi et al. 2013). Taking these into consideration, emphasis is being given to developing nutritionally rich and climate-resilient cultivars of cereals and legumes to secure food and the nutritional demand for an ever-increasing world population.

Gene(s) determine a phenotype of an individual and serve as a unit of heredity that is/are responsible to transfer that phenotype from one generation to another generation. Therefore, functional knowledge of genes is essential to breeding nutritionally dense and climate-resilient cultivars for sustaining nutritional value and crop yield under changing environments. In the last century, considerable genetic advances were made to identify genes, which accounted for phenotypic variation in individuals. The advances in next-generation sequencing (NGS) resulted in the availability of whole-genome sequences of many crop plants (Türktaş et al. 2015; Chen et al. 2011). This revolutionized genetic and genomic research and made available many genes but with unknown functions. Development of cis-/transgenic plants with a cloned/edited gene is one way to know the function of a particular gene. Cloned genes can be inserted into plant using various delivery mechanisms. If the cloned gene(s) is/are inserted into a sexually compatible plant species, the result is cisgenic. If the cloned gene is inserted into a sexually incompatible plant species, the result is transgenic, i.e., a foreign gene has been inserted. However, expression of a cloned gene in the background of transgenic plants is a major challenge (Kooter et al. 1999; Fagard and Vaucheret 2000; Li et al. 2009). The use of conventional forward genetic approach is also very difficult and time-consuming because it requires a series of mutants. Reverse genetics emerged as a complementary approach to forward genetics to decipher the unknown function of a gene sequence. In this approach, an available nucleotide sequence of a gene having an unknown function is used either to modify the function of the similar gene(s) in plants, which results in a change in the phenotype of transgenic plants or to associate the gene sequence with a mutant phenotype. This approach can also determine the function of a gene family besides individual genes (Ahringer 2006). Consequently, mutant populations are not needed to know the function of gene(s). The reverse genetic approach is more useful to know the function(s) of genes controlling agronomically important traits than the forward genetic approach. Identification of such genes helps to manipulate the plant phenotype in desirable directions following marker-assisted breeding. During the last few years, significant research has been made to use reverse genetic approaches for functional characterization of genes and several review articles have been published, which mostly focused on (i) a specific approach of reverse genetics (Tierney and Lamour 2005; Balyan et al. 2008; Barley and Wang 2008; Gilchrist and Haughn 2010; Saurabh et al. 2014; Borrelli et al. 2018; Das et al. 2018), (ii) use of reverse genetics in a specific crop (Slade et al. 2005; Caldwell et al. 2004; Dalmais et al. 2008; Uauy et al. 2009), (iii) challenges of using reverse genetics in polyploid plants species (Fitzgerald et al. 2012), and (iv) reverse genetics for functional genomics (Bouchez and Höfte 1998). However, different reverse genetic approaches have been used to understand the function of genes that controlled traits related to biotic and abiotic stress resistance/tolerance, metabolic/biochemical function, and agronomic performance, and exploited these genes in breeding programs to develop nutrient-rich and climate resilient cereal and legume crops. This review article provides information on (i) reverse genetic approaches used in functional characterization of unknown genes, (ii) use of functionally characterized genes in breeding, and (iii) future prospects of this technology alone or in combination with others in addressing the projected food demands of a growing human population under varying climate changes to develop nutrient-rich and climate-resilient cultivars of cereals and legumes.

An overview of reverse genetic approaches

Reverse genetic approaches use mutant populations, which are generated either from mutations in specific or random genomic regions, to discover the function of a gene (Meng et al. 2017). An overview of different reverse genetic approaches is presented in Fig. 1. Approaches that target specific genomic regions include genome editing by homologous recombination [e.g., site-specific recombination using site-specific recombinases (SSRs)] and by site-directed mutagenesis using zinc-finger nucleases (ZFN), transcription activator-like effector nucleases (TALENs), CRISPR-Cpf1 and CRISPR-Cas9] followed by homologous recombination (HR), non-homologous end joining (NHEJ) DNA repair systems, gene silencing [e.g., RNA interference (RNAi) and virus-induced gene silencing (VIGS)], and ectopic overexpression (Meng et al. 2017). These approaches make changes in the targeted gene under study leading to a series of mutant lines having different phenotypes. Efforts have also been made to develop a site-specific recombination system more useful for studying the expression of the targeted gene(s) in wheat and barley through a heat shock of 38 °C (Harrington et al. 2020).

Fig. 1: Reverse genetics for confirming function of genes in crop plants.
figure 1

Illustrative steps describing procedures that target random and specific genomic regions, with or without genetic transformation and screening systems for functional characterization of genes.

Chemical mutagenesis-based targeting induced local lesions in genome (TILLING), fast neutron based Deleteagene, and insertional mutagenesis using transposable element/T-DNA are approaches of reverse genetics that target random areas of the genome leading to sequence variation and generating of mutant populations, which are used to associate allelic variation of a target gene (Agarwal et al. 2013; Meng et al. 2017). Next-generation sequencing (NGS) has also facilitated screening for the presence of genome-wide induced mutations in targeted genes using TILLING populations (Table 1). This approach is known as TILLING-by-Sequencing+ (TbyS+), which has been used to identify mutations in genes controlling stress resistance in peanut (Guo et al. 2015) and fatty acid biosynthesis pathway in soybean (Guo et al. 2015; Lakhssassi et al. 2021). Similarly, TbyS has been used to identify the role of GIGANTEA (GI), RAMOSUS (RMS), and TERMINAL FLOWER1 (TFL1) genes controlling flowering and therefore alter plant architecture of mungbean (Varadaraju et al. 2021). TILLING technology holds new prospects to clone genes for disease resistance and abiotic stress tolerance in cereal crops under current scenario of climate change (Bettgenhaeuser and Krattinge 2019). Insertional mutagenesis has been used in Medicago trancatula, a model legume species, in which a MADS-box gene mutated through the insertion of Tnt1 retrotransposon leading to identification of a mutant line mtpim. This mutant had mutated sequences for inflorescence architecture and flower development in M. trancatula (Benlloch et al. 2006). The Tnt1, LTR-retrotransposon cloned from tobacco (Nicotiana tabacum) (Grandbastien et al. 1989), has widely been utilized to develop insertion populations and gene tagging in many plant species. In soybean, stable and preferential insertion of Tnt1 into the protein-coding regions of 27 independent transgenic lines suggested that it can be used at large scale for insertional mutagenesis (Cui et al. 2013). In another study, insertion of mmPing20F activation tag led to overexpression of the nearby soybean genes. These activation tags produced more phenotypes, which became useful resources to discover the function of genes (Johnson et al. 2021). In cereals, insertional mutagenesis through T-DNA, activator/dissociation (Ac/Ds) insertions, transposons, or retrotransposons has been used to generate mutant libraries that allow functional characterization of several genes (Ram et al. 2019; Kim et al. 2018a). A collection of T-DNA insertion mutant lines designated as Rice Functional Genomic Express Database i.e., RiceGE has been developed in rice (http://signal.salk.edu/cgi-bin/RiceGE/). The RiceGE database has been used to elucidate the function of [OsHKT1;4 (high-affinity K+ transporter 1;4)]; a gene responsible for salt tolerance (Oda et al. 2018). In maize, a mutant population has been generated through insertion of a Mutator (Mu) in the background of inbred line B73 (Table 2). This mutant population, which is known as BonnMu has been used for functional analysis of genes in maize (Marcon et al. 2020).

Table 1 Published TILLING populations in various cereal and legume crops used for functional analysis of genes.
Table 2 Insertional mutagenesis populations used for functional analysis of gene in rice and maize.

These reverse genetic approaches follow three primary ways to identify the function of gene(s): (i) knocking out/altering/silencing the target gene(s) leading to the development of mutants with altered phenotypes (e.g., RNAi and VIGS and homologous recombination), (ii) functional analysis of target candidate gene(s) through expression in transforming species (e.g., ectopic expression/overexpression), and (iii) screening the target gene(s) in the mutant populations developed by random disruption of genes through mutagens or T-DNA/TE (TILLING, insertional mutagenesis). Further categories can be separately grouped on the basis of a requirement of genetic transformation to study the function of gene(s). All these reverse genetic approaches cannot be used widely in crop plants including cereals and legumes mainly due to disadvantages associated with each approach like unavailability of efficient genetic transformation system (except in model plants like Arabidopsis) and mutant populations, low efficiency, low throughput and risk of off-target effects, unstable phenotype or lethal/sterile phenotype, and complexity of large/polyploid genomes (Aklilu 2021).

Functional characterization of genes

During the past few years, efforts have been made to identify the function of genes using different approaches of reverse genetics. These genes showed their functional association with different nutritional, disease and insect-pest resistance, adaptive traits, and metabolic, biochemical, and physiologic traits responsible for agronomic performance and hence paved the way to breed nutrient-rich and climate-resilient cultivars in cereal and legume crops. These have been comprehensively discussed in the following sub-sections.

Identification of genes controlling disease resistance

Different reverse genetic methods have been used to identify genes that control disease resistance in cereals and legumes (Table 3). The genome editing approach of reverse genetics has been used widely to explain the function of genes controlling disease resistance in cereal crops. For example, in rice, editing of eIF4G gene resulted in resistance to tungro spherical virus (RTSV) (Macovei et al. 2018). Editing of TaMLO-A1 gene significantly increased resistance to powdery mildew in bread wheat (Wang et al. 2014a). In another study, editing of OsSEC3A gene using CRISPR-Cas9 technology gave a dwarf stature and lesion-mimic phenotype. The mutant phenotype generated after editing of this gene contained higher levels of salicylic acid (SA) and an improved level of resistance towards the fungal pathogen causing blast disease (Ma et al. 2018). Contrastingly, transgenic plants having mutated ethylene responsive factor OsERF922 through CRISPR-Cas9 editing had no changes in phenotypes related to agronomic traits. However, these mutated transgenic plants had a reduced number of blast lesions at the seedling and tillering stages (Wang et al. 2016a; Borrelli et al. 2018). Thus, the CRISPR-Cas9 system of genome editing demonstrated a strong and beneficial impact on the development of improved cultivars having resistance to fungal diseases. In rice, a mutant population generated after editing the OsSWEET13 gene using CRISPR-Cas9 has been used to identify mutant plant(s) having resistance to bacterial blight disease (Zhou et al. 2015). This gene caused susceptibility and encoded a plant-pathogen interacting sucrose transporter. Expression of OsSWEET13 gene in the host plant is controlled by an effector protein PthXo2 of blight disease pathogen (i.e. Xanthomonas oryzae). However, an earlier study identified OsSWEET14 gene for susceptibility to bacterial blight disease; a mutation in this gene prevented the binding of effector protein with OsSWEET14 that made the rice plant resistant to blight disease (Li et al. 2012a). Recently editing of this gene through CRISPR-Cas9 showed its function as a sucrose-efflux transporter causing bacterial blight resistance in rice (Zeng et al. 2020). A null mutation in OsSWEET13 also expanded the understanding of PthXo2-based disease susceptibility in rice and null mutants were found to be resistant to bacterial blight disease (Zhou et al. 2015). Further genome editing strategies for multiplexed recessive resistance using a combination of the major effectors and other resistance (R) genes will be the next step to achieve bacterial blight resistance.

Table 3 Disease resistance genes identified using different reverse genetic approaches in cereal and legume crop species.

VIGS was used to confirm the requirement for Sgt1, Rar1, and Hsp90 genes in the Mla13-mediated resistance response to powdery mildew in barley (Hein et al. 2005). In rice, VIGS of Xa38 compromised the resistance towards bacterial blight disease (Kant et al. 2021). However, limited efforts have been made to determine the function of genes related to disease resistance in legumes by using reverse genetic approaches. Only a few studies used RNAi to elucidate the function of IFS (Isoflavone synthase) and CHR (Chalcone reductase) genes in soybean and identified their role in 5-deoxyisoflavonoids that suppress race-specific resistance and hypersensitive cell death in Phytophthora sojae infected tissues (Graham et al. 2007). More recently, editing of Rpp1L and Rps1 loci, which belonged to nucleotide-binding-site-leucine-rich-repeat (NBS-LRR) family, led to new disease resistance specificities against plant pathogens (Nagy et al. 2021).

Abiotic stress responsive genes for breeding

Knowledge of genes controlling abiotic stress tolerance is one of the essential components for breeding climate-resilient crops. Therefore, over the years, various reverse genetic approaches including ectopic expression, TILLING, Eco-TILLING, gene editing, and gene silencing using RNAi and VIGS have been used for this purpose in cereal and legume crops (Table 4). Most of these studies used ectopic expression analysis to explain the function of genes like MtPHD6 in alfalfa, PvERF35i in common bean, CaGolS in chickpea, CcCDR and CcCYP in pigeon pea (Quan et al. 2019; Kavas et al. 2020; Salvi et al. 2020; Tamirisa et al. 2014; Juturu et al. 2021). Among cereals, VIGS established the function of genes for drought tolerance (TaEra1, TaSal1, TaBTF3, TaPGR5, TdAtg8, and TaH2B-7D; Manmathan et al. 2013; Kang et al. 2013; Wang et al. 2014b; Kuzuoglu-Ozturk et al. 2012; Wang et al. 2019), cold tolerance (Hsp90, BBI, REP14, PAP6; Zhang et al. 2016, 2017) and drought and salinity response (Rong et al. 2014). As WRKYs is one of the largest transcription factor families in plants that play a crucial role in plant development under drought stress conditions (Zhang et al. 2017), functional expression analysis of a gene ZmWRKY106 belonging to this transcription factor family showed enhanced tolerance to drought and heat stresses in maize. In a study involving overexpression of ZmWRKY106 revealed greater tolerance to drought and heat stresses in transgenic Arabidopsis plants (Wang et al. 2018b), suggesting active participation of this gene in multiple abiotic stress responses. In another study, function of ARGOS8 gene identified first through overexpression analysis in transgenic plant for drought tolerance and later confirmed by CRISPR-Cas9 system of genome editing (Shi et al. 2017). In this study, gene editing replaced the native promoter of ARGOS8 or inserted into the 5′-untranslated region of this gene with native maize GOS2 promoter (responsible for a moderate level of constitutive expression) leading to generation of several variants having elevated levels of ARGOS8 transcripts. Evaluation of these variants under drought conditions at flowering stage in the field showed increased grain yield compared to the control and had no yield loss under well-watered conditions (Shi et al. 2017). In rice, OsCTZFP8 gene encodes a C2H2 zinc finger protein (a typical zinc-finger motif), which is a potential nuclear localization signal (NLS) and a leucine-rich region (L-box). Agrobacterium-mediated over-expression of OsCTZFP8 gene in transgenic rice led to significantly higher pollen fertility and seed setting resulting in higher yield under cold conditions and thus demonstrating its role in cold tolerance (Jin et al. 2018). More recently, ectopic expression analysis showed the role of AtGRXS17 gene to control drought tolerance in maize (Tamang et al. 2021). Eco-TILLING helped to identify the association of BORON EXCESS TOLERANT1 gene with boron tolerance and OsCP17, OsCPK17, OsRMC, OsNHX1, and OsHKTI;5 genes with salt tolerance in barley and rice, respectively (Ochiai et al. 2011; Negrão et al. 2013). Using the same approach, Yu et al. (2012) reported genes encoding transcription factors in association with drought tolerance in rice. Loss–of–function mutants generated through CRISPR-Cas9 gene editing were used to elucidate the function of SAPK2 (osmotic stress/ABA–activated protein kinase 2) gene in rice. The mutants with edited SAPK2 gene showed sensitivity towards drought stress and reactive oxygen species (ROS) indicating its response to drought conditions. This gene increased drought tolerance by (i) reducing water loss, and (ii) inducing the gene expression for antioxidant enzymes. This gene also showed tolerance to salt and PEG stresses. Thus, it has been suggested as a candidate gene for breeding climate-resilient cultivars of rice (Lou et al. 2017). In legumes, editing of 4-coumarate ligase (4CL) and Reveille 7 (RVE7) genes through CRISPR-Cas9 showed their involvement in controlling drought tolerance in chickpea (Badhan et al. 2021).

Table 4 List of genes for abiotic stress tolerance identified using different reverse genetic approaches in cereal and legume crop species.

Functional analysis of genes involved in biosynthetic pathways of nutritional traits

Reverse genetic approaches have also been used to explain the function of genes encoding metabolites and biochemical compounds of biosynthetic pathways. These biochemicals and metabolites are active during the growth and development of crop plants and are responsible to improve the nutritional value and other traits of agronomic importance in both cereals and legumes. These traits are also known as biochemical and metabolic traits. Reverse genetics based functional characterization of several genes showed their association with metabolites or biochemical compounds that are required to improve the nutritional value of cereals and legumes (Table 5). For example, application of VIGS to silence P23k gene, Oikawa et al. (2007) demonstrated its role in biosynthesis of cell wall polysaccharides and the formation of secondary walls in barley leaves. In wheat, the function of the TaRSR1 gene has been associated with starch synthesis using VIGS (Liu et al. 2016). In maize, candidate gene association mapping was used to unravel the function of the Arogenate dehydratase gene. This gene is involved in biosynthesis of a metabolite phenylalanine, which is an essential aromatic amino acid associated with nutritional value of maize (Wen et al. 2018). The function of two other genes Opaque2 and acetolactate synthase 1 has also been determined using this approach (Deng et al. 2017; Liu et al. 2017). Opaque2 encoded a bZIP transcription factor that regulates expression of endosperm storage protein genes during maize kernel development (Deng et al. 2017). Acetolactate synthase 1 is involved in biosynthesis of branched-chain amino acids and catalyzes the first step of valine and leucine biosynthesis (Deng et al. 2017). In wheat, the Tryptophan descarboxylase gene has been associated with tryptamine and with tryptamine synthesis from tryptophan using candidate gene association mapping (Peng et al. 2018). In barley, RNAseq and comparative analysis of wild type and nec3 mutants resulted in the identification of a candidate gene Nec3, which is responsible for biosynthesis of Tryptamine 5-Hydroxylase. This enzyme functions as a terminal serotonin biosynthetic enzyme in the tryptophan pathway of plants (Ameen et al. 2021).

Table 5 Functional analysis of genes responsible for biochemical compounds and metabolites related to traits of nutritional and agronomic importance using reverse genetic approaches in cereal and legume crop species.

In rice, development of fragrance is an important nutritional trait. Therefore, efforts have been made to identify genes controlling fragrance development in rice grains using reverse genetic approaches. It has been shown that 2-Acety-1-pyrroline (2AP) is an important metabolite for developing fragrance in non-scented rice and an inhibition of the betaine aldehyde dehydrogenase 2 (OsBADH2) gene through RNAi led to synthesis of aroma in rice grains. Silencing of this gene increased production of 2AP metabolite by ~30–40% in seeds of transgenic IR-64 line, which indicates functionality of this gene for regulating aroma development (Khandagale et al. 2020). Earlier, functional characterization of this gene had been established through genome editing approach (Shan et al. 2015).

Removing toxic compounds from grains is another aspect of improving the nutritional value of legumes and cereals. Among cereal crops, cases of gene function have been associated with metabolites/biochemical compounds that are involved in metal production or uptake in rice. Suppression of the expression of OsPCS1 (phytochelatin synthase) gene through RNAi restricted the accumulation of the toxic heavy metal cadmium (Li et al. 2007). In soybean, the role of myo-inositol-1-phosphate has been identified in seed development metabolism through RNAi-based silencing of GmMIPS1 gene (Nunes et al. 2006). Presence of high level of oleic acid in soybean seeds enhances its nutritional importance and thus, characterization of genes controlling the oleic acid level is important for breeding oleic acid-rich soybean cultivars. In a recent study, allelic variants for fatty acid desaturase (GmFAD2) gene have been identified using the TILLING approach. These variants were responsible to increase oleic acid content in soybean seed (Lakhssassi et al. 2021). Thus, available knowledge of genes controlling nutritional traits is beneficial for precise breeding of bio-fortified cereal and legume crops.

Functional analysis of genes controlling agro-morphological traits

Many agro-morphological traits enhance adaptive plasticity that helps crop plants to survive and/or grow under changing environmental conditions. Breeding of these traits can increase the resilience of crops under changing conditions leading to sustainable productivity. Knowledge of gene function controlling these traits can help to breed climate-resilience varieties (Kumar et al. 2019a). Over the years, efforts have been made to identify the function of genes associated with agro-morphological traits or adaptive traits such as flowering time, male sterility, wax formation, seed size, anther development, heterosis, internode length, plant growth, tiller number, grain number by using different reverse genetic approaches (Table 6). Male sterility is required to breed hybrid varieties that can provide phenotypic plasticity under changing environments (Liu et al. 2021). In rice, the gene OsGEN-L, belongs to the RAD2/XPG nuclease family, which has been studied through RNAi and found to play a role in producing male sterility (Moritoh et al. 2005). Functional analysis of the MS45 gene proved that it controls male sterility in maize (Cigan et al. 2005).

Table 6 Functional analysis of genes controlling agro-morphological traits using reverse genetic approaches in cereal and legumes.

Breeding for flowering time helps new crop varieties adapt to different environments (Kumar et al. 2019a; Liu et al. 2021). Therefore, functionality of genes related to flowering time or floral development has been studied in cereal and legume crops for several genes including OsMADS in rice (Jeon et al. 2000), PvE1L, MtE1L, and GmMS in soybean (Zhang et al. 2016b; Sha et al. 2015), SUPERMAN (SUP) in alfalfa (Rodas et al. 2021), and zm401, si, ZmHox1a/ZmHox1b and ZmSOC1 in maize (Uberlacker et al. 1996; Ma et al. 2005; Luo et al. 2020; Han et al. 2021). A rice gene OsPHL3 encodes a G2-like family transcription factor that delayed flowering time when overexpressed and resulted in early flowering when its function was lost due to gene editing by CRISPR-Cas9 (Zeng et al. 2018). In another study, editing of open reading frames of Hd2 gene (Hd2 uORFs) resulted in delayed flowering in rice. Editing of this gene also reduced the expression of Ehd1, Hd3a, and RFT1 genes significantly but no change had been identified at the transcription level of Hd2 gene. Thus, editing of uORF region of flowering repressor could be an efficient approach for breeding rice varieties to have delayed heading (Liu et al. 2021). In soybean, editing of E1 gene resulted in early flowering under long day conditions due to its decreased expression, which resulted in increased expression of another gene GmFT2a/5a leading to early flowering. Thus, gene editing efforts laid the foundation for breeding photo-insensitive varieties of soybean suitable for high latitudes (Han et al. 2019). In addition to this, editing and over expression analysis of another GmAP1 gene resulted in early flowering and reduced plant height in soybean. This gene was a part of regulatory networks as changes in this gene altered the expression of several other genes related to flowering and gibberellic acid metabolism. Thus, this gene has been identified as invaluable for developing cultivars with improved yield in soybean (Chen et al. 2020).

In hexaploid wheat, heritable mutations have been generated by editing the TaGW2, TaLpx-1, and TaMLO genes. For instance, the knockout mutations in all three homoeologous copies of the TaGW2 gene resulted in a considerable increase in seed size and 1000-grain weight (Wang et al. 2018c). VIGS has been used to silence the P23k gene in wheat, which led to abnormal leaf development, asymmetric orientation of main veins, and cracked leaf edges caused by mechanical weakness (Bennypaul et al. 2012). A gene NUMBER OF GRAINS 1 (NOG1) has been identified in rice, which regulated grain number and yield. NOG1 encodes an enoyl-CoA hydratase/isomerase (ECH), a key enzyme involved in fatty acid β-oxidation pathway. Up-regulation of NOG1 significantly enhanced grain number and yield without negative effects on panicle number, grain weight, seed-setting rate, and heading date. Thus, this gene enhanced molecular understanding of grain yield regulation and identified a favorable gene for breeding high-yielding rice varieties (Huo et al. 2017).

Using RNAi technology, Fu et al. (2011) reported the function of coffee acid 3-O-methyltransferase (COMT) gene to be linked with reduced lignin content, altered lignin composition, improved forage quality, and increased ethanol production in switchgrass (Panicum virgatum) without altering overall plant phenotype. RNAi-directed knock down of Glabrous Rice 1 (GLR1) gene that encoded a homeodomain protein containing the WOX motif, drastically reduced the trichome number on the leaves and glumes in transgenic rice plants (Li et al. 2012b). Recently, OsSPL6 was reported to control panicle cell death by repressing the transcriptional activation of the ER stress sensor IRE1 (Wang et al. 2018a).

Nitrogen-fixing symbiosis plays an important role in adaptation of legumes because poor nodulation caused by different stresses leads to poor yields in legume crops (Kumar et al. 2019a). In the past, reverse genetic approaches including RNAi have been used to determine the function of several genes responsible for nitrogen fixation; nodule formation and nitrogen-fixing symbiosomes. In M. truncatula, MtsuS1 gene for nitrogen fixation (Baier et al. 2007), PIN genes for nodulation (Huo et al. 2006), and DMI2 gene for formation of N2 fixing symbiosomes have been functionally characterized and validated using reverse genetic approaches (Limpens et al. 2005). In another study, insertion of retrotransposon Tnt1 in the MtMATE67 gene resulted in a loss of functional activity leading to an accumulation of iron (Fe) in the apoplasm of nodule cells, which provided a significant decline in symbiotic nitrogen fixation and plant growth. Thus, this gene played a primary role in citrate efflux from nodule cells in response to a Fe signal and helped in symbiotic nitrogen fixation (Kryvoruchko et al. 2018). Functional characterization of the dehydrin MtCAS31 (cold-acclimation-specific 31) gene has also been determined using the same approach and identified its role in symbiotic nitrogen fixation under drought conditions in M. truncatula. This gene expressed in nodules and interacted with leghemoglobin MtLb120-1 by protecting it from the damage due to drought stress. Disruption of the targeted gene due to insertion of retrotransposon Tnt1 in a mutant line reduced nitrogenase activity and ATP/ADP ratio, increased the activity of nodule senescence genes and more accumulation of amyloplasts under moisture-limited conditions. As a result, a new function for dehydrins in SNF under drought stress conditions was established (Li et al. 2018b). Knockdown of ethylene biosynthesis gene ACS10 conferred nodulation ability under limited nitrate conditions in M. truncatula (van Zeijl et al. 2018). Rhizobia, nitrogen-fixing bacteria, requires an oxygen-depleted atmosphere and consequently lives inside a host plant, which dramatically alters its root development to accommodate the bacteria.

Virus induced gene silencing has been used to dissect the molecular pathways that led to nodule formation and the mechanisms of substrate exchange between host and rhizobia. In soybean, use of virus- or artificial microRNA-mediated gene silencing of GmWPR1, GmExo70J7, GmExo70J8, and GmExo70J9 genes resulted in accelerated leaf senescence and reduced nodule formation. Moreover, it has been found that legume-specific WRKY-like and Exo70-like proteins are essential for the development of sufficient numbers of root nodules in soybean (Wang et al. 2016b). Function of a few genes associated with elevated shoot lipid content in M. truncatula has recently been confirmed using the VIGS approach. As a result, the role of SDP1 (SUGAR-DEPENDANT 1), APS1 (ADP-GLUCOSE-PYROPHOSPHORYLASE SMALL SUBUNIT 1), and PXA1 (PEROXISOMAL ABC TRANSPORTER 1) gene has been identified in controlling the shoot lipid content (Wijekoon et al. 2020). In soybean, use of virus- or artificial microRNA-mediated gene silencing for GmWPR1, GmExo70J7, GmExo70J8 and GmExo70J9 genes resulted in accelerated leaf senescence and reduced nodule formation (Wang et al. 2016b).

In addition to this, functional characterization of genes has also been associated with metabolites/biochemical compounds that are involved in herbicide tolerance and other traits like somatic embryogenesis, flavonol biosynthesis, root elongation and architecture, phosphorus availability, and morphology (Table 5). For example, CRISPR-Cas9 based functional analysis of ALS gene encoded the acetolactate synthase enzyme that governed herbicide resistance in soybean (Li et al. 2015), rice (Endo et al. 2016; Sun et al. 2016; Butt et al. 2017), and maize (Svitashev et al. 2015).

Exploitation of reverse genetic-based functionally characterized genes in breeding

Different reverse genetic approaches ultimately make available genes and their genomic sequences associated with a known function for economically important traits. Thus, agriculturally useful genes are effectively mined for genetic enhancement and breeders can prepare a blueprint of a variety using these genes that are able to fulfill the diverse needs of crop production such as high yield, nutrient rich, multiple stress resistances, and high nutrient-use efficiency (Jiang et al. 2012). So far, a few genes have been validated for their function associated with traits of economic importance in cereal and legumes crops. Now, there is a need to exploit these genes to breed nutrient-rich and climate-resilient cultivars and maximize genetic gain in cereal and legume crops following different breeding strategies (see Fig. 2).

Fig. 2: Breeding strategies for the development of climate-resilient and nutrient-rich cultivars using functionally characterized genes.
figure 2

Flow chart illustrating the steps for deploying targeted functional gene(s) through markerassisted breeding and cisgenic/transgenic breeding methods. Strategies allow evaluation and selection of the desired breeding lines for the target traits.

Development of functional makers for accelerating genetic gain through marker-assisted breeding

Availability of nucleotide sequences of functionally characterized genes provide an opportunity to develop gene-specific markers or functional makers. These markers can be used to mine the novel alleles from landraces or wild species, which subsequently can be introgressed to improve current germplasm (Xiao et al. 1998). It can be helpful to widen the cultivated gene pool and accumulate desirable alleles in one background in order to maximize genetic gain (Francki and Appels 2002). Allelic variation for a target locus is responsible to generate phenotypic variation. These variations occur due to changes at single or multiple nucleotide site(s) of the gene sequence or insertion/deletion (InDel) or copy number variation (CNV) in the gene sequence. These variations can be detected by following two types of functional markers.

PCR-based functional markers

PCR-based functional markers have been developed for many genes controlling agronomically important traits in cereal and legume crops (Kumar et al. 2011). In soybean, gene-specific markers have been developed for glycinin genes (i.e., Gy1, Gy2, Gy3, and Gy5) and used to identify allelic variation using Eco-TILLING. When tested for their selection efficiency among breeding lines having different subunits of glycinin seed storage protein and these markers showed their utility for nutritional quality improvement in soybean (Jegadeesan et al. 2012). Similarly, a functionally cleaved amplified polymorphism sequence (CAPS) marker developed for TaSdr-B1 gene controlling seed dormancy was identified through comparative genomics in wheat. This marker was subsequently used for functional characterization of this candidate gene using association and linkage mapping. As a result, a useful functional marker has been identified for developing pre-harvest sprouting (PHS) tolerant cultivars through marker-assisted selection in wheat (Zhang et al. 2014). In another study, two major semi-dwarfing genes Rht-B1b (Rht1) and Rht-D1b (Rht2) identified through comparative genomics in wheat have been used to develop PCR-based functional markers. The further validation of these markers showed perfect association with dwarfing phenotypes (Ellis et al. 2002). Also, a PCR-based functional STS marker has been developed for phytoene synthase (PsyA1) gene controlling yellow pigment (YP) content of wheat grain. Different fragment sizes of this co-dominant marker showed close association with high and low YP content containing wheat cultivars and advanced lines. Hence, this marker has been found useful for wheat breeding programs targeting improvement in YP content in wheat (He et al. 2008). In rice, a kompetitive allele-specific PCR (KASP) marker developed for SSIIIa gene encoding low-amylose content has been used to select favorable lines through marker-assisted backcross breeding. Breeding lines with amylose content ranging from 12.4 to 16.8% have been identified and found useful for breeding high-quality rice for cooking and eating (Kim et al. 2021). Function of NIGHT LIGHT-INDUCIBLE AND CLOCK-REGULATED 2 (LNK2) gene for shortening flowering time has been established through genome editing in soybean. For this gene, functional markers have been developed to identify novel components of flowering-time control. These markers may benefit the development of soybean cultivars for high-latitude environments through marker-assisted selection (Li et al. 2020b).

Sequencing-based functional SNPs for haplotype-based breeding

Next-generation sequencing also offers an opportunity to generate the whole genome re-sequencing of many genotypes for targeted candidate genes controlling traits of agronomic importance. This paves the way to identify haplo-SNPs for candidate genes having strong association with targeted traits leading to identification of superior haplotypes for deployment in haplotype-based breeding to develop next-generation tailor-made crop varieties. In soybean, resequencing of GmMYB29 gene among a subset of 30 soybean accessions led to the identification of 12 SNPs and 11 indels (insertions and deletions) associated with isoflavone contents. This association study identified 11 probable causative sites responsible for variations in the total isoflavone content (TIC) and two sites showed total contribution of 49.99% of the phenotypic variation. Another site having a single nucleotide base transversion led to a substitution of lysine to asparagine that contributed to 14.91% of the variation in TIC (Chu et al. 2017). In rice, an OsSNB gene has been identified for grain length, width, and weight using reverse genetic approaches (overexpression and CRISPR-Cas9 analysis). Resequencing of 168 rice accessions for this gene led to identification of eight haplotypes of SNPs. One of them named Hap 3 for wider grain width had a 225 bp insertion in the promoter, which was used as a functional marker (OsSNB_Indel2) for marker-assisted selection for improvement of grain width (Ma et al. 2019). In another study, resequencing of 150 accessions, which were evaluated for resistant starch (RS) and predicted glycemic index (PGI), identified favorable SNPs for eight traits. In this study, superior haplotypes for the target traits have been identified among 11 selected candidate genes. The candidate gene Os06g11100 (H4-3.28% for high RS) and Os08g12590 (H13-62.52 as intermediate PGI) had superior haplotypes for RS and PGI. Thus, this study provided an opportunity to identify donors having superior haplotype combinations. These donors can be used for tailoring high quality healthier rice varieties based on consumer preference and market demand using haplotype-based breeding (Selvaraj et al. 2021). Similarly, in pigeon pea, superior haplotypes for 10 drought-responsive candidate genes have been identified through whole genome re-sequencing of 292 genotypes. This led to the identification of the most promising haplotypes for three genes regulating five component of drought tolerance (Sinha et al. 2020).

Cis-/transgenic breeding using reverse genetic-based functionally characterized genes

Knowledge of reverse genetics may be used to generate improved cis/transgenic plants for commercialization in two ways: (i) use of functionally characterized gene(s) for over-expression/silencing in transgenic plants, and (ii) editing of target gene(s) in cisgenic plants (Banerjee et al. 2017).

Overexpression/silence the introduced gene in transgenic plants

A function of phosphoenolpyruvate carboxylase gene (PEPC) has been studied through an expression analysis approach of reverse genetics (Izui et al. 1986). A full-length cDNA of PEPC gene was isolated from maize (C4 plants) and introduced into wheat (C3 plant) for improving photosynthetic efficiency. The resulting transgenic plants showed much higher (140%) phosphoenolpyruvate carboxylase activity as compared to non-transformed plants leading to an increase in weight of seed per spike and thousand-grain weight (Hu et al. 2012). These transgenes can be utilized directly or can be incorporated into the breeding program for genetic improvement. In rice, expression analysis of Xa10-Ni and Xa23-Ni genes under the Xa10 promoter showed disease resistance to X. oryzae pv. oryzae strains in transgenic rice plants. These genes encoded functional executor R proteins, which induced cell death and were found useful for genetic engineering for broad-spectrum disease resistance to plant pathogenic Xanthomonas spp (Wang et al. 2017c).

Cisgenic populations carrying edited gene

Genome editing has emerged as a powerful approach, which rapidly turned out to infer molecular function of genes. Therefore, it has been identified as a most promising “New Plant Breeding Technology” (NPBT) that made possible fast transition of the improved cultivars from the lab to the market (Menz et al. 2020). Since genome editing makes changes in the genome or gene(s) identical to those derived from conventional breeding, or natural/induced mutations (Grohmann et al. 2019), improved cisgenic plants developed through genome editing are being considered as non-GMO crops in several countries. Thus, genome editing products that are cisgenic are not regulated like GMO crops (Menz et al. 2020). This reverse genetic approach has become useful for developing climate resilient and nutritionally rich crop plants in a short period of time. Several genes controlling grain yield, grain quality, biotic and abiotic stress tolerance traits have successfully been utilized to improve lines in cereal and legume crops (Mishra et al. 2018). For example, CRISPR-Cas9 was used to edit the fragrance gene Badh2 in the Indica rice line Zhonghua 11. This resulted in a mutated line that possessed an increased amount of 2AP due to an additional T base in the first exon of Badh2 leading to enhanced fragrance in rice (Shao et al. 2017). Editing of soluble pyrabactin resistance PYR1-like (PYL) genes using CRISPR-Cas9 technology led to increased growth and productivity in rice (Miao et al. 2018). Similarly, the editing of two rice branching enzyme (SBE) genes namely SBEI and SBEIIb led to the development of rice with high amylose (Sun et al. 2017). In this study, mutants with SBEII gene expressed an increase of as much as 25 and 9.8 %, in amylose content (AC) and resistant starch (RS) content, respectively and hence editing of SBEIIb could be crucial in the development of rice varieties with high amylose and RS contents. A metal transporter gene OsNramp5 has been edited using CRISPR-Cas9 system. This resulted in the development of Indica rice lines having low Cd accumulation (Tang et al. 2017). Under field trials, Indica rice lines with edited OsNramp5 gene had Cd concentration consistently <0.05 mg/kg in their grains compared to grains of wild-type Indica rice (0.33 to 2.90 mg/kg) without affecting the grain yield. Also, this reverse genetic system helped to knockout ERF transcription factor gene OsERF922 leading to enhanced resistance to rice blast (Wang et al. 2016a). Similarly, editing of eIF4G gene resulted in the development of a new source of resistance to rice tungro disease (RTD) in the background of the IR64 variety having susceptibility to rice tungro spherical virus (RTSV) that can be used as valuable materials for developing more diverse RTSV-resistant varieties (Macovei et al. 2018). In barley, HvCKX1 gene controls the endogenous cytokinin status. By targeting this gene, homozygous transgenic plants with silenced HvCKX1 gene and azygous knock-out Hvckx1 cisgenic mutants developed through the gene editing approach have been studied for their expression and other phenotypic attributes. In this study, although trans/-cisgenic lines showed reduced root growth, they produced more tillers and grains than azygous wild-type controls. Trans/-cisgenic plants had increased yield up to 15%. However, on the other hand, total grain biomass was decreased to 80% compared to wild type. This study confirmed the key role of HvCKX1 gene for regulating cytokinin content in barley (Holubová et al. 2018). In maize, novel variants of ARGOS8 gene for ethylene sensitivity generated through genome editing improved grain yield under drought stress conditions. In this study, the native promoter of this gene was inserted into the 5′-untranslated region of the native ARGOS8 gene or replaced with maize GOS2 promoter. This resulted in higher grain yield under limited water conditions in the field and had no yield loss under well-watered conditions. This study provided evidence for identification of novel allelic variation for breeding drought-tolerance in crop plants (Shi et al. 2017). In soybean, genome editing based on CRISPR-Cas9 generated mutants of E1 gene, producing early flowering under long day condition. This could be due to generation of the truncated E1 protein, which increased expression of GmFT2a/5a gene by disinhibiting it. This laid a foundation for breeding soybean cultivars suitable for cultivation at high latitudes (Han et al. 2019). In another study, maize SHRUNKEN2 (SH2) and WAXY (WX) genes were edited through CRISPR-Cas9 with a dual gRNA construct and identified single or double mutations for producing super-sweet (sh2), waxy (wx) corn or SWC (Dong et al. 2019). In this study, transgene-free (cisgene) lines having homozygosity for both sh2 and wx alleles (sh2sh2wxwx) were identified and named sw lines. These lines have been used for specialty corn breeding. The crosses between sw lines and wx lines resulted in the development of super-sweet‐waxy compound F1 plants. Estimation of soluble sugar contents in kernels of fresh ears, stalks and leaves in these specialty super-sweet sw lines showed higher sugar contents with an average of 7.38–10.28% in fresh kernels. Some of them had even higher amylopectin content without affecting other agronomic traits (Dong et al. 2019). In rice, two elite sticky varieties have been developed through editing the waxy gene (Wx). This gene encodes the granule bound starch synthase (GBSS) enzyme and plays an important role in amylose synthesis. These glutinous (sticky) rice varieties have been developed with little amylose content (2.6–3.2%) (Yunyan et al. 2019). Thus, cisgenic plants generated through gene editing can be used directly as commercial cultivar or can be utilized further in a breeding program by introgressing such desirable alleles through marker-assisted breeding.

Prospects of functionally characterized gene(s)

During the last few years, functions of many genes have been identified using different reverse genetic approaches in cereals and legumes (see Tables 26 for reference). This has revolutionized the field of functional genomics. The upgrading of new reverse genetic tools has facilitated the search for functions of many genes of agronomic importance in crop plants. However, only a limited number of genes have been exploited in genetic improvement for several reasons. Firstly, limited efforts have been made to develop breeder-friendly functional markers that facilitate the selection of desirable plant types in breeding populations. In the recent years, NGS has provided an opportunity of SNP marker-based selection of desirable plants through re-sequencing of large populations. However, the high cost of resequencing limits its use by plant breeders for selecting superior lines in early segregating generations as plant breeders need to screen large number lines ever year in their breeding program. Secondly, advancements in RNAi have made the availability of miRNA, siRNA, tasiRNA, natsiRNA, and hpRNA based approaches. These are being used to develop the virus-free cultivars and to manipulate metabolic pathways for improving agronomic traits. However, their use is limited by the unavailability of genetic transformation systems in some crop plant species and due to environmental and regulatory issues associated with GMO crops. Thirdly, TILLING has convincingly proved its potential in crop improvement in those species where genetic transformation is not possible but TILLING platforms and other databases are not available for every crop. Finally, CRISPR-Cas9 based genome editing technology is emerging as a potential tool to make desirable genetic corrections in the targeted gene(s) leading to generation of desirable alleles for improved types in crop plants. This trans-generational gene editing activity can serve as the source of novel variation in the progeny. Plant breeders can cross the plants expressing the gene edited constructs with their lines of interest (Wang et al. 2018c). This approach has been used successfully to develop nutritionally rich super sweet and waxy cultivars in maize and elite sticky varieties in rice by editing SH2/ WX and waxy genes, respectively (Dong et al. 2019; Yunyan et al. 2019). Among legumes, gene editing of E1 gene controlling soybean flowering provided novel mutants having early flowering under long day condition. This enabled development of cultivars suitable for high latitudes by using these novel mutants in crossing schemes (Han et al. 2019).

In 2018, a soybean strain with modified oil composition was harvested on a small scale, which had been constructed using TALEN-based genome editing. The cultivation area increased to approximately 17,000 hectares in 2019. The company Calyxt, which developed the soybean, is marketing it as an identity preserved product by contracting with farmers and purchasers. Calyxt developed the new soybean cultivars, distributed the seeds to contract farmers and commercialized the derived product (High Oleic Soybean Oil) as a high-quality food ingredient in 2019. High-oleic soybean oil contained about 80% oleic acid and up to 20% less saturated fat and had three times higher fry-life and extended shelf-life compared to commodity oils (Calyxt Inc 2020). Using this gene editing approach, Calyxt is expected to launch high-oleic and low-linolenic (HOLL) soybeans by 2023 (Calyxt Inc 2020). It is also targeting commercialization of gene edited products of high-fiber in wheat, cold-tolerance in oat, and pulses with improved protein profiles and flavor in coming years (Calyxt Inc 2020). Another USA based company namely, Yield10 Bioscience Inc. has received non-regulated status for CRISPR-Cas9 based genome edited products and is planning to commercialize the gene edited products for soybean, corn, sorghum, rice and wheat crops in coming years (Yield10 Bioscience Inc. 2020). Further genome edited plant products are in the pipelines of small and medium-sized biotech companies as well as other international plant breeders. So, more products will follow soon presumably without severe regulatory hurdles in the United States Food and Drug Administration (USFDA) since soybean and canola have successfully undergone this procedure without any major issues (USFDA 2020). Other companies are using CRISPR-Cas9, CRISPR-Cpf1, CRISPR-Cms1 and other gene editing techniques for genetic improvement in crop plants (see Table 7). Further, in future a haploid induction (HI) editing technology, HI-Edit may emerge as a direct editing tool of targeted genes in cereals, especially in wheat and maize (Kelliher et al. 2019). More efforts will be required to develop the genotype independent strategy for genome editing as developed in other crops using RNA viral vector-mediated genome-editing methodology (Ellison et al. 2020). However, a strong collaboration between plant breeders and molecular biologists is required in the public sector for routine exploitation of such advanced technologies in breeding programs to develop nutrient-rich and climate-resilient crops. Although different reverse genetic tools have their own potentials and drawbacks, their integrated use may become a boon for crop improvement. In future, advances in high-throughput multi-omics technologies will provide the opportunities for identification of the functionally characterized genes and their interactions underlying phenotypic variation (i.e., genetic mysteries). Thus, Interactome Big Data for functional genes along with machine learning will help to understand networks of functional genes underling economically important traits (Wu et al. 2021). This functional knowledge of genes along with germplasm, and genomic data will facilitate genomic-based breeding for developing the climate resilient and nutritionally rich cultivars for mitigating the current threats of climate change. Targeted traits like yield, quality, resistances to biotic and abiotic stresses, NUE, lists of the genes and germplasm for these targeted traits, genomic or specific gene selection technologies and breeding programs for implementation will be part of this genomic-based designed breeding (Li et al. 2018d). Overall, the focus should be on using the reverse genetics along with forward genetics approaches for making desirable genetic improvement in crop plants.

Table 7 Name of private sector companies that are commercializing genome editing products of crop plants Source: Taylor (2019).