Leopard genome sequencing and assembly
We built the reference leopard genome from a muscle sample obtained from a female Amur leopard from the Daejeon O-World of Korea (Additional file
1: Supplemental Methods for details of species identification using mitochondrial DNA (mtDNA) gene analysis; Additional file
2: Figure S1). The extracted DNA was sequenced to 310× average depth of coverage using Illumina HiSeq platforms (Additional file
3: Tables S1 and S2). Sequenced reads were filtered and then error-corrected using a
K-mer analysis. The size of the leopard genome was estimated to be ~2.45 Gb (Additional file
1: Supplemental Methods for details; Additional file
2: Figure S2; Additional file
3: Table S3). The error-corrected reads were assembled using SOAPdenovo2 software [
21] into 265,373 contigs (N50 length of 21.0 kb) and 50,400 scaffolds (N50 length of 21.7 Mb), totaling 2.58 Gb in length (Additional file
1: Supplemental Methods for details; Additional file
3: Table S4). Additionally, 393,866 Illumina TruSeq synthetic long reads [
22] (TSLRs, 2.0 Gb of total bases; ~0.8×) were obtained from two wild Amur leopard individuals (Additional file
3: Tables S5 and S6) and were used to correct erroneous gap regions. The GC content and distribution of the leopard genome were very similar to those of the tiger and domestic cat genomes (Additional file
2: Figure S3), indicating little sequencing and assembly bias. We successfully predicted 19,043 protein-coding genes for the leopard genome by combining
de novo and homologous gene prediction methods (Additional file
3: Table S7; see “
Methods”). In total, 39.04 % of the leopard genome were annotated as transposable elements (Additional file
1: Supplemental Methods for details; Additional file
3: Table S8), which is very similar in composition to the other felid species [
16,
18,
19]. Assembly quality was assessed by aligning the short sequence reads onto the scaffolds (99.7 % mapping rate) and compared with other Felidae species assemblies (cat, tiger, cheetah, and lion) using common assembly metrics (Additional file
3: Tables S9 and S10). The genome assembly and annotation completeness were assessed by the commonly used single-copy ortholog mapping approach [
23] (Additional file
3: Table S11). The leopard genome showed the longest continuity and highest accuracy among the big cat (
Panthera species and cheetah) genome assemblies. Two additional wild Amur leopards from the Russian Far East and a wild Amur leopard cat from Korea were whole genome re-sequenced (Additional file
3: Tables S5 and S12), and were used together with previously reported whole genome data of other felid species [
16] for comparative evolutionary analyses.
Evolutionary analysis of carnivores compared to omnivores and herbivores
To investigate the genomic adaptations to different diets and their associated lifestyles, we performed an extensive orthologous gene comparison among eight carnivorous (leopard, cat, tiger, cheetah, lion, polar bear, killer whale, and Tasmanian devil), five omnivorous (human, mouse, dog, pig, and opossum), and five herbivorous mammalian genomes (giant panda, cow, horse, rabbit, and elephant; Additional file
1: Supplemental Methods for details of species selection criteria; Additional file
3: Table S13). These comparisons revealed numerous genetic signatures consistent with molecular adaptations to a hypercarnivorous lifestyle.
Of the 15,589 orthologous gene families found in the leopard assembly, 11,748 were also found in the other four Felidae genomes and 8648 in the complete set of 18 mammalian genomes across all three dietary groups (Fig.
1a and Additional file
2: Figure S4). The leopard genome displayed 188 expanded and 313 contracted gene families compared with the common ancestor of leopard and lion (Fig.
1b and Additional file
2: Figure S5). The common ancestor of Felidae species showed 52 expanded and 567 contracted gene families compared to the common ancestor of carnivorans. In particular, Felidae expanded gene families were enriched in muscle myosin complex (GO:0005859, nine genes,
P = 1.14 × 10
–13 by EASE scores [modified Fisher’s exact test] with a 10 % false discovery rate [FDR]) and actin cytoskeleton (GO:0015629, 14 genes,
P = 4.71 × 10
–9) functions that are associated with muscle contraction and motor activity (Additional file
3: Tables S14 and S15). Conversely, Felidae clearly showed contracted gene families in starch and sucrose metabolism pathway (
P = 5.62 × 10
–7; Additional file
3: Tables S16 and S17). Notably, the common ancestor of the Carnivora order (compared to the common ancestor of carnivorans and horse) and killer whale (compared to the common ancestor of killer whale and cow) also had contracted gene families associated with starch and sucrose metabolism (
P = 0.0000032 and
P = 0.00048, respectively; Additional file
3: Tables S18–S25), whereas Tasmanian devil (a well-known scavenger as well as a meat-eating carnivore [
24]) did not (compared to the common ancestor of Tasmanian devil and opossum; Additional file
3: Tables S26–S29). UDP-glucuronosyltransferase (UGT) 1 and 2 families playing an important role in detoxification and homeostatic functions were markedly contracted in the carnivores (Fig.
2a and Additional file
3: Table S30). This is in contrast to herbivores that must have acquired detoxification pathways to protect themselves against plant-derived toxicants. It is very likely that the low dietary content of these plant-derived toxicants in carnivores is a major factor in the UGT 1 and 2 contractions in carnivores [
25,
26]. However, the UGT3 family, which is involved in the conjugation with
N-acetylglucosamine and glucose [
27], was expanded only in the Felidae genomes.
UGT8A1 that is involved in conjugation of ceramides and bile acids with galactose [
28] was conserved (in terms of gene copy number) in all 18 mammals. Additionally and expectedly, amylase gene families (AMY1 and AMY2), which catalyze dietary starch and glycogen, were contracted in the carnivores (Additional file
2: Figure S6; Additional file
3: Table S30), providing a genetic mechanism for the very low levels of salivary amylase observed in cats [
29].
It is known that cats lack the ability to synthesize sufficient amounts of vitamin A and arachidonic acid, making them essential [
30]. Interestingly, cytochrome P450 (CYP) family genes, which are involved in retinol/linoleic acid/arachidonic acid catabolism, were commonly contracted in all the carnivorous diet-groups (Felidae, Carnivora order, killer whale, and Tasmanian devil; Additional file
3: Tables S18–S29). Retinoic acid converted from retinol is essential for teeth remineralization and bone growth [
31,
32] and arachidonic acid promotes the repair and growth of skeletal muscle tissue after physical exercise [
33]. We speculate that the contraction of CYP family genes may help carnivores to keep sufficient levels of retinol and arachidonic acid concentration on their body and, therefore, they could have evolved to possess strong muscle, bone, and teeth for successful hunting.
Although carnivores derive their energy and nutrient requirements primarily from animal tissues, they also require regulatory mechanisms to ensure an adequate supply of glucose to tissues, such as the brain [
34]. The glucokinase (GCK) enzyme is responsible for regulating the uptake and storage of dietary glucose by acting as a glucose sensor [
35]. The mutations in gene for glucokinase regulatory protein (
GCKR) have effects on glucose and lipid homeostasis; and GCK and glucokinase regulatory protein (GKRP, encoded by
GCKR gene) have been suggested as a target for diabetes treatment in humans [
35]. It was predicted that
GCKR is pseudogenized by frame-shift mutations in multiple mammalian genomes including cat [
36]. We confirmed that
GCKR is also pseudogenized by frame-shift mutations in all other felids (leopard, tiger, lion, cheetah, snow leopard, and leopard cat; Additional file
2: Figure S7). Interestingly,
GCKR genes of killer whale and domestic ferret (another obligate carnivore not used in this study) [
37] were also pseudogenized by pre-matured and/or frame-shift mutations, whereas polar bear and Tasmanian devil have an intact
GCKR (Additional file
3: Table S31). It has been suggested that carnivores may not need to remove excess glucose from the circulation, as they consume food containing large amounts of protein and little carbohydrate [
36]. Among the non-carnivorous animals,
GCKR genes of cow and opossum were predicted to be pseudogenized. In the case of cow, it was speculated that ruminant animals use volatile fatty acids generated by fermentation in their foregut as main energy source and they may not need to remove excess glucose actively [
36]. Therefore, the evolutionary loss of
GCKR and the accompanying adaptation of the glucose-sensing pathway to carnivory will help us to better understand the abnormal glucose metabolism that characterizes the diabetic state [
34].
To detect genes evolving under selection for a diet specialized in meat, we performed tests for deviations in the
d N /
d S ratio (non-synonymous substitutions per non-synonymous site to synonymous substitutions per synonymous site, branch model) and likelihood ratio tests (branch-site model) [
38,
39]. A total of 586 genes were identified as positively selected genes (PSGs) in the leopard genome (Additional file
4: Datasheet S1). The leopard PSGs were functionally enriched in GTP binding (GO:0005525, 24 genes,
P = 0.00013), regulation of cell proliferation (GO:0042127, 39 genes,
P = 0.00057), and macromolecule catabolic process (GO:0009057, 38 genes,
P = 0.00096; Additional file
3: Table S32). Additionally, 228 PSGs were shared in the Felidae family (cat, tiger, lion, cheetah, and leopard); we defined shared PSGs as those that are found in two or more species (Additional file
4: Datasheet S2). The shared PSGs of Felidae were enriched in polysaccharide binding (GO:0030247, eight genes,
P = 0.00071), lipid binding (GO:0008289, 12 genes,
P = 0.0041), and immune response (GO:0006955, 16 genes,
P = 0.0052; Additional file
3: Table S33). Since felid species are hypercarnivores [
3], selection of the lipid binding associated genes may be associated to their obligatory carnivorous diet and regulation of lipid and cholesterol homeostasis [
16,
40]. We further identified shared PSGs in the eight carnivores (PSGs in three or more species), five omnivores (PSGs in two or more species), or five herbivores (PSGs in two or more species). A total of 184, 221, and 136 genes were found as shared PSGs among carnivores, omnivores, and herbivores, respectively (Additional file
4: Datasheets S3–S5). The carnivores’ shared PSGs were significantly enriched in motor axon guidance (GO:0008045, three genes,
P = 0.0050; Additional file
3: Table S34).
CXCL12 (stromal cell-derived factor 1), which was found as a shared PSG in carnivores, is known to influence the guidance of both migrating neurons and growing axons.
CXCL12/
CXCR4 signaling has been shown to regulate motor axon projection in the mouse [
41,
42]. Two other carnivore-shared PSGs,
DMP1 and
PTN, are known to play an important role in bone development and repair [
43,
44]. In contrast, there was no significant positive selection of the muscle and bone development associated genes in the omnivores and herbivores. Instead, several immune associated functional categories, such as response to cytokine stimulus, cytokine activity, and regulation of leukocyte activation, were enriched in omnivores and herbivores (Additional file
3: Tables S35–S38).
If adaptive evolution affects only a few crucial amino acids over a short time period, none of the methods for measuring selection is likely to succeed in defining positive selection [
45]. Therefore, we investigated target species-specific amino acid changes (AACs) using 15 feline (three leopards, three lions, a snow leopard, three tigers, two leopard cats, a cheetah, and two cats; Additional file
3: Table S39) and additional 13 mammalian genomes. A total of 1509 genes in the felids were predicted to have at least one function altering AAC (Additional file
4: Datasheet S6). Unexpectedly but understandably, the Felidae-specific genes with function altering AACs were enriched in response to DNA damage stimulus (GO:0006974, 53 genes,
P = 7.39 × 10
–7), DNA repair (GO:0006281, 41 genes,
P = 0.000011), and cellular response to stress (GO:0033554, 63 genes,
P = 0.00016; Additional file
2: Figure S8; Additional file
3: Tables S40 and S41). Interestingly, three genes (
MEP1A,
ACE2, and
PRCP), which are involved in the protein digestion and absorption pathway, had function altering AACs specific to Felidae species (Additional file
2: Figures S9–S11). We interpret this result as a dietary adaptation for high meat consumption that is associated with an increased risk of cancer in humans [
46], and that the heme-related reactive oxygen species (ROS) in meat cause DNA damage and disrupt normal cell proliferation [
47,
48]. We speculate that the functional changes found in DNA damage and repair associated genes help reduce diet-related DNA damage in the felid species. This possible felid’s genetic feature can lead to better understanding of human dietary and health research [
34].
We also identified convergent AACs in the carnivores (Felidae, polar bear, killer whale, and Tasmanian devil) and herbivores (giant panda, cow, horse, rabbit, and elephant). Only one embigin (
EMB) gene had a convergent AAC in the carnivores (except Tasmanian devil) and there was no convergent AAC in the herbivores (Fig.
2b), congruent with the suggestion that adaptive molecular convergence linked to phenotypic convergence is rare [
49]. Interestingly,
EMB, which was predicted to be functionally altered in the three carnivore clades, is known to play a role in the outgrowth of motor neurons and in the formation of neuromuscular junctions [
50]. We confirmed that the AAC in
EMB gene is also conserved in the domestic ferret. Additionally, 18 and 56 genes were predicted to be carnivore-specific and herbivore-specific functions, respectively, altered by at least one AAC (Additional file
4: Datasheets S7 and S8). Among the carnivore-specific function altered genes, several genes are known to be associated with muscle contraction (
TMOD4 and
SYNC) and steroid hormone synthesis (
STAR).
Family-wide highly conserved regions
Conservation of DNA sequences across species reflects functional constraints and, therefore, characterizing genetic variation patterns is critical for understanding the dynamics of genomic change and relevant adaptation of each and a group of species [
51,
52]. We scanned for homozygous genomic regions, which are strongly conserved among species within families: Felidae (cat, tiger, lion, cheetah, leopard, snow leopard, and leopard cat, divergence time: ~15.9 million years ago [MYA], carnivores), Hominidae (human, chimpanzee, bonobo, gorilla, and orangutan, ~15.8 MYA, omnivores), and Bovidae (cow, goat, sheep, water buffalo, and yak, ~26 MYA, herbivores) [
53–
55]. These highly conserved regions (HCRs) represent reduction in genetic variation (homozygous regions shared among species belonging to the same family; Fig.
3 and Additional file
3: Tables S39 and S42). A total of 1.13 Gb of Felidae, 0.93 Gb of Hominidae, and 0.88 Gb of Bovidae HCRs were detected with significantly reduced genetic variation (adjusted
P < 0.0001, Fisher’s exact test corrected using the Benjamini–Hochberg method; Additional file
3: Table S43) compared with other genomic regions. A total of 4342 genes in the HCRs were shared in all three families and these genes were enriched in many key biological functions (cell cycle, pathways in cancer, proteasome, and Hedgehog signaling pathway; Fig.
3 and Additional file
3: Tables S44 and S45) as expected. We then investigated family-specific genes (1436 in Felidae, 2477 in Hominidae, and 1561 in Bovidae) in the HCRs. The Felidae-specific genes were significantly enriched in sensory perception of light stimulus (GO:0050953, 27 genes,
P = 0.0022), synaptic transmission (GO:0007268, 33 genes,
P = 0.0044), transmission of nerve impulse (GO:0019226, 37 genes,
P = 0.0054), and axon guidance pathway (20 genes,
P = 0.0054; Additional file
3: Tables S46 and S47), hinting to adaptation for the fast reflexes found in cats. Notably, the Felidae-specific genes were also functionally enriched for carbohydrate biosynthetic process (GO:0016051, 18 genes,
P = 0.00061). This may be related to the predatory feeding pattern of felids (a meat-based diet, so low dietary availability of carbohydrates). On the other hand, the Bovidae-specific genes were enriched in sensory perception of smell (GO:0007608, 82 genes,
P = 2.44 × 10
–16) and cognition (GO:0050890, 113 genes,
P = 2.54 × 10
–9; Additional file
3: Tables S48–S50) functions, indicating herbivores’ adaptation for defense mechanisms from being poisoned by toxic plants [
56].
Genetic diversity and demographic history of Felidae species
Carnivores tend to have smaller population sizes than species belonging to lower trophic groups, a characteristic argued to be associated with a higher propensity for extinction [
1,
2]. We have investigated genetic diversity (which is affected by population size) in Felidae and compared it to different dietary requirement groups, omnivorous Hominidae and herbivorous Bovidae. The Felidae genetic diversity (0.00094 on average), based on the heterozygous single nucleotide variation (SNV) rates, is much lower than those of Hominidae (0.00175) and Bovidae (0.00244; Fig.
4a and Additional file
3: Tables S39 and S42). In terms of genomic similarity, Felidae showed the smallest genetic distances (0.00102 on average; see “
Methods”), whereas larger genetic distances were detected in Hominidae (0.00141 on average) and Bovidae (0.00133 on average), suggesting that the extreme dietary specialization in the felids imposes strong and similar selection pressures on its members [
1,
2]. The heterozygous SNV rates of leopards (0.00047–0.00070) are similar to those of snow leopard (0.00043), cheetah (0.00044), and white lion (0.00063), which have extremely low genetic diversity due to isolation or inbreeding [
16,
19,
57], and smaller than those of lions (0.00074–0.00148) and tigers (0.00087–0.00104). The smaller cat (two leopard cats, 0.00173–0.00216) displays relatively high genetic diversity compared with the larger big cats, as previously reported [
58]. Additionally, the demographic histories of felid species (leopards, tiger, cheetah, lion, snow leopard, and leopard cat) were constructed using a pairwise sequentially Markovian coalescent (PSMC) model inference [
59]. The leopard cat showed a very different demographic history from the big cats: the population size of leopard cats increased between 10 million and 2 million years ago, whereas other big cats showed a consistent population decrease (Fig.
4b). It is predicted that the leopards experienced a severe genetic bottleneck between 2 million to 900 K years ago, whereas other big cats did not. The three leopard genomes showed a similar demographic history. However, over the last 30 K years, the assembled leopard genome showed an explosion in effective population size, whereas the wild leopards did not. The relatively large effective population size likely reflects that admixture occurred very recently between Amur leopard and North-Chinese leopard (
P. pardus japonensis), as confirmed by the pedigree information (~30 % North-Chinese leopard admixture) and mitochondrial sequence analyses (Additional file
2: Figure S1), rather than an actual increase in population size. Cheetah and snow leopard showed low levels of effective population size in the last 3 million years, confirming their low genetic diversity [
16,
19].