Phylogenetic signal and bias in paleontology

doi:10.5281/zenodo.5506853

Published September 14, 2021 | Version v1

Other Open

Phylogenetic signal and bias in paleontology

1. University of Cambridge
2. Durham University

An unprecedented amount of evidence now illuminates the phylogeny of living mammals and birds on the Tree of Life. We use this tree to measure phylogenetic value of data typically used in paleontology (bones and teeth) from six datasets derived from five published studies. We ask three interrelated questions: 1) Can these data adequately reconstruct known parts of the Tree of Life? 2) Is accuracy generally similar for studies using morphology, or do some morphological datasets perform better than others? 3) Does the loss of non-fossilizable data cause taxa to occur in misleadingly basal positions? Adding morphology to DNA datasets usually increases congruence of resulting topologies to the well corroborated tree, but this varies among morphological datasets. Extant taxa with a high proportion of missing morphological characters can greatly reduce phylogenetic resolution when analyzed together with fossils. Attempts to ameliorate this by deleting extant taxa missing morphology are prone to decreased accuracy due to long-branch artefacts. We find no evidence that fossilization causes extinct taxa to incorrectly appear at or near topologically basal branches. Morphology comprises the evidence held in common by living taxa and fossils, and phylogenetic analysis of fossils greatly benefits from inclusion of molecular and morphological data sampled for living taxa, whatever methods are used for phylogeny estimation.

Notes

Supplementary Figures:

Figure S1. Percent completeness of morphological data in fossil templates from each study. Horizontal lines represent median, boxes middle quartiles, and whiskers range.

Figure S2. Congruence of artificial-extinction topologies with well-corroborated trees for each dataset, based on only templates of at least 53% complete (corresponding to the least complete template from Asher). Each horizontal bar shows the number of shared splits using strict consensuses (top Y-axis) or quartet similarity averaged across all MPTs (bottom Y-axis), averaged across all extant subjects per fossil template. Datasets are ordered from largest (left) to smallest (right) difference in median shared splits obtained by real vs. 01-randomized character states (see Table 2). Boxes denote median and interquartile range, whiskers the range; non-overlapping notches represent a significant difference in medians (Chambers et al. 1983). Characters missing in a fossil template were coded as missing in each extant subject. Remaining characters were coded with the real states in each template (blue), randomized using states drawn from different extant taxa (yellow, "noInfo"), or randomized with states 0 or 1 (red, "random01").

Figure S3. Majority Rule consensus (as shown by percentages adjacent to each node) of 11 topologies derived from the Pattinson dataset using equal and implied weighting values (k = 2, 4, 8, 16, 32, 64, 128, 256, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S4. Majority Rule consensus (as shown by percentages adjacent to each node) of 7 topologies derived from the Asher dataset using implied weighting concavity values (k = 8, 16, 32, 64, 128, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S5. Single MPT derived from Halliday-All dataset using implied weighting value (k = 2, 11026.82414 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S6. Majority Rule consensus (as shown by numbers adjacent to each node) of 11 topologies derived from the Huttenlocker dataset using equal and implied weighting values (k = 2, 4, 8, 16, 32, 64, 128, 256, 512, 999) that maximize quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S7. Single MPT derived from Livezey-Zusi dataset using implied weighting value (k = 4, 65443.9752 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Figure S8. Single MPT derived from Halliday-50 dataset using using implied weighting concavity value (k = 999, 46.99872 steps) that maximizes quartet similarity with well-corroborated tree of living taxa (Table 3). Fossils shown with "zz".

Supplementary Tables in "asherSmith_suppTablesS1-5.odt":

Table S1, Summary of taxon samples used to join morphology matrices of Asher, Halliday, Huttenlocker to DNA alignment of Upham, and morphology matrix of Livezey-Zusi to DNA alignment of Prum.

Table S2, R script for writing ArtEx TNT batch files

Table S3, R script for writing binary morph rand states (A) and rands for DNA sites (B) and taxa (C)

Table S4, R script for calculating congruence

Table S5, R script for calculating root-to-node distances

Appendices:

Appendix S1, Methods for matrix assembly & phylogenetic search strategies

Appendix S2, TNT matrices

filename	dataset	morphology source	DNA source
birdNC.tnt	birds	Livezey & Zusi (2006)	Prum et al. (2015)
pattinsonS1comb.tnt	primates	Pattinson et al. (2015), Seiffert et al. (2009)	Springer et al. (2012)
upham-asher.tnt	mammals Asher	Asher 2007	Upham et al. (2019)
upham-hallid.tnt	mammals Halliday-all	Halliday et al. 2019	Upham et al. (2019)
upham-hallid50.tnt	mammals Halliday-50	Halliday et al. 2019	Upham et al. (2019)
upham-hutt.tnt	mammals Huttenlocker	Huttenlocker et al. 2018	Upham et al. (2019)

Appendix S3, Newick topologies representing well-corroborated trees (Fig. 1), random samples of DNA sites (Fig. 7) and taxa known for DNA (Fig. 8), bifurcating full-data topologies of extant taxa used to calculate root-to-node distances (Fig. 11), and optimal topologies derived from individual datasets and the literature (Table 3; Figs. S3-S8).

Appendix S4, All MPTs resulting from ArtEx analyses (newick)

Appendix S5, Strict consensuses of MPTs derived from each ArtEx subject-template combination (newick)

Files

appendS2-matrices.zip

Files (11.5 MB)

Name	Size	Download all
appendS1_morphMatrixAssembl.odt md5:abdb89282260069b06c8a7f195509971	28.6 kB	Download
appendS2-matrices.zip md5:318a740856b8210122ef229bf3f30bc4	10.9 MB	Preview Download
appendS3.zip md5:3e654242f340fb8283ae6faf16220925	128.9 kB	Preview Download
asherSmith-suppFigs.pdf md5:a0160f2238016467772cf72afd16a4d6	324.1 kB	Preview Download
asherSmith_suppTablesS1-5.odt md5:e4a1c013065f886278d5c24603a3df2b	30.6 kB	Download
README_AsherSmith-suppdata.odt md5:ec5885e5a6ef6acdd55880228d0bbd54	23.7 kB	Download

Additional details

Is cited by: 10.1093/sysbio/syab072 (DOI)
Is derived from: 10.5061/dryad.w3r2280q3 (DOI)

	All versions	This version
Views	76	75
Downloads	252	222
Data volume	163.0 MB	153.3 MB

Phylogenetic signal and bias in paleontology

Creators

Description

Notes

Files

appendS2-matrices.zip

Files (11.5 MB)

Additional details

Related works