Do probability arguments refute evolution?

Introduction

In the past few decades, modern science has uncovered a world that is far vaster and more awe-inspiring than ever imagined before, and has uncovered a set of elegant natural laws that govern all the universe, deeply resonating with the notion of a cosmic lawgiver in Judeo-Christian religion. It not just devout believers who find these results inspiring. For example, 46% of Americans (including 54% of atheists, 55% of agnostics and 43% of nones) say that they experience a “deep sense of wonder about the universe” on at least a weekly basis [Masci2016]. With regards to a union between science, philosophy and religion, how could one ask for more?

Nonetheless, some writers, principally of the “creationist” and “intelligent design” schools, are not content with these grand revelations, and argue instead for a worldview that is often at odds with modern science. Old-earth geology and biological evolution are key targets of their objections, although for mostly historical reasons rather than for any fundamental issue (note there is no comparable opposition movement to quantum physics, organic chemistry, cetacean biology or any of a host of other scientific theories).

Both traditional creationists and intelligent design writers have invoked probability arguments in their criticisms of evolution. They argue that certain features of biology are so fantastically improbable that they could never have been produced by a purely natural, “random” process, even assuming the billions of years of history of geologic history. They further maintain that this line of reasoning can be seen by simple “back of the envelope” probability calculations that anyone familiar with basic mathematics can understand — why don’t scientists see the light?

For example, some writers equate the hypothesis of evolution to the absurd suggestion that monkeys randomly typing at a typewriter could compose a selection from the works of Shakespeare, or that an explosion in an aerospace equipment yard could produce a working airliner [Dembski1998; Foster1991; Hoyle1981; Lennox2009]. More recent studies of this genre argue that functional biology operates on an exceedingly small subset of the space of all possible DNA sequences, and that any changes to the “computer program” of biology are, like changes to human computer programs, almost certain to make the organism non-functional [Axe2017; Marks2017].

One creationist-intelligent design argument addresses the human alpha-globin molecule, a component of hemoglobin that performs a key oxygen transfer function. Alpha-globin is a protein chain based on a sequence of 141 amino acids; there are 20 different amino acids common in living systems, so the number of potential chains of length 141 is 20141, which is roughly 10183 (i.e., a one followed by 183 zeroes). These writers argue that this figure is so enormous that even after billions of years of random molecular trials, no human alpha-globin protein molecule would ever appear “at random,” and thus the hypothesis that human alpha-globin arose by a “random” evolutionary process is, in their thinking, decisively refuted [Foster1991, pg. 79-83; Hoyle1981, pg. 1-20; Lennox2009, pg. 163-173].

The treacherous world of probability and statistics

While not generally appreciated by the public at large, it is a well-known fact in the world of scientific research that arguments based on probability and statistics are fraught with potential fallacies and errors, and even trained researchers can sometimes fool themselves with invalid reasoning. For these reasons, rigorous courses in probability and statistics are now required of students in virtually all fields of science, and in numerous other disciplines as well, including law [Saini2009] and finance [Bailey2014]. Rigorous probability analyses have been a part of biology for over 100 years.

To illustrate the difficulties with probability arguments, mathematics teachers often ask their class (let’s say it has 30 students) if they think it is likely that two or more persons in the class have exactly the same birthday. Most students say that it is highly unlikely, thinking that the chances that two people have the same particular birthday is 1/365, and so 30 times this amount is only 30/365. But this argument is fallacious, since, for example, in a class of 30 students there are 435 pairs of students. When the probability calculation is done correctly for the case of 30 students [it is equal to 1 – (364/365 x 363/365 x … x 336/365)], one obtains 70.6%. In general, if there are 23 or more students in the class, then the chances that two or more have the same birthday is greater than 50%. For numerous other examples of how seemingly improbable “coincidences” can happen, see [Hand2014].

A similar probability fallacy is seen the alpha-globin argument, mentioned above. This argument ignores the fact that a large class of alpha-globin molecules can perform the essential oxygen transfer function, so that the computation of the probability of a single instance is misleadingly remote. Indeed, most of the 141 amino acids in alpha-globin can be changed without altering the key oxygen transfer function, as can be seen by noting the great variety in alpha-globin molecules across the animal kingdom (see DNA). When one revises the calculation above, based on only 25 locations essential for the oxygen transport function (which is a generous over-estimate), one obtains 1033 fundamentally different chains, a enormous figure but incomparably smaller than 10183. A calculation such as this can be refined further, taking into account other features of alpha-globin and its related biochemistry. But do the details of these calculations really matter? Let us examine this question in detail.

The fundamental fallacy of creationist probability arguments

Consider first the high-level process of scientific inference in applied mathematics:



As this chart indicates, the key initial step in the chain of applied mathematical reasoning is to formulate an accurate mathematical model of the physical or biological phenomenon in question. If this model is not carefully constructed, then the chain of inference is broken, and it matters not in the least how good or how bad are the remaining steps.

The alpha-globin example above, along with virtually all other such arguments in the creationist-intelligent design literature, suffers from exactly this problem: The argument presumes, as its fundamental mathematical model, that the human alpha-globin arose by a single all-at-once random shot event. But generating the alpha-globin molecule “at random” in a single shot is decidedly not the scientific hypothesis in question. (This “straw man” notion, by the way, is the creationist theory, not the scientific theory, of its origin.) Instead, available evidence from hundreds of published studies on the topic has demonstrated that alpha-globin arose as the end product of a long sequence of intermediate steps, each of which was biologically useful in an earlier context. See, for example, the survey article [Hardison2001], which cites 144 papers on the topic of hemoglobin evolution; many more have been published since this survey.

With regards to hemoglobin, it has long been noted that heme, the key oxygen-carrying component of hemoglobin, is remarkably similar to chlorophyll, the molecule behind photosynthesis. The principal difference is that heme has a central iron atom, whereas chlorophyll has a central magnesium atom; otherwise they are virtually identical. This similarity can hardly be a coincidence, and in fact researchers concluded long ago, based on several lines of evidence, that these two biomolecules must have shared a common lineage (meaning, of course, that organisms which incorporate these biomolecules must have shared a common lineage) [Hendry1980]. Here is a diagram of the two molecules [from MasterOrganicChemistry.com]:



In summary, it does not matter how good or how bad the mathematics used in the analysis of alpha-globin, given that the underlying probability model (an all-at-once random shot) is a fundamentally invalid description of the phenomenon in question. Any simplistic probability calculation of evolution that does not take into account the step-by-step process by which the structure came to be is deeply fallacious and misleading [Musgrave1998; Rosenhouse2018].

Is evolution a “random” process?

It is also important to keep in mind that the process of natural biological evolution is not really a “random” process. Evolution certainly has some “random” aspects, notably mutations and genetic events during reproduction. But the all-important process of natural selection, acting under the pressure of an extremely competitive landscape, often involving thousands of other individuals of the same species and other species as well, together with numerous complicated environmental pressures such as climate change, is anything but random. This strongly directional nature of natural selection, which is the essence of evolution, by itself invalidates most of these probability calculations.

What’s more, such calculations completely ignore the atomic-level biochemical processes involved, which often exhibit strong affinities for certain types of highly ordered structures. For example, molecular self-assembly occurs in DNA molecule duplication every time a cell divides. If we were to compute the chances of the formation of a human DNA molecule during meiosis, using a simple-minded back-of-the-envelope probability calculation similar to that mentioned above, the result would be something on the order of one in 101,000,000,000, which is far, far beyond the possibility of “random” assemblage. Yet this process occurs many times every day in the human body and in every other plant and animal species. In short, extreme back-of-the-envelope probabilities by themselves do not imply supernatural causation.

Snowflakes

Some of the difficulties with probability arguments against evolution can be illustrated by considering snowflakes. Bentley and Humphrey’s book Snow Crystals includes over 2000 high-resolution black-and-white photos of real snowflakes, each with intricate yet highly regular patterns that are almost perfectly six-way symmetric [Bentley1962]. A good online source with numerous high-resolution photographs has been compiled by Kenneth Libbrecht [Libbrecht2012]. Four of Bentley’s photos are shown below. By employing a reckoning based on six-way symmetry, one can calculate the chances that one of these structures can form “at random” as roughly one part in 102500. This probability figure is even more extreme than some that have appeared in the creationist-intelligent design literature. So is this proof that each individual snowflake has been designed by a supernatural intelligent entity? Obviously not.

The fallacy here, once again, is presuming an all-at-once random assembly of molecules. Instead, snowflakes, like biological organisms, are formed as the product of a long series of steps acting under well-known physical laws, and the outcomes of such processes very sensitively depend on the starting conditions and numerous environmental parameters. It is thus folly to presume that one can correctly reckon the chances of a given outcome by means of superficial probability calculations that ignore the processes by which they formed.

image #1
image #2

Computer programs emulating biological evolution

If there were some underlying principle that forbids biological structures from evolving novel features, surely this principle should be seen in computer simulations of evolution. But quite the opposite is true — such studies abundantly confirm that evolutionary processes really do work, not just in biology but in other arenas as well.

For example, as mentioned above, some critics have equated natural biological evolution to the absurd suggestion that some monkeys typing randomly at a keyboard could generate a passage of Shakespeare. Others have argued that any changes to a “computer program” would surely render the program unusable. But these too are fallacious arguments, since they ignore the all-important process of natural selection. As a single example, a 2009 study by the present author exhibited results of a computer program simulating natural evolution, which “evolved” segments of English text very much akin to actual passages from Charles Dickens. In many instances, a class of college students were unable to distinguish the computer-generated text segments from real text segments taken from Dickens’ Great Expectations. See English-text for details.

Another example is the recent rise of “genetic algorithms” and “evolutionary computing,” namely computer programs that mimic the process of biological evolution to produce novel solutions to scientific and engineering problems. As a single example, in 2017 Google researchers generated 1000 image recognition algorithms, each of which were trained using state-of-the-art deep neural networks to recognize a selected set of images. They then used an array of 250 computers, each running two algorithms, to identify an image. Only the algorithm that scored higher proceeded to the next iteration, where it was changed somewhat, mimicking mutations in biological evolution. The researchers found that their scheme could achieve accuracies as high as 94.6%, better than human efforts [Gershgorn2017]. In another Google-funded research project, a computer was programmed with only the rules of Go, together with an evolution-style “deep learning” algorithm, and then had the program play games against itself. Within a few days it had advanced to the point that it defeated an earlier Google program 100 games to zero. This earlier program, in turn, had previously defeated the world’s champion human Go player [Greenmeier2017].

Improbable structures and features in nature

Numerous examples from the natural world can be cited to demonstrate the futility of trying to argue against evolution using probability — nature can and often does produce highly improbable structures and features, by the well-understood evolutionary processes of mutation, shuffling of genes and natural selection:

  1. Lenski’s 2012 E. coli experiment: In January 2012, a research team led by Richard Lenski at Michigan State University demonstrated that colonies of viruses can evolve a new trait in as little as 15 days. The researchers studied a virus, known as “lambda,” which infects only the bacterium E. coli. They engineered a strain of E. coli that had almost none of the molecules that this virus normally attaches to, then released them into the virus colony. In 24 of 96 separate experimental lines, the viruses evolved a strain that enabled them to attach to E. coli, using a new molecule that they had never before been observed to utilize. All of the successful runs utilized essentially the same set of four distinct mutations. Justin Meyer, a member of the research team, noted that the chances of all four mutations arising “at random” in a given experimental line (based on a simplistic back-of-the-envelope probability calculation typical of creationist literature) are roughly one in 1027 (one thousand trillion trillion) [Zimmer2012]. Note also that the chances for this to happen in 24 out of 96 experimental lines are roughly one in 10626.
  2. Synthesis of RNA nucleotides and other biomolecules: Many scientists hypothesize that RNA (a molecule similar to DNA, with four “bases” or nucleotides) was involved in the origin of life (see Origin). As recently as 1999, the appearance of RNA nucleotides on the primitive Earth was widely thought to be a “near miracle” by researchers in the field [Joyce1999]. Nonetheless, in May 2009 a team led by John Sutherland of the University of Manchester discovered a particular combination of chemicals, very likely to have been plentiful on the early Earth, that synthesized the RNA nucleotides cytosine and uracil, which are known as the pyrimidines [Wade2009]. More recently, in 2016 a team led by Thomas Carell in Munich, Germany succeeded in synthesizing in the remaining two, adenine and guanine, which are known as the purines [Service2016]. Finally, in 2018, Carell’s team demonstrated a single process that created all four nucleotides [Service2018]. In short, the natural production of the four RNA nucleotides, once thought to be “impossible,” is now fairly well understood.
  3. Hawaiian crickets. In the 1800s, a species of cricket was introduced to the Hawaiian Islands, where they became quite common in grassy areas. Males attract mates with their chirps, and females select males based on their songs. However, unlike their counterparts in other Pacific islands, the Hawaiian crickets have a fearsome predator — dive-bombing flies that target chirping crickets, then implant their larvae in them. In the 1990s, researchers noted that a field in Kauai that previously was the home to many crickets now seemed silent. However, a nighttime search found that in fact there were lots of crickets there, but very few of the males now chirped — in just five years, or roughly 20 generations, a mutation had arisen that inhibited many of the males from chirping, thus protecting the population [Zuk2013, pg. 81-82].

  4. Recent human evolution. A 2010 study found a total of 30 recently evolved genetic variants that are distinct among natives of Tibetan highlands, permitting them to live well at very high altitudes (more efficient metabolism, avoid overproducing red blood cells in thin air, etc.). They concluded that these changes constitute the fastest documented case of human evolution [Wade2010b]. Some other instances of recent human evolution include: (a) a genetic adaptation has arisen among Southern Chinese that makes them more resistant to alcohol, at the cost of turning red in the face; (b) gene changes in Eskimo populations that help them better cope with bitter cold; (c) genetic changes among certain primitive farming people that enhance vitamin B9 production, which is absent in their tuber diets; (d) two genetic mechanisms for lighter skin color, which among higher-latitude people promotes vitamin D production — Europeans have one, while East Asians have another; (e) a gene that promotes hair with thicker shafts among East Asians, possibly as added protection against cold [Wade2010c].

Many other examples could be listed — see Novelty and Origin.

Dembski’s information theory arguments

Intelligent design writer William Dembski invokes both probability and information theory (the mathematical theory of information content in data) in his arguments against Darwinism [e.g., Dembski2002; Marks2017]. However, mathematicians who have examined Dembski’s works have identified major flaws in his reasoning [Elsberry2011]. For a detailed discussion of Dembski’s theories, see Information theory.

Does creationism provide a reasonable alternative?

Does a creationist worldview, in particular the hypothesis of independent creation of each species with no common biological ancestry, provide a reasonable alternative in terms of probability?

Here it is instructive to consider transposons or “jumping genes,” namely sections of DNA that have been “copied” from one part of an organism’s genome and “pasted” seemingly at random in other locations. The human genome, for example, has over four million individual transposons in over 800 families [Mills2007]. In most cases transposons do no harm, because they “land” in an unused section of DNA; and some introns have subsequently adopted biological functionality (although for the purposes of this discussion it does not matter in the least whether or not they have biological functionality). But because they are distinctive and inherited, they serve as excellent markers for genetic studies. Indeed, transposons have been used to classify a large number of vertebrate species into a family tree, with a result that is virtually identical to what biologists had earlier reckoned based only physical features and biological functions [Rogers2011, pg. 25-31, 86-92]. As just one example, consider the following table, where columns labeled ABCDE denote five blocks of transposons, and x and o denote that the block is present or absent in the genome [Rogers2011, pg. 89].

						Transposon blocks
			Species		A	B	C	D	E
        /---------	Human		o	x	x	x	x
       /----------	Bonobo		x	x	x	x	x
      / \---------	Chimp		x	x	x	x	x
     /------------	Gorilla		o	o	x	x	x
-----|------------	Orangutan	o	o	o	x	x
     \------------	Gibbon		o	o	o	o	o

It is clear from these data that our closest primate relatives are chimpanzees and bonobos. As another example, here is a classification of four cetaceans (ocean mammals) based on transposon data [Rogers2011, pg. 27]:

						Transposon blocks
		Species			A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P
    /------	Bottlenose dolphin	x  x  x  x  x  x  x  x  x  x  x  x  x  x  x  x
   /\------	Narwhal whale		x  x  x  x  x  x  x  x  x  x  x  x  x  x  x  x
---|-------	Sperm whale		x  x  x  x  x  o  o  o  o  o  o  o  o  o  o  o
   \-------	Humpback whale		x  x  o  o  o  o  o  o  o  o  o  o  o  o  o  o

Other examples could be listed, encompassing an even broader range of species [Rogers2011, pg. 25-31, 86-92].

Needless to say, these data, which all but scream “descent from common ancestors,” are highly problematic for creationists and others who hold that the individual species were separately created without common biological ancestry. Transposons typically are several thousand DNA base pair letters long, but, since there are often some disagreements from species to species, let us be very conservative and say only 1000 base pair letters long. Then for two species to share even one transposon starting at the same spot, presumably only due to random mutations since creation, the probability (according to the creationist hypothesis) is one in 41000 or roughly one in 10600. For 16 such common transposons, the chances are one in 416000 or roughly one in 109600. What’s more, as mentioned above, an individual species typically has at least several hundred thousand such transposons. Including even part of these in the reckoning would hugely multiply these odds.

But this is not all, because we have not yet considered the fact that in each diagram above, or in other tables of real biological transposon data, there is a clear hierarchical relationship. This is by no means assured, and in fact is quite improbable — for almost all tables of “random” data, there is no hierarchical pattern, and no way to the rearrange the rows to be in a hierarchical pattern. For example, in a computer run programmed by the present author, each column of the above cetacean table was pseudorandomly shuffled (thus maintaining the same number of x and o in each column), and the program checked whether the rows of the resulting table could be rearranged to be in a hierarchical order. There were no successes in 10,000,000 trials. As a second experiment, a 4 x 16 table of pseudorandom data (with a 50-50 chance of x or o) was generated, and then the program attempted to rearrange the rows to be in a hierarchical pattern as before. There were only three successes in 10,000,000 trials.

Like the calculations mentioned earlier, these calculations are simplified and informal; more careful reckonings can be done, and one can vary the underlying assumptions. But, again, do the fine details of the calculations really matter? One way or the other, it is clear that the creationist hypothesis of separate fiat creation of individual species does not resolve any probability paradoxes; instead it enormously magnifies them.

The only other possibility, from a strict creationist worldview, is to posit that a supreme being separately created species with hundreds of thousands of transposons already in place, essentially just as we see them today. But this merely replaces a scientific failure (the inability of the creationist model to explain the vast phylogenetic patterns in intron data) with a theological disaster (why did a truth-loving supreme being fill the genomes of the entire biological kingdom with vast amounts of misleading DNA evidence, all pointing unambiguously to an evolutionary descent from common ancestors over the eons, if that is not the conclusion that should be drawn?). Indeed, with regards to the discomfort some have about evolution, the creationist alternative of separate creation is arguably far worse, both scientifically and theologically. See Deceiver for additional discussion.

Conclusions

It is certainly true that there are some perplexing structures in biology. How did these structures first arise, and how did they evolve to the forms we see today? For that matter, was the origin of the first self-reproducing biomolecules on the early Earth an inevitable development, or was it a fantastically improbable event unlikely to have been repeated anywhere else in Milky Way galaxy? (See Origin.) Along this line, how likely is it that intelligent, technological societies have arisen elsewhere? Why have we not yet seen clear-cut evidence of such extraterrestrial societies (see Fermi’s paradox)? In spite of many published papers on these and similar topics, researchers in these fields would be the first to acknowledge that there is still much that is not yet fully understood. This is a large, dynamic field with many active researchers, spanning numerous scientific disciplines.

However, the simplistic, back-of-the-envelope arguments based on probability or information theory that have appeared in the creationist-intelligent design literature do not help unravel these profound questions. Instead, these arguments are riddled with serious fallacies and are ultimately invalid, as we have seen above. These difficulties include:

  1. Many of these arguments presume a mathematical model where the structure in question came into existence “at random” via a single-shot chance assemblage of atoms. But this is not the scientific hypothesis of how they formed (they were the result of a long series of intermediate steps over the eons). Thus all such “straw man” arguments are fatally flawed from the beginning.
  2. Some of these arguments rely on sophisticated mathematical calculations, but given that the underlying probability model is an invalid description of the phenomenon in question, it does not matter in the slightest how good or how bad these reckonings are.
  3. Many of these arguments apply faulty reasoning, such as by ignoring the fact that a very wide range of biomolecules could perform a similar function to the given biomolecule. Thus the odds given against the formation of the given biomolecule are often greatly exaggerated.
  4. These arguments typically ignore the fact that biological evolution is fundamentally not a purely “random” process — mutations may be random, but natural selection is far from random.
  5. These arguments typically ignore reams of evidence from the natural world that evolution can and often does produce highly improbable structures and features.
  6. Some writers attempt to invoke advanced mathematical concepts (e.g., information theory), but derive highly questionable results and misapply these results in ways that render the conclusions invalid in an evolutionary biology context.
  7. The creationist hypothesis of separate creation for each species does not resolve any probability paradoxes; instead it enormously magnifies them.

It is ironic that to the extent that probability-based arguments have any validity at all in analyses of evolution, it is precisely the creationist hypothesis of separate, all-at-once complete formation of individual species or features that is falsified.

Perhaps at some time in the distant future, a super-powerful computer will be able simulate with convincing fidelity the multi-billion-year biological history of the Earth, in the same way that scientists today attempt to simulate (in a much more modest scope) the Earth’s weather and climate. Then, after thousands of such simulations have been performed, researchers might obtain some meaningful statistics on the chances involved in the origin of life on Earth, or in the formation of some class of biological structures such as hemoglobin. Perhaps also researchers will eventually reconstruct, in the laboratory, additional key steps in the formation of the most primitive biomolecular structures involved in the origin of life. And perhaps one day in the not-too-distant future, researchers will even discover forms of life or even advanced civilizations on other planets.

Until that time, the probability and information theory calculations that appear in creationist-intelligent design literature and elsewhere should be viewed with great skepticism, to say the least. Do these writers, or others who believe that such lines of reasoning have merit, really think that the entire edifice of modern evolutionary biology can be felled with one or two back-of-the-envelope probability calculations, without regard to the massive bodies of evidence that point in the opposite direction, and do they further hold that tens of thousands of research biologists, eager to publish groundbreaking research, have all overlooked such simple arguments? Common sense says otherwise. As mathematician Jason Rosenhouse writes [Rosenhouse2018],

When biologists ascribe to evolution the ability to craft information-rich genomes, they are neither speculating nor guessing. The basic components of evolutionary theory are empirical facts. Genes really do mutate, sometimes leading to new functionalities. The process of gene duplication with subsequent divergence leads to the creation of information by any reasonable definition of the terms. Selection can string small variations together into directional change. On a small scale, this has all been observed. And if small increases in information are an empirical reality on human timescales, then what abstract principle of mathematics is going to rule out much larger increases on geological scales?

Then here come the ID [intelligent design] folks, full of swagger and bravado. They say the accumulated empirical evidence must yield before their back-of-the-envelope probability calculations and abstract mathematical modeling. Evolution should be abandoned in favor of the new theory of intelligent design. This theory states, in its entirety, that an intelligent agent of unspecified motives and abilities did something at some point in natural history. Not very useful.

In a larger context, one has to question whether highly technical issues such as biomolecular structures or calculations of probabilities have any place in a discussion of philosophy or religion. Surely such technical arguments can say nothing one way or the other about whether a supreme being was involved in the creation of life on Earth. For example, one can search in vain for even a single passage in the Bible or other scriptures that is written in the highly rigorous, quantitative style of a modern scientific paper, much less a mathematical treatise. So why attempt to “prove” a point with probability arguments, especially when there are very serious questions as to whether any of this reasoning is valid? One is reminded of a passage in the New Testament: “For if the trumpet gives an uncertain sound, who shall prepare himself for the battle?” [1 Cor. 14:8]. It makes far more sense to leave such matters to peer-reviewed scientific research.

For additional information, see
Complexity, Creationism, Deceiver, Design, DNA, English text, Evolution evidence, God of the gaps, Information theory, Intelligent design, Novelty, Origin and Theory.

Comments are closed.