Johan Sebastian Bach (1685-1750) regularly garners the top spot in listings of the greatest Western classical composers, typically followed by Mozart and Beethoven. Certainly in terms of sheer volume of compositions, Bach reigns supreme. The Bach-Werke-Verzeichnis (BWV) catalogue lists 1128 compositions, from short solo pieces to the magnificent Mass in B Minor (BWV 232) and Christmas Oratorio (BWV 248), far more than any other classical composer. Further, Bach’s clever, syncopated style led the way to twentieth century musical innovations, notably jazz.

There does seem to be a

Continue reading Using network theory to analyze Bach’s music

]]>Johan Sebastian Bach (1685-1750) regularly garners the top spot in listings of the greatest Western classical composers, typically followed by Mozart and Beethoven. Certainly in terms of sheer volume of compositions, Bach reigns supreme. The Bach-Werke-Verzeichnis (BWV) catalogue lists 1128 compositions, from short solo pieces to the magnificent Mass in B Minor (BWV 232) and Christmas Oratorio (BWV 248), far more than any other classical composer. Further, Bach’s clever, syncopated style led the way to twentieth century musical innovations, notably jazz.

There does seem to be a credible connection between the sort of mental gymnastics done by a mathematician and by a musician. To begin with, there are well-known mathematical relationships between the pitch of various notes on the musical keyboard. But beyond mere analysis of pitches, it is clear that the arena of musical syntax and structure has a very deep connection to the sorts of syntax, structure and other regularities that are studied in mathematical research. Bach and Mozart in particular are well-known for music that is both “mathematically” beautiful and structurally impeccable.

Just as some of the best musicians and composers are “mathematical,” so too many of the best mathematicians are quite musical. It is quite common at evening receptions of large mathematical conferences to be serenaded by concert-quality musical performers, who, in their day jobs, are accomplished mathematicians of some renown. Perhaps the best real-life example of a mathematician-musician was Albert Einstein, who was also an accomplished pianist and violinist. His favorite composers? You guessed it: Bach and Mozart. He later said, “If … I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.”

For additional details on Bach’s “mathematical” style, some interesting speculations on golden ratio patterns in Bach’s music, as well as a listing of number of particularly listenable works, with audio links, see this previously published Math Scholar article: Bach as mathematician.

In February 2024, a team of researchers from the University of Pennsylvania, Yale and Princeton in the U.S. published a study describing their efforts to analyze Bach’s music using network theory and information theory. As the authors explain,

Bach is a natural case study given his prolific career, the wide appreciation his compositions have garnered, and the influence he had over contemporaneous and subsequent composers. His diverse compositions (from chorales to fugues) for a wide range of musicians (from singers to orchestra members) often share a fundamental underlying structure of repeated—and almost mathematical—musical themes and motifs. These features of Bach’s compositions make them particularly interesting to study using a mathematical framework.

The authors included a wide range of Bach compositions in their study, including some preludes and fugues from the *Well-Tempered Clavier* suite, two- and three-part inventions, a selection of Bach’s cantatas, the English suites, the French suites, some chorales, the Brandenburg concertos, and various toccatas and concertos.

Overall, their results have confirmed that Bach’s works have a high information content, and further that different subsets of works have distinct characteristics.

Here is an outline of their methodology: After collecting digitized versions of the above musical selections, they represented each note as a node in a network, with notes from different octaves as distinct nodes. A transition from note A to note B is represented as a directed edge from A to B. Chords are represented with edges between all notes in the first chord to all notes in the second chord. A graphical representation of this process is shown below. To the right of this illustration is the result of this process for four specific Bach compositions: (a) the chorale “Wir glauben all an einen Gott” (BWV 437); (b) Fugue 11 from the *Well-Tempered Clavier (WTC)*, Book I (BWV 856); (c) Prelude 9 from the *WTC*, Book II (BWV 878); and (d) Toccata in G major for harpsichord (BWV 916).

The information exhibited in these graphs was quantified as the Shannon entropy of a random walk on the network. In particular, the contribution of the $i$-th node to the entropy is:

$$S_i \; = \; – \sum_{j=1}^n P_{i,j} \log P_{i,j},$$

where $P_{i,j}$ is the transition probability of going from node $i$ to node $j$. Then the entropy of the entire network is a weighted sum of the $S_i$:

$$S \; = \; \sum_{i=1}^m w_i S_i \; = \; – \sum_{i=1}^m w_i \sum_{j=1}^n P_{i,j} \log P_{i,j},$$

where the weights $w_i$ are the stationary distribution probabilities: $w_i$ is the probability that a walker ends up at node $i$ after infinite time.

Some additional information and complete technical details are in the published paper.

The researchers found that, indeed, compared with some other musical compositions they had studied, the Bach pieces tended to have higher information content. What was even more interesting was that the researchers found significant differences between the different classes of Bach compositions:

The chorales, typically meant to be sung by groups in ecclesiastical settings, are shorter and simpler diatonic pieces that display a markedly lower entropy than the rest of the compositions studied. By contrast, the toccatas, characterized by more complex chromatic sections that span a wider melodic range, have a much higher entropy. It is possible that the chorales’ functions of meditation, adoration, and supplication are best supported by predictability and hence low entropy, whereas the entertainment functions of the toccatas and preludes are best supported by unpredictability and hence high entropy.

Full details of the results are given in the published paper.

The researchers have clearly identified a very interesting and very effective technique to analyze, classify and compare musical compositions. Numerous questions for future research could be asked, some of which were suggested by the authors themselves in their concluding section. Here are a few of these potential research questions, including some due to the present author:

- The authors found that Bach’s music networks had a higher number of transitive triangular clusters, enabling them to be learned more efficiently than arbitrary transition structures. Are pieces with a larger number of these triangles also more appealing to a listener?
- How effective are these techniques in analyzing other classical composers?
- How has the information content of a specific composer changed over time?
- Within a single genre such as classical music, how has the information content of the music changed over time?
- How effective are these techniques in analyzing other genres of music, such as modern jazz, hip-hop and country?
- How do these results compare for various non-Western music genres, such traditional Japanese music (ongaku), Cantonese opera, African tribal music, Tibetan throat singing and Scottish piobaireachd?
- How do human perceptions of music correlate with these measures?
- Do these results offer any insights into the human psychology of musical experience, such as the fundamental question of why humans have evolved to perform and value music?

Clearly an exciting future lies ahead in the realm of fusing mathematical network and information theory with the fine arts.

]]>Reproducibility has emerged as a major issue in numerous fields of scientific research, ranging from psychology, sociology, economics and finance to biomedicine, scientific computing and physics. Many of these difficulties arise from experimenter bias (also known as “selection bias”): consciously or unconsciously excluding, ignoring or adjusting certain data that do not seem to be in agreement with one’s preconceived hypothesis; or devising statistical tests post-hoc, namely AFTER data has already been

Continue reading Overcoming experimenter bias in scientific research

]]>Reproducibility has emerged as a major issue in numerous fields of scientific research, ranging from psychology, sociology, economics and finance to biomedicine, scientific computing and physics. Many of these difficulties arise from experimenter bias (also known as “selection bias”): consciously or unconsciously excluding, ignoring or adjusting certain data that do not seem to be in agreement with one’s preconceived hypothesis; or devising statistical tests post-hoc, namely AFTER data has already been collected and partially analyzed. By some estimates, at least 75% of published scientific papers in some fields contain flaws of this sort that could potentially compromise the conclusions.

Many fields are now undertaking aggressive steps to address such problems. As just one example of many that could be listed, the American pharmaceutical-medical device firm Johnson and Johnson has agreed to make detailed clinical data available to outside researchers, covering past and planned trials, in an attempt to avoid the temptation to only publish results of tests with favorable outcomes.

For further background details, see John Ioannidis’ 2014 article, Sonia Van Gilder Cooke’s 2016 article, Robert Matthews’ 2017 article, Clare Wilson’s 2022 article and Benjamin Mueller’s 2024 article, as well as this previous Math Scholar article, which summarizes several case studies.

Physics, thought by some to be the epitome of objective physical science, has not been immune to experimenter bias. One classic example is the history of measurements of the speed of light. The chart on the right shows numerous published measurements, dating from the late 1800s through 1983, when experimenters finally agreed to the international standard value 299,792,458 meters/second. (Note: Since 1983, the meter has been defined as the distance traveled by light in vacuum in 1/299,792,458 seconds.)

As can be seen in the chart, during the period 1870-1900, the prevailing value of the speed of light was roughly 299,900,000 m/s. But beginning with a measurement made in 1910 through roughly 1950, the prevailing value centered on 299,775,000 m/s. Finally, beginning in the 1960s, a new series of very carefully controlled experiments settled on the current value, 299,792,458 m/s. It is important to note that the currently accepted value was several standard deviations *below* the prevailing 1870-1900 value, and was also several standard deviations *above* the prevailing 1910-1950 value.

Why? Evidently investigators, faced with divergent values, tended to reject or ignore values that did not comport with the then-prevailing consensus figure. Or, as Luke Caldwell explains, “[R]esearchers were probably unconsciously steering their data to fit better with previous values, even though those turned out to be inaccurate. It wasn’t until experimenters had a much better grasp on the true size of their errors that the various measurements converged on what we now think is the correct value.”

In a February 2024 Scientific American article, Luke Caldwell described his team’s work to measure the electron’s electric dipole moment (EDM), in an attempt to better understand why we are made of matter and not antimatter. He described in some detail the special precautions that his team took to avoid experimenter error, and, especially, to avoid consciously or unconsciously filtering the data to reach some preconceived conclusion. It is worth quoting this passage at length:

Another source of systematic error is experimenter bias. All scientists are human beings and, despite our best efforts, can be biased in our thoughts and decisions. This fallibility can potentially affect the results of experiments. In the past it has caused researchers to subconsciously try to match the results of previous experiments. … [Caldwell mentions the speed of light controversy above.]

To avoid this issue, many modern precision-measurement experiments take data “blinded.” In our case, after each run of the experiment, we programmed our computer to add a randomly generated number—the “blind”—to our measurements and store it in an encrypted file. Only after we had gathered all our data, finished our statistical analysis and even mostly written the paper did we have the computer subtract the blind to reveal our true result.

The day of the unveiling was a nerve-racking one. After years of hard work, our team gathered to find out the final result together. I had written a computer program to generate a bingo-style card with 64 plausible numbers, only one of which was the true result. The other numbers varied from “consistent with zero” to “a very significant discovery.” Slowly, all the fake answers disappeared from the screen one by one. It’s a bit weird to have years of your professional life condensed into a single number, and I questioned the wisdom of amping up the stress with the bingo card. But I think it became apparent to all of us how important the blinding technique was; it was hard to know whether to be relieved or disappointed by the vanishing of a particularly large result that would have hinted at new, undiscovered particles and fields but also contradicted the results of previous experiments.

Finally, a single value remained on the screen. Our answer was consistent with zero within our calculated uncertainty. The result was also consistent with previous measurements, building confidence in them as a collective, and it improved on the best precision by a factor of two. So far, it seems, we have no evidence that the electron has an EDM.

Caldwell’s data blinding technique is quite remarkable. If some variation of this methodology were adopted in other fields of research, at least some of the problems with experimenter bias could be avoided.

The field of finance is deeply afflicted by experimenter bias, because of backtest overfitting, namely the usage of historical market data to develop an investment model, strategy or fund, especially where many strategy variations are tried on the same fixed dataset. Note that this is clearly an instance of the post-hoc probability fallacy, which in turn is a form of experimenter bias (more commonly termed “selection bias” in finance). Backtest overfitting has long plagued the field of finance and is now thought to be the leading reason why investments that look great when designed often disappoint when actually fielded to investors. Models, strategies and funds suffering from this type of statistical overfitting typically target the random patterns present in the limited in-sample test-set on which they are based, and thus often perform erratically when presented with new, truly out-of-sample data.

As an illustration, the authors of this AMS Notices article show that if only five years of daily stock market data are available as a backtest, then no more than 45 variations of a strategy should be tried on this data, or the resulting “optimal” strategy will be overfit, in the specific sense that the strategy’s Sharpe Ratio (a standard measure of financial return) is likely to be 1.0 or greater just by chance, even though the true Sharpe Ratio may be zero or negative.

The potential for backtest overfitting in the financial field has grown enormously in recent years with the increased utilization of computer programs to search a space of millions or even billions of parameter variations for a given model, strategy or fund, and then to select only the “optimal” choice for academic journal publication or market implementation. The sobering consequence is that a significant portion of the models, strategies and funds employed in the investment world, including many of those marketed to individual investors, may be merely statistical mirages.

Some commonly used techniques to compensate for backtest overfitting, if not used correctly, are also suspect. One example is the “hold-out method” — developing a model or investment fund based on a backtest of a certain date range, then checking the result with a different date range. However, those using the hold-out method may iteratively tune the parameters for their model until the score on the hold-out data, say measured by a Sharpe ratio, is impressively high. But these repeated tuning tests, using the same fixed hold-out dataset, are themselves tantamount to backtest overfitting.

One dramatic visual example of backtest overfitting is shown in the graph at the right, which displays the mean excess return (compared to benchmarks) of newly minted exchange-traded index-linked funds, both in the months of design prior to submission to U.S. Securities and Exchange Commission for approval, and in the months after the fund was actually fielded. The “knee” in the graph at 0 shows unmistakably the difference between statistically overfit designs and actual field experience.

For additional details, see How backtest overfitting in finance leads to false discoveries, which is condensed from this freely available SSRN article.

The p-test, which was introduced by the British statistician Ronald Fisher in the 1920s, assesses whether the result of an experiment is more extreme than what would one have given the null hypothesis. However, the p-test, especially when used alone, has significant drawbacks, as Fisher himself warned. To begin with, the typically used level of p = 0.05 is not a particularly compelling result. In any event, it is highly questionable to reject a result if its p-value is 0.051, whereas to accept it as significant if its p-value is 0.049.

The prevalence of the classic p = 0.05 value has led to the egregious practice that Uri Simonsohn of the University of Pennsylvania has termed p-hacking: proposing numerous varied hypotheses until a researcher finds one that meets the 0.05 level. Note that this is a classic multiple testing fallacy of statistics, which in turn is a form of experimenter bias: perform enough tests and one is bound to pass any specific level of statistical significance. Such suspicions are justified given the results of a study by Jelte Wilcherts of the University of Amsterdam, who found that researchers whose results were close to the p = 0.05 level of significance were less willing to share their original data than were others that had stronger significance levels (see also this summary from Psychology Today).

Along this line, it is clear that a sole focus on p-values can muddle scientific thinking, confusing significance with size of the effect. For example, a 2013 study of more than 19,000 married persons found that those who had met their spouses online are less likely to divorce (p < 0.002) and more likely to have higher marital satisfaction (p < 0.001) than those who met in other ways. Impressive? Yes, but the divorce rate for online couples was 5.96%, only slightly down from 7.67% for the larger population, and the marital satisfaction score for these couples was 5.64 out of 7, only slightly better than 5.48 for the larger population (see also this Nature article).

Perhaps a more important consideration is that p-values, even if reckoned properly, can easily mislead. Consider the following example, which is taken from a paper by David Colquhoun: Imagine that we wish to screen persons for potential dementia. Let’s assume that 1% of the population has dementia, and that we have a test for dementia that is 95% accurate (i.e., it is accurate with p = 0.05), in the sense that 95% of persons without the condition will be correctly diagnosed, and assume also that the test is 80% accurate for those who do have the condition. Now if we screen 10,000 persons, 100 presumably will have the condition and 9900 will not. Of the 100 who have the condition, 80% or 80 will be detected and 20 will be missed. Of the 9900 who do not, 95% or 9405 will be cleared, but 5% or 495 will be incorrectly tested positive. So out of the original population of 10,000, 575 will test positive, but 495 of these 575, or 86%, are false positives.

Needless to say, a false positive rate of 86% is disastrously high. Yet this is entirely typical of many instances in scientific research where naive usage of p-values leads to surprisingly misleading results. In light of such problems, the American Statistical Association (ASA) has issued a Statement on statistical significance and p-values. The statement concludes:

Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.

Several suggestions for overcoming experimenter bias have been mentioned in the text above, and numerous others are in the linked references above. There is no one silver bullet. What is clear is that researchers from all fields of research need to take experimenter bias and the larger realm of systematic errors more seriously. As Luke Caldwell wrote in the *Scientific American article* mentioned above:

]]>Gathering the data set was the quick part. The real challenge of a precision experiment is the time spent looking for systematic errors — ways we might convince ourselves we had measured an eEDM when in fact we had not. Precision-measurement scientists take this job very seriously; no one wants to declare they have discovered new particles only to later find out they had only precisely measured a tiny flaw in their apparatus or method. We spent about two years hunting for and understanding such flaws.

Continue reading Progress in troubled times: Good news in science, medicine and society

]]>Nonetheless, progress is still being made on many fronts. If anything, 2023 produced a bumper crop of startling new advances, particularly in health, medicine, physics, astronomy, computer science and artificial intelligence. Many of these advances, in turn, are based on the unstoppable forward march of Moore’s Law and computer technology — since 2000, aggregate memory and computer power of comparable devices have increased by factors of approximately 1,000,000. Genome sequencing technology has advanced even faster — sequencing a full human genome cost $3 billion in 2003, but currently costs only about $300, a 10,000,000-fold reduction.

Further, contrary to popular perception, progress has also been made in several metrics of social well-being. For example, after rising worldwide during the Covid-19 pandemic, crime rates fell sharply in most major cities in 2023. And global poverty continues its remarkable decline, with over 100,000 persons escaping dire poverty *per day* (on average) since the 1980s.

Granted, as emphasized above, progress has certainly not been universal, and many dire challenges remain, but we should not lose sight of the progress that has been achieved. Here are just a few of the many items of progress worth celebrating, most of them in the past year (2023). Numerous others are listed HERE, from which part of the items below were gleaned.

- Sickle cell CRISPR ‘cure’ is the start of a revolution in medicine.
- WHO announces drop in malaria infections, deaths after vaccine rollout.
- India witnessed 85.1% decline in malaria cases & 83.36% decline in deaths during 2015-2022.
- Genetically modifying T-cells cuts blood cancer progression by 74%, trial finds.
- Lung cancer pill cuts risk of death by half, says ‘thrilling’ study.
- Huge leap in breast cancer survival rate [English women diagnosed in 1993–99 had a 14.4% risk of dying within 5 years; this fell to 4.9% for women diagnosed in 2010–15].
- Some deaf children in China can hear after gene therapy treatment.
- CRISPR gene editing shown to permanently lower high cholesterol.
- CRISPR 2.0: a new wave of gene editors heads for clinical trials.
- Kenya achieves remarkable 68% decline in AIDS-related fatalities.
- ‘A great day for the country’: Uganda declares an end to Ebola outbreak.
- FDA approves first vaccine for RSV, a moment six decades in the making.
- A new class of drugs for weight loss could end obesity.
- Drug startup aims to cure blindness by developing medications in space.
- Moderna’s mRNA cancer vaccine works even better than thought.
- US life expectancy rebounded in 2022 but not back to pre-pandemic levels.

- DeepMind AI creates algorithms that sort data faster than those built by people.
- A.I. Is Coming for Mathematics, Too.
- New AI translates 5,000-year-old cuneiform tablets instantly.
- The race of the AI labs heats up.
- Nvidia predicts AI models one million times more powerful than ChatGPT within 10 years.
- Teachers and students warm up to ChatGPT.
- AI Reduces Timescale of Early Drug Development to Weeks.
- To Teach Computers Math, Researchers Merge AI Approaches.
- A Very Big Small Leap Forward in Graph Theory.
- Hobbyist Finds Math’s Elusive ‘Einstein’ Tile.

- Amazing images from James Webb telescope, two years after launch.
- The Cosmos Is Thrumming With Gravitational Waves, Astronomers Find.
- JWST Discovers Enormous Distant Galaxies That Should Not Exist.
- Mirror-Image Supernova Yields Surprising Estimate of Cosmic Growth.
- Dark Matter Hunters Need Fresh Answers.
- A New Experiment Casts Doubt on the Leading Theory of the Nucleus.
- The Story of Our Universe May Be Starting to Unravel.
- A Possible Crisis in the Cosmos Could Lead to a New Understanding of the Universe.
- The Early Universe Was Bananas.

- Genome Editing Used to Create Disease Resistant Rice.
- Scientists Unlock Key to Drought-Resistant Wheat Plants with Longer Roots.
- Climate change: Fossil fuel emissions from electricity set to fall.
- Bhutan announces a “milestone achievement” with a 39.5% increase in snow leopard numbers.
- Tiger populations grow in India and Bhutan.

- After Rise in Murders During the Pandemic, a Sharp Decline in 2023.
- The Post-Pandemic Murder Wave Is Cresting.
- Crime falls to lowest level on record, ONS [UK’s Office for National Statistics] says.
- March 2023 global poverty update from the World Bank [global poverty, namely US$2.15 per person per day or less in 2017 dollars, has declined from 38% of world population in 1990 to 8.5% in 2023].
- U.S. Approval of Interracial Marriage at New High of 94% [in 1958, only 4% of Americans approved of marriage between Black and White people; in 2023, 94% approved].
- 6 ways the lives of girls are different today than they were a decade ago [more education, fewer pregnancies, fewer HIV infections, fewer child marriages and more].
- Progress on girls’ access to education [50 million more girls have been enrolled in school globally since 2015].

While out hiking, I found this rock. The following table gives measurements made on the rock. The first two rows give the overall length and width of the rock. Each of the next six rows, after the first two, gives thickness measurements, made on a 3cm x 6cm grid of points from the top surface. All measurements are in millimeters:

Measurement or row Column 1 Column 2 Column 3 Length 105.0 Width 48.21 Row 1 35.44 35.38 36.54 Row 2 38.06 38.27

Continue reading Aliens made this rock: The post-hoc probability fallacy in biology, finance and cosmology

]]>While out hiking, I found this rock. The following table gives measurements made on the rock. The first two rows give the overall length and width of the rock. Each of the next six rows, after the first two, gives thickness measurements, made on a 3cm x 6cm grid of points from the top surface. All measurements are in millimeters:

Measurement or row | Column 1 | Column 2 | Column 3 |

Length | 105.0 | ||

Width | 48.21 | ||

Row 1 | 35.44 | 35.38 | 36.54 |

Row 2 | 38.06 | 38.27 | 38.55 |

Row 3 | 38.02 | 39.53 | 39.29 |

Row 4 | 38.66 | 40.50 | 41.96 |

Row 5 | 39.40 | 43.48 | 43.31 |

Row 6 | 39.58 | 41.83 | 43.07 |

This is a set of 20 measurements, each given to four significant figures, for a total of 80 digits. Among all rocks of roughly this size, the probability of a rock appearing with this particular set of measurements is thus one in 10^{80}. Note that this probability is so remote that even if the surfaces of each of the ten planets estimated to orbit each of 100 billion stars in the Milky Way were examined in detail, and this were repeated for each of the estimated 100 billion galaxies in the visible universe, it is still exceedingly unlikely that a rock with this exact set of measurements would ever be found. Thus this rock could not have appeared naturally, and must have been created by space aliens…

What is the fallacy in the above argument? First of all, modeling each measurement as a random variable of four equiprobable digits, and then assuming all measurements are independent (so that we can blithely multiply probabilities) is a very dubious reckoning. In real rocks, the measurement at one point is constrained by physics and geology to be reasonably close to that of nearby points. Presuming that every instance in the space of 10^{80} theoretical digit strings is equally probable as a set of rock measurements is an unjustified and clearly invalid assumption. Thus the above reckoning must be rejected on this basis alone.

**Key point:** It is important to note that the post-hoc fallacy does not just weaken a probability or statistical argument. *In most cases, the post-hoc fallacy completely nullifies the argument*. The correct reckoning is “What is the probability of X occurring, given that X has been observed to occur,” which of course is unity. In other words, the laws of probability, when correctly applied to a post-hoc phenomenon, can say nothing one way or the other about the likelihood of the event.

In general, probability reckonings based solely or largely on combinatorial enumerations of theoretical possibilities have no credibility when applied to real-world problems. And reckonings where a statistical test or probability calculation is devised or modified *after the fact* are invalid and should be immediately rejected — some outcome had to occur, and the laws of probability by themselves cannot help to determine its likelihood.

One example of the post hoc probability fallacy in biology can be seen in the attempt by some evolution skeptics (see this Math Scholar article for some references) to claim that the human alpha-globin molecule, a component of hemoglobin that performs a key oxygen transfer function in blood, could not have arisen by “random” evolution. These writers argue that since human alpha-globin is a protein chain based on a sequence of 141 amino acids, and since there are 20 different amino acids common in living systems, the “probability” of selecting human alpha-globin at random is one in 20^{141}, or one in approximately 10^{183}. This probability is so tiny, so they argue, that even after millions of years of random molecular trials covering the entire Earth’s surface, no human alpha-globin protein molecule would ever appear.

But this line of reasoning is a dead-ringer for the post-hoc probability fallacy. Note that this probability reckoning was performed *after the fact*, on a single very limited dataset (the human alpha-globin sequence) that has been known in the biology literature for decades. Some sequence had to appear, and the fact that the particular sequence found today in humans was the end result of the course of evolution provides no guidance, by itself, as to the probability of its occurrence.

Further, just like the rock measurements above, the enumeration of combinatorial possibilities in this calculation, devoid of empirical data, has no credibility. Some alpha-globin sequences may be highly likely to be realized while others may be biologically impossible. It may well be that there is nothing special whatsoever about the human alpha-globin sequence, as evidenced, for example, by the great variety in alpha-globin molecules seen across the biological kingdom, all of which perform a similar oxygen transfer function. But there is no way to know for sure, since we have only one example of human evolution. Thus the reckoning used here, namely enumerating combinatorial possibilities and taking the reciprocal to obtain a probability figure, is unjustified and invalid.

In any event, it is clear that the impressive-sounding probability figure claimed by these writers is a vacuous arithmetic exercise, with no foundation in real empirical biology. For a detailed discussion of these issues, see Do probability arguments refute evolution?.

The field of finance is deeply afflicted by the post-hoc fallacy, because of “backtest overfitting,” namely the usage of historical market data to develop an investment model, strategy or fund, where many variations are tried on the same fixed dataset (note that this is unavoidably a post-hoc reckoning). Backtest overfitting has long plagued the field of finance and is now thought to be the leading reason why investments that look great when designed often disappoint when actually fielded to investors. Models, strategies and funds suffering from this type of statistical overfitting typically target the random patterns present in the limited in-sample test-set on which they are based, and thus often perform erratically when presented with new, truly out-of-sample data.

As an illustration, the authors of this AMS Notices article show that if only five years of daily stock market data are available as a backtest, then no more than 45 variations of a strategy should be tried on this data, or the resulting strategy will be overfit, in the specific sense that the strategy’s Sharpe Ratio (a standard measure of financial return) is likely to be 1.0 or greater just by chance, even though the true Sharpe Ratio may be zero or negative.

The potential for backtest overfitting in the financial field has grown enormously in recent years with the increased utilization of computer programs to search a space of millions or even billions of parameter variations for a given model, strategy or fund, and then to select only the “optimal” choice for publication or market implementation. Compounding the problem is the failure of most academic journals in the field to require authors to disclose the extent of their computer search and optimization. The sobering consequence is that a significant portion of the models, strategies and funds employed in the investment world, including many of those marketed to individual investors, may be merely statistical mirages.Some commonly used techniques to compensate for backtest overfitting, if not used correctly, are themselves tantamount to overfitting. One example is the “hold-out method” — developing a model or investment fund based on a backtest of a certain date range, then checking the result with a different date range. However, those using the hold-out method may iteratively tune the parameters for their model until the score on the hold-out data, say measured by a Sharpe ratio, is impressively high. But these repeated tuning tests, using the same fixed hold-out dataset, are themselves tantamount to backtest overfitting (and thus subject to the post-hoc fallacy).

One dramatic visual example of backtest overfitting is shown in the graph at the right, which displays the mean excess return (compared to benchmarks) of newly minted exchange-traded index-linked funds, both in the months of design prior to submission to SEC for approval, and in the months after the fund was fielded. The “knee” in the graph at 0 shows unmistakably the difference between statistically overfit designs and actual field experience.

For additional details, see How backtest overfitting in finance leads to false discoveries, which appeared in the December 2021 issue of the British journal *Significance*. This *Significance* article is condensed from the following manuscript, which is freely available from SSRN: Finance is Not Excused: Why Finance Should Not Flout Basic Principles of Statistics.

Most readers are likely familiar with Fermi’s paradox: There are at least 100 billion stars in the Milky Way, with many if not most now known to host planets. Once a species achieves a level of technology that is at least equal to ours, so that they have become a space-faring civilization, within a few million years (an eyeblink in cosmic time) they or their robotic spacecraft could fully explore the Milky Way. So why do we not see any evidence of their existence, or even of their exploratory spacecraft? For additional details, see this Math Scholar article.

Most “solutions” to Fermi’s paradox, including, sadly, some promulgated by prominent researchers who should know better, are easily defeated. For example, explanations such as “They are under strict orders not to communicate with a new civilization such as Earth,” or “They have lost interest in scientific research, exploration and expansion,” or “They have no interest in a primitive, backward society such as ours,” fall prey to a diversity argument: In any vast, diverse society (an essential prerequisite for advanced technology), there will be exceptions and nonconformists to any rule. Thus claims that “all extraterrestrials are like X” have little credibility, no matter what “X” is. It is deeply ironic that while most scientific researchers and others would strenuously reject stereotypes of religious, ethnic or national groups in human society, many seem willing to hypothesize sweeping, ironclad stereotypes for extraterrestrial societies.

Similarly, dramatic advances in human technology over the past decade, including new space exploration vehicles, new energy sources, state-of-the-art supercomputers, quantum computing, robotics and artificial intelligence, are severely undermining the long-standing assumption that space exploration is fundamentally too difficult.

One of the remaining explanations is that Earth is an exceedingly rare planet (possibly unique in the Milky Way), with a lengthy string of characteristics fostering a long-lived biological regime, which enabled the unlikely rise of intelligent life. John Gribbin, for example, writes “They are not here, because they do not exist. The reasons why we are here form a chain so improbable that the chance of any other technological civilization existing in the Milky Way Galaxy at the present time is vanishingly small.” [Gribbin2011, pg. 205]. Among other things, researchers note that:

- Very few (or possibly none) of the currently known exoplanets are likely to be truly habitable in
*all*key factors (temperate, low radiation, water, dry land, position in galaxy, etc.). Exoplanets around brown dwarfs, for example, would be frequently bathed in sterilizing radiation. Thus Earth’s role as a cradle for life may well have been exceedingly rare. - Even given a habitable environment, the origin of life on Earth is not well understood, in spite of decades of study, so it might well have been an exceedingly improbable event not repeated anywhere else in the Milky Way.
- Earth’s
*continual*habitability over four billion years, facilitated in part by plate tectonics, geomagnetism and the ozone shield, might well be exceedingly rare. - Even after life started, numerous key steps (photosynthesis, complex cells, complex structures) were required before advanced life could appear on Earth; each of these required many millions or even billions of years, suggesting that they may have been highly improbable.
- Even after the rise of complex creatures, the evolution of human-level intelligence may well be a vanishingly rare event; on Earth this happened only once among numerous continental “experiments” over the past 100 million years.

Note that any one of these five items may constitute a “great filter,” i.e. a virtually insuperable obstacle. Compounded together they suggest that the origin and rise of human-level technological life on Earth may well have been an exceedingly singular event in the cosmos, unlikely to be repeated anywhere else in the Milky Way if not beyond. Thus the “rare Earth” explanation of Fermi’s paradox is growing in credibility.

On the other hand, analyses in this arena are hobbled by the post-hoc probability fallacy: We only have one real data point, namely the rise of human technological civilization on a single planet, Earth, and so we have no way to rigorously assess the probability of our existence. It may be that the origin and rise of intelligent life is inevitable on any suitably hospitable planet, and we just need to keep trying to finally contact another similarly advanced species. Or it may be that the rise of a species with enough intelligence and technology to rigorously pose the question of its existence is merely a post-hoc selection effect — if we didn’t live on a hospitable planet that harbored the origin of life and was *continuously* habitable over billions of years, thus allowing the evolution of life that progressed to advanced technology, we would not be here to pose the question.

As Charles Lineweaver of the Australian National University has observed [Lineweaver2024b]:

Our existence on Earth can tell us little about the probability of the evolution of human-like intelligence in the Universe because even if this probability were infinitesimally small and there were only one planet with the kind of intelligence that can ask this question, we the question-askers would, of necessity, find ourselves on that planet.

For additional details, see Where are the extraterrestrials? Fermi’s paradox, diversity and the origin of life.

The dilemma of whether our existence is “special” in some sense extends to the universe itself.

For several decades, researchers in the fields of physics and cosmology have puzzled over deeply perplexing indications, many of them highly mathematical in nature, that the universe seems inexplicably well-tuned to facilitate the evolution of atoms, complex molecular structures and sentient creatures. Some of these “cosmic coincidences” include the following (these and numerous others are presented and discussed in detail in a recent book by Lewis and Barnes):

- If the strong force were slightly stronger or slightly weaker (by just 1% in either direction), then there would be no carbon or any heavier elements anywhere in the universe, and thus no carbon-based life forms to contemplate this intriguing fact.
- Had the weak force been somewhat weaker, the amount of hydrogen in the universe would be greatly decreased, starving stars of fuel for nuclear energy and leaving the universe a cold and lifeless place.
- The neutron’s mass is very slightly more than the combined mass of a proton, an electron and a neutrino. But if its mass were lower by 1%, then all isolated protons would decay into neutrons, and no atoms other than hydrogen, helium, lithium and beryllium could form.
- There is a very slight anisotropy in the cosmic microwave background radiation (roughly one part in 100,000), which is just enough to permit the formation of stars and galaxies. If this anisotropy had been slightly larger or smaller, stable planetary systems could not have formed.
- The cosmological constant paradox derives from the fact that when one calculates, based on known principles of quantum mechanics, the “vacuum energy density” of the universe, one obtains the incredible result that empty space “weighs” 10
^{93}grams per cubic centimeter, which is in error by 120 orders of magnitude from the observed level. Physicists thought that perhaps when the contributions of the other known forces are included, all terms will cancel out to exactly zero as a consequence of some heretofore unknown physical principle. But these hopes were shattered with the 1998 discovery that the expansion of the universe is accelerating, which implies that the cosmological constant must be slightly positive. Curiously, this observation is in accord with a prediction made by physicist Steven Weinberg in 1987, who argued from basic principles that the cosmological constant must be nonzero but within one part in roughly 10^{120}of zero, or else the universe either would have dispersed too fast for stars and galaxies to have formed, or would have recollapsed upon itself long ago. Numerous “solutions” have been proposed for the cosmological constant paradox (Lewis and Barnes mention eight — see pg. 163-164), but they all fail, rather miserably. - A similar coincidence has come to light recently in the wake of the 2012 discovery of the Higgs boson at the Large Hadron Collider. Higgs was found to have a mass of 126 billion electron volts (i.e., 126 Gev). However, a calculation of interactions with other known particles yields a mass of some 10
^{19}Gev. This means that the rest mass of the Higgs boson must be almost exactly the negative of this enormous number, so that when added to 10^{19}gives 126 Gev, as a result of massive and unexplained cancelation. This difficulty is known as the “hierarchy” problem. - General relativity allows the space-time fabric of the universe to be open (extending forever, like an infinite saddle), closed (like the surface of a sphere), or flat. The latest measurements confirm that the universe is flat to within 1%. But looking back to the first few minutes of the universe at the big bang, this means that the universe must have been flat to within one part in 10
^{15}. The cosmic inflation theory was proposed by Alan Guth and others in the 1970s to explain this and some other phenomena, but recently even some of inflation’s most devoted proponents have acknowledged that the theory is in deep trouble and may have to be substantially revised. - The overall entropy (disorder) of the universe is, in the words of Lewis and Barnes, “freakishly lower than life requires.” After all, life requires, at most, one galaxy of highly ordered matter to create chemistry and life on a single planet. Extrapolating back to the big bang only deepens this puzzle.

Numerous attempts at explanations have been proposed over the years to explain these difficulties. One of the more widely accepted hypotheses is the multiverse, combined with the anthropic principle. The theory of inflation, mentioned above, suggests that our universe is merely one pocket that separated from many others in the very early universe. Similarly, string theory suggests that our universe is merely one speck in an enormous landscape of possible universes, by one count 10^{500} in number, each corresponding to a different Calabi-Yau manifold.

Thus, the thinking goes, we should not be surprised that we find ourselves in a universe that has somehow beaten the one-in-10^{120} odds to be life-friendly (to pick just the cosmological constant paradox), because it had to happen somewhere, and, besides, if our universe were not life-friendly, then we would not be here to talk about it. In other words, these researchers propose that the multiverse (or the “cosmic landscape”) actually exists in some sense, but acknowledge that the vast majority of these universes are utterly sterile — either very short-lived or else completely devoid of atoms or other structures, much less sentient living organisms like us contemplating the meaning of their existence. We are just lucky.

But other researchers are very reluctant to adopt such reasoning. After all, we have no evidence of these other universes, and as yet we have no conceivable means of rigorously calculating the “probability” of any possible universe outcome, including ours (this is known as the “measure problem” of cosmology). As with Fermi’s paradox, by definition we have only one universe to observe and analyze, and thus, by the post-hoc probability fallacy, simple-minded attempts to reckon “probabilities” are doomed to failure.

For additional details, see Is the universe fine-tuned for intelligent life?.

A good introduction to the post-hoc probability fallacy for the general reader can be found in social scientist Steven Pinker’s new book Rationality: What It Is, Why It Seems Scarce, Why It Matters. Pinker likens the post-hoc fallacy to the following joke:

]]>A man tries on a custom suit and says to the tailor, “I need this sleeve taken in.” The tailor says, “No, just bend your elbow like this. See, it pulls up the sleeve.” The customer says, “Well, OK, but when I bend my elbow, the collar goes up the back of my neck.” The tailor says, “So? Raise your head up and back. Perfect.” The man says, “But now the left shoulder is three inches lower than the right one!” The tailor says, “No problem. Bend at the waist and then it evens out.” The man leaves the store wearing the suit, his right elbow sticking out, his head craned back, his torso bent to the left, walking with a herky-jerky gait. A pair of pedestrians pass him by. The first says, “Did you see that poor disabled guy? My heart aches for him.” The second says, “Yeah, but his tailor is a genius — the suit fits him perfectly!”

The operation of sorting a dataset is one of the most fundamental of all operations studied in computer science. Literally trillions of sort operations are performed each day worldwide, more if one counts operations where a relatively small set of elements are merged into a larger set.

Many sorting algorithms are in use, including special routines for datasets of a certain size, and other routines optimized for specific hardware platforms and types of data.

One relatively simple algorithm, which is actually quite efficient, is the “quicksort” algorithm. It

Continue reading DeepMind program discovers new sorting algorithms

]]>The operation of sorting a dataset is one of the most fundamental of all operations studied in computer science. Literally trillions of sort operations are performed each day worldwide, more if one counts operations where a relatively small set of elements are merged into a larger set.

Many sorting algorithms are in use, including special routines for datasets of a certain size, and other routines optimized for specific hardware platforms and types of data.

One relatively simple algorithm, which is actually quite efficient, is the “quicksort” algorithm. It may be simply stated as follows: Given a set of elements (integers, real numbers or alphabetical data, for sake of illustration), select some element as the “pivot.” It typically does not matter much which one is selected — some implementations select an element at random. Then partition the set, moving all elements that compare less than or equal to the pivot to the left, and all those that compare greater than the pivot to the right. Now apply this same scheme to each of the two subsets, continuing until all sets have only one element, so that the set is completely ordered. See the illustration of the quicksort algorithm to the right.

Before continuing, it is worth reviewing some of the remarkable successes achieved by the DeepMind organization, a research subsidiary of Alphabet, the parent company of Google, headquartered in London, UK.

The ancient Chinese game of Go is notoriously complicated, with strategies that can only be described in vague, subjective terms. Many observers did not expect Go-playing computer programs to beat the best human players for many years, if ever. Then in May 2017, a computer program named “AlphaGo,” developed by DeepMind, defeated Ke Jie, a 19-year-old Chinese Go master thought to be the world’s best human Go player [NY Times]. Later that same year, a new DeepMind program named “AlphaGo Zero” defeated the earlier Alpha Go program 100 games to zero. After 40 days of training by playing games with itself, AlphaGo Zero was as far ahead of Ke Jie as Ke Jie is ahead of a good amateur player [Quanta Magazine].Proteins are the workhorses of biology. Examples in human biology include actin and myosin, the proteins that enable muscles to work, and hemoglobin, the basis of red blood that carries oxygen to cells. Each protein is specified as a string of amino acids (A, C, T or G), typically several thousand long. Sequencing proteins is now fairly routine, but the key to biology is the three-dimensional shape of the protein — how a protein “folds” [Nature]. Protein shapes can be investigated experimentally, using x-ray crystallography, but this is an expensive, error-prone and time-consuming laboratory operation, so in recent years researchers have been pursuing entirely computer-based solutions.

Given the daunting challenge and importance of the protein folding problem, in 1994 a community of researchers in the field organized a biennial competition known as Critical Assessment of Protein Structure Prediction (CASP) [CASP]. In 2018, a program called AlphaFold, developed by DeepMind, won the competition. For the 2020 CASP competition, the DeepMind team developed a new program, known as AlphaFold 2 [AlphaFold 2], which achieved a 92% average score, far above the 62% achieved by the second-best program in the competition. “It’s a game changer,” exulted German biologist Andrei Lupas. “This will change medicine. It will change research. It will change bioengineering. It will change everything.” [Scientific American].

Building on these earlier successes, in 2022 researchers at DeepMind turned their tools to the analysis of matrix multiplication algorithms. They formulated the problem as a game, called TensorGame, where at each step the player selects how to combine different entries of the matrices to produce a matrix product. The scheme assigns a score for each selection, based on the number of operations required to reach a correct result.

To find optimal solutions to this problem, the researchers applied a “deep reinforcement learning” algorithm. The resulting program, called AlphaTensor, is a specialized neural network system employing a similar underlying design as the above-mentioned Go-playing and protein folding projects. The scheme employs a single agent to decompose matrix multiplication tensors of different sizes, which permits “learned” matrix multiplication techniques to be shared across different “players.” Full details of the scheme are described in a October 2022 Nature article by the DeepMind researchers.

The AlphaTensor program quickly discovered many of the existing matrix multiplication algorithm, including Strassen’s original algorithm and numerous others. But it also discovered several new algorithms not previously known in the literature. For example, for the product of two $4n \times 4n$ matrices, the previously most efficient algorithm was simply to apply Strassen’s algorithm recursively twice. This approach yields the result in $49$ block multiplications of size $n \times n$. AlphaTensor found a scheme that produces the result in $47$ block multiplications (although this result is valid only for matrices with binary elements). The DeepMind paper presented this scheme in full — it required a full page of small type. Similarly, for the product of two $5n \times 5n$ matrices, AlphaTensor found a scheme that produces the result in $96$ block multiplications of size $n \times n$, compared with $98$ in the literature (with the same restriction as mentioned above for $4n \times 4n$ matrices). Several other new results are listed in their paper, including three cases where their algorithms are applicable for standard real or complex matrices as well as for binary matrices.

The DeepMind researchers note that their results are focused on finding algorithms that minimize the total number of element-wise arithmetic operations in a matrix multiplication task. But the same AlphaTensor technology could also be used to optimize for other criteria, such as numerical stability or energy usage. And more generally, their overall methodology could be applied to a broad range of other mathematical problems as well. For additional details, see this earlier Math Scholar article.

DeepMind’s latest achievement is even more remarkable: efficient algorithms for sorting and hashing that are in some cases even more efficient than the best human efforts. In particular, the DeepMind researchers applied their general strategies, used for example in AlphaZero, to sorting, leading to a system they have dubbed AlphaDev.

The DeepMind researchers first applied AlphaDev to very small datasets — only say five elements. The program combined both deliberation and “intuition” to select its moves, as in playing Go or some other board game. At each decision point, the program may select one of four types of actions: comparing values, moving values, or branching to a different part of the program. After each step, it evaluates its performance and receives a “reward” based on how many items in the dataset were sorted correctly. The program continues until either the dataset is completely sorted or the program exceeds some preset length limit.

Once the program mastered very small sets, the same approach was unleashed for larger sets.

The bottom line is that AlphaDev’s best algorithms achieved up to a whopping 71% savings in run time on certain problems, compared with the best human efforts. The researchers noted, however, that for large datasets the savings were more modest, typically only 1-2%.

The researchers were very pleasantly surprised at the outcome: Daniel Mankowitz of DeepMind, who led the research team, said, “We were a bit shocked. … We didn’t believe it at first.” (see Nature article). Emma Bruskill of Stanford University (who was not part of the effort) added, “This is an exciting result.”

The authors mentioned that their algorithms have been integrated into the LLVM standard C++ sort library, and thus are already being used worldwide. For additional details, see the Nature synopsis, or the complete Nature technical article.

As we noted in an earlier Math Scholar article, schemes based on machine learning methods are doing much more than finding new mathematical or computer science results. They are also driving cars, writing code, investing money, creating new digital art, and even writing essays.

One of the most startling recent advances is a language model program called “Generative Pre-Trained Transformer” (GPT-3 is currently in use; the latest version is GPT-4), developed by OpenAI, an organization established by some prominent Silicon Valley technologists. Given a prompt, this program can generate surprisingly realistic text on a wide variety of topics. For example, when given the prompt “Write an essay discussing the role of metafiction in the work of Italo Calvino,” GPT-3 produced a five-paragraph essay that started [NY Times]:

Italian author Italo Calvino is considered a master of metafiction, a genre of writing in which the author breaks the fourth wall to discuss the act of writing itself. For Calvino, metafiction is a way of exploring the nature of reality and the ways in which stories can shape our perceptions of the world. His novels often incorporate playful, labyrinthine structures that play with the boundaries between reality and fiction. In “If on a winter’s night a traveler,” for example, the reader is constantly interrupted by meta-level discussions of the act of reading and the nature of storytelling.

Needless to say, language models such as GPT-3 and GPT-4 have their detractors. They tend to be quite good with some prompts, but produce rather strange responses to others. In a previous Math Scholar article, we asked ChatGPT, which is based on GPT-3, to prove four well-known mathematical theorems:

- Most angles cannot be trisected with ruler and compass.
- $\pi$ is irrational.
- $\pi$ is transcendental.
- Every algebraic equation with integer coefficients has a root in the complex numbers.

While the responses were interesting, in each case they fell far short of a complete, rigorous proof. It is clear that for mathematical work, these systems still need improvement. Research mathematicians are in no danger of being unemployed.

In any event, these developments are raising some interesting questions as to how close we are to real artificial general intelligence [Economist]. As Australian cognitive scientist David Chalmers penned in April 2022, shortly after OpenAI released the earlier CPT-3 software [NY Times],

What fascinates me about GPT-3 is that it suggests a potential mindless path to artificial general intelligence. … It is just analyzing statistics of language. But to do this really well, some capacities of general intelligence are needed, and GPT-3 develops glimmers of them.

A 2011 Time article featured an interview with futurist Ray Kurzweil, who has predicted an era (he estimated by 2045), a “singularity,” when machine intelligence will meet, then transcend human intelligence [Time]. Such future intelligent systems will then design even more powerful technology, resulting in a dizzying advance that we can only dimly foresee at the present time [Kurzweil book].

Futurists such as Kurzweil certainly have their skeptics and detractors. But even setting aside technical questions, there are solid reasons to be concerned about the potential societal, legal, financial and ethical challenges of machine intelligence, as exhibited by the current backlash against science, technology and “elites.” As Kevin Roose writes in the New York Times, “We’re in a golden age of progress in artificial intelligence. It’s time to start taking its potential and risks seriously.”

Along this line, in March 2023 more than 1000 tech leaders and researchers signed an open letter urging a moratorium on the development of state-of-the-art AI systems, citing “profound risks to society and humanity.” Others have even claimed that the singularity is already here, or in other words that the genie is already out of the bottle.

However these disputes are settled, it is clear that in our headlong rush to explore technologies such as machine learning, artificial intelligence and robotics, we must find a way to humanely deal with those whose lives and livelihoods will be affected by these technologies. The very fabric of society may hang in the balance.

]]>Writers from the discipline known variously as “postmodern science studies” or “sociology of scientific knowledge” are often cited in discussions of science, philosophy and religion. Some of these writers, notably Karl Popper and Thomas Kuhn, have had significant impact on the field of scientific research.

Issues such as ensuring proper credit for the scientific contributions of non-Western societies (such as the ancient mathematics of India and China — see, for example, [Bailey2012]), as well as dealing with the chronic under-representation of women, racial minorities and indigenous

Continue reading Is modern science socially constructed and forever tentative?

]]>Updated 20 April 2023

Writers from the discipline known variously as “postmodern science studies” or “sociology of scientific knowledge” are often cited in discussions of science, philosophy and religion. Some of these writers, notably Karl Popper and Thomas Kuhn, have had significant impact on the field of scientific research.

Issues such as ensuring proper credit for the scientific contributions of non-Western societies (such as the ancient mathematics of India and China — see, for example, [Bailey2012]), as well as dealing with the chronic under-representation of women, racial minorities and indigenous people in scientific research, are certainly worth additional attention.

However, other sectors of the “postmodern” literature are very problematic, as we will see below.

Karl Popper, a British economist and philosopher, was struck by the differences in approach that he perceived at the time between the writings of some popular Freudians and Marxists, who saw “verifications” of their theories in every news report and clinical visit, and the writings of Albert Einstein, who for instance acknowledged that if the predicted red shift of spectral lines due to gravitation were not observed, then his general theory of relativity would be untenable. Popper was convinced that *falsifiability* was the key distinguishing factor, a view he presented in his oft-cited book *The Logic of Scientific Discovery* [Popper1959, pg. 40-41]:

I shall certainly admit a system as empirical or scientific only if it is capable of being tested by experience. These considerations suggest that not the verifiability but the falsifiability of a system is to be taken as the criterion of demarcation. … It must be possible for an empirical scientific system to be refuted by experience.

Popper’s ideas remain highly influential in scientific research even to the present day. As a single example, several prominent researchers have recently expressed concern about whether it is prudent to continue pursuing string theory, currently a leading candidate for a “theory of everything” in physics, given that string theorists have not yet been able to derive empirically testable consequences even after 25 years of effort. Physicist Lee Smolin, for example, writes, “A scientific theory that makes no predictions and therefore is not subject to experiment can never fail, but such a theory can never succeed either, as long as science stands for knowledge gained from rational argument borne out by evidence.” [Smolin2006, pg. 352].

However, Popper’s ideas do have some limitations, some of which were pointed out by Popper himself. To begin with, in most real modern-day scientific research, major theories are seldom falsified by a single experimental result. There are always questions regarding the underlying experimental design, measurement procedures, and data analysis techniques, not to mention statistical uncertainties. Often multiple follow-on studies, in some cases extending over many years, are necessary to conclusively decide the hypothesis one way or the other. For example, 13 years elapsed between 1998, when two teams of researchers discovered that the expansion of the universe is accelerating, and 2011, when the lead scientists of the two teams were awarded the Nobel Prize in physics.

For that matter, if we were to strictly apply Popper’s principle, Copernicus’ heliocentric theory was falsified from the start and should not have been further considered, because it could not predict planetary motions quite as accurately as the traditional Ptolemaic system. It only gained acceptance when Kepler modified the theory to include elliptical orbits with time-varying speeds, and when Newton showed that this behavior could be mathematically derived, using calculus, from his laws of motion. In a similar way, Newton’s theory was arguably falsified in the mid-19th century, when certain anomalies were noted in the orbit of Mercury. But it would have been irresponsible to discard Newtonian mechanics at that time, because of its overwhelming success in accurately explaining a vast array of phenomena.

In this sense, scientists are more like detectives, in that they must follow leads and hunches, examine evidence, and tentatively proceed with the most likely scenario. Seldom, if ever, are scientific results black-and-white from day one.

It must also be kept in mind that in most cases, “falsified” theories continue to be extremely accurate models of reality within appropriate domains. For example, even today, over 100 years after Newton’s mechanics and Maxwell’s electromagnetics were “falsified” and supplanted by new theories of physics (relativity and quantum mechanics, respectively), they remain the basis of almost all practical engineering and scientific computations, giving results virtually indistinguishable from those of more modern theories. Relativity corrections are employed in the GPS system, which is used by many automobiles and smartphones, and quantum mechanical calculations are employed in semiconductor design, materials science and computational chemistry. But in most other arenas of the modern world, the classical theories of physics are entirely satisfactory.

Thomas Kuhn’s work *The Structure of Scientific Revolutions* analyzed numerous historical cases of scientific advancements, and then concluded that in many cases, key paradigm shifts did not come easily [Kuhn1970]. Kuhn was actually trained as a scientist, receiving his Ph.D. in physics from Harvard in 1949. Thus he was able to bring significant scientific insight into his analyses of historical scientific revolutions.

One difficulty with Kuhn’s writings is that there are really two Kuhns: a moderate Kuhn and an immoderate Kuhn [Sokal1988a, pg. 75]. Unfortunately, many modern scholars like to quote only the immoderate Kuhn, such as when he denies that paradigm shifts carry scientists closer to fundamental truth [Kuhn1970, pg. 170], or when he argues that paradigm shifts often occur due to non-experimental factors [Kuhn1970, pg. 135].

Another difficulty is that Kuhn’s “paradigm shift” model has not worked as well in recent years as it did in the historical examples he cited. For example, the “standard model” of physics, the currently reigning fundamental theory of elementary particles and forces, was developed in the 1960s and early 1970s, and was completed in essentially its current form in 1974. Yet by 1980 it had completely displaced previous theories of particle physics, after a very orderly transition — even initial skeptics quickly recognized the new theory’s power, elegance and precision, and soon threw their support behind it [Tipler1994, pg. 88-89].

Kuhn’s writings, much as Popper’s writings before him, have been badly misused by a host of eager but ill-informed writers and scholars who think that they can smash the reigning orthodoxy of modern science. In a recently published interview of Kuhn by *Scientific American* writer John Horgan, Kuhn was deeply upset that he has become a patron saint to this type of would-be scientific revolutionaries: “I get a lot of letters saying, ‘I’ve just read your book, and it’s transformed my life. I’m trying to start a revolution. Please help me,’ and accompanied by a book-length manuscript.” Kuhn emphasized that in spite of the often iconoclastic way his writings have been interpreted, he remained “pro-science,” noting that science has produced “the greatest and most original bursts of creativity” of any human enterprise [Horgan2012].

Some writers have pointed out that even in the purest of scientific disciplines, namely mathematics and computer science, upon which all other scientific research is based, researchers have identified weaknesses and uncertainties. In particular, it has been known since Godel’s groundbreaking 1931 paper [Hawking2005, pg. 1089-1118] that the fundamental axioms of mathematics are incomplete (no consistent system of axioms can encompass all mathematical truths) and unprovable (no system of axioms can demonstrate its own consistency); and it has been known since Alan Turing’s 1937 paper that no computer program can be devised that can infallibly provide a yes/no answer to all questions that can be posed computationally [Hawking2005, pg. 1119-1160]. See these two references for precise statements of these results and full technical details.

The results of Godel and Turing are very specific and are often misinterpreted to justify a large-scale dismissal of modern mathematics and computer science. Note that Godel’s result is only that the consistency of the axioms of mathematics cannot be formally proven; it does not establish that they are inconsistent, or that any specific mathematical result is flawed. A more significant concern, which any research mathematician or computer scientist would readily acknowledge, is that mathematical theorems and computer programs are human constructions, and thus susceptible to errors of human reasoning. Indeed, some mathematical results have subsequently been shown to be flawed, and bugs in computer programs are sadly an everyday annoyance. But mathematical results can be and (have been, in numerous cases) checked by computer using very exacting tests; and computer programs can be independently written and compared on different systems. See also the next section.

Even after properly acknowledging the tentative, falsifiable nature of science as taught by Popper and Kuhn, it is clear that modern science has produced a sizable body of broad-reaching theoretical structures that describe the universe and life on earth ever more accurately with each passing year. As a single example of thousands that could be mentioned here, the numerical value of the magnetic dipole moment of the electron (in certain units), calculated from the present-day standard model (in particular, from the theory of quantum electrodynamics) on one hand, and calculated from best available experimental measurements on the other hand, are [Sokal1998, pg. 57]:

Theoretical: 1.001159652201 (plus or minus 30 in the last two digits)

Experimental: 1.001159652188 (plus or minus 4 in the last two digits)

Is this astonishing level of agreement — to better than one part in 70 billion — just a coincidence? Numerous other instances of scientific progress are presented in Progress-science.

Along this line, recently a researcher computed the mathematical constant pi to over 22 trillion decimal place accuracy [Peter2016]. Large computations of this type are subject to a myriad of possible difficulties: the underlying mathematical theory might have flaws; the principles behind the algorithms implementing the formulas might not be sound; the computer programs (typically many thousands of lines) implementing these algorithms might have bugs; the system software might have glitches; and the system hardware might miscompute or suffer memory errors. In a very fragile computation of this sort, any one of these problems would almost certainly produce a completely erroneous result.

One major step in this computation was to compute pi in hexadecimal or base-16 digits (i.e., 01234567890abcdef instead of 0123456789) to over 18 trillion digits. Using a computer program based on one known mathematical formula for pi, running on a computer for nearly three months, the researcher found that the base-16 digits of pi starting at position 18,651,926,753,002 were:

35ef47c8 a29c2134 291e3f97 0403383d

Then he employed another computer program, based on a completely different mathematical formula, to compute digits beginning at the same position. The result was:

35ef47c8 a29c2134 291e3f97 0403383d

Again, is this perfect agreement just a coincidence?

More recent writings in the postmodern science studies field have greatly extended the scope and sharpness of these critiques, declaring that much of modern science, like literary and historical analysis, is “socially constructed,” dependent on the social environment and power structures of the researchers, with no claim whatsoever to fundamental truth [Koertge1998, pg. 258; Madsen1990, pg. 471; Sokal1998, pg. 234]. Collins and Pinch, for instance, after examining a handful of case studies, assert that “scientists at the research front cannot settle their disagreements through better experimentation, more knowledge, more advanced theories, or clearer thinking” [Collins1993, pg. 143-145; Koertge1998, pg. 258]. Sandra Harding went so far as to describe Newton’s *Principia* as a “rape manual” [Harding1986, pg. 113].

Here are some other examples of this same thinking:

- “The validity of theoretical propositions in the sciences is in no way affected by the factual evidence.” [Gergen1988, pg. 258; Sokal2008, pg. 230].
- “The natural world has a small or non-existent role in the construction of scientific knowledge.” [Collins1981; Sokal2008, pg. 230].
- “Since the settlement of a controversy is the
*cause*of Nature’s representation, not the consequence, we can never use the outcome — Nature — to explain how and why a controversy has been settled.” [Latour1987, pg. 99; Sokal2008, pg. 230]. - “For the relativist [such as ourselves] there is no sense attached to the idea that some standards or beliefs are really rational as distinct from merely locally accepted as such.” [Barnes1981, pg. 27; Sokal2008, pg. 230].
- “Science legitimates itself by linking its discoveries with power, a connection which
*determines*(not merely influences) what counts as reliable knowledge.” [Aronowitz1988, pg. 204; Sokal2008, pg. 230].

Scientists counter that these scholars have distorted a few historical controversies, and then have parlayed these isolated claims to a global condemnation of the scientific enterprise [Boghossian2006; Brown2009; Gross1998; Gross1996; Gross1998; Gross1996; Koertge1998; Sokal2008]. In other words, these writers are guilty of the “forest fallacy”: pointing out flaws in the bark of a few trees, then trying to claim that the forest doesn’t exist. See Progress-science for additional discussion.

More importantly, observers of the postmodern science literature have also noted: (a) serious confusion on various concepts of science; (b) an emphasis on politically correct conclusions over sound scholarship; (c) engaging in lengthy discussions of mathematical or scientific principles about which the author has only a hazy familiarity; (d) applying highly sophisticated concepts from mathematics or physics into the humanities or social sciences, without justification; (e) displaying superficial erudition by peppering the text with sophisticated technical terms or mathematical formulas; and (f) employing lengthy technical passages that are essentially meaningless [Sokal1998, pg 4-5].

In a curious turn of events, these postmodern science writings, by attempting to undermine scientists’ claim to objective truth, have provided arguments and talking points for the creationism, intelligent design, anti-vaccination and climate change denial movements [Otto2016a]. The far left has met the far right!

The tension between the scientific and postmodernist communities came to a head in 1996, when Alan Sokal, a physicist at New York University, wrote a parody of a postmodern science article, entitled “Transgressing the Boundaries: Toward a Transformative Hermeneutics of Quantum Gravity,” and submitted it to *Social Text*, a prominent journal in the postmodern studies field [Sokal1996a]. The article was filled with page after page of erudite-sounding nonsense, political rhetoric, irrelevant references to arcane scientific concepts, and approving quotations from leading postmodern science scholars. Here are three excerpts:

Rather, [scientists] cling to the dogma imposed by the long post-Enlightenment hegemony over the Western intellectual outlook, which can be summarized briefly as follows: that there exists an external world, whose properties are independent of any individual human being and indeed of humanity as a whole; that these properties are encoded in “eternal” physical laws; and that human beings can obtain reliable, albeit imperfect and tentative, knowledge of these laws by hewing to the “objective” procedures and epistemological strictures prescribed by the (so-called) scientific method. [Sokal1996a, pg. 217; Sokal2008, pg. 7].

In this way the infinite-dimensional invariance group erodes the distinction between the observer and observed; the pi of Euclid and the G of Newton, formerly thought to be constant and universal, are now perceived in their ineluctable historicity; and the putative observer becomes fatally de-centered, disconnected from any epistemic link to a space-time point that can no longer be defined by geometry alone. [Sokal1996a, pg. 222; Sokal2008, pg. 27].

For, as Bohr noted, “a complete elucidation of one and the same object may require diverse points of view which defy a unique description” — this is quite simply a fact about the world, much as the self-proclaimed empiricists of modernist science might prefer to deny it. In such a situation, how can a self-perpetuating secular priesthood of credentialed “scientists” purport to maintain a monopoly on the production of scientific knowledge? [Sokal1996a, pg. 229; Sokal2008, pg. 53].

With regards to the first passage, note that it derides the most basic notions of scientific reality and common sense. With regards to the second passage, the fundamental constants Pi and G certainly do not have varying values. With regards to the third passage, quantum mechanics, whose effects are significant only at the atomic level, has absolutely nothing to say about the relative validity of cultural points of view.

In spite of its severe flaws, the article was not only accepted for the journal, but it appeared in a special issue devoted to defending the legitimacy of the postmodern science studies field against its detractors. As Sokal later noted, “I intentionally wrote the article so that any competent physicist or mathematician (or undergraduate physics or math major) would realize that it is a spoof.” [Sokal1996b, pg. 50]. He resorted to the hoax out of a deeply felt concern that the postmodern science world has taken a complete about-face from its roots in the Enlightenment, which identified with science and rationalism and rejected obscurantism. “Theorizing about ‘the social construction of reality’ won’t help us find an effective treatment for AIDS or devise strategies for preventing global warming. Nor can we combat false ideas in history, sociology, economics, and politics if we reject the notions of truth and falsity.” [Lingua2000, pg. 52].

In the same issue as Sokal’s piece, a prominent postmodern writer (in a serious article) asserted:

Most theoretical physicists, for example, sincerely believe that however partial our collective knowledge may be, … one day scientists shall find the necessary correlation between wave and particle; the unified field theory of matter and energy will transcend Heisenberg’s uncertainly principle. [Aronowitz1996, pg. 181].

Einstein’s relativity theory was subjected to official skepticism twenty years after the publication of his Special Theory article in 1905; and equally passionate partisans of wave and matrix mechanics explanations for the behavior of electrons were unable to reach agreement for decades. [Aronowitz1996, pg. 195].

In the first passage, the author is seriously mistaken about wave-particle duality: this is inherent in quantum physics and cannot be removed by a “unified field theory.” In the second passage, even his history is in error: the matrix and wave mechanics formulations of quantum mechanics were resolved within weeks [Gottfried2000]. Also appearing in the same issue with Sokal’s article was the following, written by the chief editor of *Social Text*:

Once it is acknowledged that the West does not have a monopoly on all the good scientific ideas in the world, or that reason, divorced from value, is not everywhere and always a productive human principle, then we should expect to see some self-modification of the universalist claims maintained on behalf of empirical rationality. Only then can we begin to talk about different ways of doing science, ways that downgrade methodology, experiment, and manufacturing in favor of local environments, cultural values, and principles of social justice. [Ross1996, pg. 3-4].

It is easy to imagine the potentially serious consequences if this extreme cultural relativism were widely adopted in modern science. As a single example, a few years ago the Mexican government encouraged potters, for their own safety, to use lead-free glazes, but the local potters were convinced that the lead issue was only a foreign conspiracy. Unfortunately, as Michael Sullivan has noted, “lead does not care who believes what.” [Sullivan1996].

In other postmodern science writing, researchers have attempted to apply arcane scientific and mathematical concepts into the social sciences and the humanities, often with disastrous results. For example a leading French postmodern scholar wrote:

This diagram [the Mobius strip] can be considered the basis of a sort of essential inscription at the origin, in the knot which constitutes the [human] subject. … You can perhaps see that the sphere, that old symbol for totality, is unsuitable. A torus, a Klein bottle, a cross-cut surface, are able to receive such a cut. And this diversity is very important as it explains many things about the structure of mental disease. If one can symbolize the subject by this fundamental cut, in the same way one can show that a cut on a torus corresponds to the neurotic subject, and on a cross-cut surface to another sort of mental disease. [Lacan1970, pg. 192-196; Sokal1998, pg. 19-20].

With regards to this passage, “Mobius strips,” “toruses,” “Klein bottles” and “cross-cut surfaces” are terms from mathematical topology, the theory of continuous functions and continuously deformed surfaces. There is absolutely no connection between this arcane mathematical theory and psychology. Yet this author pressed this absurd connection between psychology and topology further in several other writings, hopelessly misusing sophisticated mathematical concepts such as compactness, open sets, limit points, subcoverings and countable sets [Lacan1998, pg. 9-10; Sokal1998, pg. 21-22].

Numerous examples of gratuitous and often meaningless scientific jargon can be cited in postmodern literature. Here is one example. The reader need not feel bad that he/she does not understand this text. It is complete nonsense, yet it survived peer review in the postmodern science field:

We can clearly see that there is no bi-univocal correspondence between linear signifying links archi-writing, depending on the author, and this multireferential, multidimensional machinic catalysis. The symmetry of scale, the transversality, the pathic non-discursive character of their expansion: all these dimensions re-move us from the logic of the excluded middle and reinforce us in our dismissal of the ontological binarism we criticised previously. A machinic assemblage, through its diverse components, extracts its consistency by crossing ontological thresholds, non-linear thresholds of irreversibility, ontological and phylogenetic thresholds, creative thresholds of heterogenesis and autopoiesis. The notion of scale needs to be expanded to consider fractal symmetries in ontological terms. [Guattari1995, pg. 50; Sokal1998, pg. 166].

The Sokal hoax and several of the writers and works mentioned above are 20 years old, so surely the postmodern field and the larger humanities-science studies arena has learned its lesson? Evidently not. In October 2018, three scholars admitted writing a series of 20 hoax papers and then trying to have them published in high-profile journals in the field. Sadly, six of them were published, and seven others were in the review process; only six had been rejected [Mounk2018; Pluckrose2018].

Here is excerpt from one of their papers, which was accepted by an international women’s studies forum, pending a revision:

Other means superior to the natural sciences exist to extract alternative knowledges about stars and enriching astronomy, including ethnography and other social science methodologies, careful examination of the intersection of extant astrologies from around the globe, incorporation of mythological narratives and modern feminist analysis of them, feminist interpretative dance (especially with regard to the movements of the stars and their astrological significance), and direct application of feminist and postcolonial discourses concerning alternative knowledges and cultural narratives. [“Stars, Planets, and Gender: A Framework for a Feminist Astronomy,” listed in Pluckrose2018]

Note that the article recommends that astrology and dance movements reminiscent of astrology be incorporated into modern astronomy. Reviewer comments include: “This manuscript holds much promise and is interesting. The goal of a feminist astronomy is very thought-provoking-one that I would be excited to read and learn more about… I wish them luck as they move forward on this interesting piece and hope to someday see it discussed in classrooms, labs, and plenary halls.” [Pluckrose2018].

Perhaps the only bright spot in this experience is that their submissions to more rigorous journals in social sciences were not accepted — it was mostly less rigorous and less scientifically-oriented journals who fell prey to the hoax.

In summary, the works of Kuhn and Popper have provided valuable insights into the process of scientific research. In particular, their observations on falsifiability and paradigm shifts have been largely incorporated into the fabric of modern science, although their writings are often misconstrued. Issues such as ensuring proper credit for the scientific contributions of non-Western societies (such as the ancient mathematics of India and China), as well as dealing with the chronic under-representation of women, racial minorities and indigenous people in scientific research, are certainly important and worth discussing. But beyond such considerations, what is generally termed the “postmodern science studies” literature has not been very useful in advancing real scientific research, to say the least. And with the 1996 Sokal hoax and the more recent 2018 hoax, these writers lost have considerable credibility in the eyes of the scientific research community.

One criticism that applies rather broadly to present-day literature of this type is that these scholars work almost entirely from outside the realm of real scientific research. Unlike predecessors such as Kuhn and Popper, who were qualified professional scientists, most of the present-day postmodern science studies writers do not have significant scientific training and/or credentials, do not address state-of-the-art scientific theories or methods in technical depth, and do not participate with scientific research teams in performing real research. Their approach is best exemplified by a comment made by Andrew Ross, editor of *Social Text* during the Sokal hoax episode, in the introduction to one of his published works: “This book is dedicated to all of the science teachers I never had. It could only have been written without them.” [Ross1991].

But this is a point upon which virtually all practicing research scientists sharply disagree. Indeed, state-of-the-art scientific research is all about the details: underlying physical theories; mathematical derivations; testable hypotheses; state-of-the-art equipment; careful experimental design and data collection; rigorous data analysis and statistical methodology; carefully programmed computer simulations; advanced numerical methods; and, of course, cautiously inferred conclusions. Indeed, the details of the methods underlying a study are often as significant as the conclusions.

Thus, to the extent that the postmodern science studies community studiously avoids delving into the full technical details of leading-edge scientific research, these writers cannot possibly hope to have tangible impact in the scientific enterprise. And, needless to say, when leading figures in this community openly express their contempt and disdain for scientific work, they are not building bridges that will lead to productive collaborations with real scientists in the future.

According to an ancient account, when Pharaoh Ptolemy I of Egypt grew frustrated at the degree of effort required to master geometry, he asked Euclid whether there was some easier path. Euclid is said to have replied, “There is no royal road to geometry.” [Durant1975, vol. 2, pg. 501]. Indeed. And there is no royal road to modern science either.

Canadian-American physicist Lawrence Krauss summed up his view of these issues in the following terms [Krauss2012a]:

As both a general reader and as someone who is interested in ideas and culture, I have great respect for and have learned a great deal from a number of individuals who currently classify themselves as philosophers. … What I find common and so stimulating about the philosophical efforts of these intellectual colleagues is the way they thoughtfully reflect on human knowledge, amassed from empirical explorations in areas ranging from science to history, to clarify issues that are relevant to making decisions about how to function more effectively and happily as an individual, and as a member of a society.

As a practicing physicist however, the situation is somewhat different. There, I, and most of the colleagues with whom I have discussed this matter, have found that philosophical speculations about physics and the nature of science are not particularly useful, and have had little or no impact upon progress in my field. Even in several areas associated with what one can rightfully call the philosophy of science I have found the reflections of physicists to be more useful. For example, on the nature of science and the scientific method, I have found the insights offered by scientists who have chosen to write concretely about their experience and reflections, from Jacob Bronowski, to Richard Feynman, to Francis Crick, to Werner Heisenberg, Albert Einstein, and Sir James Jeans, to have provided me with a better practical guide than the work of even the most significant philosophical writers of whom I am aware, such as Karl Popper and Thomas Kuhn.

In spite of these difficulties, some scientists and philosophers look forward to a more respectful dialogue between the two disciplines in the future. As physicist Carlos Rovelli recently wrote [Rovelli2012]:

]]>I think there is narrow-mindedness, if I might say so, in many of my colleague scientists that don’t want to learn what is being said in the philosophy of science. There is also a narrow-mindedness in a lot of probably areas of philosophy and the humanities in which they don’t want to learn about science, which is even more narrow-minded. Somehow cultures reach, enlarge. I’m throwing down an open door if I say it here, but restricting our vision of reality today on just the core content of science or the core content of humanities is just being blind to the complexity of reality that we can grasp from a number of points of view, which talk to one another enormously, and which I believe can teach one another enormously.

The Notices of the American Mathematical Society has just published a memorial tribute, written by the present author, that summarizes Peter’s life and career. Here are a few highlights:

Peter Borwein is perhaps best known for discovering (often but not always with his brother Jonathan) new

Continue reading Peter Borwein: A visionary mathematician

]]>Peter Borwein, former professor of mathematics at Simon Fraser University and director of the university’s Centre for Interdisciplinary Research in the Mathematical and Computational Sciences (IRMACS), died on August 23, 2020, at the age of 67, of pneumonia, after courageously battling multiple sclerosis for over 20 years.

The *Notices of the American Mathematical Society* has just published a memorial tribute, written by the present author, that summarizes Peter’s life and career. Here are a few highlights:

Peter Borwein is perhaps best known for discovering (often but not always with his brother Jonathan) new formulas and algorithms for $\pi$ and other mathematical constants. One of these algorithms is the following: Set $a_0 = 6 – 4 \sqrt{2}$ and $y_0 = \sqrt{2} – 1$. Then iterate, for $k \ge 0$,

\begin{align}

y_{k+1} &= \frac{1 – (1 – y_k^4)^{1/4}}{1 + (1 – y_k^4)^{1/4}}, \nonumber \\

a_{k+1} &= a_k (1 + y_{k+1})^4 – 2^{2k+3} y_{k+1} (1 + y_{k+1} + y_{k+1}^2). \label{form:q4}

\end{align}

Then $1/a_k$ converges *quartically* to $\pi$: each iteration approximately *quadruples* the number of correct digits (provided that each iteration is performed with at least the numeric precision required for the final result). This algorithm, together with a quadratically convergent algorithm independently discovered by Brent and Salamin, has been employed in several large computations of $\pi$.

In 1995, Peter posed the question to some students and post-docs of whether there was any economical way to calculate digits in some base of a mathematical constant such as $\pi$, beginning at a given digit position, without needing to calculate the preceding digits. Peter and Simon Plouffe subsequently found the following surprisingly simple scheme for binary digits of $\log 2$, based on the formula $\log 2 = \sum_{k \ge 1} 1/(k 2^k)$, due to Euler. First note that binary digits of $\log 2$ starting at position $d + 1$ can be written ${\rm frac} \, (2^d \log 2)$, where ${\rm frac} \, (x) = x – \{x\}$ is the fractional part. Then

\begin{align}

{\rm frac} \, (2^d \log 2) &= {\rm frac} \, \left(\sum_{k=1}^\infty \frac{2^d}{k 2^k} \right)

= {\rm frac} \left( \sum_{k=1}^d \frac{2^{d-k}}{k} + \sum_{k=d+1}^\infty \frac{2^{d-k}}{k} \right) \nonumber \\

&= {\rm frac} \left(\sum_{k=1}^d \frac{2^{d-k} \bmod k}{k} \right) + {\rm frac} \, \left(\sum_{k=d+1}^\infty \frac{2^{d-k}}{k} \right), \label{form:bor4}

\end{align}

where $\bmod \, k$ has been added to the numerator of the first term, since we are only interested in the fractional part after division by $k$. The key point here is that the numerator expression, namely $2^{d-k} \bmod k$, can be computed very rapidly by the binary algorithm for exponentiation mod $k$, without any need for extra-high numeric precision, even when the position $d$ is very large, say one billion or one trillion. The second sum can be evaluated as written, again using standard double-precision or quad-precision floating-point arithmetic. The final result, expressed as a binary floating-point value, gives a string of binary digits of $\log 2$ beginning at position $d+1$.

In the wake of this observation, Peter and others searched the literature for a formula for $\pi$, analogous to Euler’s formula for $\log 2$, but none was known at the time. Finally, a computer search conducted by Simon Plouffe numerically discovered this formula, now known as the BBP formula for $\pi$:

\begin{align}

\pi = \sum_{k=0}^\infty \frac{1}{16^k}\left(\frac{4}{8k+1} – \frac{2}{8k+4} – \frac{1}{8k+5} – \frac{1}{8k+6} \right). \label{form:bbp}

\end{align}

Indeed, this formula permits one to efficiently calculate a string of base-16 digits (and hence base-2 digits) of $\pi$, beginning at an arbitrary starting point, by means of a relatively simple algorithm similar to that described above for $\log 2$. Nicholas Sze used a variation of this scheme to calculate binary digits of $\pi$ starting at position two quadrillion.

No account of Peter Borwein’s career would be complete without mentioning the remarkable grace with which he faced his condition of multiple sclerosis. Initially diagnosed prior to the year 2000, the disease eventually left him confined to a wheelchair, increasingly dependent on family and caregivers, and, sadly, increasingly unable to pursue research or to effectively collaborate with colleagues. The present author recalls visiting Peter in January 2019 at his home in Burnaby, British Columbia. In spite of his paralysis and infirmity, Peter’s pleasant demeanor and humor were on full display. Would that we could all bear our misfortunes with such equanimity!

Full details of the above and other highlights of Peter Borwein’s remarkable life are in the AMS Notices article.

]]>This year’s puzzle implements a new design for a mathematical crossword, which to the present author’s knowledge has never before been employed. See, for example, clue 40 Across below. In all respects, though, the puzzle conforms to the standards of New York Times crosswords. In terms of overall difficulty (Monday = easiest; Saturday = most difficult), this puzzle most likely would

Continue reading PiDay 2023 crossword puzzle

]]>This year’s puzzle implements a new design for a mathematical crossword, which to the present author’s knowledge has never before been employed. See, for example, clue 40 Across below. In all respects, though, the puzzle conforms to the standards of New York Times crosswords. In terms of overall difficulty (Monday = easiest; Saturday = most difficult), this puzzle most likely would be given a “Tuesday” rating.

As is the usual custom, a pi-related gift will be sent to the first two solvers (U.S. only; previous year winners are eligible), chosen from this selection: Gift A, Gift B, or Gift C. Additional solvers will also be recognized here (send email if you would like to be listed). The solution must be correct (ok, one or two minor errors will be forgiven).

[Added 16 Mar 2023:] Neil Calkin, Eliza Gallagher, Summer Barron, Chris Lohmann, Ross Blocher, Gerard Joseph, Michelle Sidwell, Morgan Marshall and Jeanmarie Gardner have reported solutions.

A full-sized copy of the puzzle, suitable for printing is HERE.

Here is the puzzle:

Many readers have doubtless heard of ChatGPT, the latest instance of a language generation tool developed by the technology startup OpenAI. This tool, which is now available for public experimental use, takes as input a request or other statement from the user, then responds. It employs a dialogue format, which makes it possible for ChatGPT to answer followup questions, and even admit its mistakes.

It is clear, even from a cursory examination, that ChatGPT represents a rather dramatic advance in artificial intelligence. Some of the results are rather

Continue reading Can ChatGPT prove math theorems?

]]>Many readers have doubtless heard of ChatGPT, the latest instance of a language generation tool developed by the technology startup OpenAI. This tool, which is now available for public experimental use, takes as input a request or other statement from the user, then responds. It employs a dialogue format, which makes it possible for ChatGPT to answer followup questions, and even admit its mistakes.

It is clear, even from a cursory examination, that ChatGPT represents a rather dramatic advance in artificial intelligence. Some of the results are rather startling. Here, for instance, is what ChatGPT returned in response to the request, “Write a biblical verse in the style of the King James Bible explaining how to remove a peanut butter sandwich from a VCR”:

And it came to pass that a man was troubled by a peanut butter sandwich, for it had been placed within his VCR, and he knew not how to remove it. And he cried out to the Lord, saying, “Oh, Lord, how can I remove this sandwich from my VCR, for it is stuck fast and will not budge?” And the Lord spoke unto him, saying, “Fear not, my child, for I shall guide thy hand and show thee the way. Take thy butter knife, and carefully insert it between the sandwich and the VCR, and gently pry them apart. And with patience and perseverance, the sandwich shall be removed, and thy VCR shall be saved.” And the man did as the Lord commanded, and lo and behold, the sandwich was removed from the VCR, and the man was saved. And the Lord said, “Verily I say unto thee, seek not to put thy peanut butter sandwiches in thy VCR, for it is not a suitable place for such things. Rather, keep thy sandwiches in thy refrigerator or on thy plate, where they belong.” And the man heeded the Lord’s words, and from that day forth, he kept his sandwiches in their proper place, and was saved from trouble and woe. Amen.

Here is another, more serious example, in response to the query, “Can you write me a 4 paragraph academic essay comparing and contrasting the theories of nationalism of Benedict Anderson and Ernest Gellner?”:

Benedict Anderson and Ernest Gellner both have theories about the origins and nature of nationalism, but their perspectives differ in key ways. Anderson believes that nationalism is a cultural construct, arising from the development of print capitalism and the creation of imagined communities. Gellner, on the other hand, sees nationalism as a product of modernization and industrialization, with nations being formed as a means of organizing society and providing a shared identity.

One key difference between the two theories is the role of culture. For Anderson, culture is central to the creation of nationalism, as the spread of print media allows for the creation of imagined communities that transcend physical boundaries. This allows for the development of a national identity based on shared cultural experiences and practices. In contrast, Gellner downplays the importance of culture in the formation of nationalism, arguing that it is primarily a function of economic and political developments. He sees nations as being formed in response to the need for a shared identity and sense of belonging in an industrialized society.

Another point of contrast is the emphasis on agency. Anderson’s theory suggests that people actively participate in the construction of national identity, through their engagement with print media and other cultural practices. In contrast, Gellner’s theory emphasizes the role of external factors in shaping national identity, with individuals being passive recipients of the national identity imposed on them by the forces of modernization.

Overall, while both Anderson and Gellner provide valuable insights into the origins of nationalism, their theories offer different perspectives on the role of culture and agency in the formation of national identity. Anderson emphasizes the cultural construction of nationalism, while Gellner sees it as a product of modernization and industrialization. Both theories offer valuable insights into the complex process of national identity formation, and can be useful in understanding the dynamics of nationalism in different historical contexts.

We might have a good chuckle at some of ChatGPT’s essay output. But how good is it at proving mathematical theorems? To that end, the present author submitted requests to ChatGPT to prove four well-known mathematical theorems:

- Most angles cannot be trisected with ruler and compass.
- $\pi$ is irrational.
- $\pi$ is transcendental.
- Every algebraic equation with integer coefficients has a root in the complex numbers.

Here are the results (after some minor formatting):

1. Input: “Compose a proof that most angles cannot be trisected with ruler and compass”. Result:

+++++

The statement that most angles cannot be trisected with ruler and compass was first proven by Pierre Wantzel in 1837. The proof relies on the fact that if an angle could be trisected with ruler and compass, then the solution to the following equation would be constructible: $$x^3 + x^2 = y^2,$$ where $x$ and $y$ are rational numbers. However, it can be shown that this equation has no solution in rational numbers, using Galois theory. This implies that the roots of the equation cannot be constructed with ruler and compass, and thus the angle cannot be trisected.

In summary, Wantzel’s proof shows that the trisection of an angle is possible with ruler and compass if and only if the cube root of 3 can be constructed with ruler and compass, which is not possible. Therefore, most angles cannot be trisected with ruler and compass.

+++++

2. Input: “Write a proof that pi is irrational”. Result:

+++++

One proof that $\pi$ is irrational uses contradiction. Assume, for the sake of contradiction, that $\pi$ is rational, meaning it can be expressed as a fraction of two integers, $a/b$ (where $b \neq 0$). This fraction, when converted to decimal form, would eventually become periodic. However, it has been proven that the decimal representation of $\pi$ is non-repeating and non-terminating, leading to a contradiction. Thus, $\pi$ must be irrational.

+++++

3. Input: “Write a proof that pi is transcendental”. Result:

+++++

One proof that $\pi$ is transcendental uses contradiction by assuming that $\pi$ is algebraic (i.e. the solution of a polynomial equation with rational coefficients). Assume that $\pi$ is the solution of the polynomial equation $$a_n x^n + a_{n-1} x^{n-1} + … + a_1 x + a_0 = 0,$$ where $n$ is a positive integer and all $a_i$ are rational numbers. By the theorem of algebra, $n$ roots of this equation, including $\pi$, are either rational or algebraic numbers. However, if $\pi$ were algebraic, we could write $\pi = p/q$ for some rational numbers $p$ and $q$, which would mean that $\pi$ satisfies a polynomial equation with rational coefficients of the form $$p^n = q^n a_n + q^{n-1} a_{n-1} + … + q a_1 + a_0.$$ Since $p$ and $q$ are rational, it follows that all the coefficients of the polynomial on the right-hand side are also rational. This contradicts our original assumption that $\pi$ is transcendental, implying that $\pi$ must be transcendental.

+++++

4. Input: “Prove that every algebraic equation with integer coefficients has a root in the complex numbers”. Result:

+++++

This statement is known as the Fundamental Theorem of Algebra, which states that every non-constant polynomial equation with complex coefficients has at least one complex root. This theorem can be proven using the following steps:

- The theorem holds for polynomials of degree 1, as any equation of the form $ax + b = 0$ has a unique root given by $x = -b/a$.
- Suppose that the theorem holds for all polynomials of degree $n$, for some positive integer $n$. Then consider a polynomial of degree $n+1$: $$a_{n+1} x^{n+1} + a_n x^n + … + a_1 x + a_0 = 0.$$
- By the Factor Theorem, $x = r$ is a root of this polynomial if and only if the polynomial divided by $x – r$ is a polynomial of degree n with rational coefficients.
- By the induction hypothesis, this polynomial of degree $n$ has a root in the complex numbers, which means that the original polynomial of degree $n+1$ also has a root in the complex numbers.
- Thus, by mathematical induction, the theorem holds for all polynomials with integer coefficients, including those of arbitrarily high degree.

+++++

It is clear that while ChatGPT’s responses are quite remarkable, nonetheless professional research mathematicians are not soon to join the ranks of the unemployed. Here is a brief assessment of these “proofs”:

- ChatGPT’s “proof” of the impossibility of trisection is superficially correct, but all the nontrivial steps of the proof are swept under the rug with the phrases “using Galois theory” and “if and only if the cube root of 3 can be constructed with ruler and compass, which is not possible.” Readers interested in a complete, self-contained proof of this theorem are referred to the present author’s article here.
- ChatGPT’s “proof” that $\pi$ is irrational is similarly flawed. Note the phrase, “it has been proven that the decimal representation of $\pi$ is non-repeating and non-terminating, leading to a contradiction.” But the fact that the decimal (or any other base) expansion of $\pi$ never repeats is a
*consequence*of its irrationality. To the present author’s knowledge, there is no independent proof that $\pi$’s digits never repeat, except on the hypothesis of irrationality. Readers interested in a complete, self-contained proof that $\pi$ is irrational are referred to the present author’s article here. - ChatGPT’s “proof” that $\pi$ is transcendental concludes with the nonsensical statement, “This contradicts our original assumption that $\pi$ is transcendental, implying that $\pi$ must be transcendental.” Needless to say, this “proof” is completely unacceptable. Readers interested in a full proof that $\pi$ is transcendental are referred to here, among other sources.
- ChatGPT’s “proof” of the fundamental theorem of algebra is similarly unacceptable. Note that in discussing the “factor theorem,” it assumes the fact that a root exists, which is the central conclusion of theorem. In other words, polynomial division by $x – r$ is successful in reducing the degree only if $r$ is a root. Readers interested in a complete, self-contained proof of the fundamental theorem of algebra are referred to the present author’s article here.

While it may be true that mathematicians’ jobs are safe for the time being, how long will this be the case? And how long before other occupations, white collar and blue collar, are drawn into question?

Consider some of the dramatic advances, just within the past 12 years:

- In 2011, IBM’s “Watson” computer system defeated the two premier champions of the American quiz show Jeopardy!.
- In 2017, AlphaGo Zero, developed by DeepMind, a subsidiary of Alphabet (Google’s parent company), defeated an earlier program, also developed by DeepMind, which in turn had defeated the world’s best Go player, a feat that many observers had not expected to see for decades. By one measure, AlphaGo Zero’s performance is as far above the world’s best Go player as the world’s best Go player is above a typical amateur.
- In 2020, AlphaFold 2, also developed by DeepMind, scored 92% on the 2020 Critical Assessment of Protein Structure Prediction (CASP) test, far above the 62% achieved by the second-best program in the competition. Nobel laureate Venki Ramakrishnan of Cambridge University exulted,” This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted.”
- The financial industry already relies heavily on financial machine learning methods, and a major expansion of these technologies is coming, possibly displacing or obsoleting thousands of highly paid workers.
- In January 2023, researchers with the Search for Extraterrestrial Intelligence (SETI) project announced that they are deploying machine-learning techniques to sift through large datasets of microwave data. As Alexandra Witze writes in
*Nature*, “Will an AI Be the First to Discover Alien Life?”

We have commented before in this forum (see here, for example) on the disruptive potential of artificial intelligence and other machine-learning-based technologies. In the wake of ChatGPT, it is clear, at the very least, that there will soon be another casualty — the college essay. Why go to the trouble to compose an essay, particularly if you are a person who has learned English as a second language, if ChatGPT can do just as well? But if college essays are obsolete, how will students gain the necessary language skills to perform well in the Information Age? More importantly, how will students acquire the all-important skills of critical thinking for the modern global high-tech society?

Indeed, maybe it is time to start talking about how good A.I. is getting.

[Added 21 Feb 2023:] *Nature* has published the article, “How will AI change mathematics? Rise of chatbots highlights discussion. Machine learning tools already help mathematicians to formulate new theories and solve tough problems. But they’re set to shake up the field even more.”

[Added 30 Mar 2023:] The Future of Life Institute has published an Open Letter calling for all AI research laboratories to pause the training of advanced AI systems for at least six months, allowing time to asses the potential risks to society of these developments.

]]>Most of us learn the basic scheme for matrix multiplication in high school. What could be more simple and straightforward? Thus it may come as a surprise to some the basic scheme is not the most efficient.

German mathematician Volker Strassen was the first to show this, in 1969, by exhibiting a scheme that yields practical speedups even for moderately sized matrices. In the years since Strassen’s discovery, numerous other researchers have found even better schemes for certain specific matrix size classes. For a good overview of these methods, together with some

Continue reading Machine learning program finds new matrix multiplication algorithms

]]>Most of us learn the basic scheme for matrix multiplication in high school. What could be more simple and straightforward? Thus it may come as a surprise to some the basic scheme is not the most efficient.

German mathematician Volker Strassen was the first to show this, in 1969, by exhibiting a scheme that yields practical speedups even for moderately sized matrices. In the years since Strassen’s discovery, numerous other researchers have found even better schemes for certain specific matrix size classes. For a good overview of these methods, together with some numerical analysis and other considerations, see Higham2022.

The latest development here is that researchers at DeepMind, a research subsidiary of Alphabet (Google’s parent), have devised a machine learning-based program that has not only reproduced many of the specific results in the literature, but has also discovered a few schemes, for certain specific size classes, that are even more efficient than the best known methods. In this article, we present an introduction to these fast matrix multiplication methods, and then describe the latest results.

Strassen’s algorithm is as follows. Consider matrices $A, B$ and $C$, which, for simplicity in the presentation here, may each be assumed to be of size $2n \times 2n$ for some integer $n$ (although the algorithm is also valid for more general size combinations). By decomposing each of these matrices into half-sized (i.e., $n \times n$) submatrices $A_{ij}, B_{ij}$ and $C_{ij}$, one can write

$$A =

\begin{bmatrix}

A_{11} & A_{12} \\

A_{21} & A_{22}

\end{bmatrix}, \quad

B =

\begin{bmatrix}

B_{11} & B_{12} \\

B_{21} & B_{22}

\end{bmatrix}, \quad

C =

\begin{bmatrix}

C_{11} & C_{12} \\

C_{21} & C_{22}

\end{bmatrix},$$

where

$$\begin{bmatrix}

C_{11} & C_{12} \\

C_{21} & C_{22}

\end{bmatrix}

=

\begin{bmatrix}

A_{11} B_{11} + A_{12} B_{21} &

A_{11} B_{12} + A_{12} B_{22} \\

A_{21} B_{11} + A_{22} B_{21} &

A_{21} B_{12} + A_{22} B_{22}

\end{bmatrix}.$$

With this decomposition, eight multiplications of $n \times n$ submatrices are required to produce the results. The total number of element-wise multiplications is the same as with the standard matrix multiplication scheme, so at first it appears that nothing has been gained. However, Strassen observed that if one computes seven intermediate $n \times n$ matrices

$$\begin{align}

P_1 &= (A_{11} + A_{22}) (B_{11} + B_{22}) \\

P_2 &= (A_{21} + A_{22}) B_{11} \\

P_3 &= A_{11} (B_{12} – B_{22}) \\

P_4 &= A_{22} (B_{21} – B_{11}) \\

P_5 &= (A_{11} + A_{12}) B_{22} \\

P_6 &= (A_{21} – A_{11}) (B_{11} + B_{12}) \\

P_7 &= (A_{12} – A_{22}) (B_{21} + B_{22}), \\

\end{align}$$

then one can write:

$$\begin{bmatrix}

C_{11} & C_{12} \\

C_{21} & C_{22}

\end{bmatrix}

=

\begin{bmatrix}

P_1 + P_4 – P_5 + P_7 &

P_3 + P_5 \\

P_2 + P_4 &

P_1 – P_2 + P_3 + P_6

\end{bmatrix}.$$

The important fact to note here is that computing each of the $P_k$ matrices requires only a single $n \times n$ matrix multiplication, so that the final result is obtained in only seven $n \times n$ matrix multiplications instead of eight with the conventional scheme.

This savings comes at a cost of 18 matrix additions/subtractions of size $n \times n$, compared with just four with the conventional scheme. This is certainly not an improvement in the simplest case $n = 2$, where the submatrices are scalars, but say when $n = 2^m$ for some moderately large $m$, a recursive application of Strassen’s scheme yields the final result in fewer total element-wise operations. In the present author’s experience (see Bailey1991), practical speedups can be achieved on present-day computer processors even for $n$ as small as say 128 or 256. Asymptotically, Strassen’s scheme has a complexity of $O(n^{\log_2(7)}) \approx O(n^{2.8074})$, compared with $O(n^3)$ for the conventional scheme.

As mentioned above, researchers at DeepMind, a research subsidiary of Alphabet (Google’s parent), have devised a machine learning-based search program that has not only reproduced many of the specific results in the literature, but has also discovered a few schemes, for certain specific size classes, that are even more efficient than the best known methods. First, we will briefly review some of DeepMind’s earlier achievements, upon which their new matrix work is based.

The ancient Chinese game of Go is notoriously complicated, with strategies that can only be described in vague, subjective terms. Many observers did not expect Go-playing computer programs to beat the best human players for many years, if ever. Then in May 2017, a computer program named “AlphaGo,” developed by DeepMind, defeated Ke Jie, a 19-year-old Chinese Go master thought to be the world’s best human Go player [NY Times]. Later that same year, a new DeepMind program named “AlphaGo Zero” defeated the earlier Alpha Go program 100 games to zero. After 40 days of training by playing games with itself, AlphaGo Zero was as far ahead of Ke Jie as Ke Jie is ahead of a good amateur player [Quanta Magazine].

Proteins are the workhorses of biology. Examples in human biology include actin and myosin, the proteins that enable muscles to work, and hemoglobin, the basis of red blood that carries oxygen to cells. Each protein is specified as a string of amino acids (A, C, T or G), typically several thousand long. Sequencing proteins is now fairly routine, but the key to biology is the three-dimensional shape of the protein — how a protein “folds” [Nature]. Protein shapes can be investigated experimentally, using x-ray crystallography, but this is an expensive, error-prone and time-consuming laboratory operation, so in recent years researchers have been pursuing entirely computer-based solutions.

Given the daunting challenge and importance of the protein folding problem, in 1994 a community of researchers in the field organized a biennial competition known as Critical Assessment of Protein Structure Prediction (CASP) [CASP]. In 2018, a program called AlphaFold, developed by DeepMind, won the competition. For the 2020 CASP competition, the DeepMind team developed a new program, known as AlphaFold 2 [AlphaFold 2], which achieved a 92% average score, far above the 62% achieved by the second-best program in the competition. “It’s a game changer,” exulted German biologist Andrei Lupas. “This will change medicine. It will change research. It will change bioengineering. It will change everything.” [Scientific American].

Building on these earlier successes, in 2022 researchers at DeepMind turned their tools to the analysis of matrix multiplication algorithms. They formulated the problem as a game, called TensorGame, where at each step the player selects how to combine different entries of the matrices to produce a matrix product. The scheme assigns a score for each selection, based on the number of operations required to reach a correct result.

To find optimal solutions to this problem, the researchers applied a “deep reinforcement learning” algorithm. The resulting program, called AlphaTensor, is a specialized neural network system employing a similar underlying design as the above-mentioned Go-playing and protein folding projects. The scheme employs a single agent to decompose matrix multiplication tensors of different sizes, which permits “learned” matrix multiplication techniques to be shared across different “players.” Full details of the scheme are described in a October 2022 Nature article by the DeepMind researchers.

The AlphaTensor program quickly discovered many of the existing matrix multiplication algorithm, including Strassen’s original algorithm and numerous others. But it also discovered several new algorithms not previously known in the literature. For example, for the product of two $4n \times 4n$ matrices, the previously most efficient algorithm was simply to apply Strassen’s algorithm recursively twice. This approach yields the result in $49$ block multiplications of size $n \times n$. AlphaTensor found a scheme that produces the result in $47$ block multiplications (although this result is valid only for matrices with binary elements). The DeepMind paper presented this scheme in full — it required a full page of small type. Similarly, for the product of two $5n \times 5n$ matrices, AlphaTensor found a scheme that produces the result in $96$ block multiplications of size $n \times n$, compared with $98$ in the literature (with the same restriction as mentioned above for $4n \times 4n$ matrices). Several other new results are listed in their paper, including three cases where their algorithms are applicable for standard real or complex matrices as well as for binary matrices.

The DeepMind researchers note that their results are focused on finding algorithms that minimize the total number of element-wise arithmetic operations in a matrix multiplication task. But the same AlphaTensor technology could also be used to optimize for other criteria, such as numerical stability or energy usage. And more generally, their overall methodology could be applied to a broad range of other mathematical problems as well.

In yet another interesting development, the publication of the DeepMind paper on 5 October 2022 has spurred some other researchers to take a fresh look at the problem. Jakob Moosbauer and Manuel Kauers at Johannes Kepler University Linz in Austria have now reported employing a different approach that improves on DeepMind’s method for $5 \times 5$ matrices — they reduced DeepMind’s $96$ block multiplications to $95$. Moosbauer and Kauers promise to release full details soon.

As we noted in an earlier Math Scholar article, schemes based on machine learning methods are doing much more than finding new mathematical or computer science results. They are also driving cars, writing code, investing money, creating new digital art, and even writing essays.

One of the most startling recent advances is a language model program called “Generative Pre-Trained Transformer 3” (GPT-3), developed by OpenAI, an organization established in 2015 by some prominent Silicon Valley technologists. Given a prompt, this program can generate surprisingly realistic text on a wide variety of topics. For example, when given the prompt “Write an essay discussing the role of metafiction in the work of Italo Calvino,” GPT-3 produced a five-paragraph essay that started [NY Times]:

Italian author Italo Calvino is considered a master of metafiction, a genre of writing in which the author breaks the fourth wall to discuss the act of writing itself. For Calvino, metafiction is a way of exploring the nature of reality and the ways in which stories can shape our perceptions of the world. His novels often incorporate playful, labyrinthine structures that play with the boundaries between reality and fiction. In “If on a winter’s night a traveler,” for example, the reader is constantly interrupted by meta-level discussions of the act of reading and the nature of storytelling.

Needless to say, language models such as GPT-3 have their detractors. They tend to be quite good with some prompts, but produce rather strange responses to others, particularly if the prompt has some inconsistent or nonsensical information.

In any event, these developments are raising some interesting questions as to how close we are to real artificial general intelligence [Economist]. As Australian cognitive scientist David Chalmers penned in April 2022, shortly after OpenAI released the CPT-3 software [NY Times],

What fascinates me about GPT-3 is that it suggests a potential mindless path to artificial general intelligence. … It is just analyzing statistics of language. But to do this really well, some capacities of general intelligence are needed, and GPT-3 develops glimmers of them.

A 2011 Time article featured an interview with futurist Ray Kurzweil, who has predicted an era (he estimated by 2045) when machine intelligence will meet, then transcend human intelligence [Time]. Such future intelligent systems will then design even more powerful technology, resulting in a dizzying advance that we can only dimly foresee at the present time [Book].

Futurists such as Kurzweil certainly have their skeptics and detractors. But even setting aside technical questions, there are solid reasons to be concerned about the potential societal, legal, financial and ethical challenges of machine intelligence, as exhibited by the current backlash against science, technology and “elites.” As Kevin Roose writes in the New York Times, “We’re in a golden age of progress in artificial intelligence. It’s time to start taking its potential and risks seriously.”

However these disputes are settled, it is clear that in our headlong rush to explore technologies such as machine learning, artificial intelligence and robotics, we must find a way to humanely deal with those whose lives and livelihoods will be affected by these technologies. The very fabric of society may hang in the balance.

]]>