The present author recalls, while in graduate school at Stanford nearly 50 years ago, hearing two senior mathematicians (including the present author’s advisor) briefly discuss some other researcher’s usage of computation in a mathematical investigation. Their comments and body language clearly indicated a profound disdain for such efforts. Indeed, the prevailing mindset at the time was “real mathematicians don’t compute.”

Some researchers at the time attempted to draw more attention to the potential of computer tools in mathematics. In the

Continue reading Computation, AI and the future of mathematics

]]>The present author recalls, while in graduate school at Stanford nearly 50 years ago, hearing two senior mathematicians (including the present author’s advisor) briefly discuss some other researcher’s usage of computation in a mathematical investigation. Their comments and body language clearly indicated a profound disdain for such efforts. Indeed, the prevailing mindset at the time was “real mathematicians don’t compute.”

Some researchers at the time attempted to draw more attention to the potential of computer tools in mathematics. In the early 1990s, Keith Devlin edited a column in the *Notices of the American Mathematical Society* on applications of computers in the field. Journals such as *Mathematics of Computation* and the new *Experimental Mathematics* received a growing number of submissions. But in general computation and computer tools remained relatively muted in the field.

How times have changed! Nowadays most mathematicians are comfortable using mathematical software such as *Mathematica*, *Maple* or *Sage*. Many have either personally written computer code in some language as part of a mathematical investigation, or, at the least, have worked closely with a colleague who writes such code. Most present-day mathematicians are familiar with using LaTeX to typeset papers and produce presentation viewgraphs, and many are also familiar with collaboration tools such as blogs, FaceTime, Skype, Zoom, Slack and Microsoft Teams.

One important development is that a growing number of mathematicians have experience using formal mathematical software to encode and rigorously confirm all steps of a formal mathematical proof. Some more widely used systems include Lean, Coq, HOL and Isabelle.

Indeed, in sharp contrast to 50 years ago, perhaps today one would say “real mathematicians must compute,” particularly if they wish to pursue state-of-the-art research in collaboration with other talented mathematicians. Here we mention just a few of the interesting developments in this arena.

The Kepler conjecture is the assertion that the simple scheme of stacking oranges typically seen in a supermarket has the highest possible average density, namely $\pi/(3 \sqrt{2}) = 0.740480489\ldots$, for any possible arrangement, regular or irregular. It is named after 17th-century astronomer Johannes Kepler, who first proposed that planets orbited in elliptical paths around the sun.

In the early 1990s, Thomas Hales, following an approach first suggested by Laszlo Fejes Toth in 1953, determined that the maximum density of all possible arrangements could be obtained by minimizing a certain function with 150 variables. In 1992, assisted by his graduate student Samuel Ferguson (son of famed mathematician-sculptor Helaman Ferguson), Hales embarked on a multi-year effort to find a lower bound for the function for each of roughly 5,000 different configurations of spheres, using a linear programming approach. In 1998, Hales and Ferguson announced that their project was complete, documented by 250 pages of notes and three Gbyte of computer code, data and results.

Hales’ computer-assisted proof generated some controversy — the journal *Annals of Mathematics* originally demurred, although ultimately accepted his paper. In the wake of this controversy, Hales decided to embark on a collaborative effort to certify the proof via automated techniques. This project, named Flyspeck, employed the proof-checking software HOL Light and Isabelle. The effort commenced in 2003, but was not completed until 2014.

Another good recent example of computer-based tools in modern mathematics came to light recently in work on the twin prime conjecture. The twin prime conjecture is the assertion that there are infinitely many pairs of primes differing by just two, such as $(11, 13), (101, 103)$, and $(4799, 4801)$. Although many eminent mathematicians have investigated the conjecture, few solid results have been obtained, and the conjecture remains open.

In 2013, the previously unheralded mathematician Yitang Zhang posted a paper proving that there are infinitely many pairs of primes with gaps between two and 70,000,000. While this certainly did not prove the full conjecture (the 70,000,000 figure would have to be reduced to just two), the paper and its underlying techniques constituted what one reviewer termed “a landmark theorem in the distribution of prime numbers.”

Zhang based his work on a 2005 paper by Dan Goldston, Janos Pintz and Cem Yildirim. Their paper employed a sieving method analogous to the Sieve of Eratosthenes to filter out pairs of primes that are closer together than average. They then proved that for any fraction, no matter how small, there exists some pair of primes closer together than that fraction of the average gap.

Immediately after Zhang’s result was made public, researchers wondered if the figure 70,000,000 could be lowered. Soon Terence Tao, James Maynard and several other prominent mathematicians instituted an online collaboration known as Polymath to further reduce the gap. (The Polymath system had previously been developed by Timothy Gowers and others, and has an impressive list of results.) When the dust had settled, Terence Tao proved a key result lowering the limit to just 246, using results developed by Zhang, Maynard and others. The resulting paper was listed as authored by “D.H.J. Polymath.”

It is no secret that the methods of data mining, machine learning and artificial intelligence have made dramatic advances within the past years. Here are some well-known examples:

- In 2011, IBM’s “Watson” computer system defeated two premier champions of the American quiz show Jeopardy!.
- In 2017, AlphaGo Zero, developed by DeepMind, a subsidiary of Alphabet (Google’s parent company), defeated an earlier program, also developed by DeepMind, which in turn had defeated the world’s best Go player, a feat that many observers had not expected to see for decades. By one measure, AlphaGo Zero’s performance is as far above the world’s best Go player as the world’s best Go player is above a typical amateur.
- In 2020, AlphaFold 2, also developed by DeepMind, scored 92% on the 2020 Critical Assessment of Protein Structure Prediction (CASP) test, far above the 62% achieved by the second-best program in the competition. Nobel laureate Venki Ramakrishnan of Cambridge University exulted,” This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted.”
- The financial industry already relies heavily on financial machine learning methods, and a major expansion of these technologies is coming, possibly displacing or obsoleting thousands of highly paid workers.
- In January 2023, researchers with the Search for Extraterrestrial Intelligence (SETI) project announced that they are deploying machine-learning techniques to sift through large datasets of microwave data. As Alexandra Witze writes in
*Nature*, “Will an AI Be the First to Discover Alien Life?” - In 2022, OpenAI launched ChatGPT, a software system with a remarkable facility to generate surprisingly cogent text on many topics and to interact with humans. This in turn has launched an incredible boom in AI research and development by several competing laboratories and firms that shows no sign of abating. In June 2024, Apple announced that it will soon provide OpenAI software in iPhones and other devices. Firms such as Nvidia that produce chips for AI have seen explosive demand for their products.

Advanced machine learning, data mining and AI software is also being applied in mathematical research. One example is known as the Ramanujan Machine. This is a systematic computational approach, developed by a consortium of researchers mostly based in Israel, that attempts to discover mathematical formulas for fundamental constants, and to reveal the underlying structure of various constants. Their techniques have discovered numerous well-known formulas and several new ones, notably some continued fraction representations of $\pi, e$, Catalan’s constant, and values of the Riemann zeta function at integer arguments.

A similar, but quite distinct approach has been taken by a separate group of researchers, including famed mathematician brothers Jonathan Borwein and Peter Borwein (with some participation by the present author). By employing, in part, very high-precision computations, these researchers discovered a large number of new infinite series formulas for various mathematical constants (including $\pi$ and $\log(2)$, among others), some of which have the intriguing property that they permit one to directly compute binary or hexadecimal digits beginning at an arbitrary starting position, without needing to compute any of the digits that come before. Two example references are here and here.

Along this line, one very useful online reference for computational mathematics is N.J.A. Sloane’s Online Encyclopedia of Integer Sequences, which at present contains over 375,000 entries, many of which include numerous references of where a specific sequence appears in mathematical literature. With the rise of serious data-mining technology, we may see remarkable new mathematical results discovered by data mining this invaluable dataset.

UCLA Fields Medalist mathematician Terence Tao recently sat down with Christoph Drosser, a writer for *Spektrum der Wissenschaft*, the German edition of *Scientific American*. Drosser and Tao discussed at length the current status of computer tools in mathematical research, including proof-checking software such as Lean and also AI-based tools, and what is likely to be the future course of the field with the ascendance of these tools. Here are some excerpts from this remarkable interview (with some minor editing by the present author):

Drosser: With the advent of automated proof checkers, how is [the trust between mathematicians] changing?

Tao: Now you can really collaborate with hundreds of people that you’ve never met before. And you don’t need to trust them, because they upload code and the Lean compiler verifies it. You can do much larger-scale mathematics than we do normally. When I formalized our most recent results with what is called the Polynomial Freiman-Ruzsa (PFR) conjecture, [I was working with] more than 20 people. We had broken up the proof in lots of little steps, and each person contributed a proof to one of these little steps. And I didn’t need to check line by line that the contributions were correct. I just needed to sort of manage the whole thing and make sure everything was going in the right direction. It was a different way of doing mathematics, a more modern way.

Drosser: German mathematician and Fields Medalist Peter Scholze collaborated in a Lean project — even though he told me he doesn’t know much about computers.

Tao: With these formalization projects, not everyone needs to be a programmer. Some people can just focus on the mathematical direction; you’re just splitting up a big mathematical task into lots of smaller pieces. And then there are people who specialize in turning those smaller pieces into formal proofs. We don’t need everybody to be a programmer; we just need some people to be programmers. It’s a division of labor.

[Continuing after some skip:]

Drosser: I heard about machine-assisted proofs 20 years ago, when it was a very theoretical field. Everybody thought you have to start from square one — formalize the axioms and then do basic geometry or algebra — and to get to higher mathematics was beyond people’s imagination. What has changed that made formal mathematics practical?

Tao: One thing that changed is the development of standard math libraries. Lean, in particular, has this massive project called mathlib. All the basic theorems of undergraduate mathematics, such as calculus and topology, and so forth, have one by one been put in this library. So people have already put in the work to get from the axioms to a reasonably high level. And the dream is to actually get [the libraries] to a graduate level of education. Then it will be much easier to formalize new fields [of mathematics]. There are also better ways to search because if you want to prove something, you have to be able to find the things that it already has confirmed to be true. So also the development of really smart search engines has been a major new development.

Drosser: So it’s not a question of computing power?

Tao: No, once we had formalized the whole PFR project, it only took like half an hour to compile it to verify. That’s not the bottleneck — it’s getting the humans to use it, the usability, the user friendliness. There’s now a large community of thousands of people, and there’s a very active online forum to discuss how to make the language better.

Drosser: Is Lean the state of the art, or are there competing systems?

Tao: Lean is probably the most active community. For single-author projects, maybe there are some other languages that are slightly better, but Lean is easier to pick up in general. And it has a very nice library and a nice community. It may eventually be replaced by an alternative, but right now it is the dominant formal language.

[Continuing after some skip:]

Drosser: So far, the idea for the proof still has to come from the human mathematician, doesn’t it?

Tao: Yes, the fastest way to formalize is to first find the human proof. Humans come up with the ideas, the first draft of the proof. Then you convert it to a formal proof. In the future, maybe things will proceed differently. There could be collaborative projects where we don’t know how to prove the whole thing. But people have ideas on how to prove little pieces, and they formalize that and try to put them together. In the future, I could imagine a big theorem being proven by a combination of 20 people and a bunch of AIs each proving little things. And over time, they will get connected, and you can create some wonderful thing. That will be great. It’ll be many years before that’s even possible. The technology is not there yet, partly because formalization is so painful right now.

Drosser: I have talked to people that try to use large language models or similar machine-learning technologies to create new proofs. Tony Wu and Christian Szegedy, who recently co-founded the company xAI, with Elon Musk and others, told me that in two to three years mathematics will be “solved” in the same sense that chess is solved — that machines will be better than any human at finding proofs.

Tao: I think in three years AI will become useful for mathematicians. It will be a great co-pilot. You’re trying to prove a theorem, and there’s one step that you think is true, but you can’t quite see how it’s true. And you can say, “AI, can you do this stuff for me?” And it may say, “I think I can prove this.” I don’t think mathematics will become solved. If there was another major breakthrough in AI, it’s possible, but I would say that in three years you will see notable progress, and it will become more and more manageable to actually use AI. And even if AI can do the type of mathematics we do now, it means that we will just move to a higher type of mathematics. So right now, for example, we prove things one at a time. It’s like individual craftsmen making a wooden doll or something. You take one doll and you very carefully paint everything, and so forth, and then you take another one. The way we do mathematics hasn’t changed that much. But in every other type of discipline, we have mass production. And so with AI, we can start proving hundreds of theorems or thousands of theorems at a time. And human mathematicians will direct the AIs to do various things. So I think the way we do mathematics will change, but their time frame is maybe a little bit aggressive.

Drosser: I interviewed Peter Scholze when he won the Fields Medal in 2018. I asked him, How many people understand what you’re doing? And he said there were about 10 people.

Tao: With formalization projects, what we’ve noticed is that you can collaborate with people who don’t understand the entire mathematics of the entire project, but they understand one tiny little piece. It’s like any modern device. No single person can build a computer on their own, mine all the metals and refine them, and then create the hardware and the software. We have all these specialists, and we have a big logistics supply chain, and eventually we can create a smartphone or whatever. Right now, in a mathematical collaboration, everyone has to know pretty much all the mathematics, and that is a stumbling block, as [Scholze] mentioned. But with these formalizations, it is possible to compartmentalize and contribute to a project only knowing a piece of it. I think also we should start formalizing textbooks. If a textbook is formalized, you can create these very interactive textbooks, where you could describe the proof of a result in a very high-level sense, assuming lots of knowledge. But if there are steps that you don’t understand, you can expand them and go into details — all the way down the axioms if you want to. No one does this right now for textbooks because it’s too much work. But if you’re already formalizing it, the computer can create these interactive textbooks for you. It will make it easier for a mathematician in one field to start contributing to another because you can precisely specify subtasks of a big task that don’t require understanding everything.

[Continuing after some skip:]

Drosser: By breaking down a problem and exploring it, you learn a lot of new things on the way, too. Fermat’s Last Theorem, for example, was a simple conjecture about natural numbers, but the math that was developed to prove it isn’t necessarily about natural numbers anymore. So tackling a proof is much more than just proving this one instance.

Tao: Let’s say an AI supplies an incomprehensible, ugly proof. Then you can work with it, and you can analyze it. Suppose this proof uses 10 hypotheses to get one conclusion — if I delete one hypothesis, does the proof still work? That’s a science that doesn’t really exist yet because we don’t have so many AI-generated proofs, but I think there will be a new type of mathematician that will take AI-generated mathematics and make it more comprehensible. Like, we have theoretical and experimental science. There are lots of things that we discover empirically, but then we do more experiments, and we discover laws of nature. We don’t do that right now in mathematics. But I think there’ll be an industry of people trying to extract insight from AI proofs that initially don’t have any insight.

Drosser: So instead of this being the end of mathematics, would it be a bright future for mathematics?

Tao: I think there’ll be different ways of doing mathematics that just don’t exist right now. I can see project manager mathematicians who can organize very complicated projects — they don’t understand all the mathematics, but they can break things up into smaller pieces and delegate them to other people, and they have good people skills. Then there are specialists who work in subfields. There are people who are good at trying to train AI on specific types of mathematics, and then there are people who can convert the AI proofs into something human-readable. It will become much more like the way almost any other modern industry works. Like, in journalism, not everyone has the same set of skills. You have editors, you have journalists, and you have businesspeople, and so forth — we’ll have similar things in mathematics eventually.

The full text of the interview is available here.

]]>As is the custom on this site, we present below a crossword puzzle constructed on a theme of mathematics, computing and the digits of pi. Enjoy!

This puzzle has been constructed in conformance with New York Times crossword puzzle standards. In terms of overall difficulty, my mathematician daughter and my genealogist spouse agreed that it would likely rate as a Tuesday (the New York Times grades its puzzles, with Monday puzzles being the easiest and Saturday puzzles being the most challenging).

Continue reading Pi Day 2024 crossword puzzle

]]>As is the custom on this site, we present below a crossword puzzle constructed on a theme of mathematics, computing and the digits of pi. Enjoy!

This puzzle has been constructed in conformance with *New York Times* crossword puzzle standards. In terms of overall difficulty, my mathematician daughter and my genealogist spouse agreed that it would likely rate as a Tuesday (the *New York Times* grades its puzzles, with Monday puzzles being the easiest and Saturday puzzles being the most challenging).

As was the custom with past years’ puzzles here, a pi-themed prize will be rewarded to the first two persons who submit correctly completed puzzles (US only; past recipients are eligible). This year’s choices are: a Digits of pi coffee mug, a Pi Day T-shirt, or a Pi Bracelet.

As of this date (2 Mar 2024), correct solutions have been submitted by Neil Calkin, Michael Coons, Gerard Joseph, Ross Blocher and Morgan Marshall. If you wish to be credited here as a solver of the puzzle, please send the author an email (see DHB site for email).

If you wish to print the puzzle, the PNG file is available HERE.

]]>

Johan Sebastian Bach (1685-1750) regularly garners the top spot in listings of the greatest Western classical composers, typically followed by Mozart and Beethoven. Certainly in terms of sheer volume of compositions, Bach reigns supreme. The Bach-Werke-Verzeichnis (BWV) catalogue lists 1128 compositions, from short solo pieces to the magnificent Mass in B Minor (BWV 232) and Christmas Oratorio (BWV 248), far more than any other classical composer. Further, Bach’s clever, syncopated style led the way to twentieth century musical innovations, notably jazz.

There does seem to be a

Continue reading Using network theory to analyze Bach’s music

]]>Johan Sebastian Bach (1685-1750) regularly garners the top spot in listings of the greatest Western classical composers, typically followed by Mozart and Beethoven. Certainly in terms of sheer volume of compositions, Bach reigns supreme. The Bach-Werke-Verzeichnis (BWV) catalogue lists 1128 compositions, from short solo pieces to the magnificent Mass in B Minor (BWV 232) and Christmas Oratorio (BWV 248), far more than any other classical composer. Further, Bach’s clever, syncopated style led the way to twentieth century musical innovations, notably jazz.

There does seem to be a credible connection between the sort of mental gymnastics done by a mathematician and by a musician. To begin with, there are well-known mathematical relationships between the pitch of various notes on the musical keyboard. But beyond mere analysis of pitches, it is clear that the arena of musical syntax and structure has a very deep connection to the sorts of syntax, structure and other regularities that are studied in mathematical research. Bach and Mozart in particular are well-known for music that is both “mathematically” beautiful and structurally impeccable.

Just as some of the best musicians and composers are “mathematical,” so too many of the best mathematicians are quite musical. It is quite common at evening receptions of large mathematical conferences to be serenaded by concert-quality musical performers, who, in their day jobs, are accomplished mathematicians of some renown. Perhaps the best real-life example of a mathematician-musician was Albert Einstein, who was also an accomplished pianist and violinist. His favorite composers? You guessed it: Bach and Mozart. He later said, “If … I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.”

For additional details on Bach’s “mathematical” style, some interesting speculations on golden ratio patterns in Bach’s music, as well as a listing of number of particularly listenable works, with audio links, see this previously published Math Scholar article: Bach as mathematician.

In February 2024, a team of researchers from the University of Pennsylvania, Yale and Princeton in the U.S. published a study describing their efforts to analyze Bach’s music using network theory and information theory. As the authors explain,

Bach is a natural case study given his prolific career, the wide appreciation his compositions have garnered, and the influence he had over contemporaneous and subsequent composers. His diverse compositions (from chorales to fugues) for a wide range of musicians (from singers to orchestra members) often share a fundamental underlying structure of repeated—and almost mathematical—musical themes and motifs. These features of Bach’s compositions make them particularly interesting to study using a mathematical framework.

The authors included a wide range of Bach compositions in their study, including some preludes and fugues from the *Well-Tempered Clavier* suite, two- and three-part inventions, a selection of Bach’s cantatas, the English suites, the French suites, some chorales, the Brandenburg concertos, and various toccatas and concertos.

Overall, their results have confirmed that Bach’s works have a high information content, and further that different subsets of works have distinct characteristics.

Here is an outline of their methodology: After collecting digitized versions of the above musical selections, they represented each note as a node in a network, with notes from different octaves as distinct nodes. A transition from note A to note B is represented as a directed edge from A to B. Chords are represented with edges between all notes in the first chord to all notes in the second chord. A graphical representation of this process is shown below. To the right of this illustration is the result of this process for four specific Bach compositions: (a) the chorale “Wir glauben all an einen Gott” (BWV 437); (b) Fugue 11 from the *Well-Tempered Clavier (WTC)*, Book I (BWV 856); (c) Prelude 9 from the *WTC*, Book II (BWV 878); and (d) Toccata in G major for harpsichord (BWV 916).

The information exhibited in these graphs was quantified as the Shannon entropy of a random walk on the network. In particular, the contribution of the $i$-th node to the entropy is:

$$S_i \; = \; – \sum_{j=1}^n P_{i,j} \log P_{i,j},$$

where $P_{i,j}$ is the transition probability of going from node $i$ to node $j$. Then the entropy of the entire network is a weighted sum of the $S_i$:

$$S \; = \; \sum_{i=1}^m w_i S_i \; = \; – \sum_{i=1}^m w_i \sum_{j=1}^n P_{i,j} \log P_{i,j},$$

where the weights $w_i$ are the stationary distribution probabilities: $w_i$ is the probability that a walker ends up at node $i$ after infinite time.

Some additional information and complete technical details are in the published paper.

The researchers found that, indeed, compared with some other musical compositions they had studied, the Bach pieces tended to have higher information content. What was even more interesting was that the researchers found significant differences between the different classes of Bach compositions:

The chorales, typically meant to be sung by groups in ecclesiastical settings, are shorter and simpler diatonic pieces that display a markedly lower entropy than the rest of the compositions studied. By contrast, the toccatas, characterized by more complex chromatic sections that span a wider melodic range, have a much higher entropy. It is possible that the chorales’ functions of meditation, adoration, and supplication are best supported by predictability and hence low entropy, whereas the entertainment functions of the toccatas and preludes are best supported by unpredictability and hence high entropy.

Full details of the results are given in the published paper.

The researchers have clearly identified a very interesting and very effective technique to analyze, classify and compare musical compositions. Numerous questions for future research could be asked, some of which were suggested by the authors themselves in their concluding section. Here are a few of these potential research questions, including some due to the present author:

- The authors found that Bach’s music networks had a higher number of transitive triangular clusters, enabling them to be learned more efficiently than arbitrary transition structures. Are pieces with a larger number of these triangles also more appealing to a listener?
- How effective are these techniques in analyzing other classical composers?
- How has the information content of a specific composer changed over time?
- Within a single genre such as classical music, how has the information content of the music changed over time?
- How effective are these techniques in analyzing other genres of music, such as modern jazz, hip-hop and country?
- How do these results compare for various non-Western music genres, such traditional Japanese music (ongaku), Cantonese opera, African tribal music, Tibetan throat singing and Scottish piobaireachd?
- How do human perceptions of music correlate with these measures?
- Do these results offer any insights into the human psychology of musical experience, such as the fundamental question of why humans have evolved to perform and value music?

Clearly an exciting future lies ahead in the realm of fusing mathematical network and information theory with the fine arts.

]]>Reproducibility has emerged as a major issue in numerous fields of scientific research, ranging from psychology, sociology, economics and finance to biomedicine, scientific computing and physics. Many of these difficulties arise from experimenter bias (also known as “selection bias”): consciously or unconsciously excluding, ignoring or adjusting certain data that do not seem to be in agreement with one’s preconceived hypothesis; or devising statistical tests post-hoc, namely AFTER data has already been

Continue reading Overcoming experimenter bias in scientific research

]]>Reproducibility has emerged as a major issue in numerous fields of scientific research, ranging from psychology, sociology, economics and finance to biomedicine, scientific computing and physics. Many of these difficulties arise from experimenter bias (also known as “selection bias”): consciously or unconsciously excluding, ignoring or adjusting certain data that do not seem to be in agreement with one’s preconceived hypothesis; or devising statistical tests post-hoc, namely AFTER data has already been collected and partially analyzed. By some estimates, at least 75% of published scientific papers in some fields contain flaws of this sort that could potentially compromise the conclusions.

Many fields are now undertaking aggressive steps to address such problems. As just one example of many that could be listed, the American pharmaceutical-medical device firm Johnson and Johnson has agreed to make detailed clinical data available to outside researchers, covering past and planned trials, in an attempt to avoid the temptation to only publish results of tests with favorable outcomes.

For further background details, see John Ioannidis’ 2014 article, Sonia Van Gilder Cooke’s 2016 article, Robert Matthews’ 2017 article, Clare Wilson’s 2022 article and Benjamin Mueller’s 2024 article, as well as this previous Math Scholar article, which summarizes several case studies.

Physics, thought by some to be the epitome of objective physical science, has not been immune to experimenter bias. One classic example is the history of measurements of the speed of light. The chart on the right shows numerous published measurements, dating from the late 1800s through 1983, when experimenters finally agreed to the international standard value 299,792,458 meters/second. (Note: Since 1983, the meter has been defined as the distance traveled by light in vacuum in 1/299,792,458 seconds.)

As can be seen in the chart, during the period 1870-1900, the prevailing value of the speed of light was roughly 299,900,000 m/s. But beginning with a measurement made in 1910 through roughly 1950, the prevailing value centered on 299,775,000 m/s. Finally, beginning in the 1960s, a new series of very carefully controlled experiments settled on the current value, 299,792,458 m/s. It is important to note that the currently accepted value was several standard deviations *below* the prevailing 1870-1900 value, and was also several standard deviations *above* the prevailing 1910-1950 value.

Why? Evidently investigators, faced with divergent values, tended to reject or ignore values that did not comport with the then-prevailing consensus figure. Or, as Luke Caldwell explains, “[R]esearchers were probably unconsciously steering their data to fit better with previous values, even though those turned out to be inaccurate. It wasn’t until experimenters had a much better grasp on the true size of their errors that the various measurements converged on what we now think is the correct value.”

In a February 2024 Scientific American article, Luke Caldwell described his team’s work to measure the electron’s electric dipole moment (EDM), in an attempt to better understand why we are made of matter and not antimatter. He described in some detail the special precautions that his team took to avoid experimenter error, and, especially, to avoid consciously or unconsciously filtering the data to reach some preconceived conclusion. It is worth quoting this passage at length:

Another source of systematic error is experimenter bias. All scientists are human beings and, despite our best efforts, can be biased in our thoughts and decisions. This fallibility can potentially affect the results of experiments. In the past it has caused researchers to subconsciously try to match the results of previous experiments. … [Caldwell mentions the speed of light controversy above.]

To avoid this issue, many modern precision-measurement experiments take data “blinded.” In our case, after each run of the experiment, we programmed our computer to add a randomly generated number—the “blind”—to our measurements and store it in an encrypted file. Only after we had gathered all our data, finished our statistical analysis and even mostly written the paper did we have the computer subtract the blind to reveal our true result.

The day of the unveiling was a nerve-racking one. After years of hard work, our team gathered to find out the final result together. I had written a computer program to generate a bingo-style card with 64 plausible numbers, only one of which was the true result. The other numbers varied from “consistent with zero” to “a very significant discovery.” Slowly, all the fake answers disappeared from the screen one by one. It’s a bit weird to have years of your professional life condensed into a single number, and I questioned the wisdom of amping up the stress with the bingo card. But I think it became apparent to all of us how important the blinding technique was; it was hard to know whether to be relieved or disappointed by the vanishing of a particularly large result that would have hinted at new, undiscovered particles and fields but also contradicted the results of previous experiments.

Finally, a single value remained on the screen. Our answer was consistent with zero within our calculated uncertainty. The result was also consistent with previous measurements, building confidence in them as a collective, and it improved on the best precision by a factor of two. So far, it seems, we have no evidence that the electron has an EDM.

Caldwell’s data blinding technique is quite remarkable. If some variation of this methodology were adopted in other fields of research, at least some of the problems with experimenter bias could be avoided.

The field of finance is deeply afflicted by experimenter bias, because of backtest overfitting, namely the usage of historical market data to develop an investment model, strategy or fund, especially where many strategy variations are tried on the same fixed dataset. Note that this is clearly an instance of the post-hoc probability fallacy, which in turn is a form of experimenter bias (more commonly termed “selection bias” in finance). Backtest overfitting has long plagued the field of finance and is now thought to be the leading reason why investments that look great when designed often disappoint when actually fielded to investors. Models, strategies and funds suffering from this type of statistical overfitting typically target the random patterns present in the limited in-sample test-set on which they are based, and thus often perform erratically when presented with new, truly out-of-sample data.

As an illustration, the authors of this AMS Notices article show that if only five years of daily stock market data are available as a backtest, then no more than 45 variations of a strategy should be tried on this data, or the resulting “optimal” strategy will be overfit, in the specific sense that the strategy’s Sharpe Ratio (a standard measure of financial return) is likely to be 1.0 or greater just by chance, even though the true Sharpe Ratio may be zero or negative.

The potential for backtest overfitting in the financial field has grown enormously in recent years with the increased utilization of computer programs to search a space of millions or even billions of parameter variations for a given model, strategy or fund, and then to select only the “optimal” choice for academic journal publication or market implementation. The sobering consequence is that a significant portion of the models, strategies and funds employed in the investment world, including many of those marketed to individual investors, may be merely statistical mirages.

Some commonly used techniques to compensate for backtest overfitting, if not used correctly, are also suspect. One example is the “hold-out method” — developing a model or investment fund based on a backtest of a certain date range, then checking the result with a different date range. However, those using the hold-out method may iteratively tune the parameters for their model until the score on the hold-out data, say measured by a Sharpe ratio, is impressively high. But these repeated tuning tests, using the same fixed hold-out dataset, are themselves tantamount to backtest overfitting.

One dramatic visual example of backtest overfitting is shown in the graph at the right, which displays the mean excess return (compared to benchmarks) of newly minted exchange-traded index-linked funds, both in the months of design prior to submission to U.S. Securities and Exchange Commission for approval, and in the months after the fund was actually fielded. The “knee” in the graph at 0 shows unmistakably the difference between statistically overfit designs and actual field experience.

For additional details, see How backtest overfitting in finance leads to false discoveries, which is condensed from this freely available SSRN article.

The p-test, which was introduced by the British statistician Ronald Fisher in the 1920s, assesses whether the result of an experiment is more extreme than what would one have given the null hypothesis. However, the p-test, especially when used alone, has significant drawbacks, as Fisher himself warned. To begin with, the typically used level of p = 0.05 is not a particularly compelling result. In any event, it is highly questionable to reject a result if its p-value is 0.051, whereas to accept it as significant if its p-value is 0.049.

The prevalence of the classic p = 0.05 value has led to the egregious practice that Uri Simonsohn of the University of Pennsylvania has termed p-hacking: proposing numerous varied hypotheses until a researcher finds one that meets the 0.05 level. Note that this is a classic multiple testing fallacy of statistics, which in turn is a form of experimenter bias: perform enough tests and one is bound to pass any specific level of statistical significance. Such suspicions are justified given the results of a study by Jelte Wilcherts of the University of Amsterdam, who found that researchers whose results were close to the p = 0.05 level of significance were less willing to share their original data than were others that had stronger significance levels (see also this summary from Psychology Today).

Along this line, it is clear that a sole focus on p-values can muddle scientific thinking, confusing significance with size of the effect. For example, a 2013 study of more than 19,000 married persons found that those who had met their spouses online are less likely to divorce (p < 0.002) and more likely to have higher marital satisfaction (p < 0.001) than those who met in other ways. Impressive? Yes, but the divorce rate for online couples was 5.96%, only slightly down from 7.67% for the larger population, and the marital satisfaction score for these couples was 5.64 out of 7, only slightly better than 5.48 for the larger population (see also this Nature article).

Perhaps a more important consideration is that p-values, even if reckoned properly, can easily mislead. Consider the following example, which is taken from a paper by David Colquhoun: Imagine that we wish to screen persons for potential dementia. Let’s assume that 1% of the population has dementia, and that we have a test for dementia that is 95% accurate (i.e., it is accurate with p = 0.05), in the sense that 95% of persons without the condition will be correctly diagnosed, and assume also that the test is 80% accurate for those who do have the condition. Now if we screen 10,000 persons, 100 presumably will have the condition and 9900 will not. Of the 100 who have the condition, 80% or 80 will be detected and 20 will be missed. Of the 9900 who do not, 95% or 9405 will be cleared, but 5% or 495 will be incorrectly tested positive. So out of the original population of 10,000, 575 will test positive, but 495 of these 575, or 86%, are false positives.

Needless to say, a false positive rate of 86% is disastrously high. Yet this is entirely typical of many instances in scientific research where naive usage of p-values leads to surprisingly misleading results. In light of such problems, the American Statistical Association (ASA) has issued a Statement on statistical significance and p-values. The statement concludes:

Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.

Several suggestions for overcoming experimenter bias have been mentioned in the text above, and numerous others are in the linked references above. There is no one silver bullet. What is clear is that researchers from all fields of research need to take experimenter bias and the larger realm of systematic errors more seriously. As Luke Caldwell wrote in the *Scientific American article* mentioned above:

]]>Gathering the data set was the quick part. The real challenge of a precision experiment is the time spent looking for systematic errors — ways we might convince ourselves we had measured an eEDM when in fact we had not. Precision-measurement scientists take this job very seriously; no one wants to declare they have discovered new particles only to later find out they had only precisely measured a tiny flaw in their apparatus or method. We spent about two years hunting for and understanding such flaws.

Continue reading Progress in troubled times: Good news in science, medicine and society

]]>Nonetheless, progress is still being made on many fronts. If anything, 2023 produced a bumper crop of startling new advances, particularly in health, medicine, physics, astronomy, computer science and artificial intelligence. Many of these advances, in turn, are based on the unstoppable forward march of Moore’s Law and computer technology — since 2000, aggregate memory and computer power of comparable devices have increased by factors of approximately 1,000,000. Genome sequencing technology has advanced even faster — sequencing a full human genome cost $3 billion in 2003, but currently costs only about $300, a 10,000,000-fold reduction.

Further, contrary to popular perception, progress has also been made in several metrics of social well-being. For example, after rising worldwide during the Covid-19 pandemic, crime rates fell sharply in most major cities in 2023. And global poverty continues its remarkable decline, with over 100,000 persons escaping dire poverty *per day* (on average) since the 1980s.

Granted, as emphasized above, progress has certainly not been universal, and many dire challenges remain, but we should not lose sight of the progress that has been achieved. Here are just a few of the many items of progress worth celebrating, most of them in the past year (2023). Numerous others are listed HERE, from which part of the items below were gleaned.

- Sickle cell CRISPR ‘cure’ is the start of a revolution in medicine.
- WHO announces drop in malaria infections, deaths after vaccine rollout.
- India witnessed 85.1% decline in malaria cases & 83.36% decline in deaths during 2015-2022.
- Genetically modifying T-cells cuts blood cancer progression by 74%, trial finds.
- Lung cancer pill cuts risk of death by half, says ‘thrilling’ study.
- Huge leap in breast cancer survival rate [English women diagnosed in 1993–99 had a 14.4% risk of dying within 5 years; this fell to 4.9% for women diagnosed in 2010–15].
- Some deaf children in China can hear after gene therapy treatment.
- CRISPR gene editing shown to permanently lower high cholesterol.
- CRISPR 2.0: a new wave of gene editors heads for clinical trials.
- Kenya achieves remarkable 68% decline in AIDS-related fatalities.
- ‘A great day for the country’: Uganda declares an end to Ebola outbreak.
- FDA approves first vaccine for RSV, a moment six decades in the making.
- A new class of drugs for weight loss could end obesity.
- Drug startup aims to cure blindness by developing medications in space.
- Moderna’s mRNA cancer vaccine works even better than thought.
- US life expectancy rebounded in 2022 but not back to pre-pandemic levels.

- DeepMind AI creates algorithms that sort data faster than those built by people.
- A.I. Is Coming for Mathematics, Too.
- New AI translates 5,000-year-old cuneiform tablets instantly.
- The race of the AI labs heats up.
- Nvidia predicts AI models one million times more powerful than ChatGPT within 10 years.
- Teachers and students warm up to ChatGPT.
- AI Reduces Timescale of Early Drug Development to Weeks.
- To Teach Computers Math, Researchers Merge AI Approaches.
- A Very Big Small Leap Forward in Graph Theory.
- Hobbyist Finds Math’s Elusive ‘Einstein’ Tile.

- Amazing images from James Webb telescope, two years after launch.
- The Cosmos Is Thrumming With Gravitational Waves, Astronomers Find.
- JWST Discovers Enormous Distant Galaxies That Should Not Exist.
- Mirror-Image Supernova Yields Surprising Estimate of Cosmic Growth.
- Dark Matter Hunters Need Fresh Answers.
- A New Experiment Casts Doubt on the Leading Theory of the Nucleus.
- The Story of Our Universe May Be Starting to Unravel.
- A Possible Crisis in the Cosmos Could Lead to a New Understanding of the Universe.
- The Early Universe Was Bananas.

- Genome Editing Used to Create Disease Resistant Rice.
- Scientists Unlock Key to Drought-Resistant Wheat Plants with Longer Roots.
- Climate change: Fossil fuel emissions from electricity set to fall.
- Bhutan announces a “milestone achievement” with a 39.5% increase in snow leopard numbers.
- Tiger populations grow in India and Bhutan.

- After Rise in Murders During the Pandemic, a Sharp Decline in 2023.
- The Post-Pandemic Murder Wave Is Cresting.
- Crime falls to lowest level on record, ONS [UK’s Office for National Statistics] says.
- March 2023 global poverty update from the World Bank [global poverty, namely US$2.15 per person per day or less in 2017 dollars, has declined from 38% of world population in 1990 to 8.5% in 2023].
- U.S. Approval of Interracial Marriage at New High of 94% [in 1958, only 4% of Americans approved of marriage between Black and White people; in 2023, 94% approved].
- 6 ways the lives of girls are different today than they were a decade ago [more education, fewer pregnancies, fewer HIV infections, fewer child marriages and more].
- Progress on girls’ access to education [50 million more girls have been enrolled in school globally since 2015].

While out hiking, I found this rock. The following table gives measurements made on the rock. The first two rows give the overall length and width of the rock. Each of the next six rows, after the first two, gives thickness measurements, made on a 3cm x 6cm grid of points from the top surface. All measurements are in millimeters:

Measurement or row Column 1 Column 2 Column 3 Length 105.0 Width 48.21 Row 1 35.44 35.38 36.54 Row 2 38.06 38.27

Continue reading Aliens made this rock: The post-hoc probability fallacy in biology, finance and cosmology

]]>While out hiking, I found this rock. The following table gives measurements made on the rock. The first two rows give the overall length and width of the rock. Each of the next six rows, after the first two, gives thickness measurements, made on a 3cm x 6cm grid of points from the top surface. All measurements are in millimeters:

Measurement or row | Column 1 | Column 2 | Column 3 |

Length | 105.0 | ||

Width | 48.21 | ||

Row 1 | 35.44 | 35.38 | 36.54 |

Row 2 | 38.06 | 38.27 | 38.55 |

Row 3 | 38.02 | 39.53 | 39.29 |

Row 4 | 38.66 | 40.50 | 41.96 |

Row 5 | 39.40 | 43.48 | 43.31 |

Row 6 | 39.58 | 41.83 | 43.07 |

This is a set of 20 measurements, each given to four significant figures, for a total of 80 digits. Among all rocks of roughly this size, the probability of a rock appearing with this particular set of measurements is thus one in 10^{80}. Note that this probability is so remote that even if the surfaces of each of the ten planets estimated to orbit each of 100 billion stars in the Milky Way were examined in detail, and this were repeated for each of the estimated 100 billion galaxies in the visible universe, it is still exceedingly unlikely that a rock with this exact set of measurements would ever be found. Thus this rock could not have appeared naturally, and must have been created by space aliens…

What is the fallacy in the above argument? First of all, modeling each measurement as a random variable of four equiprobable digits, and then assuming all measurements are independent (so that we can blithely multiply probabilities) is a very dubious reckoning. In real rocks, the measurement at one point is constrained by physics and geology to be reasonably close to that of nearby points. Presuming that every instance in the space of 10^{80} theoretical digit strings is equally probable as a set of rock measurements is an unjustified and clearly invalid assumption. Thus the above reckoning must be rejected on this basis alone.

**Key point:** It is important to note that the post-hoc fallacy does not just weaken a probability or statistical argument. *In most cases, the post-hoc fallacy completely nullifies the argument*. The correct reckoning is “What is the probability of X occurring, given that X has been observed to occur,” which of course is unity. In other words, the laws of probability, when correctly applied to a post-hoc phenomenon, can say nothing one way or the other about the likelihood of the event.

In general, probability reckonings based solely or largely on combinatorial enumerations of theoretical possibilities have no credibility when applied to real-world problems. And reckonings where a statistical test or probability calculation is devised or modified *after the fact* are invalid and should be immediately rejected — some outcome had to occur, and the laws of probability by themselves cannot help to determine its likelihood.

One example of the post hoc probability fallacy in biology can be seen in the attempt by some evolution skeptics (see this Math Scholar article for some references) to claim that the human alpha-globin molecule, a component of hemoglobin that performs a key oxygen transfer function in blood, could not have arisen by “random” evolution. These writers argue that since human alpha-globin is a protein chain based on a sequence of 141 amino acids, and since there are 20 different amino acids common in living systems, the “probability” of selecting human alpha-globin at random is one in 20^{141}, or one in approximately 10^{183}. This probability is so tiny, so they argue, that even after millions of years of random molecular trials covering the entire Earth’s surface, no human alpha-globin protein molecule would ever appear.

But this line of reasoning is a dead-ringer for the post-hoc probability fallacy. Note that this probability reckoning was performed *after the fact*, on a single very limited dataset (the human alpha-globin sequence) that has been known in the biology literature for decades. Some sequence had to appear, and the fact that the particular sequence found today in humans was the end result of the course of evolution provides no guidance, by itself, as to the probability of its occurrence.

Further, just like the rock measurements above, the enumeration of combinatorial possibilities in this calculation, devoid of empirical data, has no credibility. Some alpha-globin sequences may be highly likely to be realized while others may be biologically impossible. It may well be that there is nothing special whatsoever about the human alpha-globin sequence, as evidenced, for example, by the great variety in alpha-globin molecules seen across the biological kingdom, all of which perform a similar oxygen transfer function. But there is no way to know for sure, since we have only one example of human evolution. Thus the reckoning used here, namely enumerating combinatorial possibilities and taking the reciprocal to obtain a probability figure, is unjustified and invalid.

In any event, it is clear that the impressive-sounding probability figure claimed by these writers is a vacuous arithmetic exercise, with no foundation in real empirical biology. For a detailed discussion of these issues, see Do probability arguments refute evolution?.

The field of finance is deeply afflicted by the post-hoc fallacy, because of “backtest overfitting,” namely the usage of historical market data to develop an investment model, strategy or fund, where many variations are tried on the same fixed dataset (note that this is unavoidably a post-hoc reckoning). Backtest overfitting has long plagued the field of finance and is now thought to be the leading reason why investments that look great when designed often disappoint when actually fielded to investors. Models, strategies and funds suffering from this type of statistical overfitting typically target the random patterns present in the limited in-sample test-set on which they are based, and thus often perform erratically when presented with new, truly out-of-sample data.

As an illustration, the authors of this AMS Notices article show that if only five years of daily stock market data are available as a backtest, then no more than 45 variations of a strategy should be tried on this data, or the resulting strategy will be overfit, in the specific sense that the strategy’s Sharpe Ratio (a standard measure of financial return) is likely to be 1.0 or greater just by chance, even though the true Sharpe Ratio may be zero or negative.

The potential for backtest overfitting in the financial field has grown enormously in recent years with the increased utilization of computer programs to search a space of millions or even billions of parameter variations for a given model, strategy or fund, and then to select only the “optimal” choice for publication or market implementation. Compounding the problem is the failure of most academic journals in the field to require authors to disclose the extent of their computer search and optimization. The sobering consequence is that a significant portion of the models, strategies and funds employed in the investment world, including many of those marketed to individual investors, may be merely statistical mirages.Some commonly used techniques to compensate for backtest overfitting, if not used correctly, are themselves tantamount to overfitting. One example is the “hold-out method” — developing a model or investment fund based on a backtest of a certain date range, then checking the result with a different date range. However, those using the hold-out method may iteratively tune the parameters for their model until the score on the hold-out data, say measured by a Sharpe ratio, is impressively high. But these repeated tuning tests, using the same fixed hold-out dataset, are themselves tantamount to backtest overfitting (and thus subject to the post-hoc fallacy).

One dramatic visual example of backtest overfitting is shown in the graph at the right, which displays the mean excess return (compared to benchmarks) of newly minted exchange-traded index-linked funds, both in the months of design prior to submission to SEC for approval, and in the months after the fund was fielded. The “knee” in the graph at 0 shows unmistakably the difference between statistically overfit designs and actual field experience.

For additional details, see How backtest overfitting in finance leads to false discoveries, which appeared in the December 2021 issue of the British journal *Significance*. This *Significance* article is condensed from the following manuscript, which is freely available from SSRN: Finance is Not Excused: Why Finance Should Not Flout Basic Principles of Statistics.

Most readers are likely familiar with Fermi’s paradox: There are at least 100 billion stars in the Milky Way, with many if not most now known to host planets. Once a species achieves a level of technology that is at least equal to ours, so that they have become a space-faring civilization, within a few million years (an eyeblink in cosmic time) they or their robotic spacecraft could fully explore the Milky Way. So why do we not see any evidence of their existence, or even of their exploratory spacecraft? For additional details, see this Math Scholar article.

Most “solutions” to Fermi’s paradox, including, sadly, some promulgated by prominent researchers who should know better, are easily defeated. For example, explanations such as “They are under strict orders not to communicate with a new civilization such as Earth,” or “They have lost interest in scientific research, exploration and expansion,” or “They have no interest in a primitive, backward society such as ours,” fall prey to a diversity argument: In any vast, diverse society (an essential prerequisite for advanced technology), there will be exceptions and nonconformists to any rule. Thus claims that “all extraterrestrials are like X” have little credibility, no matter what “X” is. It is deeply ironic that while most scientific researchers and others would strenuously reject stereotypes of religious, ethnic or national groups in human society, many seem willing to hypothesize sweeping, ironclad stereotypes for extraterrestrial societies.

Similarly, dramatic advances in human technology over the past decade, including new space exploration vehicles, new energy sources, state-of-the-art supercomputers, quantum computing, robotics and artificial intelligence, are severely undermining the long-standing assumption that space exploration is fundamentally too difficult.

One of the remaining explanations is that Earth is an exceedingly rare planet (possibly unique in the Milky Way), with a lengthy string of characteristics fostering a long-lived biological regime, which enabled the unlikely rise of intelligent life. John Gribbin, for example, writes “They are not here, because they do not exist. The reasons why we are here form a chain so improbable that the chance of any other technological civilization existing in the Milky Way Galaxy at the present time is vanishingly small.” [Gribbin2011, pg. 205]. Among other things, researchers note that:

- Very few (or possibly none) of the currently known exoplanets are likely to be truly habitable in
*all*key factors (temperate, low radiation, water, dry land, position in galaxy, etc.). Exoplanets around brown dwarfs, for example, would be frequently bathed in sterilizing radiation. Thus Earth’s role as a cradle for life may well have been exceedingly rare. - Even given a habitable environment, the origin of life on Earth is not well understood, in spite of decades of study, so it might well have been an exceedingly improbable event not repeated anywhere else in the Milky Way.
- Earth’s
*continual*habitability over four billion years, facilitated in part by plate tectonics, geomagnetism and the ozone shield, might well be exceedingly rare. - Even after life started, numerous key steps (photosynthesis, complex cells, complex structures) were required before advanced life could appear on Earth; each of these required many millions or even billions of years, suggesting that they may have been highly improbable.
- Even after the rise of complex creatures, the evolution of human-level intelligence may well be a vanishingly rare event; on Earth this happened only once among numerous continental “experiments” over the past 100 million years.

Note that any one of these five items may constitute a “great filter,” i.e. a virtually insuperable obstacle. Compounded together they suggest that the origin and rise of human-level technological life on Earth may well have been an exceedingly singular event in the cosmos, unlikely to be repeated anywhere else in the Milky Way if not beyond. Thus the “rare Earth” explanation of Fermi’s paradox is growing in credibility.

On the other hand, analyses in this arena are hobbled by the post-hoc probability fallacy: We only have one real data point, namely the rise of human technological civilization on a single planet, Earth, and so we have no way to rigorously assess the probability of our existence. It may be that the origin and rise of intelligent life is inevitable on any suitably hospitable planet, and we just need to keep trying to finally contact another similarly advanced species. Or it may be that the rise of a species with enough intelligence and technology to rigorously pose the question of its existence is merely a post-hoc selection effect — if we didn’t live on a hospitable planet that harbored the origin of life and was *continuously* habitable over billions of years, thus allowing the evolution of life that progressed to advanced technology, we would not be here to pose the question.

As Charles Lineweaver of the Australian National University has observed [Lineweaver2024b]:

Our existence on Earth can tell us little about the probability of the evolution of human-like intelligence in the Universe because even if this probability were infinitesimally small and there were only one planet with the kind of intelligence that can ask this question, we the question-askers would, of necessity, find ourselves on that planet.

For additional details, see Where are the extraterrestrials? Fermi’s paradox, diversity and the origin of life.

The dilemma of whether our existence is “special” in some sense extends to the universe itself.

For several decades, researchers in the fields of physics and cosmology have puzzled over deeply perplexing indications, many of them highly mathematical in nature, that the universe seems inexplicably well-tuned to facilitate the evolution of atoms, complex molecular structures and sentient creatures. Some of these “cosmic coincidences” include the following (these and numerous others are presented and discussed in detail in a recent book by Lewis and Barnes):

- If the strong force were slightly stronger or slightly weaker (by just 1% in either direction), then there would be no carbon or any heavier elements anywhere in the universe, and thus no carbon-based life forms to contemplate this intriguing fact.
- Had the weak force been somewhat weaker, the amount of hydrogen in the universe would be greatly decreased, starving stars of fuel for nuclear energy and leaving the universe a cold and lifeless place.
- The neutron’s mass is very slightly more than the combined mass of a proton, an electron and a neutrino. But if its mass were lower by 1%, then all isolated protons would decay into neutrons, and no atoms other than hydrogen, helium, lithium and beryllium could form.
- There is a very slight anisotropy in the cosmic microwave background radiation (roughly one part in 100,000), which is just enough to permit the formation of stars and galaxies. If this anisotropy had been slightly larger or smaller, stable planetary systems could not have formed.
- The cosmological constant paradox derives from the fact that when one calculates, based on known principles of quantum mechanics, the “vacuum energy density” of the universe, one obtains the incredible result that empty space “weighs” 10
^{93}grams per cubic centimeter, which is in error by 120 orders of magnitude from the observed level. Physicists thought that perhaps when the contributions of the other known forces are included, all terms will cancel out to exactly zero as a consequence of some heretofore unknown physical principle. But these hopes were shattered with the 1998 discovery that the expansion of the universe is accelerating, which implies that the cosmological constant must be slightly positive. Curiously, this observation is in accord with a prediction made by physicist Steven Weinberg in 1987, who argued from basic principles that the cosmological constant must be nonzero but within one part in roughly 10^{120}of zero, or else the universe either would have dispersed too fast for stars and galaxies to have formed, or would have recollapsed upon itself long ago. Numerous “solutions” have been proposed for the cosmological constant paradox (Lewis and Barnes mention eight — see pg. 163-164), but they all fail, rather miserably. - A similar coincidence has come to light recently in the wake of the 2012 discovery of the Higgs boson at the Large Hadron Collider. Higgs was found to have a mass of 126 billion electron volts (i.e., 126 Gev). However, a calculation of interactions with other known particles yields a mass of some 10
^{19}Gev. This means that the rest mass of the Higgs boson must be almost exactly the negative of this enormous number, so that when added to 10^{19}gives 126 Gev, as a result of massive and unexplained cancelation. This difficulty is known as the “hierarchy” problem. - General relativity allows the space-time fabric of the universe to be open (extending forever, like an infinite saddle), closed (like the surface of a sphere), or flat. The latest measurements confirm that the universe is flat to within 1%. But looking back to the first few minutes of the universe at the big bang, this means that the universe must have been flat to within one part in 10
^{15}. The cosmic inflation theory was proposed by Alan Guth and others in the 1970s to explain this and some other phenomena, but recently even some of inflation’s most devoted proponents have acknowledged that the theory is in deep trouble and may have to be substantially revised. - The overall entropy (disorder) of the universe is, in the words of Lewis and Barnes, “freakishly lower than life requires.” After all, life requires, at most, one galaxy of highly ordered matter to create chemistry and life on a single planet. Extrapolating back to the big bang only deepens this puzzle.

Numerous attempts at explanations have been proposed over the years to explain these difficulties. One of the more widely accepted hypotheses is the multiverse, combined with the anthropic principle. The theory of inflation, mentioned above, suggests that our universe is merely one pocket that separated from many others in the very early universe. Similarly, string theory suggests that our universe is merely one speck in an enormous landscape of possible universes, by one count 10^{500} in number, each corresponding to a different Calabi-Yau manifold.

Thus, the thinking goes, we should not be surprised that we find ourselves in a universe that has somehow beaten the one-in-10^{120} odds to be life-friendly (to pick just the cosmological constant paradox), because it had to happen somewhere, and, besides, if our universe were not life-friendly, then we would not be here to talk about it. In other words, these researchers propose that the multiverse (or the “cosmic landscape”) actually exists in some sense, but acknowledge that the vast majority of these universes are utterly sterile — either very short-lived or else completely devoid of atoms or other structures, much less sentient living organisms like us contemplating the meaning of their existence. We are just lucky.

But other researchers are very reluctant to adopt such reasoning. After all, we have no evidence of these other universes, and as yet we have no conceivable means of rigorously calculating the “probability” of any possible universe outcome, including ours (this is known as the “measure problem” of cosmology). As with Fermi’s paradox, by definition we have only one universe to observe and analyze, and thus, by the post-hoc probability fallacy, simple-minded attempts to reckon “probabilities” are doomed to failure.

For additional details, see Is the universe fine-tuned for intelligent life?.

A good introduction to the post-hoc probability fallacy for the general reader can be found in social scientist Steven Pinker’s new book Rationality: What It Is, Why It Seems Scarce, Why It Matters. Pinker likens the post-hoc fallacy to the following joke:

]]>A man tries on a custom suit and says to the tailor, “I need this sleeve taken in.” The tailor says, “No, just bend your elbow like this. See, it pulls up the sleeve.” The customer says, “Well, OK, but when I bend my elbow, the collar goes up the back of my neck.” The tailor says, “So? Raise your head up and back. Perfect.” The man says, “But now the left shoulder is three inches lower than the right one!” The tailor says, “No problem. Bend at the waist and then it evens out.” The man leaves the store wearing the suit, his right elbow sticking out, his head craned back, his torso bent to the left, walking with a herky-jerky gait. A pair of pedestrians pass him by. The first says, “Did you see that poor disabled guy? My heart aches for him.” The second says, “Yeah, but his tailor is a genius — the suit fits him perfectly!”

The operation of sorting a dataset is one of the most fundamental of all operations studied in computer science. Literally trillions of sort operations are performed each day worldwide, more if one counts operations where a relatively small set of elements are merged into a larger set.

Many sorting algorithms are in use, including special routines for datasets of a certain size, and other routines optimized for specific hardware platforms and types of data.

One relatively simple algorithm, which is actually quite efficient, is the “quicksort” algorithm. It

Continue reading DeepMind program discovers new sorting algorithms

]]>The operation of sorting a dataset is one of the most fundamental of all operations studied in computer science. Literally trillions of sort operations are performed each day worldwide, more if one counts operations where a relatively small set of elements are merged into a larger set.

Many sorting algorithms are in use, including special routines for datasets of a certain size, and other routines optimized for specific hardware platforms and types of data.

One relatively simple algorithm, which is actually quite efficient, is the “quicksort” algorithm. It may be simply stated as follows: Given a set of elements (integers, real numbers or alphabetical data, for sake of illustration), select some element as the “pivot.” It typically does not matter much which one is selected — some implementations select an element at random. Then partition the set, moving all elements that compare less than or equal to the pivot to the left, and all those that compare greater than the pivot to the right. Now apply this same scheme to each of the two subsets, continuing until all sets have only one element, so that the set is completely ordered. See the illustration of the quicksort algorithm to the right.

Before continuing, it is worth reviewing some of the remarkable successes achieved by the DeepMind organization, a research subsidiary of Alphabet, the parent company of Google, headquartered in London, UK.

The ancient Chinese game of Go is notoriously complicated, with strategies that can only be described in vague, subjective terms. Many observers did not expect Go-playing computer programs to beat the best human players for many years, if ever. Then in May 2017, a computer program named “AlphaGo,” developed by DeepMind, defeated Ke Jie, a 19-year-old Chinese Go master thought to be the world’s best human Go player [NY Times]. Later that same year, a new DeepMind program named “AlphaGo Zero” defeated the earlier Alpha Go program 100 games to zero. After 40 days of training by playing games with itself, AlphaGo Zero was as far ahead of Ke Jie as Ke Jie is ahead of a good amateur player [Quanta Magazine].Proteins are the workhorses of biology. Examples in human biology include actin and myosin, the proteins that enable muscles to work, and hemoglobin, the basis of red blood that carries oxygen to cells. Each protein is specified as a string of amino acids (A, C, T or G), typically several thousand long. Sequencing proteins is now fairly routine, but the key to biology is the three-dimensional shape of the protein — how a protein “folds” [Nature]. Protein shapes can be investigated experimentally, using x-ray crystallography, but this is an expensive, error-prone and time-consuming laboratory operation, so in recent years researchers have been pursuing entirely computer-based solutions.

Given the daunting challenge and importance of the protein folding problem, in 1994 a community of researchers in the field organized a biennial competition known as Critical Assessment of Protein Structure Prediction (CASP) [CASP]. In 2018, a program called AlphaFold, developed by DeepMind, won the competition. For the 2020 CASP competition, the DeepMind team developed a new program, known as AlphaFold 2 [AlphaFold 2], which achieved a 92% average score, far above the 62% achieved by the second-best program in the competition. “It’s a game changer,” exulted German biologist Andrei Lupas. “This will change medicine. It will change research. It will change bioengineering. It will change everything.” [Scientific American].

Building on these earlier successes, in 2022 researchers at DeepMind turned their tools to the analysis of matrix multiplication algorithms. They formulated the problem as a game, called TensorGame, where at each step the player selects how to combine different entries of the matrices to produce a matrix product. The scheme assigns a score for each selection, based on the number of operations required to reach a correct result.

To find optimal solutions to this problem, the researchers applied a “deep reinforcement learning” algorithm. The resulting program, called AlphaTensor, is a specialized neural network system employing a similar underlying design as the above-mentioned Go-playing and protein folding projects. The scheme employs a single agent to decompose matrix multiplication tensors of different sizes, which permits “learned” matrix multiplication techniques to be shared across different “players.” Full details of the scheme are described in a October 2022 Nature article by the DeepMind researchers.

The AlphaTensor program quickly discovered many of the existing matrix multiplication algorithm, including Strassen’s original algorithm and numerous others. But it also discovered several new algorithms not previously known in the literature. For example, for the product of two $4n \times 4n$ matrices, the previously most efficient algorithm was simply to apply Strassen’s algorithm recursively twice. This approach yields the result in $49$ block multiplications of size $n \times n$. AlphaTensor found a scheme that produces the result in $47$ block multiplications (although this result is valid only for matrices with binary elements). The DeepMind paper presented this scheme in full — it required a full page of small type. Similarly, for the product of two $5n \times 5n$ matrices, AlphaTensor found a scheme that produces the result in $96$ block multiplications of size $n \times n$, compared with $98$ in the literature (with the same restriction as mentioned above for $4n \times 4n$ matrices). Several other new results are listed in their paper, including three cases where their algorithms are applicable for standard real or complex matrices as well as for binary matrices.

The DeepMind researchers note that their results are focused on finding algorithms that minimize the total number of element-wise arithmetic operations in a matrix multiplication task. But the same AlphaTensor technology could also be used to optimize for other criteria, such as numerical stability or energy usage. And more generally, their overall methodology could be applied to a broad range of other mathematical problems as well. For additional details, see this earlier Math Scholar article.

DeepMind’s latest achievement is even more remarkable: efficient algorithms for sorting and hashing that are in some cases even more efficient than the best human efforts. In particular, the DeepMind researchers applied their general strategies, used for example in AlphaZero, to sorting, leading to a system they have dubbed AlphaDev.

The DeepMind researchers first applied AlphaDev to very small datasets — only say five elements. The program combined both deliberation and “intuition” to select its moves, as in playing Go or some other board game. At each decision point, the program may select one of four types of actions: comparing values, moving values, or branching to a different part of the program. After each step, it evaluates its performance and receives a “reward” based on how many items in the dataset were sorted correctly. The program continues until either the dataset is completely sorted or the program exceeds some preset length limit.

Once the program mastered very small sets, the same approach was unleashed for larger sets.

The bottom line is that AlphaDev’s best algorithms achieved up to a whopping 71% savings in run time on certain problems, compared with the best human efforts. The researchers noted, however, that for large datasets the savings were more modest, typically only 1-2%.

The researchers were very pleasantly surprised at the outcome: Daniel Mankowitz of DeepMind, who led the research team, said, “We were a bit shocked. … We didn’t believe it at first.” (see Nature article). Emma Bruskill of Stanford University (who was not part of the effort) added, “This is an exciting result.”

The authors mentioned that their algorithms have been integrated into the LLVM standard C++ sort library, and thus are already being used worldwide. For additional details, see the Nature synopsis, or the complete Nature technical article.

As we noted in an earlier Math Scholar article, schemes based on machine learning methods are doing much more than finding new mathematical or computer science results. They are also driving cars, writing code, investing money, creating new digital art, and even writing essays.

One of the most startling recent advances is a language model program called “Generative Pre-Trained Transformer” (GPT-3 is currently in use; the latest version is GPT-4), developed by OpenAI, an organization established by some prominent Silicon Valley technologists. Given a prompt, this program can generate surprisingly realistic text on a wide variety of topics. For example, when given the prompt “Write an essay discussing the role of metafiction in the work of Italo Calvino,” GPT-3 produced a five-paragraph essay that started [NY Times]:

Italian author Italo Calvino is considered a master of metafiction, a genre of writing in which the author breaks the fourth wall to discuss the act of writing itself. For Calvino, metafiction is a way of exploring the nature of reality and the ways in which stories can shape our perceptions of the world. His novels often incorporate playful, labyrinthine structures that play with the boundaries between reality and fiction. In “If on a winter’s night a traveler,” for example, the reader is constantly interrupted by meta-level discussions of the act of reading and the nature of storytelling.

Needless to say, language models such as GPT-3 and GPT-4 have their detractors. They tend to be quite good with some prompts, but produce rather strange responses to others. In a previous Math Scholar article, we asked ChatGPT, which is based on GPT-3, to prove four well-known mathematical theorems:

- Most angles cannot be trisected with ruler and compass.
- $\pi$ is irrational.
- $\pi$ is transcendental.
- Every algebraic equation with integer coefficients has a root in the complex numbers.

While the responses were interesting, in each case they fell far short of a complete, rigorous proof. It is clear that for mathematical work, these systems still need improvement. Research mathematicians are in no danger of being unemployed.

In any event, these developments are raising some interesting questions as to how close we are to real artificial general intelligence [Economist]. As Australian cognitive scientist David Chalmers penned in April 2022, shortly after OpenAI released the earlier CPT-3 software [NY Times],

What fascinates me about GPT-3 is that it suggests a potential mindless path to artificial general intelligence. … It is just analyzing statistics of language. But to do this really well, some capacities of general intelligence are needed, and GPT-3 develops glimmers of them.

A 2011 Time article featured an interview with futurist Ray Kurzweil, who has predicted an era (he estimated by 2045), a “singularity,” when machine intelligence will meet, then transcend human intelligence [Time]. Such future intelligent systems will then design even more powerful technology, resulting in a dizzying advance that we can only dimly foresee at the present time [Kurzweil book].

Futurists such as Kurzweil certainly have their skeptics and detractors. But even setting aside technical questions, there are solid reasons to be concerned about the potential societal, legal, financial and ethical challenges of machine intelligence, as exhibited by the current backlash against science, technology and “elites.” As Kevin Roose writes in the New York Times, “We’re in a golden age of progress in artificial intelligence. It’s time to start taking its potential and risks seriously.”

Along this line, in March 2023 more than 1000 tech leaders and researchers signed an open letter urging a moratorium on the development of state-of-the-art AI systems, citing “profound risks to society and humanity.” Others have even claimed that the singularity is already here, or in other words that the genie is already out of the bottle.

However these disputes are settled, it is clear that in our headlong rush to explore technologies such as machine learning, artificial intelligence and robotics, we must find a way to humanely deal with those whose lives and livelihoods will be affected by these technologies. The very fabric of society may hang in the balance.

]]>Writers from the discipline known variously as “postmodern science studies” or “sociology of scientific knowledge” are often cited in discussions of science, philosophy and religion. Some of these writers, notably Karl Popper and Thomas Kuhn, have had significant impact on the field of scientific research.

Issues such as ensuring proper credit for the scientific contributions of non-Western societies, such as the ancient mathematics of China, India and the Middle East, as well as dealing with the chronic under-representation of women, racial minorities and indigenous

Continue reading Is modern science socially constructed and forever tentative?

]]>Updated 7 April 2024 (c) 2024

Writers from the discipline known variously as “postmodern science studies” or “sociology of scientific knowledge” are often cited in discussions of science, philosophy and religion. Some of these writers, notably Karl Popper and Thomas Kuhn, have had significant impact on the field of scientific research.

Issues such as ensuring proper credit for the scientific contributions of non-Western societies, such as the ancient mathematics of China, India and the Middle East, as well as dealing with the chronic under-representation of women, racial minorities and indigenous people in scientific research, are certainly worth additional attention.

However, other sectors of the “postmodern” literature are very problematic, as we will see below.

Karl Popper, a British economist and philosopher, was struck by the differences in approach that he perceived at the time between the writings of some popular Freudians and Marxists, who saw “verifications” of their theories in every news report and clinical visit, and the writings of Albert Einstein, who for instance acknowledged that if the predicted red shift of spectral lines due to gravitation were not observed, then his general theory of relativity would be untenable. Popper was convinced that *falsifiability* was the key distinguishing factor, a view he presented in his oft-cited book *The Logic of Scientific Discovery* [Popper1959, pg. 40-41]:

I shall certainly admit a system as empirical or scientific only if it is capable of being tested by experience. These considerations suggest that not the verifiability but the falsifiability of a system is to be taken as the criterion of demarcation. … It must be possible for an empirical scientific system to be refuted by experience.

Popper’s ideas remain highly influential in scientific research even to the present day. As a single example, several prominent researchers have recently expressed concern about whether it is prudent to continue pursuing string theory, currently a leading candidate for a “theory of everything” in physics, given that string theorists have not yet been able to derive empirically testable consequences even after 25 years of effort. Physicist Lee Smolin, for example, writes, “A scientific theory that makes no predictions and therefore is not subject to experiment can never fail, but such a theory can never succeed either, as long as science stands for knowledge gained from rational argument borne out by evidence.” [Smolin2006, pg. 352].

However, Popper’s ideas do have some limitations, some of which were pointed out by Popper himself. To begin with, in most real modern-day scientific research, major theories are seldom falsified by a single experimental result. There are always questions regarding the underlying experimental design, measurement procedures, and data analysis techniques, not to mention statistical uncertainties. Often multiple follow-on studies, in some cases extending over many years, are necessary to conclusively decide the hypothesis one way or the other. For example, 13 years elapsed between 1998, when two teams of researchers discovered that the expansion of the universe is accelerating, and 2011, when the lead scientists of the two teams were awarded the Nobel Prize in physics.

For that matter, if we were to strictly apply Popper’s principle, Copernicus’ heliocentric theory was falsified from the start and should not have been further considered, because it could not predict planetary motions quite as accurately as the traditional Ptolemaic system. It only gained acceptance when Kepler modified the theory to include elliptical orbits with time-varying speeds, and when Newton showed that this behavior could be mathematically derived, using calculus, from his laws of motion. In a similar way, Newton’s theory was arguably falsified in the mid-19th century, when certain anomalies were noted in the orbit of Mercury. But it would have been irresponsible to discard Newtonian mechanics at that time, because of its overwhelming success in accurately explaining a vast array of phenomena.

In this sense, scientists are more like detectives, in that they must follow leads and hunches, examine evidence, and tentatively proceed with the most likely scenario. Seldom, if ever, are scientific results black-and-white from day one.

It must also be kept in mind that in most cases, “falsified” theories continue to be extremely accurate models of reality within appropriate domains. For example, even today, over 100 years after Newton’s mechanics and Maxwell’s electromagnetics were “falsified” and supplanted by new theories of physics (relativity and quantum mechanics, respectively), they remain the basis of almost all practical engineering and scientific computations, giving results virtually indistinguishable from those of more modern theories. Relativity corrections are employed in the GPS system, which is used by many automobiles and smartphones, and quantum mechanical calculations are employed in semiconductor design, materials science and computational chemistry. But in most other arenas of the modern world, the classical theories of physics are entirely satisfactory.

Thomas Kuhn’s work *The Structure of Scientific Revolutions* analyzed numerous historical cases of scientific advancements, and then concluded that in many cases, key paradigm shifts did not come easily [Kuhn1970]. Kuhn was actually trained as a scientist, receiving his Ph.D. in physics from Harvard in 1949. Thus he was able to bring significant scientific insight into his analyses of historical scientific revolutions.

One difficulty with Kuhn’s writings is that there are really two Kuhns: a moderate Kuhn and an immoderate Kuhn [Sokal1988a, pg. 75]. Unfortunately, many modern scholars like to quote only the immoderate Kuhn, such as when he denies that paradigm shifts carry scientists closer to fundamental truth [Kuhn1970, pg. 170], or when he argues that paradigm shifts often occur due to non-experimental factors [Kuhn1970, pg. 135].

Another difficulty is that Kuhn’s “paradigm shift” model has not worked as well in recent years as it did in the historical examples he cited. For example, the “standard model” of physics, the currently reigning fundamental theory of elementary particles and forces, was developed in the 1960s and early 1970s, and was completed in essentially its current form in 1974. Yet by 1980 it had completely displaced previous theories of particle physics, after a very orderly transition — even initial skeptics quickly recognized the new theory’s power, elegance and precision, and soon threw their support behind it [Tipler1994, pg. 88-89].

Kuhn’s writings, much as Popper’s writings before him, have been badly misused by a host of eager but ill-informed writers and scholars who think that they can smash the reigning orthodoxy of modern science. In a recently published interview of Kuhn by *Scientific American* writer John Horgan, Kuhn was deeply upset that he has become a patron saint to this type of would-be scientific revolutionaries: “I get a lot of letters saying, ‘I’ve just read your book, and it’s transformed my life. I’m trying to start a revolution. Please help me,’ and accompanied by a book-length manuscript.” Kuhn emphasized that in spite of the often iconoclastic way his writings have been interpreted, he remained “pro-science,” noting that science has produced “the greatest and most original bursts of creativity” of any human enterprise [Horgan2012].

Some writers have pointed out that even in the purest of scientific disciplines, namely mathematics and computer science, upon which all other scientific research is based, researchers have identified weaknesses and uncertainties. In particular, it has been known since Godel’s groundbreaking 1931 paper [Hawking2005, pg. 1089-1118] that the fundamental axioms of mathematics are incomplete (no consistent system of axioms can encompass all mathematical truths) and unprovable (no system of axioms can demonstrate its own consistency); and it has been known since Alan Turing’s 1937 paper that no computer program can be devised that can infallibly provide a yes/no answer to all questions that can be posed computationally [Hawking2005, pg. 1119-1160]. See these two references for precise statements of these results and full technical details.

The results of Godel and Turing are very specific and are often misinterpreted to justify a large-scale dismissal of modern mathematics and computer science. Note that Godel’s result is only that the consistency of the axioms of mathematics cannot be formally proven; it does not establish that they are inconsistent, or that any specific mathematical result is flawed. A more significant concern, which any research mathematician or computer scientist would readily acknowledge, is that mathematical theorems and computer programs are human constructions, and thus susceptible to errors of human reasoning. Indeed, some mathematical results have subsequently been shown to be flawed, and bugs in computer programs are sadly an everyday annoyance. But mathematical results can be and (have been, in numerous cases) checked by computer using very exacting tests; and computer programs can be independently written and compared on different systems. See also the next section.

Even after properly acknowledging the tentative, falsifiable nature of science as taught by Popper and Kuhn, it is clear that modern science has produced a sizable body of broad-reaching theoretical structures that describe the universe and life on earth ever more accurately with each passing year. As a single example of thousands that could be mentioned here, the numerical value of the magnetic moment of the electron, calculated from the present-day standard model (in particular, from the theory of quantum electrodynamics) on one hand, and calculated from best available experimental measurements on the other hand, are [Cliff2024, pg. 98]:

Theoretical: 2.00231930436321

Experimental: 2.00231930436118

Is this astonishing level of agreement — to roughly one part in one trillion, comfortably within the level of experimental uncertainty — just a coincidence? Numerous other instances of scientific progress are presented in Progress-science.

Along this line, recently a researcher computed the mathematical constant pi to over 100 trillion decimal place accuracy [Ranous2024]. Large computations of this type are subject to a myriad of possible difficulties: the underlying mathematical theory might have flaws; the principles behind the algorithms implementing the formulas might not be sound; the computer programs (typically many thousands of lines) implementing these algorithms might have bugs; the system software might have glitches; and the system hardware might miscompute or suffer memory errors. In a very fragile computation of this sort, any one of these problems would almost certainly produce a completely erroneous result.

One major step in this computation was to compute pi in hexadecimal or base-16 digits (i.e., with the digits 01234567890abcdef instead of 0123456789) to over 83 trillion digits. Using a computer program based on one known mathematical formula for pi, running on a computer for several months, the researcher found that the base-16 digits of pi starting at position 83,048,202,372,150 were:

4757d05f3f 35d1b41de3 7d8b3b2289 a4c8a3eb18 262cc3818d

Then he employed another computer program, based on a completely different mathematical formula, derived from a completely different line of mathematical reasoning, to compute digits beginning at the same position. The result was:

4757d05f3f 35d1b41de3 7d8b3b2289 a4c8a3eb18 262cc3818d

Again, is this perfect agreement just a coincidence?

More recent writings in the postmodern science studies field have greatly extended the scope and sharpness of these critiques, declaring that much of modern science, like literary and historical analysis, is “socially constructed,” dependent on the social environment and power structures of the researchers, with no claim whatsoever to fundamental truth [Koertge1998, pg. 258; Madsen1990, pg. 471; Sokal1998, pg. 234]. Collins and Pinch, for instance, after examining a handful of case studies, assert that “scientists at the research front cannot settle their disagreements through better experimentation, more knowledge, more advanced theories, or clearer thinking” [Collins1993, pg. 143-145; Koertge1998, pg. 258]. Sandra Harding went so far as to describe Newton’s *Principia* as a “rape manual” [Harding1986, pg. 113].

Here are some other examples of this same thinking:

- “The validity of theoretical propositions in the sciences is in no way affected by the factual evidence.” [Gergen1988, pg. 258; Sokal2008, pg. 230].
- “The natural world has a small or non-existent role in the construction of scientific knowledge.” [Collins1981; Sokal2008, pg. 230].
- “Since the settlement of a controversy is the
*cause*of Nature’s representation, not the consequence, we can never use the outcome — Nature — to explain how and why a controversy has been settled.” [Latour1987, pg. 99; Sokal2008, pg. 230]. - “For the relativist [such as ourselves] there is no sense attached to the idea that some standards or beliefs are really rational as distinct from merely locally accepted as such.” [Barnes1981, pg. 27; Sokal2008, pg. 230].
- “Science legitimates itself by linking its discoveries with power, a connection which
*determines*(not merely influences) what counts as reliable knowledge.” [Aronowitz1988, pg. 204; Sokal2008, pg. 230].

Scientists counter that these scholars have distorted a few historical controversies, and then have parlayed these isolated claims to a global condemnation of the scientific enterprise [Boghossian2006; Brown2009; Gross1998; Gross1996; Koertge1998; Sokal2008]. In other words, these writers are guilty of the “forest fallacy”: pointing out flaws in the bark of a few trees, then trying to claim that the forest doesn’t exist. See Progress-science for additional discussion.

More importantly, observers of the postmodern science literature have also noted: (a) serious confusion on various concepts of science; (b) an emphasis on politically correct conclusions over sound scholarship; (c) engaging in lengthy discussions of mathematical or scientific principles about which the author has only a hazy familiarity; (d) applying highly sophisticated concepts from mathematics or physics into the humanities or social sciences, without justification; (e) displaying superficial erudition by peppering the text with sophisticated technical terms or mathematical formulas; and (f) employing lengthy technical passages that are essentially meaningless [Sokal1998, pg 4-5].

In a curious turn of events, these postmodern science writings, by attempting to undermine scientists’ claim to objective truth, have provided arguments and talking points for the creationism, intelligent design, anti-vaccination and climate change denial movements [Otto2016a]. The far left has met the far right!

The tension between the scientific and postmodernist communities came to a head in 1996, when Alan Sokal, a physicist at New York University, wrote a parody of a postmodern science article, entitled “Transgressing the Boundaries: Toward a Transformative Hermeneutics of Quantum Gravity,” and submitted it to *Social Text*, a prominent journal in the postmodern studies field [Sokal1996a]. The article was filled with page after page of erudite-sounding nonsense, political rhetoric, irrelevant references to arcane scientific concepts, and approving quotations from leading postmodern science scholars. Here are three excerpts:

Rather, [scientists] cling to the dogma imposed by the long post-Enlightenment hegemony over the Western intellectual outlook, which can be summarized briefly as follows: that there exists an external world, whose properties are independent of any individual human being and indeed of humanity as a whole; that these properties are encoded in “eternal” physical laws; and that human beings can obtain reliable, albeit imperfect and tentative, knowledge of these laws by hewing to the “objective” procedures and epistemological strictures prescribed by the (so-called) scientific method. [Sokal1996a, pg. 217; Sokal2008, pg. 7].

In this way the infinite-dimensional invariance group erodes the distinction between the observer and observed; the pi of Euclid and the G of Newton, formerly thought to be constant and universal, are now perceived in their ineluctable historicity; and the putative observer becomes fatally de-centered, disconnected from any epistemic link to a space-time point that can no longer be defined by geometry alone. [Sokal1996a, pg. 222; Sokal2008, pg. 27].

For, as Bohr noted, “a complete elucidation of one and the same object may require diverse points of view which defy a unique description” — this is quite simply a fact about the world, much as the self-proclaimed empiricists of modernist science might prefer to deny it. In such a situation, how can a self-perpetuating secular priesthood of credentialed “scientists” purport to maintain a monopoly on the production of scientific knowledge? [Sokal1996a, pg. 229; Sokal2008, pg. 53].

With regards to the first passage, note that it derides the most basic notions of scientific reality and common sense. With regards to the second passage, the fundamental constants Pi and G certainly do not have varying values. With regards to the third passage, quantum mechanics, whose effects are significant only at the atomic level, has absolutely nothing to say about the relative validity of cultural points of view.

In spite of its severe flaws, the article was not only accepted for the journal, but it appeared in a special issue devoted to defending the legitimacy of the postmodern science studies field against its detractors. As Sokal later noted, “I intentionally wrote the article so that any competent physicist or mathematician (or undergraduate physics or math major) would realize that it is a spoof.” [Sokal1996b, pg. 50]. He resorted to the hoax out of a deeply felt concern that the postmodern science world has taken a complete about-face from its roots in the Enlightenment, which identified with science and rationalism and rejected obscurantism. “Theorizing about ‘the social construction of reality’ won’t help us find an effective treatment for AIDS or devise strategies for preventing global warming. Nor can we combat false ideas in history, sociology, economics, and politics if we reject the notions of truth and falsity.” [Lingua2000, pg. 52].

In the same issue as Sokal’s piece, a prominent postmodern writer (in a serious article) asserted:

Most theoretical physicists, for example, sincerely believe that however partial our collective knowledge may be, … one day scientists shall find the necessary correlation between wave and particle; the unified field theory of matter and energy will transcend Heisenberg’s uncertainly principle. [Aronowitz1996, pg. 181].

Einstein’s relativity theory was subjected to official skepticism twenty years after the publication of his Special Theory article in 1905; and equally passionate partisans of wave and matrix mechanics explanations for the behavior of electrons were unable to reach agreement for decades. [Aronowitz1996, pg. 195].

In the first passage, the author is seriously mistaken about wave-particle duality: this is inherent in quantum physics and cannot be removed by a “unified field theory.” In the second passage, even his history is in error: the matrix and wave mechanics formulations of quantum mechanics were resolved within weeks [Gottfried2000]. Also appearing in the same issue with Sokal’s article was the following, written by the chief editor of *Social Text*:

Once it is acknowledged that the West does not have a monopoly on all the good scientific ideas in the world, or that reason, divorced from value, is not everywhere and always a productive human principle, then we should expect to see some self-modification of the universalist claims maintained on behalf of empirical rationality. Only then can we begin to talk about different ways of doing science, ways that downgrade methodology, experiment, and manufacturing in favor of local environments, cultural values, and principles of social justice. [Ross1996, pg. 3-4].

It is easy to imagine the potentially serious consequences if this extreme cultural relativism were widely adopted in modern science. As a single example, a few years ago the Mexican government encouraged potters, for their own safety, to use lead-free glazes, but the local potters were convinced that the lead issue was only a foreign conspiracy. Unfortunately, as Michael Sullivan has noted, “lead does not care who believes what.” [Sullivan1996].

In other postmodern science writing, researchers have attempted to apply arcane scientific and mathematical concepts into the social sciences and the humanities, often with disastrous results. For example a leading French postmodern scholar wrote:

This diagram [the Mobius strip] can be considered the basis of a sort of essential inscription at the origin, in the knot which constitutes the [human] subject. … You can perhaps see that the sphere, that old symbol for totality, is unsuitable. A torus, a Klein bottle, a cross-cut surface, are able to receive such a cut. And this diversity is very important as it explains many things about the structure of mental disease. If one can symbolize the subject by this fundamental cut, in the same way one can show that a cut on a torus corresponds to the neurotic subject, and on a cross-cut surface to another sort of mental disease. [Lacan1970, pg. 192-196; Sokal1998, pg. 19-20].

With regards to this passage, “Mobius strips,” “toruses,” “Klein bottles” and “cross-cut surfaces” are terms from mathematical topology, the theory of continuous functions and continuously deformed surfaces. There is absolutely no connection between this arcane mathematical theory and psychology. Yet this author pressed this absurd connection between psychology and topology further in several other writings, hopelessly misusing sophisticated mathematical concepts such as compactness, open sets, limit points, subcoverings and countable sets [Lacan1998, pg. 9-10; Sokal1998, pg. 21-22].

Numerous examples of gratuitous and often meaningless scientific jargon can be cited in postmodern literature. Here is one example. The reader need not feel bad that he/she does not understand this text. It is complete nonsense, yet it survived peer review in the postmodern science field:

We can clearly see that there is no bi-univocal correspondence between linear signifying links archi-writing, depending on the author, and this multireferential, multidimensional machinic catalysis. The symmetry of scale, the transversality, the pathic non-discursive character of their expansion: all these dimensions re-move us from the logic of the excluded middle and reinforce us in our dismissal of the ontological binarism we criticised previously. A machinic assemblage, through its diverse components, extracts its consistency by crossing ontological thresholds, non-linear thresholds of irreversibility, ontological and phylogenetic thresholds, creative thresholds of heterogenesis and autopoiesis. The notion of scale needs to be expanded to consider fractal symmetries in ontological terms. [Guattari1995, pg. 50; Sokal1998, pg. 166].

As mentioned above, the scientific world has long recognized the need to properly acknowledge the heretofore downplayed scientific contributions of non-Western societies. One example of many that could be cited is recent recognition of key mathematical contributions in ancient China, India and the Middle East. India in particular is now recognized as the birthplace of our modern system of positional decimal arithmetic with zero, by unknown scholar(s) at least by 200 CE — see, for example, [Bailey2012]. An even more pressing current issue is to understand and correct the chronic under-representation of women, racial minorities and indigenous people in the scientific and engineering fields.

However, many researchers are concerned that in a headlong rush to promote these social justice issues, that solid scientific merit is being compromised. For example, in 2023 a group of scientists published “In defense of merit in science” [Abbot2023], where they argued:

For science to succeed, it must strive for the non-ideological pursuit of objective truth. Scientists should feel free to pursue political projects in the public sphere as private citizens, but not to inject their personal politics and biases into the scientific endeavor. Maintaining institutional neutrality is also essential for cultivating public trust in science. … Although no system is guaranteed to eliminate all biases, merit-based systems are the best tool to mitigate it. Moreover, they promote social cohesion because they can be observed to maximize fairness.

Clearly, this debate will continue, but a broad range of researchers agree that in the final analysis, the scientific method of objectively evaluating real empirical evidence against proposed theories is the best path forward to uncover truth.

In summary, the works of Kuhn and Popper have provided valuable insights into the process of scientific research. In particular, their observations on falsifiability and paradigm shifts have been largely incorporated into the fabric of modern science, although their writings are often misconstrued. Issues such as ensuring proper credit for the scientific contributions of non-Western societies (such as the ancient mathematics of China, India and the Middle East), as well as responding to the chronic under-representation of women, racial minorities and indigenous people in scientific research, are certainly important and worth discussing. But beyond such considerations, what is generally termed the “postmodern science studies” literature has not been very useful in advancing real scientific research, to say the least. And with the 1996 Sokal hoax, these writers lost have considerable credibility in the eyes of the scientific research community.

One criticism that applies rather broadly to present-day literature of this type is that these scholars work almost entirely from outside the realm of real scientific research. Unlike predecessors such as Kuhn and Popper, who were qualified professional scientists, most of the present-day postmodern science studies writers do not have significant scientific training and/or credentials, do not address state-of-the-art scientific theories or methods in technical depth, and do not participate with scientific research teams in performing real research. Their approach is best exemplified by a comment made by Andrew Ross, editor of *Social Text* during the Sokal hoax episode, in the introduction to one of his published works: “This book is dedicated to all of the science teachers I never had. It could only have been written without them.” [Ross1991].

But this is a point upon which virtually all practicing research scientists will sharply disagree. Indeed, state-of-the-art scientific research is all about the details: underlying physical theories; mathematical derivations; testable hypotheses; state-of-the-art equipment; careful experimental design and data collection; rigorous data analysis and statistical methodology; carefully programmed computer simulations; advanced numerical methods; and, of course, cautiously inferred conclusions. Indeed, the details of the methods underlying a study are often as significant as the conclusions.

Thus, to the extent that the postmodern science studies community studiously avoids delving into the full technical details of leading-edge scientific research, these writers cannot possibly hope to have tangible impact in the scientific enterprise. And, needless to say, when leading figures in this community openly express their contempt and disdain for scientific work, they are not building bridges that will lead to productive collaborations with real scientists in the future.

According to an ancient account, when Pharaoh Ptolemy I of Egypt grew frustrated at the degree of effort required to master geometry, he asked Euclid whether there was some easier path. Euclid is said to have replied, “There is no royal road to geometry.” [Durant1975, vol. 2, pg. 501]. Indeed. And there is no royal road to modern science either.

Canadian-American physicist Lawrence Krauss summed up his view of these issues in the following terms [Krauss2012a]:

As both a general reader and as someone who is interested in ideas and culture, I have great respect for and have learned a great deal from a number of individuals who currently classify themselves as philosophers. … What I find common and so stimulating about the philosophical efforts of these intellectual colleagues is the way they thoughtfully reflect on human knowledge, amassed from empirical explorations in areas ranging from science to history, to clarify issues that are relevant to making decisions about how to function more effectively and happily as an individual, and as a member of a society.

As a practicing physicist however, the situation is somewhat different. There, I, and most of the colleagues with whom I have discussed this matter, have found that philosophical speculations about physics and the nature of science are not particularly useful, and have had little or no impact upon progress in my field. Even in several areas associated with what one can rightfully call the philosophy of science I have found the reflections of physicists to be more useful. For example, on the nature of science and the scientific method, I have found the insights offered by scientists who have chosen to write concretely about their experience and reflections, from Jacob Bronowski, to Richard Feynman, to Francis Crick, to Werner Heisenberg, Albert Einstein, and Sir James Jeans, to have provided me with a better practical guide than the work of even the most significant philosophical writers of whom I am aware, such as Karl Popper and Thomas Kuhn.

In spite of these difficulties, some scientists and philosophers look forward to a more respectful dialogue between the two disciplines in the future. As physicist Carlos Rovelli recently wrote [Rovelli2012]:

]]>I think there is narrow-mindedness, if I might say so, in many of my colleague scientists that don’t want to learn what is being said in the philosophy of science. There is also a narrow-mindedness in a lot of probably areas of philosophy and the humanities in which they don’t want to learn about science, which is even more narrow-minded. Somehow cultures reach, enlarge. I’m throwing down an open door if I say it here, but restricting our vision of reality today on just the core content of science or the core content of humanities is just being blind to the complexity of reality that we can grasp from a number of points of view, which talk to one another enormously, and which I believe can teach one another enormously.

The Notices of the American Mathematical Society has just published a memorial tribute, written by the present author, that summarizes Peter’s life and career. Here are a few highlights:

Peter Borwein is perhaps best known for discovering (often but not always with his brother Jonathan) new

Continue reading Peter Borwein: A visionary mathematician

]]>Peter Borwein, former professor of mathematics at Simon Fraser University and director of the university’s Centre for Interdisciplinary Research in the Mathematical and Computational Sciences (IRMACS), died on August 23, 2020, at the age of 67, of pneumonia, after courageously battling multiple sclerosis for over 20 years.

The *Notices of the American Mathematical Society* has just published a memorial tribute, written by the present author, that summarizes Peter’s life and career. Here are a few highlights:

Peter Borwein is perhaps best known for discovering (often but not always with his brother Jonathan) new formulas and algorithms for $\pi$ and other mathematical constants. One of these algorithms is the following: Set $a_0 = 6 – 4 \sqrt{2}$ and $y_0 = \sqrt{2} – 1$. Then iterate, for $k \ge 0$,

\begin{align}

y_{k+1} &= \frac{1 – (1 – y_k^4)^{1/4}}{1 + (1 – y_k^4)^{1/4}}, \nonumber \\

a_{k+1} &= a_k (1 + y_{k+1})^4 – 2^{2k+3} y_{k+1} (1 + y_{k+1} + y_{k+1}^2). \label{form:q4}

\end{align}

Then $1/a_k$ converges *quartically* to $\pi$: each iteration approximately *quadruples* the number of correct digits (provided that each iteration is performed with at least the numeric precision required for the final result). This algorithm, together with a quadratically convergent algorithm independently discovered by Brent and Salamin, has been employed in several large computations of $\pi$.

In 1995, Peter posed the question to some students and post-docs of whether there was any economical way to calculate digits in some base of a mathematical constant such as $\pi$, beginning at a given digit position, without needing to calculate the preceding digits. Peter and Simon Plouffe subsequently found the following surprisingly simple scheme for binary digits of $\log 2$, based on the formula $\log 2 = \sum_{k \ge 1} 1/(k 2^k)$, due to Euler. First note that binary digits of $\log 2$ starting at position $d + 1$ can be written ${\rm frac} \, (2^d \log 2)$, where ${\rm frac} \, (x) = x – \{x\}$ is the fractional part. Then

\begin{align}

{\rm frac} \, (2^d \log 2) &= {\rm frac} \, \left(\sum_{k=1}^\infty \frac{2^d}{k 2^k} \right)

= {\rm frac} \left( \sum_{k=1}^d \frac{2^{d-k}}{k} + \sum_{k=d+1}^\infty \frac{2^{d-k}}{k} \right) \nonumber \\

&= {\rm frac} \left(\sum_{k=1}^d \frac{2^{d-k} \bmod k}{k} \right) + {\rm frac} \, \left(\sum_{k=d+1}^\infty \frac{2^{d-k}}{k} \right), \label{form:bor4}

\end{align}

where $\bmod \, k$ has been added to the numerator of the first term, since we are only interested in the fractional part after division by $k$. The key point here is that the numerator expression, namely $2^{d-k} \bmod k$, can be computed very rapidly by the binary algorithm for exponentiation mod $k$, without any need for extra-high numeric precision, even when the position $d$ is very large, say one billion or one trillion. The second sum can be evaluated as written, again using standard double-precision or quad-precision floating-point arithmetic. The final result, expressed as a binary floating-point value, gives a string of binary digits of $\log 2$ beginning at position $d+1$.

In the wake of this observation, Peter and others searched the literature for a formula for $\pi$, analogous to Euler’s formula for $\log 2$, but none was known at the time. Finally, a computer search conducted by Simon Plouffe numerically discovered this formula, now known as the BBP formula for $\pi$:

\begin{align}

\pi = \sum_{k=0}^\infty \frac{1}{16^k}\left(\frac{4}{8k+1} – \frac{2}{8k+4} – \frac{1}{8k+5} – \frac{1}{8k+6} \right). \label{form:bbp}

\end{align}

Indeed, this formula permits one to efficiently calculate a string of base-16 digits (and hence base-2 digits) of $\pi$, beginning at an arbitrary starting point, by means of a relatively simple algorithm similar to that described above for $\log 2$. Nicholas Sze used a variation of this scheme to calculate binary digits of $\pi$ starting at position two quadrillion.

No account of Peter Borwein’s career would be complete without mentioning the remarkable grace with which he faced his condition of multiple sclerosis. Initially diagnosed prior to the year 2000, the disease eventually left him confined to a wheelchair, increasingly dependent on family and caregivers, and, sadly, increasingly unable to pursue research or to effectively collaborate with colleagues. The present author recalls visiting Peter in January 2019 at his home in Burnaby, British Columbia. In spite of his paralysis and infirmity, Peter’s pleasant demeanor and humor were on full display. Would that we could all bear our misfortunes with such equanimity!

Full details of the above and other highlights of Peter Borwein’s remarkable life are in the AMS Notices article.

]]>This year’s puzzle implements a new design for a mathematical crossword, which to the present author’s knowledge has never before been employed. See, for example, clue 40 Across below. In all respects, though, the puzzle conforms to the standards of New York Times crosswords. In terms of overall difficulty (Monday = easiest; Saturday = most difficult), this puzzle most likely would

Continue reading PiDay 2023 crossword puzzle

]]>This year’s puzzle implements a new design for a mathematical crossword, which to the present author’s knowledge has never before been employed. See, for example, clue 40 Across below. In all respects, though, the puzzle conforms to the standards of New York Times crosswords. In terms of overall difficulty (Monday = easiest; Saturday = most difficult), this puzzle most likely would be given a “Tuesday” rating.

As is the usual custom, a pi-related gift will be sent to the first two solvers (U.S. only; previous year winners are eligible), chosen from this selection: Gift A, Gift B, or Gift C. Additional solvers will also be recognized here (send email if you would like to be listed). The solution must be correct (ok, one or two minor errors will be forgiven).

[Added 16 Mar 2023:] Neil Calkin, Eliza Gallagher, Summer Barron, Chris Lohmann, Ross Blocher, Gerard Joseph, Michelle Sidwell, Morgan Marshall and Jeanmarie Gardner have reported solutions.

A full-sized copy of the puzzle, suitable for printing is HERE.

Here is the puzzle: