1  The Lore

Learning Objectives:

  • discuss R as a language with a history: use knowledge about the history of R (and of scientific computing more generally) to describe what R is, what people “say” in this language, and why this language has the properties and characteristics that it does.

  • R’s friends: Describe R’s relationship to RStudio, RMarkdown, Shiny, Quarto, and Tidyverse; and, describe what each of these tools is and why someone would use them.

1.1 Early 19th Century: the birth of programming

The idea of a programmable computer is not difficult to understand. The primary goal is to make a machine that you can use for different tasks, depending on what program you feed to that machine. Programs are written in code, which today is stored as text files on a computer. The programmable machine also lives within the computer, so running the program is as simple as telling the machine part of the computer where the program is located; then, the machine part of the computer tries to read find the and read the program at the specified location, and the (hopefully) program runs.

Folks in the 19th century did not have access to digital text files, and so they could not write their programs on them. How did they write programs, and which sort of programs did they write? These are the primary questions I hope to address in this very first section?

1.1.1 Jacquard’s Loom: the punched card

One of the first programmable machines was Jacquard’s loom, which Jacquard patented in 1804. Jacquard was a weaver. Jacquard’s loom was a machine that could create a variety of different patterns, depending on which program was put into it?

So, Jacquard wrote programs to produce beautiful woven fabrics, but more important is how he wrote programs. Jacquard had no text files, so instead he developed a different form of machine input: the punched card. Each line on Jacquard’s punched cards contained information about a single row of the design. The cards could be fed sequentially into the loom to produce a large pattern.

1.1.2 Babbage, Lovelace, and the Analytical Engine

General-purpose digital computers, the sort of computers that can run R, emerged as an idea in the early-to-mid 19th century. Up to that point, computers were either mechanical (mechanical computers are fascinating, by the way) or just humans.

One of the first to develop a design for a general-purpose computer was Charles Babbage, working in the early part of the 19th century. In the 1830’s he proposed a massively complicated, general-purpose, steam-powered computer, which he called the analytical engine. The computer was only capable of carrying out the four basic operations of arithmetic: addition, subtraction, multiplication, and division; it was designed to take input via punched card, just like Jacquard’s loom.

During the 1830’s and 1840’s, Lady Ada Lovelace communicated with Charles Babbage (and several others involved in similar work) with the intention to collaborate with him in studying the analytical engine. It was Lady Lovelace who wrote the first substantial computer program, whose purpose was to compute Fibonacci numbers (Tibees 2020). Her program, written in the iconic note G, used only the four simple arithmetic operations.

Lovelace was interested in discovering the capabilities of the analytical engine. Her program computing Fibonacci numbers was important because it used loops in computation. Lovelace, daughter of the poet Lord Byron, was also interested in non-mathematical applications for the machine. She suggested that a sufficiently mathematical theory of sound could enable to engine to compose complex and scientific symphonies (Tibees 2020). Isn’t that beautiful!

1.2 Middle and Late 19th century

1.2.1 1830-1870(ish)

The middle of the 19th century was a period of massive global shifts. Liberation from enslavement was spreading across the globe after the Haitian revolution left white men horrified at the peculiar institution, which was also losing economic utility (see Capitalism and Slavery by Eric Williams).

Also within that 40 year period was:

  • abolition (as a matter of law, anyways): British Empire (1834), French Empire (1848), Russian Empire (1861), Dutch Empire (and Dutch East India Company; 1863-73), American Empire (1865), Portuguese Empire (1869)

  • Samuel Colt’s invention of a revolver that can be mass-produced (1836?)

  • the development of the telegraph (1830s)

  • the trail of tears (starting 1836)

  • the revolutions of 1848 and the publication of the communist manifesto

  • the first woman’s rights convention in the U.S. (Seneca Falls Convention, 1848)

  • the discovery of the Bessemer Process which enables the mass-production of steel, paving the way for emerging steel tycoons (1855)

  • Darwin published On the Origin of Species (1859)

  • Gatling’s invention of the machine gun (1861)

  • Maxwell publishes his equations, proposing an incredibly successful theory of physics that understands electricity, magnetism, and light as essentially the same thing (1861)

  • the construction and openning of the Suez Canal (1860’s)

  • Mendel’s publication of his laws of genetic inheritance (1865)

  • the discovery of the cell and subsequent elaboration of cell theory (1865 and after)

  • Nobel’s invention of dynamite (1867)

  • Marx’ publication of the first volume of capital (1867)

  • the completion of the transcontinental railroad (U.S., 1869)

  • Mendeleev’s publication of the first periodic table (1869)

In this revisionist history of the computer (and ultimately of R), this period in history marked a transformation of power. The structure and organization of society was changing along with the flow of people, ideas, and commerce. Western, liberal democracies had to develop new technologies of population control in order to prevent all of these liberal changes from challenging their position of authority and power.

1.2.2 Late 19th century

With the relative liberation of black bodies (and other bodies, as well) came a scientific imperative. Power continued to demand that these bodies be inferior, but evidence of inferiority was no longer to come from the conditions and dimensions of the body. Nay, the newly-available technologies of genetic inheritance and natural selection allowed a regime of a new flavor to take hold, one that cited hard science to support and justify the inequities in society. Inferiority was moving through the skin, into the body, and - importantly - into the mind.

Wilhelm Wundt opened the first psychology lab, and William James delivered the first psychology course and textbook. Galton, who was studying intelligence, popularized the idea of the median (Bakker and Gravemeijer 2006). Psychology and with it psychological statistics, was beginning to take shape to meet the new demands of the state: a theory and a technology that will find permanent, internal traits upon which to stratify society into haves and have-nots. The story of the emergence of psychological statistics is incomplete without mention of eugenics. The tools being developed were not neutral and scientific, but overtly political, aimed at achieving the goals of the state.

Also in the late 19th century was what Foucault called the implantation of perversions (Foucault 1978) - the creation of new symbolic threats to the body and to society as a whole. This operated through the invention of new characters that continue to exist within society today.

Firstly, there was the medical specification of the homosexual (Townsend 2011). This began in 1864 with the work of Karl-Heinrich Ulrichs, who was gay himself. He specified men as either urnings or dionings. Urnings and Dionings are both male-bodied creatures, but the urning experiences the desires and character of a female (Townsend 2011). The dioning, by contrast, is normal. Discourse about the urning (renamed to the invert, and then to the homosexual) continued well into the 20th century, and the sissy (the archetype the invert represents) is, obviously, still with us.

Also within this time period, was the medical specification of the hysteric woman, which was initially the perogative of Jean-Martin Charcot.

I’ll mention just one more character that was invented in the later 19th century. For all of American history to this point, immigration law was about the process of naturalization - immigrants becoming citizens. From the beginning of the union, only white men of “good moral character” were allowed to become American citizens (Naturalization act of 1790?). There was little effort to actually prevent bodies from entering the country.

Until 1875. With the passage fo the Page Act of 1875, the United States declared its intention to keep undesirable bodies out of the country for the first time. Shortly thereafter, the “illegal alien” was invented as a result of the Chinese Exclusion Act of 1882, which is the only American immigration law I am aware of that names a specific national group in its title.

All this to say that the nature and enforcement of undesirability were in massive flux in the late 19th century. The foreign element was moving within: the enslaved African could become a citizen and could vote, the invert or the hysteric could be hiding within anyone, and the state took up the power to deport bodies that did not belong. No longer was the anthropologist writing about the inferiority of foreign peoples (although to be clear, they absolutely were still doing that); the pschiatrist was now writing about our own inferiority.

I consider the birth of statistics to be in this time period, which does not have pleasant implications for statistics as a field. There is a lot more to be said about the advent of statistics, and how statistics is designed to serve power (i.e., fulfill the demands of the state). However, I’m going to leave all of that unsaid and refocus on computation in general, and statistical computing in particular.

The late 19th century was also, notoriously, the era of massive trusts in the United States. These monstrous, monopolistic companies exploited both the consumer and the worker, but the United States did not yet have a legal mechanism for breaking them up. The most important monopoly for our purposes: the one that is most influential is the development of S and then R is the AT&T monopoly.

Another monopoly was also forming. Using Jacquard’s punched cards, an American man designed and patented a system to read punched cards. In 1890, this punched card system was used to complete the census, resulting in the 1890 census being completed two years quicker than the 1880 one. The company that developed this technology would go on to become IBM, which enjoyed monopoly status in the computing industry for several decades.

1.3 Early 19th century: the advent of computing

Near the end of the 19th century, a mathematician named David Hilbert decided that mathematics needed to be formalized. Up to that point, it had developed as myriad sub-disciplines that failed to cohere into a single, interconnected web of mathematics. Hilbert believed that it should, and his goal was to formalize this system. He believed that such a system (of mathematical axioms) needed to have three properties:

  1. to be consistent: it should not be possible to derive that a statement is both true and false
  2. to be complete: it should be possible to derive the truth of every true statement (or the falsity of its negation)
  3. to be decidable: there must be an algorithm that can identify all and only true statements in a finite number of steps.

(The excitement about formalizing affected Hilbert, but by no means was he the first or the only to be caught up in this mess. Notoriously, Whitehead and Russel got spun up enough to publish a 126-page long proof that \(1+1=2\). I’m mostly attributing these three demands to Hilbert for sanity’s sake because I cannot stand to write out the sordid details. These three “properties” as I call them, are really inspired very loosely on any specific, cite-able Hilbert publication. He did publish a list of 23 questions, which refer to the properties I mention here, but understand this as a drastically over-simplified view of the mathematical debates unfolding at the time.)

Mathematics was not the only field to be heating up. There was growing speculation in physics that matter may not be as continuous as was previously assumed. In 1900, Max Planck published the first quantum theory in physics, which was aimed at modelling thermal radiation. Shortly thereafter, Albert Einstein published another quantum theory, this time aimed at modeling the the photoelectric effect. Both of these models used quantum stuff (i.e., minimal, discrete units of energy, creating measurements of energy that are always a multiple of the quantum unit), but the authors did not actually believe the world was quantum. Famously, Einstein’s theories of relativity both rely on space-time being continuous. They merely believed quantized math was the best way to explain non-quantum physical phenomena.

Neils Bohr went the whole way, creating his model of the atom, with distinct, orbital electron shells. In the 1920’s quantum mechanics, as we know it today, came into existence. It did not make Einstein happy. Einstein wanted a deterministic world, where each cause has an specific, reliable effect. Quantum mechanics is not a deterministic theory of physics, but a probabilistic one. I take this diversion into the physical sciences not only to stress that this is a transition period within the physical sciences, but to temper my claim from the previous section. The “demands of power” did no less to supercharge the development of statistics and probability than did rapid changes in the way we understand and model the physical world.

During my quantum mechanical tangent, Gödel has proven that achieving the second property of Hilbert’s idealistic system is unlikely. In fact, Gödel establishes that it is logically impossible that any formal mathematical system could be complete, as defined above.

To answer the question about whether mathematics is decidable, a new technology is needed. Before a mathematician can make formal claims about the capabilities or limitations of algorithms in general (as Hilbert demanded), she must first provide a rigorous definition of an algorithm. Two mathematicians took up this task, Alonzo Church who developed the lambda calculus, and Alan Turing who developed the Turing machine. Both men reached the same conclusion: mathematics cannot be decidable. It is logically impossible to make an algorithm (a Turing machine) that can identify all and only true statements (Turing 1936). There are, as it turns out, hard limits on the types of problems algorithms are able to solve (at least in a finite number of steps).

Thus, Turing half accidentally created the field of computer science while trying to answer a question about the foundations of mathematics. This is also an opportune time to introduce the term Turing-complete which refers to anything (model of computation, programming language, a book of instructions used by a human computer) that can simulate the a Turing machine. Any Turing-complete system is essentially equivalent to the original Turing machine described in (Turing 1936). The analytical engine is (theoretically, of course, it never got built) Turing-complete; Jacquard’s loom, by contrast, is not. Modern programming languages are, for the most part, Turing complete, meaning that any function you write in a modern programming language could be performed on the OG Turing machine from (Turing 1936).

The first electric, digital computer was not fully constructed until 1945. It was built by and for the U.S. military, who named the machine ENIAC. Thus, the first computations done on an electric, digital computer were intended to speed up the process of human and earthly destruction. ENIAC was a bunch of coordinated units that ran according to the placement of wires on the machine (Shustek 2016). The machine took IBM punched cards as input (remember the punched card monopolist from the end of the 19th century?).

Initially, the wires on ENIAC had to be moved for each new problem (Shustek 2016). The process of re-configuring the machine for each new problem was tedious, but it was possible, and so ENIAC was Turing-complete. However, having to physically move wires prevented the machine from achieving the utility of a modern programmable computer.

This machine was very quickly modified in a way that dramatically changed its function. Instead of having to move wires, and then feed the machine (punched card) instructions based on the position of those wires, it would be much faster permanently code instructions (functions) into the machine. Then, the input of the machine could describe the sequence of functions. You could achieve looping by instructing ENIAC to perform a function repeatedly and conditional (if-statement) execution by instructing ENIAC to skip functions in the sequence.

This is the idea behind modern programming languages. Instructions for the computer, written in the computer’s language (ENIAC’s language was wires, the one we’ll soon focus on is R) are stored within the machine. “Programming” the machine involves telling it which instructions to perform and in which order. In 1948, the first ENIAC “program” ran under this new computer architecture was a Monte Carlo simulation of neutron decay during nuclear fission (Shustek 2016).

1.4 AT&T, Bell Labs, and S

Monopolies suck, and AT&T did as well. Throughout the beginnning of the century, it gradually became clear that the benefits of a monopolist teleohpne provider were not going to materialize. In 1949, the U.S. Department of Justice sued AT&T for violating the anti-trust act, and the resulting 1956 consent decree prohibited AT&T from entering the computer business (Chang, Hen, and Kan n.d.).

This consent decree did not prevent the further degradation of AT&T’s service, nor did it prevent future anti-trust lawsuits. Throughout the 60’s and early 70’s, the U.S. government dogged AT&T with recurrent anti-trust lawsuits. In 1974, the Department of Justice began their final lawsuit against a monopoly AT&T. Over the course of the next decade, the government proved that AT&T was leveraging its monopoly power to predatory ends, annihilating potential competitors and pricing services far beyond the cost to provide them. This lawsuit ended in 1982 with the dissolution of AT&T into 7 Regional Bell Operating Companies (Chang, Hen, and Kan n.d.).

Bell Labs was probably the most important laboratory within AT&T. During the early 70’s, the researchers at the statistics research department within Bell Labs was using the programming language FORTRAN. FORTRAN is a general purpose, compiled programming language, developed by IBM (the punched card guys).

(This isn’t super relevant, but I think it’s fun. Before 1968, computational statisticians had been using a algorithm called RANDU, which was a FORTRAN function that generated random numbers. In 1968, a mathematician proved that the allegedly random numbers actually all had to lie on a series of parallel hyper-planes, and we thus not actually random. wtf is a series of parallel hyper-planes? See below)

FORTRAN, the name, stands for “formula translation,” and it was primarily used for scientific computing, like computing weather models or doing computational physics - things that have to do with numbers, essentially. It is still used in these fields to some extent, although it is less popular for scientific computing than other, more recent programming languages, like R. FORTRAN is, computationally speaking, incredibly efficient, mostly by natively supporting parallel computation. For this reason, FORTRAN is still used to benchmark supercomputers. You can learn more about FORTRAN on its website.

In any case, in the 1970’s, the statistics research department at Bell Labs found FORTRAN to be somewhat insufficient, and they set out to develop a new language that would more fully suit their needs (Becker 1994).

1.4.1 S

S is a statistical computing language that was developed first at Bell Laboratories in the mid 1970s. At the time, statistics was undergoing a change. Previously, statistics had been developing as a set of methods - essentially algorithms that prescriptively described how to complete a statistical analysis from beginning to end. In the early 70’s, John Tukey was working at Bell Labs and at Princeton, and he was making a lot of noise about the problems with statistics. He popularized a different approach to statistics, establishing something of a binary between data analysis and statistics, just as I did between machine learning and statistics (in Section 1.2; Tukey (1972)).

The statistics research department was beginning to demand a tool that aligned with Tukey’s approach. FORTRAN, developed more than a decade before that demand was created at and by Bell Labs, did not measure up to the task. Instead, they decided to develop a new language, which they named S. Initially, there was a large focus on being able to import FORTRAN functions into S, so that there could be a smooth transition from FORTRAN to S within Bell Labs.

S was built from the ground up to include graphics capabilities, and a structure that enabled and encouraged exploratory data analysis. The basic data structure in S is a vector of like-elements, which were used to make matrices and time-series; S also included lists (key-value maps) and the $ operator, which could be used to retrieve specific components of larger data structures (Becker 1994). It also included all of the arithmetic operators that you need in a desk calculator, making it useful for that purpose, as well.

In 1980, S was distributed outside of Bell Labs for the first time. Initially, it was distributed for a nominal fee and for educational use only, but by 1981 it was widely available (Becker 1994). After it began to be distributed, the developers added explicit looping (i.e., for loops), as well as the apply function, which could be used to loop over a vector while applying a function (Becker 1994). The developers also introduced the “category”, which is now called the factor in R. Categories are vectors of data. They merge numerical and string data types - each entry in the vector is assigned a category label (so that you can read it), as well as a underlying integer (so that you can do math with categories).

Although S was developed initially by statisticians, it clearly had utility as a data manipulation, graphics, and exploratory data analysis tool. In 1988, the developers released the “New S,” renaming the software after some significant changes. The most significant feature of New S was the inclusion of first class functions, which are functions that you can assign to a name and and then refer to by that name. Functions are first class in that they are S objects, just like any vector or matrix. For the first time, S had depreciated functions, which R also has. Depreciated functions are functions for which there is a better alternative. They are generally still included in R and S distributions (so old code that uses depreciated functions can still run), but it’s best to avoid using them (and to use the better alternatives instead). By 1988, many of the FORTRAN functions from the initial development of S were rewritten in C, which is a general purpose programming languages on which New S is built (Becker 1994).

In 1991, the S development team expanded, and there was a focus on adding statistical software to the S language. Although S was developed by statisticians who intended to use it for statistics, the statistics are not inherent in S: “S is a computational language and environment for data analysis and graphics” (Becker 1994). As such, the developers added the formula class, which could be used to specify statistical models. The formula is marked by the ~ operator, with the dependent variable on the left and the independent variable(s) on the right (e.g., y ~ x + w + x*w).

Also in the 1991 release was the data.frame. Matrices are like vectors: they can only have one type of data. If you have a matrix that has even one number in it, then the entire matrix must be numeric, even if you want to use it to represent string data (like names and job titles) or categorical data (Becker 1994). So, a matrix is a combination of multiple vectors, all of the same type. A data.frame, by contrast, is a combination of vectors of any type. You can have a string vector (column) in the data frame representing job title, as well as a numeric vector representing income. As with matrices, you can use the $ operator to pick a vector out of the data frame (e.g., data$income picks out the income vector in the data frame called data).

It’s not really possible for a programming language to die. As we have seen with FORTRAN and S, new programming languages often use code from their older counterparts, especially at the beginning. Even though I can no longer find S on the internet and run it on my computer, a very large number of S functions continue to exist in R.

I am able to find relatively scant documentation about this final period in the history of S, so the rest of this section is at least somewhat speculative (except claims that are cited, of course).

What is the need for R if S exists? Well, well, well. Let’s talk about corporate fuckery, which both killed S and prevented it from dying. I have been making a much bigger deal over anti-trust law than the vast majority of those who introduce R to their students. To this point, as far as AT&T and Bell Labs are concerned, I have presented a world in which they are legally prohibited from selling computers (and presumably, computer software) as a result of the Consent Decree from 1956.

Up to this point, no one was making money off of S. Although Bell was initially charging folks a nominal fee to use the software (Becker 1994), this practice ended quickly, meaning that the software was being distributed for free. As a result of the anti-trust, Bell Labs was not going to monetize this technology. Instead, one of their former employees had to do it.

In the 70s and 80s, the graphical user interface (GUI) was being born. This emerging technology came with a new generation of capitalists who had not been subject to extensive anti-trust, in which former trade union president Ronald Reagan did not believe - the capitalists who bring us Microsoft and Apple, who own outright the operating systems of about 85% of the worlds’ computers (and many phones and other devices, as well).

S wasn’t fated to become Windows; it was fated to become S-PLUS. S-PLUS is/was a statistical computing software with a graphical user interface. It was developed by a company owned by a former Bell Labs employee and University of Washington professor, R Douglas Martin. His work is primarily in econometrics, and he has extensively published about investment risks. Because of this, and because S-PLUS was and is mostly used by economists, a cynic might call it an application to be used for those who are unwilling or unable to learn how to code (similar in character to Microsoft’s SPSS). S-PLUS started circulating (for a fee) in about the year 1987, and it did include features that S did not (like generalized linear models).

Let me just quickly recap, so I can make sure everyone is oriented in time - I’m discussing a lot of events that overlap and are not all well documented. In 1980, S was released to the public; in 1988, S had a significant update, becoming “New S”; around 1987, a former employee of Bell Labs developed S-PLUS; in 1991, S had an update that focused on statistics.

In 1991, two statisticians quietly began work on the project (R) that would more-or-less kill S and S-PLUS.

In 1993, S and S-PLUS were reunited when Bell Labs sold S to the company that had developed S-PLUS. That company, in turn, immediately merged with a company called MathSoft. S-PLUS was only available on windows, and its relationship with Microsoft strengthened when features were added to connect S-PLUS to Excel and to SPSS.

Part of the company (MathSoft) was sold, it got renamed (to Insightful), the exclusive license to distribute S turned into AT&T (i.e., Lucent, one of the companies that remained after AT&T) selling S so that it became the property of Insightful. Then Insightful got bought by a company called TIBCO, and then…

I think you get the general idea. S and S-PLUS got bought, and sold, and licensed, and merged, and acquired to the point that it no longer really exists in any meaningful, public way. But by the 2000s, that didn’t matter.