The Hardware Lottery

H hardwarelottery.github.io ↗

▲ 26 points • 7 comments • by intelkishan • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 4 of 4

SEGMENTS · AI 0 of 4

WORD COUNT 1,586

PEAK AI % 0% · §2

Analyzed

May 22

backend: pangram/v3.3

Segments scanned

4 windows

avg 397 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,586 words · 4 segments analyzed

Human AI-generated

§1 Human · 0%

Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly. This historical treatment is odd given that hardware and software have frequently determined which research ideas succeed (and fail). This essay introduces the term hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is universally superior to alternative research directions. History tells us that hardware lotteries can obfuscate research progress by casting successful ideas as failures and can delay signaling that some research directions are far more promising than others. These lessons are particularly salient as we move into a new era of closer collaboration between hardware, software and machine learning research communities. After decades of treating hardware, software and algorithms as separate choices, the catalysts for closer collaboration include changing hardware economics , a “bigger is better” race in the size of deep learning architectures and the dizzying requirements of deploying machine learning to edge devices. Closer collaboration has centered on a wave of new generation hardware that is "domain specific" to optimize for commercial use cases of deep neural networks. While domain specialization creates important efficiency gains, it arguably makes it more even more costly to stray off of the beaten path of research ideas. While deep neural networks have clear commercial use cases, there are early warning signs that the path to true artificial intelligence may require an entirely different combination of algorithm, hardware and software. This essay begins by acknowledging a crucial paradox: machine learning researchers mostly ignore hardware despite the role it plays in determining what ideas succeed. What has incentivized the development of software, hardware and algorithms in isolation? What follows is part position paper, part historical review that attempts to answer the question, "How does tooling choose which research ideas succeed and fail, and what does the future hold?" For the creators of the first computers the program was the machine. Early machines were single use and were not expected to be re-purposed for a new task because of both the cost of the electronics and a lack of cross-purpose software. Charles Babbage’s difference machine was intended solely to compute polynomial functions (1817). Mark I was a programmable calculator (1944). Rosenblatt’s perceptron machine computed a step-wise single layer network (1958).

§2 Human · 0%

Even the Jacquard loom, which is often thought of as one of the first programmable machines, in practice was so expensive to re-thread that it was typically threaded once to support a pre-fixed set of input fields (1804) . Early computers such as the Mark I were single use and were not expected to be repurposed. While Mark I could be programed to compute different calculations, it was essentially a very powerful reprogramable calculator and could not run the variety of programs that we expect of our modern day machines. The specialization of these early computers was out of necessity and not because computer architects thought one-off customized hardware was intrinsically better. However, it is worth pointing out that our own intelligence is both algorithm and machine. We do not inhabit multiple brains over the course of our lifetime. Instead, the notion of human intelligence is intrinsically associated with the physical 1400g of brain tissue and the patterns of connectivity between an estimated 85 billion neurons in your head . When we talk about human intelligence, the prototypical image that probably surfaces as you read this is of a pink ridged cartoon blob. It is impossible to think of our cognitive intelligence without summoning up an image of the hardware it runs on. Today, in contrast to the necessary specialization in the very early days of computing, machine learning researchers tend to think of hardware, software and algorithm as three separate choices. This is largely due to a period in computer science history that radically changed the type of hardware that was made and incentivized hardware, software and machine learning research communities to evolve in isolation. The general purpose computer era crystalized in 1969, when opinion piece by a young engineer called Gordan Moore appeared in Electronics magazine with the apt title “Cramming more components onto circuit boards” . Moore predicted you could cram double the amount of transistors on an integrated circuit every two years. Originally, the article and subsequent follow-up was motivated by a simple desire -- Moore thought it would sell more chips. However, the prediction held and motivated a remarkable decline in the cost of transforming energy into information over the next 50 years. Moore’s law combined with Dennard scaling enabled a factor of three magnitude increase in microprocessor performance between 1980-2010 . The predictable increases in compute and memory every two years meant hardware design became risk-adverse.

§3 Human · 0%

Even for tasks which demanded higher performance, the benefits of moving to specialized hardware could be quickly eclipsed by the next generation of general purpose hardware with ever growing compute. Moore's law combined with Dennard Scaling motivated a remarkable decline in the cost of transforming energy into information over the next 50 years. Chip design became risk adverse because it was hard to motivate exploration when there were predictable gains in each new generation of hardware. The emphasis shifted to universal processors which could solve a myriad of different tasks. Why experiment on more specialized hardware designs for an uncertain reward when Moore’s law allowed chip makers to lock in predictable profit margins? The few attempts to deviate and produce specialized supercomputers for research were financially unsustainable and short lived . A few very narrow tasks like mastering chess were an exception to this rule because the prestige and visibility of beating a human adversary attracted corporate sponsorship . Treating the choice of hardware, software and algorithm as independent has persisted until recently. It is expensive to explore new types of hardware, both in terms of time and capital required. Producing a next generation chip typically costs $30-80 million dollars and takes 2-3 years to develop . These formidable barriers to entry have produced a hardware research culture that might feel odd or perhaps even slow to the average machine learning researcher. While the number of machine learning publications has grown exponentially in the last 30 years , the number of hardware publications have maintained a fairly even cadence . For a hardware company, leakage of intellectual property can make or break the survival of the firm. This has led to a much more closely guarded research culture. In the absence of any lever with which to influence hardware development, machine learning researchers rationally began to treat the hardware as a sunk cost to work around rather than something fluid that could be shaped. However, just because we have abstracted away hardware doesn’t mean that it has disappeared. Early computer science history tells us there are many hardware lotteries where the choice of hardware and software has determined which ideas succeeded (and which failed). The Hardware Lottery I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail. — Abraham Maslow, 1966 The first sentence of Anna Karenina by Tolstoy reads “Happy families are all alike, every unhappy family is unhappy in it’s own way.”

§4 Human · 0%

Tolstoy is saying that it takes many different things for a marriage to be happy -- financial stability, chemistry, shared values, healthy offspring. However, it only takes one of these aspects to not be present for a family to be unhappy. This has been popularized as the Anna Karenina principle -- “a deficiency in any one of a number of factors dooms an endeavor to failure.” Despite our preference to believe algorithms succeed or fail in isolation, history tells us that most computer science breakthroughs follow the Anna Kerenina principle. Successful breakthroughs are often distinguished from failures by benefiting from multiple criteria aligning surreptitiously. For computer science research, this often depends upon winning what this essay terms the hardware lottery — avoiding possible points of failure in downstream hardware and software choices. The analytical engine designed by Charles Babbage was never built in part because he had difficulty fabricating parts with the correct precision. This image depicts the general plan of the analytical machine in 1840. An early example of a hardware lottery is the analytical machine (1837). Charles Babbage was a computer pioneer who designed a machine that (at least in theory) could be programmed to solve any type of computation. His analytical engine was never built in part because he had difficulty fabricating parts with the correct precision . The electromagnetic technology to actually build the theoretical foundations laid down by Babbage only surfaced during WWII. In the first part of the 20th century, electronic vacuum tubes were heavily used for radio communication and radar. During WWII, these vacuum tubes were re-purposed to provide the compute power necessary to break the German enigma code . As noted in the TV show Silicon Valley, often “being too early is the same as being wrong”. When Babbage passed away in 1871, there was no continuous path between his ideas and modern day computing. The concept of a stored program, modifiable code, memory and conditional branching were rediscovered a century later because the right tools existed to empirically show that the idea worked. The Lost Decades Perhaps the most salient example of the damage caused by not winning the hardware lottery is the delayed recognition of deep neural networks as a promising direction of research.