One day in March of 2000, six of
Google’s best engineers gathered in a makeshift war room. The company was in the midst of an unprecedented emergency. In October, its core systems, which crawled the Web to build an “index” of it, had stopped working. Although users could still type in queries at google.com, the results they received were five months out of date. More was at stake than the engineers realized. Google’s co-founders, Larry Page and Sergey Brin, were negotiating a deal to power a search engine for Yahoo, and they’d promised to deliver an index ten times bigger than the one they had at the time—one capable of keeping up with the World Wide Web, which had doubled in size the previous year. If they failed, google.com would remain a time capsule, the Yahoo deal would likely collapse, and the company would risk burning through its funding into oblivion.
In a conference room by a set of stairs, the engineers laid doors across sawhorses and set up their computers. Craig Silverstein, a twenty-seven-year-old with a small frame and a high voice, sat by the far wall. Silverstein was Google’s first employee: he’d joined the company when its offices were in Brin’s living room and had rewritten much of its code himself. After four days and nights, he and a Romanian systems engineer named Bogdan Cocosel had got nowhere. “None of the analysis we were doing made any sense,” Silverstein recalled. “Everything was broken, and we didn’t know why.”
Silverstein had barely registered the presence, over his left shoulder, of Sanjay Ghemawat, a quiet thirty-three-year-old M.I.T. graduate with thick eyebrows and black hair graying at the temples. Sanjay had joined the company only a few months earlier, in December. He’d followed a colleague of his—a rangy, energetic thirty-one-year-old named Jeff Dean—from Digital Equipment Corporation. Jeff had left D.E.C. ten months before Sanjay. They were unusually close, and preferred to write code jointly. In the war room, Jeff rolled his chair over to Sanjay’s desk, leaving his own empty. Sanjay worked the keyboard while Jeff reclined beside him, correcting and cajoling like a producer in a news anchor’s ear.
Jeff and Sanjay began poring over the stalled index. They discovered that some words were missing—they’d search for “mailbox” and get no results—and that others were listed out of order. For days, they looked for flaws in the code, immersing themselves in its logic. Section by section, everything checked out. They couldn’t find the bug.
Programmers sometimes conceptualize their software as a structure of layers ranging from the user interface, at the top, down through increasingly fundamental strata. To venture into the bottom of this structure, where the software meets the hardware, is to turn away from the Platonic order of code and toward the elemental universe of electricity and silicon on which it depends. On their fifth day in the war room, Jeff and Sanjay began to suspect that the problem they were looking for was not logical but physical. They converted the jumbled index file to its rawest form of representation: binary code. They wanted to see what their machines were seeing.
On Sanjay’s monitor, a thick column of 1s and 0s appeared, each row representing an indexed word. Sanjay pointed: a digit that should have been a 0 was a 1. When Jeff and Sanjay put all the missorted words together, they saw a pattern—the same sort of glitch in every word. Their machines’ memory chips had somehow been corrupted.
Sanjay looked at Jeff. For months, Google had been experiencing an increasing number of hardware failures. The problem was that, as Google grew, its computing infrastructure also expanded. Computer hardware rarely failed, until you had enough of it—then it failed all the time. Wires wore down, hard drives fell apart, motherboards overheated. Many machines never worked in the first place; some would unaccountably grow slower. Strange environmental factors came into play. When a supernova explodes, the blast wave creates high-energy particles that scatter in every direction; scientists believe there is a minute chance that one of the errant particles, known as a cosmic ray, can hit a computer chip on Earth, flipping a 0 to a 1. The world’s most robust computer systems, at
nasa, financial firms, and the like, used special hardware that could tolerate single bit-flips. But Google, which was still operating like a startup, bought cheaper computers that lacked that feature. The company had reached an inflection point. Its computing cluster had grown so big that even unlikely hardware failures were inevitable.
Together, Jeff and Sanjay wrote code to compensate for the offending machines. Shortly afterward, the new index was completed, and the war room disbanded. Silverstein was flummoxed. He was a good debugger; the key to finding bugs was getting to the bottom of things. Jeff and Sanjay had gone deeper.
Until the March index debacle, Google’s systems had been rooted in code that its founders had written in grad school, at Stanford. Page and Brin weren’t professional software engineers. They were academics conducting an experiment in search technology. When their Web crawler crashed, there was no informative diagnostic message—just the phrase “Whoa, horsey!” Early employees referred to BigFiles, a piece of software that Page and Brin had written, as BugFiles. Their all-important indexing code took days to finish, and if it encountered a problem it had to re-start from the beginning. In the parlance of Silicon Valley, Google wasn’t “scalable.”
We say that we “search the Web,” but we don’t, really; our search engines traverse an index of the Web—a map. When Google was still called BackRub, in 1996, its map was small enough to fit on computers installed in Page’s dorm room. In March of 2000, there was no supercomputer big enough to process it. The only way that Google could keep up was by buying consumer machines and wiring them together into a fleet. Because half the cost of these computers was in parts that Google considered junk—floppy drives, metal chassis—the company would order raw motherboards and hard drives and sandwich them together. Google had fifteen hundred of these devices stacked in towers six feet high, in a building in Santa Clara, California; because of hardware glitches, only twelve hundred worked. Failures, which occurred seemingly at random, kept breaking the system. To survive, Google would have to unite its computers into a seamless, resilient whole.
VIDEO FROM THE NEW YORKER
Surfing on Kelly Slater’s Machine-Made Wave
Side by side, Jeff and Sanjay took charge of this effort. Wayne Rosing, who had worked at
Apple on the precursor to the Macintosh, joined Google in November, 2000, to run its hundred-person engineering team. “They were the leaders,” he said. Working ninety-hour weeks, they wrote code so that a single hard drive could fail without bringing down the entire system. They added checkpoints to the crawling process so that it could be re-started midstream. By developing new encoding and compression schemes, they effectively doubled the system’s capacity. They were relentless optimizers. When a car goes around a turn, more ground must be covered by the outside wheels; likewise, the outer edge of a spinning hard disk moves faster than the inner one. Google had moved the most frequently accessed data to the outside, so that bits could flow faster under the read-head, but had left the inner half empty; Jeff and Sanjay used the space to store preprocessed data for common search queries. Over four days in 2001, they proved that Google’s index could be stored using fast random-access memory instead of relatively slow hard drives; the discovery reshaped the company’s economics. Page and Brin knew that users would flock to a service that delivered answers instantly. The problem was that speed required computing power, and computing power cost money. Jeff and Sanjay threaded the needle with software.
Alan Eustace became the head of the engineering team after Rosing left, in 2005. “To solve problems at scale, paradoxically, you have to know the smallest details,” Eustace said. Jeff and Sanjay understood computers at the level of bits. Jeff once circulated a list of “Latency Numbers Every Programmer Should Know.” In fact, it’s a list of numbers that almost no programmer knows: that an L1 cache reference usually takes half a nanosecond, or that reading one megabyte sequentially from memory takes two hundred and fifty microseconds. These numbers are hardwired into Jeff’s and Sanjay’s brains. As they helped spearhead several rewritings of Google’s core software, the system’s capacity scaled by orders of magnitude. Meanwhile, in the company’s vast data centers technicians now walked in serpentine routes, following software-generated instructions to replace hard drives, power supplies, and memory sticks. Even as its parts wore out and died, the system thrived.
Today, Google’s engineers exist in a Great Chain of Being that begins at Level 1. At the bottom are the I.T. support staff. Level 2s are fresh out of college; Level 3s often have master’s degrees. Getting to Level 4 takes several years, or a Ph.D. Most progression stops at Level 5. Level 6 engineers—the top ten per cent—are so capable that they could be said to be the reason a project succeeds; Level 7s are Level 6s with a long track record. Principal Engineers, the Level 8s, are associated with a major product or piece of infrastructure. Distinguished Engineers, the Level 9s, are spoken of with reverence. To become a Google Fellow, a Level 10, is to win an honor that will follow you for life. Google Fellows are usually the world’s leading experts in their fields. Jeff and Sanjay are Google Senior Fellows—the company’s first and only Level 11s.
The Google campus, set beside a highway a few minutes from downtown Mountain View, is a series of squat, unattractive buildings with tinted windows. One Monday last summer, after a morning of programming together, Jeff and Sanjay went to lunch at a campus cafeteria called Big Table, which was named for a system they’d helped develop, in 2005, for treating numberless computers as though they were a single database. Sanjay, who is tall and thin, wore an ancient maroon Henley, gray pants, and small wire-frame glasses. He spied a table outside and walked briskly to claim it, cranking open the umbrella and taking a seat in the shade. He moved another chair into the sun for Jeff, who arrived a minute later, broad-shouldered in a short-sleeved shirt and wearing stylish sneakers.
Like a couple, Jeff and Sanjay tell stories together by contributing pieces of the total picture. They began reminiscing about their early projects.“Behold, as I transform this normal woman into a sexualized prop.”
https://www.facebook.com/dialog/fee...m_brand=the-new-yorker&utm_social-type=earned
“We were writing things by hand,” Sanjay said. His glasses darkened in the sun. “We’d rewrite it, and it was, like, ‘Oh, that seems near to what we wrote last month.’ ”
“Or a slightly different pass in our indexing data,” Jeff added.
“Or slightly different,” Sanjay said. “And that’s how we figure out—”
“This is the essence,” Jeff said.
“—this is the common pattern,” Sanjay said, finishing their thought.
Jeff took a bite of the pizza he’d got. He has the fingers of a deckhand, knobby and leathery; Sanjay, who looks almost delicate in comparison, wondered how they ended up as a pair. “I don’t quite know how we decided that it would be better,” he said.