Arrow-right Camera
The Spokesman-Review Newspaper
Spokane, Washington  Est. May 19, 1883

Chipping away at Sapphire Rapids: Inside Intel’s delays in delivering a crucial new microprocessor

By Don Clark New York Times

SANTA CLARA, Calif. – In May, Sandra Rivera, a top executive at the chip giant Intel, got some alarming news.

Engineers had worked for more than five years to develop a powerful new microprocessor to carry out computing chores in data centers and were confident they had finally made the product right. But signs of a potentially serious technical flaw surfaced during a regular morning meeting to discuss the project.

The issue was so troublesome that Sapphire Rapids, the code name for the microprocessor, had to be delayed – the latest in a series of setbacks for one of Intel’s most important products in years.

“We were pretty dejected,” said Rivera, an executive vice president in charge of Intel’s data center and artificial intelligence group. “It was a painful decision.”

The launch of Sapphire Rapids wound up being pushed from mid-2022 to Tuesday, nearly two years later than once expected. The lengthy development of the product – which combines four chips in one package – underscores some of the challenges facing a turnaround effort at Intel when the United States is trying to assert its dominance in the foundational computer technology.

Since the 1970s, Intel has been a leading player in the small slices of silicon that run most electronic devices, best known for a variety called microprocessors, which act as electronic brains in most computers. But the Silicon Valley company in recent years lost its longtime lead in manufacturing technology, which helps determine how fast chips can compute.

Patrick Gelsinger, who became Intel’s CEO in 2021, has vowed to restore its manufacturing edge and build new U.S. factories. He was a leading figure as Congress debated and passed legislation last summer to reduce U.S. dependence on chip manufacturing in Taiwan, which China claims as its territory.

The bumpy development of Sapphire Rapids has implications for whether Intel can rebound to deliver future chips on time. That’s an issue that could affect scores of computer makers and cloud service providers, not to mention the millions of consumers who tap into online services likely to be powered by Intel technology.

“What we want is a stable cadence that is predictable,” said Kirk Skaugen, the executive vice president leading server sales at Lenovo, a Chinese company that is planning 25 new systems based on the new processor. “Sapphire Rapids is the start of a journey.”

For Intel, the pressure is on. Along with falling demand for chips used in personal computers, the company faces stiff competition in the server chips that are its most profitable business. That issue has worried Wall Street, with Intel’s market value plunging more than $120 billion since Gelsinger took charge.

At an online event Tuesday to discuss Sapphire Rapids, which is named after a portion of the Colorado River, Intel customers described plans to use the processor, which they said would bring particular benefits for artificial intelligence tasks. The product, formally called the 4th Gen Intel Xeon Scalable processor, was introduced along with another delayed addition to the Xeon chip family. That product, formerly code-named Ponte Vecchio, was designed to accelerate special-purpose jobs and be used alongside Sapphire Rapids in high-performance computers.

In an interview, Gelsinger said Sapphire Rapids had the makings of a hit, despite the delays. He picked Rivera in 2021 to take over the unit developing it, where she is using lessons from the experience to change how Intel designs and tests its products. He said Intel had conducted several internal reviews of what happened with Sapphire Rapids, and “we’re not done.”

Sapphire Rapids began in 2015, with discussions among a small group of Intel engineers. The product was the company’s first attempt at a new approach in chip design. Companies routinely pack tens of billions of tiny transistors on each piece of silicon, but competitors like Advanced Micro Devices and others had started making processors from multiple chips bundled together in plastic packages.

Intel engineers came up with a design with four chips, each one sporting 15 processor “cores” that act like individual calculators for general purpose computing jobs. The company also decided to include extra blocks of circuitry for special tasks – including artificial intelligence and encryption – and to communicate with other components, such as chips that store data.

The interaction among so many elements is “very complex,” said Shlomit Weiss, who jointly leads Intel’s design engineering group. “Complexity usually brings problems.”

The Sapphire Rapids team grappled with bugs, flaws caused by designer errors or manufacturing glitches that can cause a chip to make incorrect calculations, work slowly or stop functioning. They were also affected by delays in the product’s manufacturing process.

But by December 2019, the engineers had hit a milestone called “tape-in.” That’s when electronic files containing a completed design move to a factory to make sample chips.

The sample chips arrived in early 2020, as COVID-19 forced lockdowns. The engineers soon got the computing cores on Sapphire Rapids communicating with one another, said Nevine Nassif, the project’s chief engineer. But more work than expected remained.

One key chore was “validation,” a testing process in which Intel and its customers run software on sample chips to simulate computing chores and catch bugs. Once flaws are found and fixed, designs may go back to the factory to make new test chips, which typically takes more than a month.

Repeating that process led to missed deadlines. Nassif said Sapphire Rapids was designed to counter AMD’s Milan processor, which was introduced in March 2021. But it still wasn’t ready by that June, when Intel announced a delay until the next year to allow more validation.

That was when Rivera stepped in. The longtime Intel executive had successfully built a business in networking products before being appointed in 2019 as chief people officer.

“We had to get our execution mojo back,” Gelsinger said. “I needed somebody who was going to run to the fire and fix this business for me.”

In October 2021, Rivera and a top design executive established weekly Sapphire Rapids status meetings, held each Monday at 7 a.m. Those gatherings showed steady progress in finding and fixing bugs, she said, bolstering confidence about starting production in the second quarter of 2022.

Then came the discovery of the flaw in May. Rivera would not describe it in detail but said it had affected the processor’s performance. In June, she used an investor event to announce a delay of at least a quarter, which pushed Sapphire Rapids later than the launch of a competing AMD chip in November.

“We were ready to ship,” Nassif said. The final delay “was just so sad given all the effort that had gone into it.”

Rivera saw a series of lessons from the setbacks. One was simply that Intel packed too many innovations into Sapphire Rapids, rather than deliver a less ambitious product sooner.

She also concluded that the team should have spent more time on perfecting and testing its design using computer simulations. Finding bugs before they are in sample chips is less expensive and would have made it possible to remove features to simplify the product, Rivera said. She has since moved to bolster Intel’s simulation and validation abilities.

“We used to have a lot of this kind of muscle that we let atrophy,” Rivera said. “Now we’re rebuilding.”

This article originally appeared in the New York Times.