Blackberry outage traced to poorly tested upgrade
NEW YORK — After two days of frustrating silence about a lengthy outage in its BlackBerry e-mail service, the company that makes the addictive mobile device issued a jargon-laden update indicating a minor software upgrade had crashed the system.
The statement late Thursday night by Research in Motion Ltd. said the outage from Tuesday evening into Wednesday morning was triggered by “the introduction of a new, non-critical system routine” designed to optimize the cache, or temporary memory, on the computer servers that run the BlackBerry network.
RIM said “the pre-testing of the system routine proved to be insufficient.”
The failed upgrade apparently set off a domino effect of glitches, which the company referred to as “a compounding series of interaction errors between the system’s operational database and cache.”
The Canadian company said a “failover process” to switch to a backup system “did not fully perform to RIM’s expectations.”
That led to a delay in restoring service and “processing the resulting message queue,” a reference to the backlog of undelivered e-mail that accumulated during the outage.
The outage and the company’s delayed, tightlipped response to the situation angered some customers. It is an approach RIM has taken with past service outages, which in fact have been rare.
Yet with the company’s rapid expansion beyond its longtime focus on business users — the new BlackBerry Pearl has been a smash hit with consumers since its launch last summer — some experts say RIM needs to get more savvy in dealing with problems.
“So far, all we have gotten from RIM are explanations fit for engineers, not customers,” said Richard S. Levick, whose firm Levick Strategic Communications LLC specializes in crisis communications.
While most of the latest outage happened outside “work” hours, the always-connected mentality fueled by BlackBerry’s success left many users feeling disjointed and aggravated when their devices stopped buzzing. Grumbles were heard at the highest levels of business and government, including the White House and the Canadian Parliament.
During the last major failures nearly two years ago, RIM waited hours before confirming the problem, then issued a cryptic description of what happened.
This time around, from the time the e-mail ceased flowing Tuesday evening, it took RIM more than 12 hours to issue a vague three-sentence statement acknowledging the disruption. No further updates were provided until late Thursday’s statement, prompting criticism in online forums and Web logs.
“They have to stop thinking like engineers and start thinking like a utility,” Levick said. “When the telephone lines go down or the power goes out, the first thing these utilities do is try to fix the problem while simultaneously communicating with the media and customers. Why does RIM think it can’t do two things at once?”