How one piece of hardware took down an $8 trillion stock market

3

Tokyo, the stewards of the world’s third-largest equity realized they had a problem.

A data device critical to the Tokyo Exchange’s trading system had malfunctioned, and the automatic backup had failed to kick in. It was less than an hour before the system, called Arrowhead, was due to start processing orders in the US$6 (S$8.2 trillion) equity market. Exchange officials could see no solution.

The full-day shutdown that ensued was the longest since the exchange switched to a fully electronic trading system in 1999. It drew criticism from market participants and authorities and shone a spotlight on a lesser-discussed vulnerability in the world’s financial plumbing – not software or security risks but the danger when one of hundreds of pieces of that make up a trading system decides to give up the ghost.

“Exchanges are a crucial part of market infrastructure and it’s unacceptable that trading opportunities were denied,” Finance Minister Taro Aso told reporters. “You’re dealing with machines so it’s always possible they will break. They need to create the infrastructure with that possibility of a breakdown in mind.”

The TSE’s Arrowhead system launched to much fanfare in 2010, billed as a modern-day solution after a series of outages on an older system embarrassed the exchange in the 2000s. The “arrow” symbolizes speed of order processing, while the “head” suggests robustness and reliability, according to the exchange. The system of roughly 350 servers that process buy and sell orders had had a few hiccups but no major outages in its first decade.

That all changed on Thursday, when a of hardware called the No 1 shared disk device, one of two square-shaped data-storage boxes, detected a memory error. These devices store management data used across the servers, and distribute information such as commands and ID and password combinations for terminals that monitor trades.

When the error happened, the system should have carried out what’s called a failover – an automatic switching to the No 2 device. But for reasons the exchange’s executives couldn’t explain, that process also failed. That had a knock-on effect on servers called information distribution gateways that are meant to send market information to traders.

DISAPPEARING DATA

At 8am, traders preparing at their desks for the market open an hour later should have been seeing indicative prices on their terminals as orders were processed. But many saw nothing, while others reported seeing data appearing and disappearing. They had no idea if the information was accurate.

A minute later, the bourse made its first communication, informing systems administrators at securities firms that there had been an issue. At some brokerages, that didn’t immediately filter down to befuddled trading desks.

At about 8:05am, Twitter – often used by traders to communicate outside of more official communication channels monitored by compliance – began to buzz with rumors of an issue. Traders described a growing sense of confusion as few answers came from the bourse.

“We didn’t know if it was our system or the exchange,” said Masaya Akiba, a broker at Marusan Securities’ stock-trading department. “We only confirmed it when the exchange put out a release.”

At 8:36am, the bourse finally informed securities firms that trading would be halted. Three minutes later, it issued a press release on its public website — although only in Japanese. A confusingly translated English release wouldn’t follow for more than 90 minutes.

It was the first time in almost fifteen years that the exchange had suffered a complete trading outage. The Tokyo bourse has a policy of not shutting even during natural disasters, so for many on trading floors in the capital, this experience was a first.

HISTORIC DECISION

Some market participants fumed at the closure. Others, with nothing to do, occupied their time by reading research notes or trading commodities.

“I didn’t think much of it at first,” said Kiyoshi Ishigane, the chief fund manager at Mitsubishi UFJ Kokusai Asset Management in Tokyo. “Previous outages were quickly resolved so I assumed orders would just be delayed.”

In 2012, after the switchover to Arrowhead, the exchange had quickly resolved limited issues. Many expected the bourse to do the same this time, too.

But as the hours passed, Hajime Sakai, the chief fund manager at Mito Securities, grew increasingly uneasy.

“I really couldn’t pay attention to much else,” he said. “I wasn’t like, ‘Open the market!’ It was more like, ‘whichever it is, make your call on it, fast.’”

The call was a daunting one. After the failed switch to the backup, the exchange had manually forced a switchover to the No 2 shared disk device. At this point, the administrators had a choice: they could seek to restart trading, but this would have entailed a full reset of the system – shutting down the power and rebooting.

Data for orders already received from securities firms would have been lost, without having been canceled. That would have led to anarchy, securities firms told the exchange. After speaking with market participants, the exchange made its decision: trading would be called off for the entire day.

Many in the market say they were relieved. A call to resume trading would have been chaotic, said one worker at a Tokyo-based brokerage, with no way to tell which existing client orders remained active, while also trying to process new asks and bids.

TECHNICAL DISCUSSION

At 4:30pm local time, four TSE executives, including chief executive officer Koichiro Miyahara and chief information officer Ryusuke Yokoyama, faced journalists at the exchange to explain the outage. In a briefing that lasted about 100 minutes, they bowed in apology in front of the crowded room before going into a detailed technical discussion of the breakdown.

If the bourse was criticized for its communications earlier in the day, it won praise for how it handled the press conference. The executives answered questions from the media with relative ease, discussing areas such as systems architecture in highly technical terms. They also squarely accepted responsibility for the incident, rather than trying to deflect blame onto the system vendor Fujitsu. It bore little resemblance to gaffe-filled briefings by other Japanese firms in the past. On Twitter, the Japanese public voiced its approval.

“Management explained very clearly during the briefing last night,” said Megumi Takarada, a senior analyst at Toyo Securities in Tokyo. “The briefing provided some reassurance that management clearly understands the issue.”

Later in the evening, the announcement came that the bourse would restart trading Friday. While that passed without issue, many questions remain unanswered. The Financial Services Agency has ordered the exchange to issue a report on the outage, according to local media, which may give further insight on some of the issues.

But one of the biggest is whether the same kind of hardware-driven failure could happen in other stock markets. For one strategist, it almost certainly could – but that’s not something to worry too much about.

“There’s nothing uniquely Japanese about this,” said Nicholas Smith of CLSA in Tokyo. “I think we’ve just got to put that in the box of ‘stuff happens.’ These things happen. They shouldn’t, but they do.”

Comments are closed.