Thanks to a huge effort to fix the most obvious weaknesses and the appointment at last of a single contractor, QSSI, to oversee the work, the website now crashes much less frequently, officials said. That is a major improvement from a month ago, when it was up only 42 percent of the time and 10-hour failures were common. Yet an enormous amount of work remains to be done, all sides agree.
Systems like this should require 5 9s availability from the beginning. This means that the system should be operationally up 99.999% of the time. This allows for around 5.7 minutes downtime per year. I suspect companies like Amazon, Facebook, and Google meet this standard for high availability. There are all kinds of methods and tricks to achieve this that have been learned over the past century in telecommunication systems.
In the last week of September, the disastrous results of the project’s inept management and execution were becoming fully apparent. The agency pressed CGI to explain why a performance test showed that the site could not handle more than 500 simultaneous users. The response once again exhibited the blame-shifting that had plagued the project for months.