The parliamentary dog ate my homework
Posted by Dave Bath on 2009-01-13
If a note, apparently from a parliamentary worker, to my thoughts about the implications of a dead parliamentary webserver is correct, then going offline without warning to the public, or redirection to a "don’t panic" page, still indicates incompetent IT planning and/or contempt for the public.
When an offered excuse is self-condemning, you have to wonder if it reflects the real reason, or whether the full implications of the excuse are understood… either way, not good!
Richard‘s comment to my report of the dead parliament site and my initial thoughts about the possible causes and consequences raises a number of questions which I’ll explore over the fold, and it certainly doesn’t explain the continuing misbehaviour of a new Senate submissions system, except by pointing to the same root cause of hopeless IT governance, identified as systemic through government in the Gershon Report commissioned by, and promised to be implemented in full by Tanner.
If the comment about scheduled power outage is true, it does rule out some of the possible reasons for the loss of service, but raises others that were so unlikely that I hadn’t initially considered them… but have been partially explored in comments by myself and Robert Merkel (of Benambra and Larvatus Prodeo fame).
The first thing to note is that a scheduled power outage is scheduled, known in advance. A power outage to the parliament (which is really not so much a building as a highly-fortified small suburb) is a pretty big thing, and should be known days, if not weeks in advance.
Thus, while "Richard" notes that occupants were notified of coming disruptions, the lack of notice given to the public shows contempt and/or negligent management.
Traditionally, sysadmins let all users know that an outage is coming up, even in near emergencies. As I pointed out elsewhere…
As a sysadmin from the mid 1980’s, for SCHEDULED maintenance (from say 17:00) we’d always gave warnings on login ALL DAY (if not the day before), and even in emergency situations, apart from a kernel panic (see also Screens of Death), we’d
wall(1)everyone with typically 5 minutes grace time so (wherever possible) they could save files/records they were editing.
Although you may be correct, a rude shutdown without warnings for “maintenance” is a Bad Thing, should not be tolerated, but is unsurprising in these latter days of poor rigor (but IT cognitive rigor mortis).
After all, when you are in an office, and there is a sudden need to shut servers down, what happens? You get emails, on screen messages, and IT staff running around like headless chooks telling everyone to save what they are currently doing and not start anything else.
If the outage was planned, and even in the absence of the failover/handover systems to ensure continuous service, the following trivial steps were not taken, and they should have been:
- A large notice on the parliamentary home page informing the public that the system would be off-line, so they could plan to make submissions at other times;
- A redirection from all web requests directed at
aph.gov.auto even just a static page on another computer, saying "Don’t panic. Parliament has not been attacked by terrorists. Scheduled system maintenance is underway and we’ll be back in action by Monday morning at the latest;
- For those logged in and making submissions (like me),:
- Extra warning when logging in about the forthcoming outage;
- Logging of all started sessions (your email address is your login account) and transactions so that followup "Can you please resend what you were trying to upload, or use standard email to [the email of whichever committee you were uploading to]".
No such email confirming my upload (as promised), or apologies and request to resend were sent to me, and it has been a couple of days since the system came back online This implies that there was no proper planning to trap the information from logs about people making submissions to parliamentary inquiries, meaning that the committees are possible denied all the information they should have when making decisions about recommending legislation or changes to draft bills.
I also bet that they made darn sure that any processing of parliamentarians pay and re-imbursements weren’t corrupted mid-batch!
In essence, if Richard’s comment is true, then it is effectively knowing when the "cleaner will trip over the power cord" and not having a plan to handle it. Things could only be worse if you don’t know when your cleaner will be disastrously unco-ordinated!
But as Robert Merkel noted elsewhere, and I’m in firm agreement:
In any case, I would have thought that the APH site was important enough that outages of that length are avoided.
Obviously the BCP (Business Continuity Plan) is completely inadequate. If a planned outage can pull down not only external access, but operations for internal customers (there is no excuse for the latter), then what are the consequences of an unplanned outage caused by meteorites, terrorist crackers, or more mundane terrorists with explosives? If there is no verified BCP, then what information would be lost? If losing such information isn’t important enough to have a BCP, then this implies the work of the parliament isn’t that important (you’d lose any paperwork too if there was a meteorite or something similar).
Again, the scheduled outage does not offer an excuse for the misbehaviour (incredibly slow response or session crashes) of the newish submissions system. Parliamentary IT staff still have to answer for that! (See description of the problem and comments by various people on causes here.)
So, if the given rationale is from a person-in-the-know, it is such a self-condemnation that it raises into question the other possibility (equally damning of IT security planning)… the system was being cracked and the plug was pulled. This is just a minor possibility, but if cracking was the cause, the "scheduled shutdown" excuse can be understood, although there should be a fairly detailed investigation by the DSD and lots of resources put into fixing the problem, given that we can only expect more criminal and state-sponsored cracking in the future.
It is worth remembering that parliamentary systems, from cleaning services to IT, are "independent" of the party in power, apart from the resources granted to them.
As noted elsewhere, given that the reworking of various senate systems are reported in the 2006/07 and 2007/08 reports from the Black Rod, and consistent with the attitude of Howard to comments about process by independent parliamentary staff, and given that enterprise architecture and cultural changes (which take into account expected funding) take two years or thereabouts to filter into daily operations and capability, any blame should probably be directed to the previous federal government.
This of course doesn’t mean the current government should sit on its hands. Parliamentary IT services obviously need greater funding, and much greater oversight by DSD and AGIMO, two agencies that have demonstrated competence time and time again, but are too often sidelined.
I’ll be writing about my concerns to
firstname.lastname@example.org, but I REALLY WANT YOUR THOUGHTS BEFORE I WRITE, whether you are technologist, service delivery manager, or private citizen. So please comment on the most appropriate posts.
After all, whether all government systems have the same inadequacies, are exposed to the same untreated risks, or whether it is just the parliamentary IT system, it affects us all.
- Posts on recent events:
- "Oz Parliament website up and stumbling" (2009-01-12) which covers errors in the senate submissions system AFTER parliament came back up.
- "Implications of off-line parliamentary webserver" (2009-01-11) explored before the "scheduled power outage" comment
- "Oz Parliament Website Dead" (2009-01-10) broke the news of the outage
- "Disability/Human Rights Inquiry Closes Monday" (2009-01-10) includes my experience with a lost transaction
- Reference posts and documents
- "Senate IT Committee System Outage Woes" (2007-05-27) documents a similar problem a couple of years back
- COBIT (wikipedia overview of the relevant control points and ISACA COBIT source materials, which provide authoritative guidance for acquisition, delivery and continuous service of IT systems
- VALIT (wikipedia overview – created by me, and ISACA Val IT source materials which provides the framework and supporting publications addressing the governance of IT-enabled investments)
- Capability Maturity Model is the framework of competence for different processes within an organization
- Capability Immaturity Model provides a framework for describing organizational dysfunction
- Pointy-Haired Boss and suits and management articles describing the sort of persons who should be taken out and shot.