Australian Lefty on Politics, Governance, Science and Info Management

Low bar for overblocking in the net censorship test

Posted by Dave Bath on 2009-12-15

Rather than go on about the offensiveness and cost-ineffectiveness of the KRudd/Conroy net censorship effort, and discussion of better approaches, I thought I’d look at one particular aspect – the design and results of the overblocking test.

Bottom line: Low bar, and even then, the results weren’t great.  Lots of evil stuff still got through, and too much innocent stuff got blocked.

Overblocking, the prevention of access to legitimate content, can be a problem with blocking based on the URL (the requested address), and is always a problem with content-based filtering – even when automagically deciding to block based only on text content rather than trying to analyze images, sound or video.

The bar for overblocking was set extraordinarily low ("mostly below MA15+" according to the fragmented report at Conroy’s propaganda page which is performing like a dog, but at least you can get a PDF-rendered version mirrored at ZDNet), and even then nearly 3.5% of legitimate content was blocked.

Testing was also undertaken against a list of content, prepared by Enex, considered to
be innocuous and which should not be blocked by a filter.  All participants experienced some level of over-blocking in this test (i.e. blocking of some legitimate URLs).  All filters blocked less than 3.4 percent of such content.

– Enex Report p3 ZDNet PDF Mirror

False positives designated as "inappropriate for children" would include discussions of "sperm whale"… so it would be awfully difficult to get stuff relating to important sociological, historical and medical literature.

I guess we don’t want kiddies to know about safe sex and contraception!  Let them get HIV instead!

Anyway, here are the results for the accuracy tests (p13 in the ZDNet PDF Mirror), that should that even when letting 20% of inappropriate material through, 3% of legitimate material was blocked.  I’d imagine that if we only let through 5% of inappropriate material, at least 20% of legitimate material involving biology would be blocked.  Say goodbye to so much of David Attenborough!

Results for accuracy tests – additional content
Accuracy testing results are as follows:

  • All filters participating in additional content filtering in the pilot blocked between 78.80 percent and 84.65 percent of inappropriate material.
  • All filters participating in additional content filtering in the pilot blocked less than 3.37 percent of innocuous content.

Let’s express the same figures more clearly: ALL participants allowed AT LEAST 15% of inappropriate material through and blocked AT LEAST 2.4% of innocuous content.

Here’s the more detailed data (with "Pn" replacing "Participant n", and adding an extra row to show how much evil stuff got through):

ISPs participating in filtering additional content

Service P2
P5 P6 P7 P8 P9
Inappropriate Content Blocked 80.72% 84.65% 80.97% 80.72% 78.80% 82.03%
Inappropriate Content Allowed 19.28% 15.35% 19.03% 19.28% 21.20% 17.97%
Innocuous Content Blocked 2.76% 3.17% 2.78% 2.44% 2.87% 3.37%

OK, so without digging around for traffic loads so I can do a weighted average, I can only do a simple average, and round to the nearest percent:

  • Inappropriate Content Blocked: 81%
  • Inappropriate Content Allowed Through: 19%
  • Innocuous Content Blocked: 3%

It’s also worth noting that the circumvention techniques discussed only involved URL-based blocking rather than content obfuscation (such as using a private codecs for a streaming video or even a simple rotate of bytes in an image – the same as a ROT13 of text so that it looks like raw data).

So… given that the amount of evil stuff flying around the net is huge, the 20% of evil stuff that gets through is still a lot.  So the kiddies aren’t going to be protected.

A real test, if the aim truly was to keep the kiddies safe, would have required at least 95% blockage of inappropriate content… but then the amount of blocked legitimate content would have left Conroy with a very red face.

Now, let’s start revving up the fundies and alarmist techo-illiterates with "Conroy still wants to let 20% of the evil stuff on the net through to your kiddies" and get them to force Conroy to tighten the test so that inappropriate content is stemmed to the level the fundies will accept (probably much less than 1%), then see what happens to the overblockage rate.

And that’s without even worrying about network performance degradation.

Of course, I suspect that this whole exercise has nothing to do with actually protecting the kiddies, nothing to do with actually catching evildoers, because better and cheaper approaches would be employed.

So… who benefits from this?  Not the government hoping to get political mileage from protecting kiddies, because the amount of stuff getting through will turn into egg on the faces of politicians.  Is it merely a way of shovelling taxpayer funds to service providers?  Maybe.  But even that is a pretty roundabout way of doing it and provides no new useful system capability.  My guess, it’s an excuse to get more Echelon capabilities, but, like the Victorian Government release of data to friendly major contractors, will delve into the legitimate but politically inconvenient conversations of law-abiding citizens.

See Also:

  • "Something Conroy might block: Pottery and Plato" (2008-12-21), let alone the type of violence in Shakespeare’s Titus Andronicus, the stories of Zeus’ cavortings as a swan (Leda), as a bull (Europa), or, as Pavlov’s Cat points out in an LP comment, the story of Pasiphaë.
  • And if we are sniffing at content, Conroy’s telemedicine plans might be very difficult!
  • "If Conroy wants to control the worst web porn" (2009-07-23) asks why he doesn’t do the simple thing, like getting the Feds to have a blog and follow the bouncing ball by looking through the spam filters, which have some pretty horrible text chunks… indeed… how will we be able to go through our spam filters and un-spam real content because the pages contained spammed comments will be blocked based on the words in them?

And a few images I’ve gimped up for previous posts (although you may need to look up what HTTP 403 means).

Earth: 403: Forbidden

I can't see a problem. Thanks, Conroy!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: