I love watching CSI: Crime Scene Investigation, but there’s one thing that makes me want to throw things at the television. Here’s how it goes. The investigators have found a security camera at the scene and have taken the tape back to the lab. They roll forward to the point where the crime is taking place, and look for clues about the perpetrator: maybe a badge, a signet ring, or tattoo. “Zoom in on the ring,” says the person in charge, and the grainy, highly-pixelated image of the ring fills the screen. “Now enhance that section.” Magically, the 50 or so pixels that show the ring vanish and a clear, crisp image of the ring appears in its place.
This just doesn’t work: as you may have experienced when working with images from your digital camera. Here’s why. Let’s assume for simplicity that the security camera is digital and designed to allow guards to see what’s going on, remotely, on a TV screen. The resolution of the camera (how detailed the image is) will be matched to the television: which in this case is less than half a million pixels (depending on format, around 700×500) to capture the entire room, face, or vista. If you’ve bought a digital camera for taking still pictures lately, you’ll know this resolution is very poor. Even cell-phone cameras are likely to have a couple of million pixels these days.
So, what you see is what you get: when the investigators zoom in on the ring in the picture, all they will see is the very same pixels (blocks of color), blown up larger. There’s simply no more information there. Try taking a little jpeg from a website and blowing it up and you’ll get the same effect. Of course, the security companies could invent a much-higher resolution video system—high-definition or more—but at some point the investigators would always run out of extra detail. This is a problem with any kind of detector that is inherently pixelated: we only have so many light receptors and so can only take in so many things at once.
There’s another option too, which is to do some very high level image processing to remove optical distortion. This requires doing experiments on the camera, not just the image taken by it.
In the 1990s Nicholas Negroponte—head of the Massachusetts Institute of Technology Media Lab—was absolutely right in his book Being Digital: at least when it comes to his main argument. Media (providing information to humans) is a killer application for digital technology. This is because, with media, engineers have a very clear idea of exactly how much information they need to pick up to make an experience sufficiently ‘real’ for people to appreciate. They know how big the viewing screen will be, how much detail the eye can resolve one point from another (why display what the eye can’t see?), and at what speed the individual frames will become a moving image thanks to the persistence of the eye. Likewise, compact discs record sound based on what we know about the human hearing system. Frequencies that are above or below what we can hear are not captured and not stored because it’s a waste of resources.
This is fine if you’re just trying to record things to watch on TV or listen to on the stereo. But not if your cameras are your ‘eyes’ on a scene and you don’t know what’s going to be important. In that case you want to throw away as little information as possible. That’s not what digital is best at.
Originally posted on Books on Brains and Machines.