Experienced Points

Just How Does the Oculus Rift Work?

Experienced Points Oculus Rift

One of the most annoying things about being a fan of this VR revolution is that a lot of people see no reason to be impressed, because they thought we were able to do this years ago. We’ve been showing it in movies and talking about it for ages. Heck, Nintendo even had that Virtual Boy thing in the 90’s! Why are people acting like this virtual stuff is new?

And there’s some truth to that. We have indeed been building VR prototypes since at least the late 80’s, and it’s always seemed like it was right around the corner. If you heard about a VR prototype in the early 90’s and if you’re not an avid follower of VR news, then it would be perfectly natural and reasonable to assume that it all worked out and VR was invented years ago. That’s certainly how most technologies work.

But the story of VR has been sort of strange, because it’s much less about machines and much more about learning how our bodies work. We’ve discovered that our eyes and our vestibular system are strongly linked, and there are limits on how far you can trick one before you piss off the other.

A great way to illustrate the complexity of the problem is to just go over what the Oculus Rift has to do to bring you something as simple as a pair of images. Note that I’m a software guy, not a hardware guy, so a lot of this is simplified because I don’t feel qualified to discuss the finer points of accelerometers and the like. Still, this should give you a good broad-strokes overview of what this thing does when it’s strapped to your face.

We begin with an OLED display. The old CRT screens (you remember those heavy old things, right?) were too heavy to use for VR. Aside from the neck strain of having one strapped to the front of your head, there’s the problem that those screens flickered very, very fast. It was tolerable at normal viewing distances, but it would have been seizure-inducing in VR. An LCD screen is better, but the pixels can’t turn on and off quickly enough to keep up in VR. So we need OLED screens. (OLED stands for “organic light-emitting diode”, which sounds like Star Trek technobabble, but is a real thing.)

Now we render our virtual world to the screen. We’ve got to do it quick, which means we need a really good computer. Cutting-edge games struggle to hit 60 frames a second, but in VR that’s the bare minimum. (Anything less feels awful.) 75 fps is much better. (And we probably need something like 120 fps to completely trick the eyes seamlessly.) Oh, not only do you need to hit this high frame rate target, you need to do it while rendering the whole scene twice – once for each eyeball.

Now we have a split screen with an image for each eye. But we need this little six-inch screen to envelop the viewer. (Note how you don’t feel “enveloped” if you just mash your face into your smartphone.) The way to do this is to set the screen a couple of inches from the eye, and then use lenses to bend the image. But these lenses also distort the image. They “pinch” the everything towards the center.

This distortion is unnatural and would ruin the illusion. The solution is to run a shader on the image. This is a special full-screen effect (like full-screen anti-aliasing or depth of field, if you’ve heard those buzzwords before) that distorts the image by stretching it out in the center, perfectly negating the lens distortion. So not only are we rendering a video game at an insane frame rate (twice!) but we’re also doing this while applying complex full-screen effects.

And it gets worse, because the lenses don’t bend all wavelengths of light equally. If you’ve ever used a magnifying glass to focus sunlight, you’ll notice that around the edges of the light (let’s assume you’re not just focusing the sunlight down to a pinpoint) has a kind of rainbow effect, because different wavelengths of light end up bent at different angles by the lenses. This is called chromatic aberration. So that shader we were using to “un-pinch” the image must un-pinch different colors in different shapes, or the viewer will see goofy color distortions. Again, this takes more computing power which makes it even harder to hit our frame rate targets, etc etc.

So now the user has a convincing 3D image. We’ve successfully fooled the eyes. Now we have to deal with the vestibular system.

If you’re wearing screens on your eyes and if your brain is convinced in what you’re seeing, then if you turn your head the brain will expect your view to turn as well. If it doesn’t, then this is very uncomfortable. At best, the illusion is ruined. More likely, you’ll begin to feel some degree of VR sickness.

So we add some gyroscopes to keep track of how the device (and thus your head) is turned. But gyroscopes don’t really tell you which way you’re pointed. All they tell you is how fast the device (and thus the user’s head) is turning. So we take these gyro inputs and add them up to make a solid guess about how the head is tilted. These readings will “drift” over time, slowly building up errors.

To picture how this works: Imagine that I tell you to close your eyes and turn ninety degrees left. Then I tell you to turn around 180 degrees. Then right 90 degrees. Those turns are easy and you’ll probably have a pretty good picture in your head of which way you’re facing. But the longer you do this, the more the little errors in your movements will add up and pretty soon you’ll have no idea which way your body is really facing. It works the same way for the Rift. It senses all the little turns, but the data is imperfect and over time it loses its place.

We can add accelerometers to correct for your side-to-side head tilt, and a magnetometer to correct your heading, which (mostly) solves the drift problem and gives the Oculus a stable way to keep track of which way you’re looking. It’s not perfect, but it’s good enough to fool your eyes. So now when you look to the left, the simulation can adjust the camera to show things off to the left. Suddenly the simulation is that much more convincing to the brain.

However, we have a new challenge: We need speed. In the real world your eyes and head are always moving together. In VR, the gyroscope has to take a measurement, then it has to send its data through the operating system and driver layers to the application. The application has to crunch the numbers and figure out what the new position is, and then you’ll see your head-turn reflected the next time a frame is drawn. Smartphones have had gyros and accelerometers in the past, but they were never built for speed. If you’re turning your phone over in your hand, it’s totally fine if it takes the device half a second (or even a couple of seconds) to figure out what’s going on. So these systems were not originally designed with low latency in mind. Now we need to go over all of these devices and drivers and cut out all the half-ass bottlenecks that were there because speed didn’t matter until now.

We’re not done yet. Once someone is in a VR world, they start believing in it. They look up, right, left, down, and their brain starts buying into the idea that they’re really in this strange place. Then the user leans forward to examine some object more closely. The gryos can tell their head is tilted down, but not that it’s moving forward. So the scene doesn’t reflect this. They leaned forward but didn’t move any closer to the thing they’re looking at. To the user it feels like the entire world has tipped forward and away from them. Once again: VR Sickness.

So we need the simulation to track the user’s head movement. In the case of the Oculus, this is done with some infrared LEDs and an infrared camera. This is basically the same technology the Wii uses with the Wii remote. On the Wii, that sensor bar on top of the TV has an LED on either side. That black panel on the front of the Wii remote has a sensor behind it. It sees the two lights and uses them to figure out where it is relative to the television. The Oculus does the same thing to keep track of your head, except the parts are reversed: The lights are on the headset and the sensor is a camera on top of your monitor.

Once again, we have speed and latency problems to worry about. We’ve got to pull in these images from the sensor, scan them for the lights (just to be clear, these lights are infrared and thus invisible to the eye), do a bunch of math to figure out where the headset is relative to the camera based on the lights, use this information blended with previous updates to figure out how the head is moving, and send that data off to the application. Most webcams have a bit of latency and nobody cares, but now every millisecond counts, because the bigger the gap between the time your head moves and the time when your eyeballs see the movement, the more likely you are to find it confusing or disorienting.

There’s still work to do. In a typical OLED display, it takes the pixels a tiny amount of time to shut off. Even if the display is updating at 75 fps, when a pixel on the screen goes from light to dark it actually fades out gradually instead of snapping off instantly. Nobody cared before because this effect was too subtle to notice. But when you’re wearing a VR headset this results in bright objects leaving glowing “trails”. So now we need better displays with more responsive pixels.

This is where we are today. As far as I can tell, we’re done inventing new stuff and now it all comes down to refinement: More pixels. Faster displays. Smaller electronics. Oh, and we’re powering all of this stuff off of USB ports, which vary in ability, are in limited supply, and which aren’t guaranteed to be close to each other. Cordless headsets would help with this I suppose, which is probably why they’re working on them.

We’ve been trying to invent this for over quarter century. It’s been a long road with a lot of setbacks. But it’s finally happening and a lot of the pieces have fallen into place in just the last two years.

Shamus Young is a programmer, critic, comic, and crank. You can read more of his work here.

About the author