Google Cardboard

Note: This is a blog post I started writing almost a year ago, which I finally managed to finish today. So you can see I mentioned a bit about my first semester as a senior student. Now I'm already a constructive member of the society and working a full-time job.

After a year of internship, I finally went back to CUHK to finish my final year. A lot had happened. For starters, I went insane and took four elective courses. One of them is CSCI3290, a computation photography course. It's a nice course, great materials, but terrible class. And I knew I was doomed when I finally knew all my fellow classmates except me had actually taken a course about principles of computer graphics (CSCI3260) before coming here. I had no idea what I was doing during the exam, but the assignments were quite interesting.  And for the final project, I have to make an Android app to

  • Generate stereoscope Images using a single image & depth map
  • Add a 3D logo to a video
  • Add 3D subtitles to a video
  • Panorama image viewing


I was given a sample app, which was quite nice. It had already handled the basic image showing and video playing functions for me. The main focus was to make the images and videos. Should be easy. So I went over the specification document, which is real piece of work. I only understand the part where they explained how to assemble a Google Cardboard, which seemed so 'hard'. So I jotted down what I did those few days and cut all the crap that I didn't need for my own reference here to save other lost souls like me. To assure you I'm the real deal, the boss, the man, I got the highest score of the class in this project.

Generate stereoscope Images using a single image & depth map

I had to use an algorithm called Depth Image-Based Rendering (DIBR) to make 3D images. Basically it means when given a single image and a depth map, I'll be able to generate one more image so when the original image and generated image are put together side by side, I can see some 3D effect through the Cardboard.

I was really confused by all those technical stuff, then I remembered I was also given two sample images, so I put it in my phone and had a look, which did look quite 3D!

 
Figure 1: The left and right sample images.

So, what's next? After reading the specification over and over again (and some Googling) for almost the whole evening, I finally understood what was going on.

Imagine you are in a room staring at some objects, and obviously what you see when standing at one corner is different from standing at another corner, right? Now, take a look at the figure below, try and imagine what the pictures taken from the two cameras (with the labels L and R) would look like . They should look similarly to what the figure illustrated. And we define the shift of the object disparity. Since parallax can make objects move faster when they are close to the viewer, slower when they are farther, and stop at "infinity", these two images can give the desired 3D virtual view by creating the feeling of depth.

Figure 2

If we use the left image as reference, and we jot down the disparity of every pixel. Eventually we will have a 2-D matrix, and we call it the Disparity Map. This may go without saying that if we are given the left image and the disparity map, we can do some calculation to generate the right image.

To do this, let's call the left image matrix I and the disparity map d. Our goal is to do the warping for the resulting image J (the right image).

For each pixel in I, we know Ii,j need to be shifted to position (i, j + di,j), that is

Ii,j+di,j = Ji,j

Visually speaking, we want to do something like this:
Figure 3

But in this case, we will only shift the pixels horizontally.

Everything is going fine, but there's a catch. J’s position (j + di,j) may not be an integer. This is bad, really bad. But we can fix this, with backward warping. In MATLAB, this is when interp2() comes to play, since it can interpolate unknown values. Done and done.

However, sometimes you may not be given a Disparity Map, instead, what you get is something called Depth Map. a depth map encapsulates how far an object is from the viewer. Intuitively, the real depth T can be determined by a depth map D using this formula:

Ti,j = αDi,j + β, where  Î± and β are values of your choice here

Now we have the 'real' depth, but how can we use it to make the disparity map?

Consider the following figure, where

  • at the bottom, we have two cameras, so b is the distance between the two cameras
  • if you look higher, you can see two horizontal lines, which represents the left and right images, so f is the focal length of the two cameras
  • we define XL and XR as the x-coordinates of the spots on the image

Figure 4

So, the first question is, how to find the disparity d?

Going back to Figure 2, we can deduce that disparity

d = X*L - X*R

Consider
X*L = XL,
X*R = XR - b,

and the properties of similar triangles, we have the following:

d = XL - (XR -b), and
z/(z-f) = b/(XR - XL)

z/(z-f) = b/(b-d)
1 - f/z = 1 - d/b
z = f * b/d

where z is the depth we can find from the Depth Map. And with this, we can find the d for every pixel and consequently, we have our Disparity Map.

I figured the whole thing by abusing my Google-Fu and reading numerous papers that were written to be incomprehensible, but here are some of the better resources that you can have a look:

  • An assignment from Tempere University of Technology which details pretty what I described, but in a more difficult way, the specification of my assignment pretty much rips it off: link
  • Stereo Vision Basics, a blog post by blogger Chris Walker: link
  • Chapter 11 from the book "Intelligent Robotic Vision" which talks about Stereo Vision: link
And here are some results:

 

 


Add a 3D logo to a video & Add 3D subtitles to a video

These two parts are actually very easy. To embed a 3D logo to a video, I simply conjured up a depth map for the CUHK logo (as the left image) myself with a random value and shifted the pixels accordingly with the same algorithm to make the right image. As soon as I found the random value that makes the logo '3D' enough, I embedded the two images onto every frame of the video file (that means I generated two video files), and I'm done.

 


3D subtitles are pretty much the same. The additional step I did was converting subtitles into images before repeating all the steps I mentioned above.

Panorama image viewing

This part is not worth spending a long time to explain. The given code has already implemented how to do the orientation (so you can move the device around and see different things) accelerometer and magentic sensor (i.e. Geomagnetic Rotation Vector Sensor, which is recommended by Google: https://developer.android.com/guide/topics/sensors/sensors_position.html#sensors-pos-geomrot)

All I needed to do was adjusting the size of the panorama image to be large enough for 'panoramic' viewing. Everything was working great when I moved my phone around, until I put it into the CardBoard, which had a goddamn magnet in it and affected the readings of the magnetometer. Pretty much everything didn't work anymore. So I rewrote the code into using gyroscope instead. Period.

Source code


Installation

Just build the app using Android Studio and copy the CardBoard folder into your device. But if your device is using Android 6.0, you'll have to enable the permission for Storage and Location manually after installation.




Kev
Google Cardboard Google Cardboard Reviewed by Kevin Lai on 6:34:00 AM Rating: 5

No comments:

Powered by Blogger.