Image Manipulation in MATLAB

Image Manipulation in MATLAB
Due 11/2 at 11:59 PM
1 Introduction
A couple of your friends in the business school have a concept for a great new picture editing application
for mobile devices, but they need help on the technical side of things. They’ve asked you
to come on board and help write some of the code necessary to manipulate the images. It just
so happens that you have been studying matrix manipulations in your differential equations class,
and this knowledge along with some basic programming skills will allow you to make a significant
contribution to the application.
Digital images are just matrices of pixels, and any type of matrix operation can be applied to a
matrix containing image data. In this project, you will explore some ways to manipulate images using
MATLAB. We’ll start off with transformation matrices and then move on to image compression.
2 Basic MATLAB commands
MATLAB has some nice built in functions for reading and writing image files—the first command
we’ll be using is imread, which reads in an image file saved in the current directory. imread works
with most popular image file formats. To view an image in a matlab figure, use imagesc. imagesc
is similar to image, but for our purposes will work more consistently. The following code will read
in an image with file name photo1.jpg, save it as the variable X, and display the image in a matlab
figure window. Make sure you run the code from the same folder that contains the image.
After reading in an image like this, X(:,:,1) is a 2-D matrix with intensity values for the red
channel, X(:,:,2) for the green and X(:,:,3) for the blue.
When images are read in using imread, MATLAB stores the data as 8-bit integers, or integers
that can range from 0 to 255. If we want to perform mathematical operations on the image data
using floating point numbers, the integers must be converted to floats as well. If you just read in
an image as X, you can use X double = double(X); to perform the floating point conversion.
If you want to write image data to an image file, you can use imwrite. Note that if you converted
the image data to the double format, you’ll need to convert back to integer values. The
command to do this is uint8. The following code shows how to read in an image, convert to double,
and write to a .jpg file. You’ll want to build your image manipulation code around this template
if you wish to write your output to an image file.
1
Note that if you want to view the image after you’ve converted the image data to double format,
you may need to convert back to uint8 just as you did to use imwrite. In other words, the command
you’ll need to view an image once you’ve converted it to double format is imagesc(uint8(X double)).
For much of this lab, we’ll be working with grayscale versions of photos to keep the matrix manipulations
simpler. To make a grayscale image out of our color image, we want to combine the matrices
that store the information for the red, green, and blue colors into one matrix whose (i, j)-th entry
tells the intensity level of the pixel at that position in the image, with 0 being black and 255 being
white. Since we want the information from each of the matrices storing the red, green, and blue
values for the image to contribute to the intensity of each pixel, we’ll create the matrix for the
grayscale image by making a linear combination of each of X double(:,:,1), X double(:,:,2),
and X double(:,:,3). The following lines of code will convert the image to grayscale and display
this converted image in a MATLAB figure:
X gray = .3*X double(:,:,1) + .3*X double(:,:,2) + .4*X double(:,:,3);
imagesc(uint8(X gray))
colormap(’gray’)
3 Image Manipulation
Matrix multiplication allows us to transform a vector in many ways. The following matrix takes
the entries of a vector and shifts them down one position, cycling the last entry around to the top.
?
?
?
?
?
?
0 0 0 0 1
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
?
?
?
?
?
?
?
?
?
?
?
?
1
2
3
4
5
?
?
?
?
?
?
=
?
?
?
?
?
?
5
1
2
3
4
?
?
?
?
?
?
Matrices like the one on the left above, which we’ll call P, can be helpfully visualized with the
MATLAB command spy(). If P is defined in the MATLAB workspace, then typing spy(P) gives
the figure shown below. spy() is especially useful for visualizing sparse* matrices that have large
dimensions.
*Matrices with relatively few nonzero entries.
2
Figure 1: The figure shows the location of the nonzero entries of the matrix
P. The axes give the row (y-axis) and column (x-axis) indices for the
nonzero entries.
Notice that if you start with the identity matrix and interchange rows until you get the matrix on
the left above, multiplying a vector by that new matrix applies the same row interchanges to the
vector. Each of the rows were shifted down one, and the last row cycled around to the top. This
is called a permutation matrix because it is obtained by permuting the rows and columns of the
identity matrix. In this case, row 1 turns into row 6, row 2 turns into row 3 etc. This transformation
matrix works on matrices, too.
?
?
?
?
?
?
0 0 0 0 1
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
?
?
?
?
?
?
?
?
?
?
?
?
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
?
?
?
?
?
?
=
?
?
?
?
?
?
5 10 15 20 25
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
?
?
?
?
?
?
Notice how the matrix multiplication cycled the rows around in the matrix the same way it did in
the vector. Since an image is just a matrix, we can transform them using a linear transformation.
The following image was transformed using a 256 ×256 version of the transformation matrix above,
shifting the image down by 50 pixels.
3
Here’s the code that produced the image above.
Row vectors can be transformed too, just by multiplying by the transformation matrix on the right
side. As an example:

1 2 3 4 5
?
?
?
?
?
?
0 0 0 0 1
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
?
?
?
?
?
?
=

2 3 4 5 1
However, the reordering is not the same as before—the transpose of the transformation matrix must
be used to shift the elements so that the last element is first.

1 2 3 4 5
?
?
?
?
?
?
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 0 0 0
?
?
?
?
?
?
=

5 1 2 3 4
4 Image Compression
4.1 The Discrete Sine Transform
The Discrete Sine Transform (DST) is a technique for decomposing a vector into a linear combination
of sine functions with different frequencies. The idea is similar to that of a Taylor series, except
instead of using polynomials to approximate a function, we are using sine functions. If the data in a
vector are smooth, then the low frequency components will dominate the linear combination. If the
data are not smooth (discontinuous, jagged, rapidly increasing or decreasing), then the coefficients
on the higher frequency components will have greater magnitude.
In practice, transforming a vector with the DST means multiplying the vector by a special matrix
called (unsurprisingly) the DST matrix. There are several ways to define the DST matrix; for
4
this assignment, use:
Si,j =
r
2
n
sin
p(i –
1
2
)(j –
1
2
)
n
!
Where S is an n × n matrix, and Si,j is the entry of S in the i-th row and j-th column. There are
several ways to construct this matrix, but the simplest way is to use nested for loops. If you get
stuck, you may find the second half of Worksheet 5 on the APPM 2460 web page helpful.
To apply this transform matrix to a vector, just multiply. So if y is the transformed version of x, we
would obtain it by computing y = Sx. Since S is square, the 1-D DST (i.e. the DST that operates
on vectors) is an operation that takes in a vector of length n and returns another vector of length
n. For 1-D data (vectors), the output is a vector containing weights for the different frequency
components; the higher the weight, the more important that frequency is.
However, we cannot use S to transform our data because our image data is a matrix, which is
two dimensional. So, we will need the 2-D transform. Thankfully, it’s very easy to compute the 2-D
DST using the 1-D transform matrix. Let Xg be the grayscale version of the image data†
. Then
the 2-D DST for the image Xg is:
Y = SXgS
T
Intuitively, you can think of SXg as applying applying the 1-D DST to the columns of Xg, and
XgS
T as applying the 1-D DST to the rows of Xg. So SXgS
T applies the 1-D transform to both
the rows and the columns of Xg. Our DST matrix has the special property that it is symmetric, or
equal to its transpose. So for our DST matrix S,
S
T = S
Now we can define the 2-D transformed image as:
Y = SXgS
If we want to get our original image back from the DST, we’ll need to know the inverse of S. Our
matrix S also has the property that it is its own inverse. So we have,
S
-1 = S
This is useful, since inverses are often difficult to compute.
†Xg is a matrix whose (i, j) entry represents the grayscale level at pixel position (i, j). In our case, the values
range from 0 to 255, with 0 being black and 255 being white.
5
Figure 2: DST coefficients for an image (values in the matrix Y ). Values in
the upper left are weights on low frequency sine components while values
in the lower right are weights on high frequency sine components. Since
the values in the upper left are significantly larger than those in the lower
right, we can see that low frequencies dominate the overall image.
4.2 Compression
JPEG is a type of “lossy compression,” which means that the compressed file contains less information
than the original. Since human eyes are better at seeing lower frequency components, we can
afford to toss out the highest frequency components of our transformed image. The more uniform
an image, the more data we can throw away without causing a noticeable loss in quality. More
complicated images can still be compressed, but heavy compression is more noticeable. Thankfully,
the DST helps us sort out which components of the image are represented by low frequencies, which
are the ones we need to keep.
The information corresponding to the highest frequencies is stored in the lower right of the transformed
matrix, while the information for the lowest frequencies is stored in the upper left. Therefore,
we want to save data in the upper left, and not store data from the remaining entries. We will simulate
this effect by zeroing out the high frequency components. The following code will zero out
the matrix below the off diagonal:
Adjusting the value of p moves the diagonal up and down the matrix, affecting how much data are
6
retained. This illustration shows how the off diagonal moves with changes in p.
After deleting the high frequency data, the inverse 2-D DST must be applied to return the transformed
image back to normal space (right now it will look nothing like the original photograph).
Since none of the zeros need to be stored, this process could allow for a significant reduction in file
size. The compression we perform here is a simplified version of the compression involved when
storing an image as a JPEG file, which uses a Discrete Cosine Transform on 8 × 8 blocks of the
image data.
Figure 3: The left figure was obtained by performing the DST on the
image Y = SXS, zeroing out the bottom right “half” of the matrix Y
(using the above algorithm with p=0.5), and then applying the inverse
DST. The right figure was obtained by zeroing out the top left “half” of Y
and applying the DST. This shows that the values in the bottom right half
of the matrix store the image data that corresponds to quickly oscillating
pixel intensities, or fine textures in the image. The values in the top left
of Y correspond to image data with coarser, smoother texture. We also
clearly see that the values in the top left contain the vast majority of the
information in the image.
5 Questions
Your friends have asked you to write code capable of shifting, cropping, and compressing images
(image compression can also double as a blurring effect because removing high frequency information
smooths sharp edges). To do so, follow the steps laid out in the questions below. In your report, you
should include examples of images that you have processed with your code, and you should describe
the techniques you used to achieve your results. For this project, submit all your code in an
appendix at the end of your report. Make sure to comment your code clearly, so it’s
very obvious which problem the code was for. Output not supported by code in the
appendix will not be counted.
7
5.1 Image Translation
1. Write a MATLAB function to read in the files photo1.jpg and photo2.jpg, which can be
found on the APPM 2460 web page, and store them as matrices of doubles. Convert the color
arrays into grayscale matrices and include these grayscale images in your writeup. These tasks
can be completed by piecing together the blocks of code given in Section 2.
2. When a grayscale image is stored as a matrix, it is a simple task to alter the exposure of
the image. Remembering that the values of the matrix represent the pixel intensity, increase
the exposure in photo2.jpg so that the resulting image appears more “whited out” than the
original. Include this increased-exposure image in your report and place the original alongside
it so your friends can clearly see the difference between the two.
3. Given an image that is 4 pixels x 4 pixels, so that the grayscale representation of the image
is a 4 x 4 matrix called A, what matrix E would you multiply A by in order to switch the
left-most column of pixels with the right-most column of pixels? Should you multiply A by E
on the right or the left (i.e. would AE, EA, or both give the desired result)?
4. The matrix from photo1.jpg is not square. We can, however, still apply matrices to shift the
pixels. Find a matrix that will perform a horizontal shift of 240 pixels to photo1.jpg and
include the shifted image in your write up.
Hint: We saw with n × n matrices that to perform a horizontal shift we multiply our matrix
by a transformation matrix on the right. The transformation matrix on the right was obtained
by transforming the columns of the n × n identity in the same way we wanted the columns
of the image matrix to be transformed. For a non-square matrix X, we can take the same
approach, but we have to start with the correct identity matrix. Think about the dimensions
of the matrix you want to transform and find the matrix IR such that XIR = X. Manipulate
the columns of IR to obtain the transformation matrix. Display your matrix using spy().
5. How could you perform a horizontal and vertical shift? That is, what matrix operations would
need to be applied to to get an image to wrap around both horizontally and vertically? Apply
transformations to the original matrix from photo1.jpg that result in both a horizontal and
vertical shift. This matrix isn’t square, so think about the dimensions for the appropriate
transformation matrices. Shift the image 240 pixels horizontally and 100 pixels vertically.
Display your transformation matrix/matrices using spy().
6. Using what you learned about transformation matrices, determine what matrix would be
required to flip an image upside down. Using that transformation, flip photo2.jpg upside
down. Use spy() once more to display your transformation matrix.
7. What should transposing an image matrix do? Try it with photo2.jpg. Does it look the way
you expected?
8. Use matrix operations to “fix” (shift the image around so that it matches a portion of the
photo1.jpg image) the distorted mount2.jpg, and reconnect it to mount1.jpg. You can
combine the pictures by creating a new matrix that contains the values for both images. When
you are done editing and combining the photos, you should have reconstructed a grayscale
version of photo1.jpg. Hint: When viewing the image using imagesc, the axis labels are the
pixel numbers.
8
5.2 Image Compression
9. Verify that, for the 5 × 5 matrix S, S = S
-1
. That is, show that SS = I5 (this can be done
in MATLAB).
10. Determine what steps need to be taken to undo the 2-D DST. Remember that our DST is
defined by Y = SXgS, and also the special properties of S. You can easily check to see if your
inverse transform works by applying it to Y and viewing it with imagesc.
11. Perform our simplified JPEG-type compression on the image of Albus the cat, photo2.jpg
• Read an image into MATLAB and store as a matrix of doubles
• Convert the 3-D RGB matrix to a 2-D grayscale matrix
• Perform the 2-D discrete sine transform on the grayscale image data
• Delete some of the less important values in the transformed matrix using the included
algorithm
• Perform the inverse discrete sine transform
• View the image or write to a file
Compress the image with several different values of p. Include sample images for compression
values that don’t cause an obvious drop in quality, as well as some that do.
12. You should be able to make p pretty small before noticing a significant loss in quality. Explain
why you think this might be the case. The point of image compression is to reduce storage
requirements; compare the number of non-zero entries in the transformed image matrix (Y ,
not Xg) to a qualitative impression of compressed image quality. What value of p do you
think provides a good balance? (no correct answer, just explain)
13. The compression ratio is defined as the ratio between the uncompressed file size and compressed
size
CR =
uncompressed size
compressed size .
Determine CR for p = 0.5, 0.3, 0.1, using the number of nonzero entries in your transformed
image matrix Y as a substitute for file size. In your view, what p value (any value of p, not just
p = 0.5, 0.3, 0.1) gives the best (largest) CR while still maintaining reasonable image quality?
9