Image Manipulation in MATLAB

Due 11/2 at 11:59 PM

1 Introduction

A couple of your friends in the business school have a concept for a great new picture editing application

for mobile devices, but they need help on the technical side of things. They’ve asked you

to come on board and help write some of the code necessary to manipulate the images. It just

so happens that you have been studying matrix manipulations in your differential equations class,

and this knowledge along with some basic programming skills will allow you to make a significant

contribution to the application.

Digital images are just matrices of pixels, and any type of matrix operation can be applied to a

matrix containing image data. In this project, you will explore some ways to manipulate images using

MATLAB. We’ll start off with transformation matrices and then move on to image compression.

2 Basic MATLAB commands

MATLAB has some nice built in functions for reading and writing image files—the first command

we’ll be using is imread, which reads in an image file saved in the current directory. imread works

with most popular image file formats. To view an image in a matlab figure, use imagesc. imagesc

is similar to image, but for our purposes will work more consistently. The following code will read

in an image with file name photo1.jpg, save it as the variable X, and display the image in a matlab

figure window. Make sure you run the code from the same folder that contains the image.

After reading in an image like this, X(:,:,1) is a 2-D matrix with intensity values for the red

channel, X(:,:,2) for the green and X(:,:,3) for the blue.

When images are read in using imread, MATLAB stores the data as 8-bit integers, or integers

that can range from 0 to 255. If we want to perform mathematical operations on the image data

using floating point numbers, the integers must be converted to floats as well. If you just read in

an image as X, you can use X double = double(X); to perform the floating point conversion.

If you want to write image data to an image file, you can use imwrite. Note that if you converted

the image data to the double format, you’ll need to convert back to integer values. The

command to do this is uint8. The following code shows how to read in an image, convert to double,

and write to a .jpg file. You’ll want to build your image manipulation code around this template

if you wish to write your output to an image file.

1

Note that if you want to view the image after you’ve converted the image data to double format,

you may need to convert back to uint8 just as you did to use imwrite. In other words, the command

you’ll need to view an image once you’ve converted it to double format is imagesc(uint8(X double)).

For much of this lab, we’ll be working with grayscale versions of photos to keep the matrix manipulations

simpler. To make a grayscale image out of our color image, we want to combine the matrices

that store the information for the red, green, and blue colors into one matrix whose (i, j)-th entry

tells the intensity level of the pixel at that position in the image, with 0 being black and 255 being

white. Since we want the information from each of the matrices storing the red, green, and blue

values for the image to contribute to the intensity of each pixel, we’ll create the matrix for the

grayscale image by making a linear combination of each of X double(:,:,1), X double(:,:,2),

and X double(:,:,3). The following lines of code will convert the image to grayscale and display

this converted image in a MATLAB figure:

X gray = .3*X double(:,:,1) + .3*X double(:,:,2) + .4*X double(:,:,3);

imagesc(uint8(X gray))

colormap(’gray’)

3 Image Manipulation

Matrix multiplication allows us to transform a vector in many ways. The following matrix takes

the entries of a vector and shifts them down one position, cycling the last entry around to the top.

?

?

?

?

?

?

0 0 0 0 1

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

?

?

?

?

?

?

?

?

?

?

?

?

1

2

3

4

5

?

?

?

?

?

?

=

?

?

?

?

?

?

5

1

2

3

4

?

?

?

?

?

?

Matrices like the one on the left above, which we’ll call P, can be helpfully visualized with the

MATLAB command spy(). If P is defined in the MATLAB workspace, then typing spy(P) gives

the figure shown below. spy() is especially useful for visualizing sparse* matrices that have large

dimensions.

*Matrices with relatively few nonzero entries.

2

Figure 1: The figure shows the location of the nonzero entries of the matrix

P. The axes give the row (y-axis) and column (x-axis) indices for the

nonzero entries.

Notice that if you start with the identity matrix and interchange rows until you get the matrix on

the left above, multiplying a vector by that new matrix applies the same row interchanges to the

vector. Each of the rows were shifted down one, and the last row cycled around to the top. This

is called a permutation matrix because it is obtained by permuting the rows and columns of the

identity matrix. In this case, row 1 turns into row 6, row 2 turns into row 3 etc. This transformation

matrix works on matrices, too.

?

?

?

?

?

?

0 0 0 0 1

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

?

?

?

?

?

?

?

?

?

?

?

?

1 6 11 16 21

2 7 12 17 22

3 8 13 18 23

4 9 14 19 24

5 10 15 20 25

?

?

?

?

?

?

=

?

?

?

?

?

?

5 10 15 20 25

1 6 11 16 21

2 7 12 17 22

3 8 13 18 23

4 9 14 19 24

?

?

?

?

?

?

Notice how the matrix multiplication cycled the rows around in the matrix the same way it did in

the vector. Since an image is just a matrix, we can transform them using a linear transformation.

The following image was transformed using a 256 ×256 version of the transformation matrix above,

shifting the image down by 50 pixels.

3

Here’s the code that produced the image above.

Row vectors can be transformed too, just by multiplying by the transformation matrix on the right

side. As an example:

1 2 3 4 5

?

?

?

?

?

?

0 0 0 0 1

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

?

?

?

?

?

?

=

2 3 4 5 1

However, the reordering is not the same as before—the transpose of the transformation matrix must

be used to shift the elements so that the last element is first.

1 2 3 4 5

?

?

?

?

?

?

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

1 0 0 0 0

?

?

?

?

?

?

=

5 1 2 3 4

4 Image Compression

4.1 The Discrete Sine Transform

The Discrete Sine Transform (DST) is a technique for decomposing a vector into a linear combination

of sine functions with different frequencies. The idea is similar to that of a Taylor series, except

instead of using polynomials to approximate a function, we are using sine functions. If the data in a

vector are smooth, then the low frequency components will dominate the linear combination. If the

data are not smooth (discontinuous, jagged, rapidly increasing or decreasing), then the coefficients

on the higher frequency components will have greater magnitude.

In practice, transforming a vector with the DST means multiplying the vector by a special matrix

called (unsurprisingly) the DST matrix. There are several ways to define the DST matrix; for

4

this assignment, use:

Si,j =

r

2

n

sin

p(i –

1

2

)(j –

1

2

)

n

!

Where S is an n × n matrix, and Si,j is the entry of S in the i-th row and j-th column. There are

several ways to construct this matrix, but the simplest way is to use nested for loops. If you get

stuck, you may find the second half of Worksheet 5 on the APPM 2460 web page helpful.

To apply this transform matrix to a vector, just multiply. So if y is the transformed version of x, we

would obtain it by computing y = Sx. Since S is square, the 1-D DST (i.e. the DST that operates

on vectors) is an operation that takes in a vector of length n and returns another vector of length

n. For 1-D data (vectors), the output is a vector containing weights for the different frequency

components; the higher the weight, the more important that frequency is.

However, we cannot use S to transform our data because our image data is a matrix, which is

two dimensional. So, we will need the 2-D transform. Thankfully, it’s very easy to compute the 2-D

DST using the 1-D transform matrix. Let Xg be the grayscale version of the image data†

. Then

the 2-D DST for the image Xg is:

Y = SXgS

T

Intuitively, you can think of SXg as applying applying the 1-D DST to the columns of Xg, and

XgS

T as applying the 1-D DST to the rows of Xg. So SXgS

T applies the 1-D transform to both

the rows and the columns of Xg. Our DST matrix has the special property that it is symmetric, or

equal to its transpose. So for our DST matrix S,

S

T = S

Now we can define the 2-D transformed image as:

Y = SXgS

If we want to get our original image back from the DST, we’ll need to know the inverse of S. Our

matrix S also has the property that it is its own inverse. So we have,

S

-1 = S

This is useful, since inverses are often difficult to compute.

†Xg is a matrix whose (i, j) entry represents the grayscale level at pixel position (i, j). In our case, the values

range from 0 to 255, with 0 being black and 255 being white.

5

Figure 2: DST coefficients for an image (values in the matrix Y ). Values in

the upper left are weights on low frequency sine components while values

in the lower right are weights on high frequency sine components. Since

the values in the upper left are significantly larger than those in the lower

right, we can see that low frequencies dominate the overall image.

4.2 Compression

JPEG is a type of “lossy compression,” which means that the compressed file contains less information

than the original. Since human eyes are better at seeing lower frequency components, we can

afford to toss out the highest frequency components of our transformed image. The more uniform

an image, the more data we can throw away without causing a noticeable loss in quality. More

complicated images can still be compressed, but heavy compression is more noticeable. Thankfully,

the DST helps us sort out which components of the image are represented by low frequencies, which

are the ones we need to keep.

The information corresponding to the highest frequencies is stored in the lower right of the transformed

matrix, while the information for the lowest frequencies is stored in the upper left. Therefore,

we want to save data in the upper left, and not store data from the remaining entries. We will simulate

this effect by zeroing out the high frequency components. The following code will zero out

the matrix below the off diagonal:

Adjusting the value of p moves the diagonal up and down the matrix, affecting how much data are

6

retained. This illustration shows how the off diagonal moves with changes in p.

After deleting the high frequency data, the inverse 2-D DST must be applied to return the transformed

image back to normal space (right now it will look nothing like the original photograph).

Since none of the zeros need to be stored, this process could allow for a significant reduction in file

size. The compression we perform here is a simplified version of the compression involved when

storing an image as a JPEG file, which uses a Discrete Cosine Transform on 8 × 8 blocks of the

image data.

Figure 3: The left figure was obtained by performing the DST on the

image Y = SXS, zeroing out the bottom right “half” of the matrix Y

(using the above algorithm with p=0.5), and then applying the inverse

DST. The right figure was obtained by zeroing out the top left “half” of Y

and applying the DST. This shows that the values in the bottom right half

of the matrix store the image data that corresponds to quickly oscillating

pixel intensities, or fine textures in the image. The values in the top left

of Y correspond to image data with coarser, smoother texture. We also

clearly see that the values in the top left contain the vast majority of the

information in the image.

5 Questions

Your friends have asked you to write code capable of shifting, cropping, and compressing images

(image compression can also double as a blurring effect because removing high frequency information

smooths sharp edges). To do so, follow the steps laid out in the questions below. In your report, you

should include examples of images that you have processed with your code, and you should describe

the techniques you used to achieve your results. For this project, submit all your code in an

appendix at the end of your report. Make sure to comment your code clearly, so it’s

very obvious which problem the code was for. Output not supported by code in the

appendix will not be counted.

7

5.1 Image Translation

1. Write a MATLAB function to read in the files photo1.jpg and photo2.jpg, which can be

found on the APPM 2460 web page, and store them as matrices of doubles. Convert the color

arrays into grayscale matrices and include these grayscale images in your writeup. These tasks

can be completed by piecing together the blocks of code given in Section 2.

2. When a grayscale image is stored as a matrix, it is a simple task to alter the exposure of

the image. Remembering that the values of the matrix represent the pixel intensity, increase

the exposure in photo2.jpg so that the resulting image appears more “whited out” than the

original. Include this increased-exposure image in your report and place the original alongside

it so your friends can clearly see the difference between the two.

3. Given an image that is 4 pixels x 4 pixels, so that the grayscale representation of the image

is a 4 x 4 matrix called A, what matrix E would you multiply A by in order to switch the

left-most column of pixels with the right-most column of pixels? Should you multiply A by E

on the right or the left (i.e. would AE, EA, or both give the desired result)?

4. The matrix from photo1.jpg is not square. We can, however, still apply matrices to shift the

pixels. Find a matrix that will perform a horizontal shift of 240 pixels to photo1.jpg and

include the shifted image in your write up.

Hint: We saw with n × n matrices that to perform a horizontal shift we multiply our matrix

by a transformation matrix on the right. The transformation matrix on the right was obtained

by transforming the columns of the n × n identity in the same way we wanted the columns

of the image matrix to be transformed. For a non-square matrix X, we can take the same

approach, but we have to start with the correct identity matrix. Think about the dimensions

of the matrix you want to transform and find the matrix IR such that XIR = X. Manipulate

the columns of IR to obtain the transformation matrix. Display your matrix using spy().

5. How could you perform a horizontal and vertical shift? That is, what matrix operations would

need to be applied to to get an image to wrap around both horizontally and vertically? Apply

transformations to the original matrix from photo1.jpg that result in both a horizontal and

vertical shift. This matrix isn’t square, so think about the dimensions for the appropriate

transformation matrices. Shift the image 240 pixels horizontally and 100 pixels vertically.

Display your transformation matrix/matrices using spy().

6. Using what you learned about transformation matrices, determine what matrix would be

required to flip an image upside down. Using that transformation, flip photo2.jpg upside

down. Use spy() once more to display your transformation matrix.

7. What should transposing an image matrix do? Try it with photo2.jpg. Does it look the way

you expected?

8. Use matrix operations to “fix” (shift the image around so that it matches a portion of the

photo1.jpg image) the distorted mount2.jpg, and reconnect it to mount1.jpg. You can

combine the pictures by creating a new matrix that contains the values for both images. When

you are done editing and combining the photos, you should have reconstructed a grayscale

version of photo1.jpg. Hint: When viewing the image using imagesc, the axis labels are the

pixel numbers.

8

5.2 Image Compression

9. Verify that, for the 5 × 5 matrix S, S = S

-1

. That is, show that SS = I5 (this can be done

in MATLAB).

10. Determine what steps need to be taken to undo the 2-D DST. Remember that our DST is

defined by Y = SXgS, and also the special properties of S. You can easily check to see if your

inverse transform works by applying it to Y and viewing it with imagesc.

11. Perform our simplified JPEG-type compression on the image of Albus the cat, photo2.jpg

• Read an image into MATLAB and store as a matrix of doubles

• Convert the 3-D RGB matrix to a 2-D grayscale matrix

• Perform the 2-D discrete sine transform on the grayscale image data

• Delete some of the less important values in the transformed matrix using the included

algorithm

• Perform the inverse discrete sine transform

• View the image or write to a file

Compress the image with several different values of p. Include sample images for compression

values that don’t cause an obvious drop in quality, as well as some that do.

12. You should be able to make p pretty small before noticing a significant loss in quality. Explain

why you think this might be the case. The point of image compression is to reduce storage

requirements; compare the number of non-zero entries in the transformed image matrix (Y ,

not Xg) to a qualitative impression of compressed image quality. What value of p do you

think provides a good balance? (no correct answer, just explain)

13. The compression ratio is defined as the ratio between the uncompressed file size and compressed

size

CR =

uncompressed size

compressed size .

Determine CR for p = 0.5, 0.3, 0.1, using the number of nonzero entries in your transformed

image matrix Y as a substitute for file size. In your view, what p value (any value of p, not just

p = 0.5, 0.3, 0.1) gives the best (largest) CR while still maintaining reasonable image quality?

9