Food Classification CSE 190 SP 11

Sunday, June 19, 2011

Semaine 12 dimanche

~ dimanche ~

OpenCV on Mac OS X

Installed OpenCV on Mac. About the same process as in Linux (thankfully), besides just installing the Xcode package instead of using yum in Fedora, and installing pkg-config manually:

Download pkg-config for Mac OS X:http://mac.softpedia.com/get/Developer-Tools/pkg-config.shtml
OR direct link
http://pkgconfig.freedesktop.org/releases/pkg-config-0.23.tar.gz

Unzip it into wherever.
Go into that directory
$ cd <pkg-config unzipped dir>
$ ./configure
$ make
$ sudo make install

Then set the environment variables just as in Linux:

$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
$ export LD_LIBRARY_PATH=/usr

Check it by running these two commands, should have something similar, in your /

usr/local/...:

$ pkg-config --cflags opencv

-I/usr/local/include/opencv -I/usr/local/include

$ pkg-config --libs opencv-L/usr/local/lib -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_ml -lopencv_video -lopencv_features2d -lopencv_calib3d -lopencv_objdetect -lopencv_con

trib -lopencv_legacy -lopencv_flann

For the compile lines to use in Makefile, see the Linux installation guide posted before.

Xcode is so big... - - a 4 GB download for 3+ hours and 10 GB install... O_O They should have optional packages so that we can choose to only download the UNIX development package...

Some of my built-in tests in test_cv.sh failed though, with errors FAIL(Bad accuracy) and FAIL(Invalid test data). Not sure why, the same process in Linux didn't get me any errors... test_cxcore.sh and test_ml.sh both worked. So I think I'll just ignore the ones that failed in test_cv.sh.

Subversion

Trying to check out the repository on a Mac, which I committed from Linux. Had a bunch problems with subversion, seemed to be caused by files with the same name but different capitalizations, which Linux allows because it's case-sensitive, but other systems don't, so SVN is giving all these errors when I fix and reupdate and fix and reupdate, seemed like a cycle of 3 files.

Initial error looks like:

svn: In directory '.'
svn: Can't open file '.svn/tmp/text-base/foodRecog.cpp.svn-base': No such file or directory

That means there's a file with the same name but different capitalization. Just delete the problematic file from the repository with:
$ svn rm http://repositoryURL/theProblemFile

Run update again. Most likely it won't let you run, saying

svn update
svn: Working copy '.' locked
svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for details)

Then run
$ svn cleanup
If it doesn't let you run, then do
$ rm .svn/log

Then svn cleanup should run, then svn update should run.
Solve the problem with other files that give the same error. May have to just delete the local copy and make a new folder and check out again.
Otherwise your update may be cycling through this type of errors for several files, even though there is nothing wrong with these files:

svn: Failed to add file 'myResize.sh': an unversioned file of the same name already exists

OR

svn: In directory '.'
svn: Can't move source to dest
svn: Can't move '.svn/prop-base/tempfile.7.tmp' to '.svn/prop-base/Makefile.svn-base': No such file or directory
(this error also means the same capitalization problem. But if this is not the initial error, this file might have no problem at all, it's just that you need to delete this entire folder, make a new one, and do a clean checkout, after you resolve the problems above with the other files. This was the case for me.)

Turns out the way to solve it is that after removing the offending file, don't update in the same directory. Delete it, then do a clean checkout in a new directory. That solved it :) Geez. Now hopefully it'll run on the Mac and we can start expanding our database.

Amazon Mechanical Turk

Thought about it. Looked at it. I think I'll still do some by myself and won't use it until I can't handle it. See how fast I am. If I'm just not fast enough, then I might just start using it. Not too sure how to set up the labeling yet though. I should do some tasks to try out how it works.

Friday, June 10, 2011

Semaine 11 vendredi - Quarter Final Report

~ vendredi ~

Final report for the course is here http://acsweb.ucsd.edu/~mezhang/cse190/MabelZhang_cse190_FinalReport.pdf

(Temporary location, might get wiped after we graduate in a few days... Will move to Google sites later.)

Friday, June 3, 2011

Semaine 10 vendredi - Le Plan

~ vendredi ~

In order to submit to a workshop, we must expand the depth in all directions, including:

Getting more cuisines and more dishes for the database
Looking closer at the different SVM kernels and how well they can work
Different low-level feature descriptors (currently using EMD, there's also RGB histogram and other features that preserve color information, and then SIFT and others that discard color information, might serve well to compare perserving vs. discarding color)
Maybe the attribute (ingredient) types we chose and how effective they are (this requires relabeling if change)
Any other part of the pipeline

The top concern is expanding the database, because labeling takes a lot of time, especially when we expand to more cuisines and more dishes. We might even consider Mechanical Turk from Amazon. Hmm I haven't used it before so I'll have to look at how to set it up, etc.

So we already have the code to grab images from Flickr, but we've found that Google Image's results are much higher quality and clearer than Flickr images, so it is desirable to have a batch downloading code / software for Google Image.

Resources I've found:

// Seems like it could only do the first page
Google Image Ripper - online service
http://www.dearcomputer.nl/gir/
Linked from
http://labnol.blogspot.com/2006/07/how-to-leech-pictures-from-flickr-or.html
"This online service extracts the full size images [no thumbnails] from Google index and displays them in one page. You can then save the full page with attachments to build your offline gallery of Google Images."

// Hmm didn't work, probably old software
MultiImageDownloader (freeware, looks like this download is Windows only?)
http://www.addictivetips.com/windows-tips/download-google-images-in-bulk/
Download page
http://www.freewarefiles.com/MultiImageDownloader_program_55357.html
Its Developer's page
http://chesterway.co.uk/

// Trial version only allows 30 images to be downloaded...

WebImageGrab (formerly Googlegrab, Windows & Mac OS)

http://www.sas21.de/apps/webimagegrab/

Review

http://www.ghacks.net/2006/07/22/grab-images-from-google-image-search/

Batch Image Downloader - Firefox Addon on Google Code, Downloads selected images on a webpage
http://code.google.com/p/batch-images-downloader/

Review articles about various tools to batch download images from the web
http://www.ghacks.net/tag/download-images/

My second concern is that once the database is expanded, the algorithm won't work anymore! Hahaha... yeah. We'll see.

The other detail-looks listed above require some additions to the code, such as adding code that calculates other feature descriptors, rerunning with different SVM parameters, and other experiments.

Wednesday, June 1, 2011

Semaine 10 mercredi - Les Résultats!

~ mercredi ~

I think it works!!!!!

Hold on.

Layer 1: SVM, RBF kernel
Layer 2: SVM, 3rd degree Polynomial

Final result (i.e. layer 2 cuisine classification) with cross validation:
Fold    Accuracy
0       60.000000
1       86.666664
2       95.555557
3       86.666664
4       85.416664
mean     82.8611010
var     143.854935
stdev   11.993954

Q1: How do I know whether it's overfitting? Can I check somehow? If I do 5th degree polynomial, then it's lower:
Layer 2: SVM, 5th degree Polynomial
Fold       Accuracy
0               51.111111%
1               77.777779%
2               84.444443%
3               77.777779%
4               83.333336%
mean             74.888890%
var             148.938279
std dev         12.204027
When I used RBF kernel, it just didn't work at all, almost everything gets classified as a single class, so the accuracy is around 33.33%.

Q2: The variance looks really large. Fold 0 for some reason doesn't work well.

Wednesday, May 25, 2011

Semaine 9 mercredi - Full Run-down of Process

~ mercredi ~

By the way, final report rough draft here http://acsweb.ucsd.edu/~mezhang/cse190/MabelZhang_cse190_FinalReportRoughDraft.pdf

And... eh... oh! Right. Found some mistakes, one in pure code, one in the way I did the training matrix for layer 1 attribute classifiers. Yeah that was dumb I only gave it positive data and never gave it any negative data so the predictions were weird. But now, it's so much better!

The matrix is sparse, since there are 16 ingredient categories, and each dish usually only contains 3 or 4. Regardless, I can see from the comparison of the ground truth and the prediction results that the places where it should be 0, it got 0, and where it shouldn't be, it actually predicted a percentage that's reasonably close! (It's regression so can't tell the exact accuracy by hard comparison, will probably need threshold if I were to automate the accuracy count. Because the matrix is sparse, most of the accuracy rate is inflated because many (at least ~80% by eyeballing) entries are 0s.)

Pictures! Okay I finally have a pictorial description of what I did.

______________________________________________________________________

First, the original input images (228 total):

(* there are some extra images in the directory, so in case you go count the pictures in the screenshot, no it doesn't add up to 228.)

http://acsweb.ucsd.edu/~mezhang/cse190/input_bibimbap.png
http://acsweb.ucsd.edu/~mezhang/cse190/input_bulgogi.png
http://acsweb.ucsd.edu/~mezhang/cse190/input_pasta.png
http://acsweb.ucsd.edu/~mezhang/cse190/input_pizza.png
http://acsweb.ucsd.edu/~mezhang/cse190/input_sashimi.png
http://acsweb.ucsd.edu/~mezhang/cse190/input_sushi.png

______________________________________________________________________

Images clustered into ingredients regions by color textons (228 total):

Each "grayness" is 1 cluster. It may be discontinuous in the image, depending on where the ingredient is.

(* there are some extra images in the directory, so in case you go count the pictures in the screenshot, no it doesn't add up to 228.)

http://acsweb.ucsd.edu/~mezhang/cse190/textonClusters_all_s.png

______________________________________________________________________

Illustration of the classification process, from raw image to prediction, using one typical image as an example:

1. Original image. A typical image.

http://acsweb.ucsd.edu/~mezhang/cse190/bibimbap01_s.jpg

2. Clustered grayscale image by ingredient region, using Texton-izer. Each grayness is interpreted as one ingredient.
Note that there's a list of small numbers on the upper-left corner. The grayness of the number corresponds to the grayness of the cluster. This number is used as the row number in the label file for the convenience of labeling.

http://acsweb.ucsd.edu/~mezhang/cse190/bibimbap01_s_textonMap.jpg

3. Manually label each gray cluster with an ingredient ID, from the list of a total of 16 ingredients we defined (below). Each ingredient cluster is assigned one ingredient ID.

0 pasta
1 tomato
2 greens
3 red fish, as in tuna
4 seaweed, as in sushi
5 carrots, orange or pink veggies
6 meat, brown bread
7 orange fish, shrimp, fish eggs, fried food
8 white veggies, rice
9 dark veggies, eel-colored ingredients
10 egg yolk, yellow green veggies, cooked
onions, light white fish
11 chili sauce, kimchi, red peppers, red veggies
12 whitefish, pinkish raw fish
13 cheese
14 flour (roasted, yellow), burnt cheese
15 pepperoni, sausage

This image's label file would be the following (row # corresponds to the cluster #, so 0 1 2 3 4 5):
-1
10
8
5
-1
-1

(* last row is just discarded when read into program, it's an extra row when I thought I'd cluster into 6 clusters.)

4. Read back the manual label file. Two things to do,

i. Output area label file to later help build the attribute vector for attribute-based classification (this is NOT the attribute vector YET). Area is measured by the number of pixels in a cluster. The vector is a n by 1 vector, where n = number of clusters.

This image's area distribution is (row # is the cluster #):
1754
29016
11196
7476
6558

Later, we will use the ratio of (an ingredient's area / total food area) to label the attribute vector for layer 1 training of the attribute-based classification.

ii. Output EMD data. For each cluster, use all the RGB points in that cluster to calculate its Earth Mover's Distance with all other clusters in this image. Distance between two points that we gave to EMD is simply the standard formula, square root of (R1-R2)^2 + (G1-G2)^2 + (B1-B2)^2.

A constant NONEXISTENT = 1 is used to indicate ingredients that aren't on the image.

Clusters that don't have an ingredient are discarded. Their EMD vector is later automatically filled with a row of [-1, ..., -1] , before training.

This image's EMD:

-1.000000 -1.000000 -1.000000 -1.000000 -1.000000 145.025467 -1.000000 -1.000000 138.852463 -1.000000 0.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
-1.000000 -1.000000 -1.000000 -1.000000 -1.000000 44.827419 -1.000000 -1.000000 0.000000 -1.000000 138.755737 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
-1.000000 -1.000000 -1.000000 -1.000000 -1.000000 0.000000 -1.000000 -1.000000 54.944424 -1.000000 104.543594 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000

EMD with self is 0, so from looking at which column is 0, you can tell what ingredient ID it is. For this, it's ingredients 10, 8, 5, which matches our manual labels, so it's a sanity check.

5. Output attribute vector. Using the manual label file and the outputted area file, we can derive which ingredients exist in the image, and their corresponding areas. We discard the clusters labeled with -1 in the manual file (recall that -1 means there is no ingredients there), and we add up the areas of all clusters with an ingredient, => total food area. Then for each ingredient, we use (ingredient area / total food area) to produce a fraction, which goes into the attribute vector.

The attribute vector is a nx1 vector, where n = number of ingredients, in this case, 16. Each row means the area ratio that this ingredient occupies on the plate.

In English, this means that the vector tells us what ingredients exist on that plate, and how much (the amount) there is, relative to other ingredients, i.e. what portion of the dish is each ingredient.

For this image, it happens to be:

0.000000
0.000000
0.000000
0.000000
0.000000
0.156769
0.000000
0.000000
0.234776
0.000000
0.608455
0.000000
0.000000
0.000000
0.000000
0.000000

All the non-zero rows should add up to 1, which they do.

6. Train attribute classifier (layer 1 of the attribute-based classification) for each ingredient (i.e. the "attribute").

For each ingredient, all the images containing that ingredient are gathered to form the positive data, where the area count is non-zero, and all the remaining images (which do not contain that ingredient) form the negative data, where the area count is 0.

For each image, the training sample is the EMD vector of this ingredient in this image (EMD, or RGB color histogram, or other low-level feature descriptors). The ground truth is this ingredient's area fraction.

So one row of the training matrix looks like:
| EMD data | Ground truth: area fraction |

For the example image, its row in the training data of ingredients 10, 8, 5 are filled, its row in other ingredients' classifier is just all 0s (meaning area is 0, those ingredients doesn't appear in the image).

For ingredient 10:

-1.000000 -1.000000 -1.000000 -1.000000 -1.000000 145.025467 -1.000000 -1.000000 138.852463 -1.000000 0.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
Ground truth: 0.608455

Ingredient 8:
-1.000000 -1.000000 -1.000000 -1.000000 -1.000000 44.827419 -1.000000 -1.000000 0.000000 -1.000000 138.755737 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
Ground truth: 0.234776

Ingredient 5:
-1.000000 -1.000000 -1.000000 -1.000000 -1.000000 0.000000 -1.000000 -1.000000 54.944424 -1.000000 104.543594 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
Ground truth: 0.156769

For all other ingredients:
-1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
Ground truth: 0.000000

Training is done with SVM RBF kernel regression.

7. Test on test fold, output prediction results.

8. Train layer 2 classifer, the final cuisine classifier. (* in progress)

Training data is the attribute vector.
Ground truth is the cuisine ID, 0 for Italian, 1 for Japanese, 2 for Korean.

9. Test on test fold. (* in progress)

Testing data is a vector extracted from the prediction result outputted in step 7 above.
Ground truth is the cuisine ID.

Saturday, May 21, 2011

Semaine 8 samedi

~ samedi ~

Footnote: Just realized that OpenCV actually has a EMD implementation... O_O Wow. Wow. Wow okay. Well okay whatever.

Found a stupid bug that resulted in all the points used for EMD being the same point. Results from last time are certainly flawed.

Turned on randomly choosing the points to pass to EMD, instead of taking the first 100. The effect is that the distance between two clusters is now somewhat randomized, so it's a bit different every time, and the reversed distance is not exactly the same anymore, e.g. emd(cluster1, cluster2) is not equal to emd(cluster2, cluster1), where as before, it was equal.

Friday, May 20, 2011

Semaine 8 vendredi

~ vendredi ~

Finally got attribute-based classifier code fully running, but result doesn't look quite right... Wonder if it's because there's too few data samples for each attribute. The regression predictions look almost identical for every fold, and the accuracy is 0%. The range is somewhere in there, but there must be either something wrong with the way I'm constructing the matrix, or something wrong with the training.

Ran RGB color histogram and EMD on the complete set of data again, since it is increased 4 times from before. Color histogram did surprisingly better than the last round:

fold#   accuracy%
0       46.666668
1       55.555557
2       57.777779
3       51.111111
4       56.250000
mean    53.472223
var     15.564670993
std dev 3.945208612

EMD, on the other hand, dropped a lot:
fold #    accuracy%
0        9.615385
1        17.391304
2
3
4