Sunday, April 10, 2011

Semaine 2 (Week 2) dimanche

~ dimanche ~

Bah, c'est très joli aujourd'hui.... mais je suis malade... - -

Ran textonizer (from Google Code) and seems to work quite well on the test image [5]. It does its own segmentation, calculates the textons, and uses kmeans to cluster the segments, it seems. Its output is an integer array with values that indicate which cluster the pixel belongs to, ex. if number of clusters == 4, then the output values are 0, 1, 2, 3, 255, where 255 is the background cluster.

Parameters that the user can tune include the number of clusters, a pixel that represents pixels that should be clustered as the background (value 255), texton size (default 30). The system has a nice .doc file that specifies the parameters and explains the algorithm like a paper.

Now that the similar-looking ingredients are clustered into the same group, we have better data than raw segments. Probably don't need JSEG anymore.

Problems:

NUM_CLUSTER is specified by user. How do I get the optimal number of clusters? Ideally, it should be equal to the number of ingredients in the dish, but how does the program know how many ingredients in the dish, before clustering occurs? Two ways:
1. Specify ONE constant number and hope it works well enough for all images.
2. Do some kind of guessing algorithm to get a good number? How do we guess?

Background: Background and plate are correctly clustered into two different clusters than the ingredients, this is good. However, that's 2 clusters, not 1, because the background appearance is not the same as the plate. How do we prune the plate out from the data? Two ways:
1. Edit the images so that there is no table in the image, only the plate and food. This is good because table can be in any kind of color and textures, this is easy to get rid of it and now we'd only have 1 background cluster - the plate, and we can rule it out easily in the algorithm. However, this is a lot of manual work.
2. In the algorithm, figure out which cluster is the plate, and rule it out, along with the background cluster. How to figure out which cluster is the plate...?

Original and clustered images according to textons:










After that, now that we have the clusters, we need to label the clusters into ingredient categories and send it into training.

First, need to get the matrix from the textonizer's internal data.

Each matrix is size (number of pixels) x 15, for a texton size of 30. Each pixel corresponds to a cluster (an ingredient). Each image has 4 clusters (ingredients). The cluster values are 0, 1, 2, 3, 255 (background cluster).

Each pixel has the cluster as the label. Each cluster corresponds to an ingredient. Cluster number across images are not the same. They may refer to different ingredients, depending on the image.

Pick 10 images from dataset, output each image's texton vectors into a file *_textons.txt. For each row in the file, it's a 1x15 row vector representing the texton of that pixel.

Output each image's cluster (the answer given by kmeans) into a file *_labels.txt.

TODO:
Hand replace the cluster number in each of the 10 _labels.txt files with a number from 0-7 (assuming we have 8 ingredients total in all the 10 images).

We can actually set the right number of clusters for each image, and train the ingredients using very very accurate ingredient clusters.

Now we have 8 labeled ingredients. Feed it into SVM for training and save the model trained.

For testing, do much of the similar thing, except won't have labels, and feed to SVM to get the predicted labels.

[5] Texton-izer : An Irregular Textonizer. http://code.google.com/p/texton-izer/

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home