In this project the goal was recognizing the unseen picture classes using the attributes of seen classes.

This is Project for ZJU AI Challenge Zero-Shot Learning Competition 2018.

The contest attracted a total of 3224 teams from all over the world to participate in the competition.

My team “Parsian” ranked 14th at the end of the competition.

You can reach to open source base code here:


  •  GAN and Data Generation:

Due to the lack of data for unseen labels, the state-of-art for this project was training a Generative Adversarial Network in order to generate the unseen dataset.

According to the GAN Architecture (Figure bellow), the generative network makes a random image with a fake label `ZJULFAKE`.

Both discriminator and generator networks will train through the response of the discriminator network to the augmented data set including both real and fake data.

Training procedure of both networks continues until the state that the discriminator network cannot discriminate the fake and the real data, this shows the generator network works perfectly. At this point, the trained generator network was used in the main architecture of the learning system.

Now the problem is simplified to a classic classification then the network has enough data of all labels, and the main CNN will be trained using the augmented dataset from both Train Data and Generated Data (Using GAN).

Using trained CNN on test data, and the result will be a Label Probability Vector for each image.
(This vector shows the membership probability of an image for each label.)

In the next step, a Weighted Average of Attribute is extracted for each image, where the membership probability of each label will be the attributes weight for that label. Result attribute vector will be constructed by a weighted average of each attribute.

Now, each image has its attribute vector. Hence, finding the matched label for that attribute vector by measuring the distance of that vector on a manifold is possible. In the end, the nearest class to image vector will be selected as the correct class for that image.

  • CNN – RESNET152

To prevent the vanishing gradient, a RESNET152 has been used. Here is the RESNET152 structure.

  • Manifold

Manifold and clustering usage for find the best match label after computing the attributes values is depicted in the figure bellow.

In this project an image processing technic has been used to detect a red marker, track it, and draw the route. The achieved route was used for controlling an omni-directional robot.

Here is the link of the open source code on Github:

Here is a short video of this project.