ディープラーニングによるきめ細かな花の分類 Fine-grained Flower Classification by Deep Learning

thumb image



認識で用いたディープラーニングのモデルは GoogleNet with Batch Normalization です。コードは Soumith Chintala氏の torchによる実装をほぼそのまま使用しました。



各画像はモデルの後段で1000次元のベクトルで表現されています。このベクトルを t-SNE という手法で2次元ベクトルに変換したものが花マップです。

top1 accuracy = 77.5%
top5 accuracy = 96.5 %

Since in LSVRC 2012 Hinton’s group won the competition with his epoch-making deep learning system, SuperVision, applications of deep learning theory to various visual tasks are quite active. We at STAIR Lab. developed an AI engine that can recognize the species of a flower using the technology.

It is said that there are more than hundreds of thousands of species of flowers in the world. Our AI engine can recognize only several hundreds of species out of them. They are flowers mostly found in our neighborhood.
Please see the following for the entire list of recognizable flowers.

Deep Learning Model
The task here is muti-class classification. Especially this task is an example of fine-grained recognition.
The number of classes is 406. This number comes from the availability of photos (see Dataset section below).
The model used is GoogleNet with batch normalization. We used the code written in torch by Dr. Soumith Chintala ( ). Only the last layer was modified to match with the number of classes, 406 (see below).
We experimented with the model pre-trained using ImageNet dataset, but the result was not so different from the one trained from scratch.

Images of flowers are collected from ImageNet. There are many flower nodes in ImageNet, but some don’t have enough number of images. The list of adopted 406 nodes (WNID) is here. For each of 406 nodes, we collected at least 700 images.

Attention: Some images stored in ImageNet are incorrectly labeled. So if you want to use them for your own deep learning experiment, first you need data-cleaning by removing incorrectly labeled images. You may also need to add images to supplement the deleted images. This is what we did. Indeed this is the hardest part of this experiment.

Flower Map
Each image is transformed into a vector of 1,000 dimensions along the deep convolutional neural network. These vectors can be transformed further to two dimensional vectors by means of the well known dimension reduction method, t-SNE. Flower Map is a map that plots these two dimensional vectors. In the map each image is placed on its t-SNE 2D coordinates. In order to avoid too much overlapping, we give a random z-axis coordinate for each image. So in practice, Flower Map is a 3D map. You can see that a well-trained category looks like an isolated island. Some islands are getting together to form a large continent, which implies those categories are difficult to distinguish one another.

Using the test dataset that has 50 images for each nodes, the model achieve the following.
top1 accuracy = 77.5%
top5 accuracy = 96.5%

Members 関連メンバー