CIFAR-10 and CIFAR-100 Dataset in PyTorch
In the previous topic, we learn how to use the endless dataset to recognized number image. The endless dataset is an introductory dataset for deep learning because of its simplicity. The endless dataset is a hello world for deep learning.
The CIFAR 10(Canadian Institute for Advanced Research) will be harder to classify and will come with new barriers which we will need to overcome. It is a collection of the image which is commonly used to train machine learning and computer vision algorithms. The CIFAR 10 dataset contains 50000 training images and 10000 validation images such that the images can be classified between 10 different classes.
The CIFAR-10 dataset consists of 60000 thirty by thirty color images in 10 classes means 6000 images per class. This dataset is divided into one test batch and five training batches. Every batch contains 10000 images. In the test batch, there are 1000 images which are randomly selected from each class. The training batch contains remaining images in random order. Some of the training batches may contain more images from one class than another.
The classes will be completely mutually exclusive. There will be no overlapping between automobiles and trucks. Automobiles include things which are similar to sedans and SUVs. Trucks class includes only big trucks, and it neither includes pickup trucks. As opposed to the MNIST dataset, the objects within these classes are much more complex in nature and extremely varied. If we are looked through the CIFAR dataset, we realize that there is not just one type of bird or cat. The bird and cat class contains many different types of birds and cat varying in size, color, magnification, different angles, and different poses.
With the endless dataset, although there are many ways in which we can write the number one and number two. It just was not as varied, and on the top of that, the endless dataset is a gray scalar. The CIFAR dataset contains a larger 32 by 32 color images, and each image is with three different color channels. Now our biggest question is that the LeNet model which performed so well on the endless dataset will it be enough to classify CIFAR dataset?
CIFAR-100 Dataset
It is just like the CIFAR-10 dataset. The only difference is that it has 100 classes containing 600 images per class. There are 100 testing images and 500 training images per class. These 100 classes are grouped into 20 superclasses, and each image comes with a “coarse” label (the superclass to which it belongs) and a “fine” label (the class to which it belongs).
There are the following classes in the CIFAR-100 dataset:
S. No | Superclass | Classes |
---|---|---|
1. | aquatic mammals | beaver, dolphin, otter, seal, whale |
2. | flowers | orchids, poppies, roses, sunflowers, tulips |
3. | fish | aquarium fish, flatfish, ray, shark, trout |
4. | food containers | bottles, bowls, cans, cups, plates |
5. | household electrical devices | clock, computer keyboard, lamp, telephone, television |
6. | fruit and vegetables | apples, mushrooms, oranges, pears, sweet peppers |
7. | household furniture | bed, chair, couch, table, wardrobe |
8. | large carnivores | bear, leopard, lion, tiger, wolf |
9. | insectsbee, beetle, butterfly, caterpillar, cockroach | |
10. | large man-made outdoor things | bridge, castle, house, road, skyscraper |
11. | large natural outdoor scenes | cloud, forest, mountain, plain, sea |
12. | medium-sized mammals | fox, porcupine, possum, raccoon, skunk |
13. | large omnivores and herbivores | camel, cattle, chimpanzee, elephant, kangaroo |
14. | non-insect invertebrates | crab, lobster, snail, spider, worm |
15. | reptiles | crocodile, dinosaur, lizard, snake, turtle |
16. | people | baby, boy, girl, man, woman |
17. | trees | maple, oak, palm, pine, willow |
18. | small mammals | hamster, mouse, rabbit, shrew, squirrel |
19. | vehicles 1 | bicycle, bus, motorcycle, pickup truck, train |
20. | vehicles 2 | lawn-mower, rocket, streetcar, tank, tractor |