I am struggling with training a CNN model to identify dogbreeds. I intend to train the Stanford Dogs Dataset using ResNet architecture. I downloaded the dataset from http://vision.stanford.edu/aditya86/ImageNetDogs/ into google-colab notebook and have extracted the images in the dataset. I get a folder structure like this: folder_structure. I know I need the folder structure which has subfolders train and test and then further subfolders with images of dogs with corresponding species. How do I go along doing that?
You don’t need to strictly create separate folders for train and test. You can use the method
tf.keras.utils.image_dataset_from_directory from tensorflow. It lets you load your all-in-one-folder dataset taking the right split while loading. This is how:
train_ds = tf.keras.preprocessing.image_dataset_from_directory( "/images/", # path to your data folder validation_split=0.2, # percentage reserved for test subset="training", # this dataset is for training seed=1024 # must be the same for both train and test: ensures that you take the images consistently ) test_ds = tf.keras.preprocessing.image_dataset_from_directory( "/images/", validation_split=0.2, subset="validation", seed=1024 )
Both functions return a
tf.data.Dataset object. The argument
validation_split lets you specify the percentage of data to reserve for validation (test in your case). In the example above I chose 80% train and 20% validation.
seed argument must be the same for both
test_ds, because it ensures that the images are taken in same order, so you don’t end up with mixed images in your train and test split.
Answered By – ClaudiaR