Issue
I am planning to create a CNN to predict mushroom types and collected over 2500 photos from the internet. Dataset has 156 classes(different types of mushrooms). I trained it using ImageDataGenerator on Tensorflow 2 and Keras.
Here is the Image Generator:
image_gen = ImageDataGenerator(rotation_range = 20,
width_shift_range=0.12,
height_shift_range=0.12,
shear_range=0.1,
zoom_range = 0.06,
horizontal_flip=True,
fill_mode='nearest',
rescale=1./255)
and here is the model;
model = Sequential()
model.add(Conv2D(filters=32,kernel_size=(3,3),input_shape=image_shape,activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(filters=64,kernel_size=(3,3),input_shape=image_shape,activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(filters=64,kernel_size=(3,3),input_shape=image_shape,activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(2,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(156,activation='softmax'))
model.compile(loss = 'categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
with using early stopping as follows;
early_stop = EarlyStopping(monitor='val_loss',patience=2)
The model starts with acc=0.006 and stops after 20 epochs around acc=0.2.
When I predict an image from test set, here i get this absurd results(i see a ‘1.’ in the array but test image must correspound to the last element of the array):
Random image prediction after compiling the model
without using early stopping(i think it is overfitting?), 2000 epochs done and resulted in 0.8 accuracy, but the predictions are still wrong.
First Question
What is the reason behind the low accuracy? Is it because I have low samples of data?
I have read that Class_num/100, so 156/100 might good accuracy in my case but when I predict a photo from the test files, it never finds the corresponding mushroom type.
I tried
Using a larger dataset with 7000+ photos with only 9 classes, and accuracy resulted in 0.23.
But in the test case,
model.predict(my_image_arr).round(3)
resulted as follows, for whatever photo I feed in;
array([[0.063, 0.11 , 0.153, 0.123, 0.064, 0.059, 0.208, 0.162, 0.059]],
dtype=float32)
I would be very grateful if anyone can help me with what I am doing wrong.
Solution
for a classification problem with a large number of classes I do not think your model is sufficiently complex. As a minimum change the dense unit with 2 neurons to something 256 neurons. Frankly I recommend that you consider using transfer learning. Below is the code to use the MobilenetV2 model for that purpose.
height=224
width=224
img_shape=(height, width, 3)
dropout=.3
lr=.001
class_count=156 # number of classes
img_shape=(height, width, 3)
base_model=tf.keras.applications.MobileNetV2( include_top=False, input_shape=img_shape, pooling='max', weights='imagenet')
x=base_model.output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x = Dense(512, kernel_regularizer = regularizers.l2(l = 0.016),activity_regularizer=regularizers.l1(0.006),
bias_regularizer=regularizers.l1(0.006) ,activation='relu', kernel_initializer= tf.keras.initializers.GlorotUniform(seed=123))(x)
x=Dropout(rate=dropout, seed=123)(x)
output=Dense(class_count, activation='softmax',kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123))(x)
model=Model(inputs=base_model.input, outputs=output)
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
No matter which model you use results will improve if you use an adjustable learning. So in addition to the early stopping callback add the ReduceLROnPlateau callback. My suggested code for that is below
rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=1, verbose=1, mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)
callbacks=[rlronp, your early stopping callack]
In your early stopping callback make sure patience value is at least 4.
I do not know the structure of your data set but if it is not balanced where the ratio between the most samples in a class/least samples in a class >2 then you might want to consider using the class weights parameter in model.fit. This is a dictionary of weights based on the number of samples in each class. Below is the code for a function to iterate through your training images directory and calculate the class weight dictionary. Note it is essential that your dir parameter is the training sub directory and only contains class sub directories containing the images.
def get_weight_dict(dir):
most_samples=0
class_weight={}
class_list=os.listdir(dir) # dir is the directory with the training samples organized by class
for c in (class_list): # iterate through class directories, find number of samples in each class then find class with highest number of samples
c_path=os.path.join(dir,c)
if os.path.isdir(c_path):
length=len(os.listdir(c_path)) # determine number of samples in the class directory
if length>most_samples:
most_samples=length
for i,c in enumerate(class_list): #iterate through class directories, find number of samples in each and divide total_samples by length
c_path=os.path.join(dir,c)
if os.path.isdir(c_path):
length=len(os.listdir(c_path)) # number of samples inclass directory
class_weight[i]=most_samples/length
#print (i,most_samples, class_weight[i])
return class_weight
for example if your main directory is c:\mushrooms and your training directory is called train than use
train_dir=r'c:\mushrooms\train'
class_weight=get_weight_dict(train_dir)
in model.fit include class_weight= class_weight
Additionally for each class I find you should have a minimum of 50 images per class. If you have less than that there may not be a sufficient number of samples even using image augmentation as you have done. So the choice here is to either get more images for under represented classes or remove those classes. Finally and this is not a pleasant task but depending on your images you may want to consider cropping your images such that the region of interest (ROI) , namely the mushroom occupies the majority of pixels in the image. So you might want to go through your images and select those not meeting this criteria and crop them as needed. I have produced several high quality data sets that typically achieve a 98% or better test accuracy by using this approach.
Answered By – Gerry P
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0