Issue
I’m working on a multiple class classification model it has more than 4000 classes means 4000 folders for each class which occupies space around 30GB, to train the classification model I’m copy-pasting images into train and validation folders for each class in order to be in a classification folder structure which will take another 30GB of space and lots of time to read data.
Im using ImageDataGenerator API from keras to load data and feeding to the model for training as below
train_generator = train_datagen.flow_from_directory(Training_DIR, batch_size= batch_size, class_mode='categorical',target_size=(img_height,img_width)
validation_generator = validation_datagen.flow_from_directory(VALIDATION_DIR, batch_size= batch_size, class_mode='categorical',target_size=(img_height,img_width)
then passing these generator to model.fit_generator function as below
model.fit_generator(train,generator,validation_data=validation_generator)
is there a better way to load data in high speed directly from main folder containing subfolders with classes instead of creating new directories and copying images to it which will take twice disk size. I haven’t dealt with these kind of large datasets which is taking up all my drive spaces.
Update:
I tried the solution given by @GerryP but I ended up getting error as below
Epoch 1/50
2021-09-06 17:22:12.576079: I
tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version
8101
2021-09-06 17:22:13.251294: W
tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator
(GPU_0_bfc) ran out of memory trying to allocate 1.19GiB with
freed_by_count=0. The caller indicates that this is not a failure, but
may mean that there could be performance gains if more memory were
available.
2021-09-06 17:22:16.096112: E
tensorflow/stream_executor/cuda/cuda_driver.cc:1010] failed to
synchronize the stop event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch
timed out and was terminated
2021-09-06 17:22:16.096299: E
tensorflow/stream_executor/gpu/gpu_timer.cc:55] Internal: Error
destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out
and was terminated
2021-09-06 17:22:16.097126: E
tensorflow/stream_executor/gpu/gpu_timer.cc:60] Internal: Error
destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out
and was terminated
2021-09-06 17:22:16.097682: I
tensorflow/stream_executor/cuda/cuda_driver.cc:732] failed to allocate
8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed
out and was terminated
2021-09-06 17:22:16.097935: E
tensorflow/stream_executor/stream.cc:4508] Internal: Failed to enqueue
async memset operation: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed
out and was terminated
2021-09-06 17:22:16.098312: W tensorflow/core/kernels/gpu_utils.cc:69]
Failed to check cudnn convolutions for out-of-bounds reads and writes
with an error message: ‘Failed to load in-memory CUBIN:
CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated’;
skipping this check. This only means that we won’t check cudnn for
out-of-bounds reads and writes. This message will only be printed
once.
2021-09-06 17:22:16.098676: I
tensorflow/stream_executor/cuda/cuda_driver.cc:732] failed to allocate
8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed
out and was terminated
2021-09-06 17:22:16.099006: E
tensorflow/stream_executor/stream.cc:4508] Internal: Failed to enqueue
async memset operation: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed
out and was terminated
2021-09-06 17:22:16.099369: F
tensorflow/stream_executor/cuda/cuda_dnn.cc:216] Check failed: status
== CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.
Solution
Even though the answer from @Gerry P is IMO correct and answers what OP asked for. Here is another answer motivated by the discussion in the comments which tries to prevent unnecessary bottleneck caused by I/O operations during training while using .flow_from_directory()
or .flow_from_dataframe()
.
Disclaimer: this solution works only if all the images are of the same shape.
I am suggesting to use the .flow()
method of the ImageDataGenerator
in combination with numpy.memmap
. You can create one memmap for each subset of data, i.e. train
, validation
, and test
sets. I have created a Google Colab notebook in which I compare the methods using the MNIST dataset. Here is the most important code from that notebook:
# Initiating a memmap on drive with specified shape and dtype
mmap = np.memmap('mnist.mmap', dtype='uint8', mode='w+', shape=(x_test.shape))
# Filling the memmap with data
# If hard disk space is a problem, we can delete the source image files on the go
for i, fpath in enumerate(fpaths):
mmap[i][:] = np.expand_dims(imread(fpath)[:], -1)
# deleting the file if desired
# os.remove(fpath)
# Loading the memmap from disk (does not load all the data to RAM), shape must be specified
mmap = np.memmap('mnist.mmap', dtype='uint8', mode='r', shape=(x_test.shape))
DataGen = ImageDataGenerator(rescale=1./255)
gen = DataGen.flow(mmap, y=None, batch_size=batch_size, shuffle=True, seed=10)
Bellow are the results of measuring how much time it took to generate 2 epochs from the MNIST testing set (10k grayscale images 28×28).
Method | Source | Batch size | Time* |
---|---|---|---|
.flow() |
np.array | 1 | 584 ms |
8 | 293 ms | ||
32 | 285 ms | ||
256 | 280 ms | ||
.flow() |
memmap | 1 | 574 ms |
8 | 296 ms | ||
32 | 278 ms | ||
256 | 274 ms | ||
flow_from_directory() |
files | 1 | 3.96 s |
8 | 3.50 s | ||
32 | 3.39 s | ||
256 | 3.41 s | ||
flow_from_dataframe() |
files | 1 | 3.97 s |
8 | 3.50 s | ||
32 | 3.41 s | ||
256 | 3.39 s |
- time to generate 2 epochs (i.e. 2x the whole testing set)
Maybe this blog which suggest to use tf.data
instead of the aforementioned ImageDataGenerator
might be interesting for the readers of this question. Though, I have not tested it myself.
Answered By – Paloha
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0