I’m playing around with small neural networks on my GTX1070 card, and I have experienced very large RAM (not GPU memory) when using CUDA through keras (and pytorch). Consider the following python keras program:
import keras
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
input_shape=(28,28,1)
num_classes = 10
model = Sequential()
model.add(Conv2D(10, kernel_size=(5, 5), input_shape=input_shape, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(20, kernel_size=(5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
batch_size=64
epochs = 10
img_rows, img_cols = 28,28
(x_train, y_train), (x_test, y_test) = mnist.load_data()
#x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
#x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
#input_shape = (1, img_rows, img_cols)
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
I’m running it while monitoring the host virtual memory usage . First, if running it without CUDA as follows:
CUDA_VISIBLE_DEVICES="" python3 mnist.py
Virtual memory usage peaks on about 3.1GB of memory, well below the 16GB of physical memory on my box. But when running it with CUDA enabled:
python3 mnist.py
Virtual memory hits 23.9G of memory.
I saw a similar behavior when running under pytorch, so it’s not very likely that it is a Keras issue.
When trying to run ResNet50 the virtual memory got above 50GB and then my box froze. Is this normal? Is there any way of requesting to reduce the amount of RAM used? Should I just accept that this is what is needed and buy another 16GB of RAM?
Here is the output of nvidia-smi during training:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25 Driver Version: 390.25 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 On | 00000000:01:00.0 On | N/A |
| 0% 39C P2 70W / 180W | 7892MiB / 8119MiB | 50% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3640 G /usr/libexec/Xorg 41MiB |
| 0 10928 C python3 7839MiB |
+-----------------------------------------------------------------------------+
Any help and or explanations would be very much appreciated!
My system is running Fedora 27:
Linux groovy 4.15.6-300.fc27.x86_64 #1 SMP Mon Feb 26 18:43:03 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux