Execute a DIGITS trained tensorflow model on TX2 using python
Hi ! I'm using DIGITS to train my tensorflow models, currently a LeNet network with gray[28x28] input using my own classified images. I prepare a dataset with two labels, 0 and 1 that stands for : - 0 => not a ball (~ 6000 images) - 1 => a ball (~ 1000 images) When I train it using DIGITS, I get a model with an accuracy of ~94% and a loss of 0.27. When I classify one image using DIGITS, it classifies it well, as you can see below : [img]http://vps166675.ovh.net/in-digits.png[/img] Very well, so now I want to use this model in one of my Python script. So I define the model, derived from the network.py provided with DIGITS : [code] class LeNetModel(): # A placeholder version, allowing to load an image from a numpy array (OpenCV in my case) def placeholder_gray28(self, nclasses): x = tf.placeholder(tf.float32, shape=[28, 28, 1], name="x") return x, self.gray28(x, nclasses) def gray28(self, x, nclasses, is_training=False): rs = tf.reshape(x, shape=[-1, 28, 28, 1]) # scale (divide by MNIST std) rs = rs * 0.0125 with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_initializer=tf.contrib.layers.xavier_initializer(), weights_regularizer=slim.l2_regularizer(0.0005)): model = slim.conv2d(rs, 20, [5, 5], padding='VALID', scope='conv1') model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool1') model = slim.conv2d(model, 50, [5, 5], padding='VALID', scope='conv2') model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool2') model = slim.flatten(model) model = slim.fully_connected(model, 500, scope='fc1') model = slim.dropout(model, 0.5, is_training=is_training, scope='do1') model = slim.fully_connected(model, nclasses, activation_fn=None, scope='fc2') # I only append this softmax, that changes output tensor values but doesn't change the classification model = tf.nn.softmax(model) return model [/code] Except the latest softmax, this is the same network as the one that has been trained by DIGITS. I use this model by shaping and providing a Tensor obtained from the same JPG image as I use in DIGITS : [code] def name_in_checkpoint(var): return 'model/' + var.op.name TF_INTRA_OP_THREADS = 0 TF_INTER_OP_THREADS = 0 MIN_LOGS_PER_TRAIN_EPOCH = 8 # torch default: 8 FLAGS = tf.app.flags.FLAGS tf.app.flags.DEFINE_boolean('log_device_placement', False, """Whether to log device placement.""") tf.app.flags.DEFINE_boolean('serving_export', False, """Flag for exporting an Tensorflow Serving model""") if __name__ == '__main__': filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once("img-535.jpg")) image_reader = tf.WholeFileReader() key, image_file = image_reader.read(filename_queue) ball = tf.image.decode_jpeg(image_file) ball = tf.to_float(ball) ball = tf.image.resize_bicubic([ball],(28,28)) ball = tf.image.rgb_to_grayscale([ball]) ball = tf.divide(ball, 255) single_batch = [key, ball] inference_op = LeNetModel().gray28(ball,2,False) sess = tf.Session(config=tf.ConfigProto( allow_soft_placement=True, inter_op_parallelism_threads=TF_INTER_OP_THREADS, intra_op_parallelism_threads=TF_INTRA_OP_THREADS, log_device_placement=FLAGS.log_device_placement)) variables_to_restore = slim.get_variables_to_restore(exclude=["is_training"]) variables_to_restore = {name_in_checkpoint(var):var for var in variables_to_restore} saver = tf.train.Saver(variables_to_restore) # Initialize variables init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) sess.run(init_op) saver.restore(sess, "snapshot_30.ckpt") tf.train.start_queue_runners(sess) print sess.run(inference_op * 100) sess.close() exit(0) [/code] And I am not able to have the same results. Executing this script, I get this result : [code] [[ 82.83679962 17.16320229]] [/code] Neither the score nor the classification are right. But I can't understand what I am doing wrong. I take a look at the DIGITS source code and I can't find significant differences with my code. Does anybody encounter this problem ? You can download the full use case here : [url]http://vps166675.ovh.net/digits-issue.tar.gz[/url] Thank you in advance. Damien.
Hi !

I'm using DIGITS to train my tensorflow models, currently a LeNet network with gray[28x28] input using my own classified images.
I prepare a dataset with two labels, 0 and 1 that stands for :
- 0 => not a ball (~ 6000 images)
- 1 => a ball (~ 1000 images)
When I train it using DIGITS, I get a model with an accuracy of ~94% and a loss of 0.27.
When I classify one image using DIGITS, it classifies it well, as you can see below :
Image

Very well, so now I want to use this model in one of my Python script. So I define the model, derived from the network.py provided with DIGITS :

class LeNetModel():

# A placeholder version, allowing to load an image from a numpy array (OpenCV in my case)
def placeholder_gray28(self, nclasses):
x = tf.placeholder(tf.float32, shape=[28, 28, 1], name="x")
return x, self.gray28(x, nclasses)

def gray28(self, x, nclasses, is_training=False):
rs = tf.reshape(x, shape=[-1, 28, 28, 1])
# scale (divide by MNIST std)
rs = rs * 0.0125
with slim.arg_scope([slim.conv2d, slim.fully_connected],
weights_initializer=tf.contrib.layers.xavier_initializer(),
weights_regularizer=slim.l2_regularizer(0.0005)):
model = slim.conv2d(rs, 20, [5, 5], padding='VALID', scope='conv1')
model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool1')
model = slim.conv2d(model, 50, [5, 5], padding='VALID', scope='conv2')
model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool2')
model = slim.flatten(model)
model = slim.fully_connected(model, 500, scope='fc1')
model = slim.dropout(model, 0.5, is_training=is_training, scope='do1')
model = slim.fully_connected(model, nclasses, activation_fn=None, scope='fc2')

# I only append this softmax, that changes output tensor values but doesn't change the classification
model = tf.nn.softmax(model)

return model

Except the latest softmax, this is the same network as the one that has been trained by DIGITS.

I use this model by shaping and providing a Tensor obtained from the same JPG image as I use in DIGITS :

def name_in_checkpoint(var):
return 'model/' + var.op.name

TF_INTRA_OP_THREADS = 0
TF_INTER_OP_THREADS = 0
MIN_LOGS_PER_TRAIN_EPOCH = 8 # torch default: 8
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_boolean('log_device_placement', False, """Whether to log device placement.""")
tf.app.flags.DEFINE_boolean('serving_export', False, """Flag for exporting an Tensorflow Serving model""")

if __name__ == '__main__':

filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once("img-535.jpg"))

image_reader = tf.WholeFileReader()

key, image_file = image_reader.read(filename_queue)

ball = tf.image.decode_jpeg(image_file)
ball = tf.to_float(ball)
ball = tf.image.resize_bicubic([ball],(28,28))
ball = tf.image.rgb_to_grayscale([ball])
ball = tf.divide(ball, 255)

single_batch = [key, ball]

inference_op = LeNetModel().gray28(ball,2,False)

sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
inter_op_parallelism_threads=TF_INTER_OP_THREADS,
intra_op_parallelism_threads=TF_INTRA_OP_THREADS,
log_device_placement=FLAGS.log_device_placement))

variables_to_restore = slim.get_variables_to_restore(exclude=["is_training"])
variables_to_restore = {name_in_checkpoint(var):var for var in variables_to_restore}
saver = tf.train.Saver(variables_to_restore)

# Initialize variables
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)

saver.restore(sess, "snapshot_30.ckpt")

tf.train.start_queue_runners(sess)
print sess.run(inference_op * 100)

sess.close()
exit(0)


And I am not able to have the same results. Executing this script, I get this result :
[[ 82.83679962  17.16320229]]


Neither the score nor the classification are right. But I can't understand what I am doing wrong. I take a look at the DIGITS source code and I can't find significant differences with my code. Does anybody encounter this problem ?

You can download the full use case here : http://vps166675.ovh.net/digits-issue.tar.gz

Thank you in advance.
Damien.

#1
Posted 12/04/2017 08:06 PM   
Hi, Guess that there is something different in the image preprocessing. Could you check if your workflow is identical to the DIGITs inference here: [url]https://github.com/NVIDIA/DIGITS/blob/master/digits/model/tasks/tensorflow_train.py#L513[/url] Thanks.
Hi,

Guess that there is something different in the image preprocessing.
Could you check if your workflow is identical to the DIGITs inference here:
https://github.com/NVIDIA/DIGITS/blob/master/digits/model/tasks/tensorflow_train.py#L513

Thanks.

#2
Posted 12/07/2017 03:18 AM   
Thank you. The processing steps that I can identify on the DIGITS source code are : https://github.com/NVIDIA/DIGITS/blob/master/digits/model/tasks/tensorflow_train.py#L539 [code]_float_array_feature(image.flatten())[/code] This is a no-op processing, from the image point-of-view https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/main.py#L510 [code] with tf.name_scope(digits.STAGE_INF) as stage_scope: inf_model = Model(digits.STAGE_INF, FLAGS.croplen, nclasses) ... [/code] The dataloader is instantiated. It is a TFRecordsLoader : https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L310 The interresting parameters are : https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L634 [code] self.float_data = False # For now only strings self.unencoded_data_format = 'hwc' self.unencoded_channel_scheme = 'rgb' self.image_dtype = tf.uint8 [/code] So it does : https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L278 Then https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L693 It returns a FIxedLenFeature : https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L704 [code] tf.FixedLenFeature([self.height, self.width, self.channels], tf.float32) [/code] Then : https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L310 that does : [code] data = tf.image.decode_jpeg(data, name='image_decoder') ... data = tf.to_float(data) [/code] It adds : https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L334 [code] single_data = tf.image.resize_image_with_crop_or_pad(single_data, self.croplen, self.croplen) [/code] And to finish, it creates a batch and launch it : [code] single_batch = [single_key, single_data] ... batch = tf.train.batch( single_batch, batch_size=self.batch_size, dynamic_pad=True, # Allows us to not supply fixed shape a priori enqueue_many=False, # Each tensor is a single example # set number of threads to 1 for tfrecords (used for inference) num_threads=NUM_THREADS_DATA_LOADER if not self.is_inference else 1, capacity=max_queue_capacity, # Max amount that will be loaded and queued allow_smaller_final_batch=True, # Happens if total%batch_size!=0 name='batcher') [/code] So there is some differences : - The image doesn't seem to be converted to grayscale using DIGITS inference tool - The resize is done by resize_image_with_crop_or_pad where I use a resize_bicubic So my questions are : - Where does the image is converted to grayscale ? Does DIGITS take only the first channel while a reshape ? - The resize using resize_image_with_crop_or_pad only crops/pads (according to https://www.tensorflow.org/api_docs/python/tf/image/resize_image_with_crop_or_pad), but I train my model using the Squash resize transformation, so I guess that it is more a bicubic resize than a crop/pad. I will adapt my sample to use reshape and crop_pad and give it a try, but it seems to me that it is counterintuitive. So I guess that I miss a step done by DIGITS...
Thank you.

The processing steps that I can identify on the DIGITS source code are :

https://github.com/NVIDIA/DIGITS/blob/master/digits/model/tasks/tensorflow_train.py#L539
_float_array_feature(image.flatten())

This is a no-op processing, from the image point-of-view

https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/main.py#L510
with tf.name_scope(digits.STAGE_INF) as stage_scope:
inf_model = Model(digits.STAGE_INF, FLAGS.croplen, nclasses)
...

The dataloader is instantiated. It is a TFRecordsLoader :
https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L310

The interresting parameters are :
https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L634
self.float_data = False  # For now only strings
self.unencoded_data_format = 'hwc'
self.unencoded_channel_scheme = 'rgb'
self.image_dtype = tf.uint8


So it does :
https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L278
Then
https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L693

It returns a FIxedLenFeature :
https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L704
tf.FixedLenFeature([self.height, self.width, self.channels], tf.float32)


Then :
https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L310
that does :
data = tf.image.decode_jpeg(data, name='image_decoder')
...
data = tf.to_float(data)


It adds :
https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/tensorflow/tf_data.py#L334
single_data = tf.image.resize_image_with_crop_or_pad(single_data, self.croplen, self.croplen)


And to finish, it creates a batch and launch it :
single_batch = [single_key, single_data]
...
batch = tf.train.batch(
single_batch,
batch_size=self.batch_size,
dynamic_pad=True, # Allows us to not supply fixed shape a priori
enqueue_many=False, # Each tensor is a single example
# set number of threads to 1 for tfrecords (used for inference)
num_threads=NUM_THREADS_DATA_LOADER if not self.is_inference else 1,
capacity=max_queue_capacity, # Max amount that will be loaded and queued
allow_smaller_final_batch=True, # Happens if total%batch_size!=0
name='batcher')


So there is some differences :
- The image doesn't seem to be converted to grayscale using DIGITS inference tool
- The resize is done by resize_image_with_crop_or_pad where I use a resize_bicubic

So my questions are :
- Where does the image is converted to grayscale ? Does DIGITS take only the first channel while a reshape ?
- The resize using resize_image_with_crop_or_pad only crops/pads (according to https://www.tensorflow.org/api_docs/python/tf/image/resize_image_with_crop_or_pad), but I train my model using the Squash resize transformation, so I guess that it is more a bicubic resize than a crop/pad.

I will adapt my sample to use reshape and crop_pad and give it a try, but it seems to me that it is counterintuitive. So I guess that I miss a step done by DIGITS...

#3
Posted 12/07/2017 09:10 AM   
I try with this image pre-processing : [code] ball = tf.image.decode_jpeg(image_file) ball = tf.to_float(ball) # ball = tf.image.resize_bicubic([ball],(28,28)) # ball = tf.image.rgb_to_grayscale([ball]) # ball = tf.reshape(ball,(49,49,1)) ball = tf.image.resize_image_with_crop_or_pad(ball, 28, 28) ball = tf.divide(ball, 255) [/code] The result is : [code] [[ 82.9730835 17.02691078] [ 82.78138733 17.21861267] [ 82.86641693 17.13358116]] [/code] So it didn't change anything...
I try with this image pre-processing :

ball = tf.image.decode_jpeg(image_file)
ball = tf.to_float(ball)
# ball = tf.image.resize_bicubic([ball],(28,28))
# ball = tf.image.rgb_to_grayscale([ball])
# ball = tf.reshape(ball,(49,49,1))
ball = tf.image.resize_image_with_crop_or_pad(ball, 28, 28)
ball = tf.divide(ball, 255)


The result is :
[[ 82.9730835   17.02691078]
[ 82.78138733 17.21861267]
[ 82.86641693 17.13358116]]


So it didn't change anything...

#4
Posted 12/07/2017 12:15 PM   
Given the DIGITS pre-processing I analyse, it tries to classify this image : [img]http://vps166675.ovh.net/crop_and_pad.png[/img] [i]this image is the out image of resize_image_with_crop_or_pad[/i] Instead of : [img]http://vps166675.ovh.net/img-535.jpg[/img] The DIGITS pre-processing steps are not really obvious to me...
Given the DIGITS pre-processing I analyse, it tries to classify this image :
Image
this image is the out image of resize_image_with_crop_or_pad

Instead of :
Image

The DIGITS pre-processing steps are not really obvious to me...

#5
Posted 12/08/2017 07:07 AM   
Hi, Here is the control of crop function: https://github.com/NVIDIA/DIGITS/blob/master/digits/model/tasks/tensorflow_train.py#L571 Thanks.
Hi,

Here is the control of crop function:
https://github.com/NVIDIA/DIGITS/blob/master/digits/model/tasks/tensorflow_train.py#L571

Thanks.

#6
Posted 12/08/2017 08:48 AM   
Thank you. But what if I don't choose to crop images, but to squash them ? I can't find where DIGITS does the "squash" in the case of a "classify one image". In the inference.py tools, I can see that the resize is done by : https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/inference.py#L131 [code] image = utils.image.resize_image( image, height, width, channels=channels, resize_mode=resize_mode) [/code] That leads to : https://github.com/NVIDIA/DIGITS/blob/master/digits/utils/image.py#L223 [code] scipy.misc.imresize(image, (height, width), interp=interp) [/code] (The doc doesn't explain what is the used resize algorithm : https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.imresize.html) But it seems that this tool is used by the REST API, not the "classify one image". I will try with the REST API to see the result in this case.
Thank you.

But what if I don't choose to crop images, but to squash them ? I can't find where DIGITS does the "squash" in the case of a "classify one image".

In the inference.py tools, I can see that the resize is done by :
https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/inference.py#L131
image = utils.image.resize_image(
image,
height,
width,
channels=channels,
resize_mode=resize_mode)


That leads to :
https://github.com/NVIDIA/DIGITS/blob/master/digits/utils/image.py#L223
scipy.misc.imresize(image, (height, width), interp=interp)

(The doc doesn't explain what is the used resize algorithm : https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.imresize.html)


But it seems that this tool is used by the REST API, not the "classify one image".
I will try with the REST API to see the result in this case.

#7
Posted 12/08/2017 10:56 AM   
Scroll To Top

Add Reply