Keras: What is model.inputs in VGG16











up vote
1
down vote

favorite












I start playing with keras and vgg16 recently, and I am using keras.applications.vgg16.



But here I come with a question about what is model.inputs because I saw others using it in https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py although it does not initialize it



    ...
input_img = model.input
...
layer_output = layer_dict[layer_name].output
if K.image_data_format() == 'channels_first':
loss = K.mean(layer_output[:, filter_index, :, :])
else:
loss = K.mean(layer_output[:, :, :, filter_index])

# we compute the gradient of the input picture wrt this loss
grads = K.gradients(loss, input_img)[0]


I checked the keras site but it only said that is an input tensor with shape (1,224,224,3) But I still don't understand what is that exactly. Is that an image from ImageNet?Or a default image provided by keras for keras model?



I am sorry if I don't have enough understanding of deep learning, but can someone explain it to me please. Thanks










share|improve this question
























  • These are dimensions of your image. A shape of (1,224,224,3) means you have 1 image (I might be wrong on this one), with both height and width of 224 pixels, and 3 channels (RGB).
    – Xiaoyu Lu
    Nov 20 at 21:26















up vote
1
down vote

favorite












I start playing with keras and vgg16 recently, and I am using keras.applications.vgg16.



But here I come with a question about what is model.inputs because I saw others using it in https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py although it does not initialize it



    ...
input_img = model.input
...
layer_output = layer_dict[layer_name].output
if K.image_data_format() == 'channels_first':
loss = K.mean(layer_output[:, filter_index, :, :])
else:
loss = K.mean(layer_output[:, :, :, filter_index])

# we compute the gradient of the input picture wrt this loss
grads = K.gradients(loss, input_img)[0]


I checked the keras site but it only said that is an input tensor with shape (1,224,224,3) But I still don't understand what is that exactly. Is that an image from ImageNet?Or a default image provided by keras for keras model?



I am sorry if I don't have enough understanding of deep learning, but can someone explain it to me please. Thanks










share|improve this question
























  • These are dimensions of your image. A shape of (1,224,224,3) means you have 1 image (I might be wrong on this one), with both height and width of 224 pixels, and 3 channels (RGB).
    – Xiaoyu Lu
    Nov 20 at 21:26













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I start playing with keras and vgg16 recently, and I am using keras.applications.vgg16.



But here I come with a question about what is model.inputs because I saw others using it in https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py although it does not initialize it



    ...
input_img = model.input
...
layer_output = layer_dict[layer_name].output
if K.image_data_format() == 'channels_first':
loss = K.mean(layer_output[:, filter_index, :, :])
else:
loss = K.mean(layer_output[:, :, :, filter_index])

# we compute the gradient of the input picture wrt this loss
grads = K.gradients(loss, input_img)[0]


I checked the keras site but it only said that is an input tensor with shape (1,224,224,3) But I still don't understand what is that exactly. Is that an image from ImageNet?Or a default image provided by keras for keras model?



I am sorry if I don't have enough understanding of deep learning, but can someone explain it to me please. Thanks










share|improve this question















I start playing with keras and vgg16 recently, and I am using keras.applications.vgg16.



But here I come with a question about what is model.inputs because I saw others using it in https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py although it does not initialize it



    ...
input_img = model.input
...
layer_output = layer_dict[layer_name].output
if K.image_data_format() == 'channels_first':
loss = K.mean(layer_output[:, filter_index, :, :])
else:
loss = K.mean(layer_output[:, :, :, filter_index])

# we compute the gradient of the input picture wrt this loss
grads = K.gradients(loss, input_img)[0]


I checked the keras site but it only said that is an input tensor with shape (1,224,224,3) But I still don't understand what is that exactly. Is that an image from ImageNet?Or a default image provided by keras for keras model?



I am sorry if I don't have enough understanding of deep learning, but can someone explain it to me please. Thanks







python tensorflow keras






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 4:53

























asked Nov 20 at 14:39









cwl6

83




83












  • These are dimensions of your image. A shape of (1,224,224,3) means you have 1 image (I might be wrong on this one), with both height and width of 224 pixels, and 3 channels (RGB).
    – Xiaoyu Lu
    Nov 20 at 21:26


















  • These are dimensions of your image. A shape of (1,224,224,3) means you have 1 image (I might be wrong on this one), with both height and width of 224 pixels, and 3 channels (RGB).
    – Xiaoyu Lu
    Nov 20 at 21:26
















These are dimensions of your image. A shape of (1,224,224,3) means you have 1 image (I might be wrong on this one), with both height and width of 224 pixels, and 3 channels (RGB).
– Xiaoyu Lu
Nov 20 at 21:26




These are dimensions of your image. A shape of (1,224,224,3) means you have 1 image (I might be wrong on this one), with both height and width of 224 pixels, and 3 channels (RGB).
– Xiaoyu Lu
Nov 20 at 21:26












1 Answer
1






active

oldest

votes

















up vote
3
down vote













The 4 dimensions of (1,224,224,3) are the batch_size, image_width, image_height and image_channels respectively. (1,224,224,3) means that the VGG16 model accepts a batch size of 1 (one image at a time) of shape 224x224 and three channels (RGB).



For more information on what a batch and therefore a batch size is, you can check this Cross Validated question.



Returning to VGG16, the input of the architecture is (1, 224, 224, 3). What does this mean? That in order to input a image into the network, you will need to:




  1. Preprocess it to reach a shape of (224, 224) and 3 channels (RGB)

  2. Convert this to an actual matrix of shape (224, 224, 3)

  3. Group together various images in a batch of the size that requires the network (in this case, the batch size is 1, but you need to add a dimension to the matrix, in order to obtain the (1, 224, 224, 3)


After doing this, you can input the image to the model.



Keras offers few utilitary functions to do these tasks. Below I present a modified version of the code snippet shown in Extract features with VGG16 from Usage examples for image classification models in the documentation.



In order to have it actually working, you need a jpg of any size named elephant.jpg. You can obtain it running this bash command:



wget https://upload.wikimedia.org/wikipedia/commons/f/f9/Zoorashia_elephant.jpg -O elephant.jpg   


I will split the code in the image preprocesing and the model prediction for clarity:



Load the image



import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)


You can add prints along the way to see what's going on, but here is a brief summary:





  1. image.load_img() load a PIL image, already in RGB and already reshaping it to (224, 224)


  2. image.img_to_array() is translating this image into a matrix of shape (224, 224, 3). If you access, x[0, 0, 0] you will get the red component of the first pixels as a number between 0 and 255


  3. np.expand_dims(x, axis=0) is adding the first dimension. x after is has shape (1, 224, 224, 3)


  4. preprocess_input is doing an extra preprocessing required for imagenet-trained architectures. From its docstring (run help(preprocess_input)) you can see that it:


    will convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling





This seems to be the standard input for ImageNet training set.



That's it for the preprocessing, now you can just input the image in the pretrained model and get a prediction



Predict



y_hat = base_model.predict(x)
print(y_hat.shape) # res.shape (1, 1000)


y_hat contains the probabilities for each of the 1000 imagenet classes the model assigned to this image.



In order to obtain the class names and a readable output, keras provided an utility function too:



from keras.applications.vgg16 import decode_predictions
decode_predictions(y_hat)


Outputs, for the Zoorashia_elephant.jpg image I downloaded before:



[[('n02504013', 'Indian_elephant', 0.48041093),
('n02504458', 'African_elephant', 0.47474155),
('n01871265', 'tusker', 0.03912963),
('n02437312', 'Arabian_camel', 0.0038948185),
('n01704323', 'triceratops', 0.00062475674)]]


Which seems pretty good!






share|improve this answer























  • I am sorry I didn't make the question clear enough. I am looking at the code from github.com/keras-team/keras/blob/master/examples/…. It visualizes some filters, but it uses model.inputs to get a gradient: grads = K.gradients(loss, input_img)[0] But it does not initialize the model input, therefore I am confused is there any default input of keras VGG16 model
    – cwl6
    Nov 22 at 3:41












  • Hi @cwl6. I think this is a pretty different question from the previous one and my answer addressed the previous one extensively. You can always open a new question and the community will be glad to help!. Regarding this question, you can see on line 91 that it's actually initalizing a random noise image with the line input_img_data = np.random.random((1, 3, img_width, img_height))., with shape (1, 224, 224, 3), which fits my explanation. This input is plugged to the model in line 98 using the definition of iterate done in 84.
    – Julian Peller
    Nov 22 at 4:05












  • Thank you for your reply. I know it takes your times to answer my question. But in line 78 , it used the model input to create a function. Anyway I will open another new question. Thanks
    – cwl6
    Nov 22 at 4:20










  • input_img is a placeholder for the model.input. This is "plugged" to input_img_data using iterate on line 84, where iterate is a K.function which maps inputs (input_img_data) to outputs (the model gradients and losses). The connection between input_img_data and input_img happens there, saying to iterate to take input_img_data as input to obtain the gradients and losses of the model, it's implicitly saying to plug the input_img_data to input_img and, in turn, to model.inputs. I'm not super confident with k
    – Julian Peller
    Nov 22 at 4:27












  • Now, I'm already researching :P ... I'm pretty noob with Keras too. The idea is the following: basically, you are just defining the structure of the model with model.input as the input until you execute a line saying: "ok, this is the input". This line is line 98. This line (loss_value, grads_value = iterate([input_img_data])) says: compute loss and grads running the model defined before with input_imag_data as the input. iterate is the connection between both.
    – Julian Peller
    Nov 22 at 4:36













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395427%2fkeras-what-is-model-inputs-in-vgg16%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote













The 4 dimensions of (1,224,224,3) are the batch_size, image_width, image_height and image_channels respectively. (1,224,224,3) means that the VGG16 model accepts a batch size of 1 (one image at a time) of shape 224x224 and three channels (RGB).



For more information on what a batch and therefore a batch size is, you can check this Cross Validated question.



Returning to VGG16, the input of the architecture is (1, 224, 224, 3). What does this mean? That in order to input a image into the network, you will need to:




  1. Preprocess it to reach a shape of (224, 224) and 3 channels (RGB)

  2. Convert this to an actual matrix of shape (224, 224, 3)

  3. Group together various images in a batch of the size that requires the network (in this case, the batch size is 1, but you need to add a dimension to the matrix, in order to obtain the (1, 224, 224, 3)


After doing this, you can input the image to the model.



Keras offers few utilitary functions to do these tasks. Below I present a modified version of the code snippet shown in Extract features with VGG16 from Usage examples for image classification models in the documentation.



In order to have it actually working, you need a jpg of any size named elephant.jpg. You can obtain it running this bash command:



wget https://upload.wikimedia.org/wikipedia/commons/f/f9/Zoorashia_elephant.jpg -O elephant.jpg   


I will split the code in the image preprocesing and the model prediction for clarity:



Load the image



import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)


You can add prints along the way to see what's going on, but here is a brief summary:





  1. image.load_img() load a PIL image, already in RGB and already reshaping it to (224, 224)


  2. image.img_to_array() is translating this image into a matrix of shape (224, 224, 3). If you access, x[0, 0, 0] you will get the red component of the first pixels as a number between 0 and 255


  3. np.expand_dims(x, axis=0) is adding the first dimension. x after is has shape (1, 224, 224, 3)


  4. preprocess_input is doing an extra preprocessing required for imagenet-trained architectures. From its docstring (run help(preprocess_input)) you can see that it:


    will convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling





This seems to be the standard input for ImageNet training set.



That's it for the preprocessing, now you can just input the image in the pretrained model and get a prediction



Predict



y_hat = base_model.predict(x)
print(y_hat.shape) # res.shape (1, 1000)


y_hat contains the probabilities for each of the 1000 imagenet classes the model assigned to this image.



In order to obtain the class names and a readable output, keras provided an utility function too:



from keras.applications.vgg16 import decode_predictions
decode_predictions(y_hat)


Outputs, for the Zoorashia_elephant.jpg image I downloaded before:



[[('n02504013', 'Indian_elephant', 0.48041093),
('n02504458', 'African_elephant', 0.47474155),
('n01871265', 'tusker', 0.03912963),
('n02437312', 'Arabian_camel', 0.0038948185),
('n01704323', 'triceratops', 0.00062475674)]]


Which seems pretty good!






share|improve this answer























  • I am sorry I didn't make the question clear enough. I am looking at the code from github.com/keras-team/keras/blob/master/examples/…. It visualizes some filters, but it uses model.inputs to get a gradient: grads = K.gradients(loss, input_img)[0] But it does not initialize the model input, therefore I am confused is there any default input of keras VGG16 model
    – cwl6
    Nov 22 at 3:41












  • Hi @cwl6. I think this is a pretty different question from the previous one and my answer addressed the previous one extensively. You can always open a new question and the community will be glad to help!. Regarding this question, you can see on line 91 that it's actually initalizing a random noise image with the line input_img_data = np.random.random((1, 3, img_width, img_height))., with shape (1, 224, 224, 3), which fits my explanation. This input is plugged to the model in line 98 using the definition of iterate done in 84.
    – Julian Peller
    Nov 22 at 4:05












  • Thank you for your reply. I know it takes your times to answer my question. But in line 78 , it used the model input to create a function. Anyway I will open another new question. Thanks
    – cwl6
    Nov 22 at 4:20










  • input_img is a placeholder for the model.input. This is "plugged" to input_img_data using iterate on line 84, where iterate is a K.function which maps inputs (input_img_data) to outputs (the model gradients and losses). The connection between input_img_data and input_img happens there, saying to iterate to take input_img_data as input to obtain the gradients and losses of the model, it's implicitly saying to plug the input_img_data to input_img and, in turn, to model.inputs. I'm not super confident with k
    – Julian Peller
    Nov 22 at 4:27












  • Now, I'm already researching :P ... I'm pretty noob with Keras too. The idea is the following: basically, you are just defining the structure of the model with model.input as the input until you execute a line saying: "ok, this is the input". This line is line 98. This line (loss_value, grads_value = iterate([input_img_data])) says: compute loss and grads running the model defined before with input_imag_data as the input. iterate is the connection between both.
    – Julian Peller
    Nov 22 at 4:36

















up vote
3
down vote













The 4 dimensions of (1,224,224,3) are the batch_size, image_width, image_height and image_channels respectively. (1,224,224,3) means that the VGG16 model accepts a batch size of 1 (one image at a time) of shape 224x224 and three channels (RGB).



For more information on what a batch and therefore a batch size is, you can check this Cross Validated question.



Returning to VGG16, the input of the architecture is (1, 224, 224, 3). What does this mean? That in order to input a image into the network, you will need to:




  1. Preprocess it to reach a shape of (224, 224) and 3 channels (RGB)

  2. Convert this to an actual matrix of shape (224, 224, 3)

  3. Group together various images in a batch of the size that requires the network (in this case, the batch size is 1, but you need to add a dimension to the matrix, in order to obtain the (1, 224, 224, 3)


After doing this, you can input the image to the model.



Keras offers few utilitary functions to do these tasks. Below I present a modified version of the code snippet shown in Extract features with VGG16 from Usage examples for image classification models in the documentation.



In order to have it actually working, you need a jpg of any size named elephant.jpg. You can obtain it running this bash command:



wget https://upload.wikimedia.org/wikipedia/commons/f/f9/Zoorashia_elephant.jpg -O elephant.jpg   


I will split the code in the image preprocesing and the model prediction for clarity:



Load the image



import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)


You can add prints along the way to see what's going on, but here is a brief summary:





  1. image.load_img() load a PIL image, already in RGB and already reshaping it to (224, 224)


  2. image.img_to_array() is translating this image into a matrix of shape (224, 224, 3). If you access, x[0, 0, 0] you will get the red component of the first pixels as a number between 0 and 255


  3. np.expand_dims(x, axis=0) is adding the first dimension. x after is has shape (1, 224, 224, 3)


  4. preprocess_input is doing an extra preprocessing required for imagenet-trained architectures. From its docstring (run help(preprocess_input)) you can see that it:


    will convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling





This seems to be the standard input for ImageNet training set.



That's it for the preprocessing, now you can just input the image in the pretrained model and get a prediction



Predict



y_hat = base_model.predict(x)
print(y_hat.shape) # res.shape (1, 1000)


y_hat contains the probabilities for each of the 1000 imagenet classes the model assigned to this image.



In order to obtain the class names and a readable output, keras provided an utility function too:



from keras.applications.vgg16 import decode_predictions
decode_predictions(y_hat)


Outputs, for the Zoorashia_elephant.jpg image I downloaded before:



[[('n02504013', 'Indian_elephant', 0.48041093),
('n02504458', 'African_elephant', 0.47474155),
('n01871265', 'tusker', 0.03912963),
('n02437312', 'Arabian_camel', 0.0038948185),
('n01704323', 'triceratops', 0.00062475674)]]


Which seems pretty good!






share|improve this answer























  • I am sorry I didn't make the question clear enough. I am looking at the code from github.com/keras-team/keras/blob/master/examples/…. It visualizes some filters, but it uses model.inputs to get a gradient: grads = K.gradients(loss, input_img)[0] But it does not initialize the model input, therefore I am confused is there any default input of keras VGG16 model
    – cwl6
    Nov 22 at 3:41












  • Hi @cwl6. I think this is a pretty different question from the previous one and my answer addressed the previous one extensively. You can always open a new question and the community will be glad to help!. Regarding this question, you can see on line 91 that it's actually initalizing a random noise image with the line input_img_data = np.random.random((1, 3, img_width, img_height))., with shape (1, 224, 224, 3), which fits my explanation. This input is plugged to the model in line 98 using the definition of iterate done in 84.
    – Julian Peller
    Nov 22 at 4:05












  • Thank you for your reply. I know it takes your times to answer my question. But in line 78 , it used the model input to create a function. Anyway I will open another new question. Thanks
    – cwl6
    Nov 22 at 4:20










  • input_img is a placeholder for the model.input. This is "plugged" to input_img_data using iterate on line 84, where iterate is a K.function which maps inputs (input_img_data) to outputs (the model gradients and losses). The connection between input_img_data and input_img happens there, saying to iterate to take input_img_data as input to obtain the gradients and losses of the model, it's implicitly saying to plug the input_img_data to input_img and, in turn, to model.inputs. I'm not super confident with k
    – Julian Peller
    Nov 22 at 4:27












  • Now, I'm already researching :P ... I'm pretty noob with Keras too. The idea is the following: basically, you are just defining the structure of the model with model.input as the input until you execute a line saying: "ok, this is the input". This line is line 98. This line (loss_value, grads_value = iterate([input_img_data])) says: compute loss and grads running the model defined before with input_imag_data as the input. iterate is the connection between both.
    – Julian Peller
    Nov 22 at 4:36















up vote
3
down vote










up vote
3
down vote









The 4 dimensions of (1,224,224,3) are the batch_size, image_width, image_height and image_channels respectively. (1,224,224,3) means that the VGG16 model accepts a batch size of 1 (one image at a time) of shape 224x224 and three channels (RGB).



For more information on what a batch and therefore a batch size is, you can check this Cross Validated question.



Returning to VGG16, the input of the architecture is (1, 224, 224, 3). What does this mean? That in order to input a image into the network, you will need to:




  1. Preprocess it to reach a shape of (224, 224) and 3 channels (RGB)

  2. Convert this to an actual matrix of shape (224, 224, 3)

  3. Group together various images in a batch of the size that requires the network (in this case, the batch size is 1, but you need to add a dimension to the matrix, in order to obtain the (1, 224, 224, 3)


After doing this, you can input the image to the model.



Keras offers few utilitary functions to do these tasks. Below I present a modified version of the code snippet shown in Extract features with VGG16 from Usage examples for image classification models in the documentation.



In order to have it actually working, you need a jpg of any size named elephant.jpg. You can obtain it running this bash command:



wget https://upload.wikimedia.org/wikipedia/commons/f/f9/Zoorashia_elephant.jpg -O elephant.jpg   


I will split the code in the image preprocesing and the model prediction for clarity:



Load the image



import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)


You can add prints along the way to see what's going on, but here is a brief summary:





  1. image.load_img() load a PIL image, already in RGB and already reshaping it to (224, 224)


  2. image.img_to_array() is translating this image into a matrix of shape (224, 224, 3). If you access, x[0, 0, 0] you will get the red component of the first pixels as a number between 0 and 255


  3. np.expand_dims(x, axis=0) is adding the first dimension. x after is has shape (1, 224, 224, 3)


  4. preprocess_input is doing an extra preprocessing required for imagenet-trained architectures. From its docstring (run help(preprocess_input)) you can see that it:


    will convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling





This seems to be the standard input for ImageNet training set.



That's it for the preprocessing, now you can just input the image in the pretrained model and get a prediction



Predict



y_hat = base_model.predict(x)
print(y_hat.shape) # res.shape (1, 1000)


y_hat contains the probabilities for each of the 1000 imagenet classes the model assigned to this image.



In order to obtain the class names and a readable output, keras provided an utility function too:



from keras.applications.vgg16 import decode_predictions
decode_predictions(y_hat)


Outputs, for the Zoorashia_elephant.jpg image I downloaded before:



[[('n02504013', 'Indian_elephant', 0.48041093),
('n02504458', 'African_elephant', 0.47474155),
('n01871265', 'tusker', 0.03912963),
('n02437312', 'Arabian_camel', 0.0038948185),
('n01704323', 'triceratops', 0.00062475674)]]


Which seems pretty good!






share|improve this answer














The 4 dimensions of (1,224,224,3) are the batch_size, image_width, image_height and image_channels respectively. (1,224,224,3) means that the VGG16 model accepts a batch size of 1 (one image at a time) of shape 224x224 and three channels (RGB).



For more information on what a batch and therefore a batch size is, you can check this Cross Validated question.



Returning to VGG16, the input of the architecture is (1, 224, 224, 3). What does this mean? That in order to input a image into the network, you will need to:




  1. Preprocess it to reach a shape of (224, 224) and 3 channels (RGB)

  2. Convert this to an actual matrix of shape (224, 224, 3)

  3. Group together various images in a batch of the size that requires the network (in this case, the batch size is 1, but you need to add a dimension to the matrix, in order to obtain the (1, 224, 224, 3)


After doing this, you can input the image to the model.



Keras offers few utilitary functions to do these tasks. Below I present a modified version of the code snippet shown in Extract features with VGG16 from Usage examples for image classification models in the documentation.



In order to have it actually working, you need a jpg of any size named elephant.jpg. You can obtain it running this bash command:



wget https://upload.wikimedia.org/wikipedia/commons/f/f9/Zoorashia_elephant.jpg -O elephant.jpg   


I will split the code in the image preprocesing and the model prediction for clarity:



Load the image



import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)


You can add prints along the way to see what's going on, but here is a brief summary:





  1. image.load_img() load a PIL image, already in RGB and already reshaping it to (224, 224)


  2. image.img_to_array() is translating this image into a matrix of shape (224, 224, 3). If you access, x[0, 0, 0] you will get the red component of the first pixels as a number between 0 and 255


  3. np.expand_dims(x, axis=0) is adding the first dimension. x after is has shape (1, 224, 224, 3)


  4. preprocess_input is doing an extra preprocessing required for imagenet-trained architectures. From its docstring (run help(preprocess_input)) you can see that it:


    will convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling





This seems to be the standard input for ImageNet training set.



That's it for the preprocessing, now you can just input the image in the pretrained model and get a prediction



Predict



y_hat = base_model.predict(x)
print(y_hat.shape) # res.shape (1, 1000)


y_hat contains the probabilities for each of the 1000 imagenet classes the model assigned to this image.



In order to obtain the class names and a readable output, keras provided an utility function too:



from keras.applications.vgg16 import decode_predictions
decode_predictions(y_hat)


Outputs, for the Zoorashia_elephant.jpg image I downloaded before:



[[('n02504013', 'Indian_elephant', 0.48041093),
('n02504458', 'African_elephant', 0.47474155),
('n01871265', 'tusker', 0.03912963),
('n02437312', 'Arabian_camel', 0.0038948185),
('n01704323', 'triceratops', 0.00062475674)]]


Which seems pretty good!







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 23 at 22:13

























answered Nov 20 at 22:45









Julian Peller

844511




844511












  • I am sorry I didn't make the question clear enough. I am looking at the code from github.com/keras-team/keras/blob/master/examples/…. It visualizes some filters, but it uses model.inputs to get a gradient: grads = K.gradients(loss, input_img)[0] But it does not initialize the model input, therefore I am confused is there any default input of keras VGG16 model
    – cwl6
    Nov 22 at 3:41












  • Hi @cwl6. I think this is a pretty different question from the previous one and my answer addressed the previous one extensively. You can always open a new question and the community will be glad to help!. Regarding this question, you can see on line 91 that it's actually initalizing a random noise image with the line input_img_data = np.random.random((1, 3, img_width, img_height))., with shape (1, 224, 224, 3), which fits my explanation. This input is plugged to the model in line 98 using the definition of iterate done in 84.
    – Julian Peller
    Nov 22 at 4:05












  • Thank you for your reply. I know it takes your times to answer my question. But in line 78 , it used the model input to create a function. Anyway I will open another new question. Thanks
    – cwl6
    Nov 22 at 4:20










  • input_img is a placeholder for the model.input. This is "plugged" to input_img_data using iterate on line 84, where iterate is a K.function which maps inputs (input_img_data) to outputs (the model gradients and losses). The connection between input_img_data and input_img happens there, saying to iterate to take input_img_data as input to obtain the gradients and losses of the model, it's implicitly saying to plug the input_img_data to input_img and, in turn, to model.inputs. I'm not super confident with k
    – Julian Peller
    Nov 22 at 4:27












  • Now, I'm already researching :P ... I'm pretty noob with Keras too. The idea is the following: basically, you are just defining the structure of the model with model.input as the input until you execute a line saying: "ok, this is the input". This line is line 98. This line (loss_value, grads_value = iterate([input_img_data])) says: compute loss and grads running the model defined before with input_imag_data as the input. iterate is the connection between both.
    – Julian Peller
    Nov 22 at 4:36




















  • I am sorry I didn't make the question clear enough. I am looking at the code from github.com/keras-team/keras/blob/master/examples/…. It visualizes some filters, but it uses model.inputs to get a gradient: grads = K.gradients(loss, input_img)[0] But it does not initialize the model input, therefore I am confused is there any default input of keras VGG16 model
    – cwl6
    Nov 22 at 3:41












  • Hi @cwl6. I think this is a pretty different question from the previous one and my answer addressed the previous one extensively. You can always open a new question and the community will be glad to help!. Regarding this question, you can see on line 91 that it's actually initalizing a random noise image with the line input_img_data = np.random.random((1, 3, img_width, img_height))., with shape (1, 224, 224, 3), which fits my explanation. This input is plugged to the model in line 98 using the definition of iterate done in 84.
    – Julian Peller
    Nov 22 at 4:05












  • Thank you for your reply. I know it takes your times to answer my question. But in line 78 , it used the model input to create a function. Anyway I will open another new question. Thanks
    – cwl6
    Nov 22 at 4:20










  • input_img is a placeholder for the model.input. This is "plugged" to input_img_data using iterate on line 84, where iterate is a K.function which maps inputs (input_img_data) to outputs (the model gradients and losses). The connection between input_img_data and input_img happens there, saying to iterate to take input_img_data as input to obtain the gradients and losses of the model, it's implicitly saying to plug the input_img_data to input_img and, in turn, to model.inputs. I'm not super confident with k
    – Julian Peller
    Nov 22 at 4:27












  • Now, I'm already researching :P ... I'm pretty noob with Keras too. The idea is the following: basically, you are just defining the structure of the model with model.input as the input until you execute a line saying: "ok, this is the input". This line is line 98. This line (loss_value, grads_value = iterate([input_img_data])) says: compute loss and grads running the model defined before with input_imag_data as the input. iterate is the connection between both.
    – Julian Peller
    Nov 22 at 4:36


















I am sorry I didn't make the question clear enough. I am looking at the code from github.com/keras-team/keras/blob/master/examples/…. It visualizes some filters, but it uses model.inputs to get a gradient: grads = K.gradients(loss, input_img)[0] But it does not initialize the model input, therefore I am confused is there any default input of keras VGG16 model
– cwl6
Nov 22 at 3:41






I am sorry I didn't make the question clear enough. I am looking at the code from github.com/keras-team/keras/blob/master/examples/…. It visualizes some filters, but it uses model.inputs to get a gradient: grads = K.gradients(loss, input_img)[0] But it does not initialize the model input, therefore I am confused is there any default input of keras VGG16 model
– cwl6
Nov 22 at 3:41














Hi @cwl6. I think this is a pretty different question from the previous one and my answer addressed the previous one extensively. You can always open a new question and the community will be glad to help!. Regarding this question, you can see on line 91 that it's actually initalizing a random noise image with the line input_img_data = np.random.random((1, 3, img_width, img_height))., with shape (1, 224, 224, 3), which fits my explanation. This input is plugged to the model in line 98 using the definition of iterate done in 84.
– Julian Peller
Nov 22 at 4:05






Hi @cwl6. I think this is a pretty different question from the previous one and my answer addressed the previous one extensively. You can always open a new question and the community will be glad to help!. Regarding this question, you can see on line 91 that it's actually initalizing a random noise image with the line input_img_data = np.random.random((1, 3, img_width, img_height))., with shape (1, 224, 224, 3), which fits my explanation. This input is plugged to the model in line 98 using the definition of iterate done in 84.
– Julian Peller
Nov 22 at 4:05














Thank you for your reply. I know it takes your times to answer my question. But in line 78 , it used the model input to create a function. Anyway I will open another new question. Thanks
– cwl6
Nov 22 at 4:20




Thank you for your reply. I know it takes your times to answer my question. But in line 78 , it used the model input to create a function. Anyway I will open another new question. Thanks
– cwl6
Nov 22 at 4:20












input_img is a placeholder for the model.input. This is "plugged" to input_img_data using iterate on line 84, where iterate is a K.function which maps inputs (input_img_data) to outputs (the model gradients and losses). The connection between input_img_data and input_img happens there, saying to iterate to take input_img_data as input to obtain the gradients and losses of the model, it's implicitly saying to plug the input_img_data to input_img and, in turn, to model.inputs. I'm not super confident with k
– Julian Peller
Nov 22 at 4:27






input_img is a placeholder for the model.input. This is "plugged" to input_img_data using iterate on line 84, where iterate is a K.function which maps inputs (input_img_data) to outputs (the model gradients and losses). The connection between input_img_data and input_img happens there, saying to iterate to take input_img_data as input to obtain the gradients and losses of the model, it's implicitly saying to plug the input_img_data to input_img and, in turn, to model.inputs. I'm not super confident with k
– Julian Peller
Nov 22 at 4:27














Now, I'm already researching :P ... I'm pretty noob with Keras too. The idea is the following: basically, you are just defining the structure of the model with model.input as the input until you execute a line saying: "ok, this is the input". This line is line 98. This line (loss_value, grads_value = iterate([input_img_data])) says: compute loss and grads running the model defined before with input_imag_data as the input. iterate is the connection between both.
– Julian Peller
Nov 22 at 4:36






Now, I'm already researching :P ... I'm pretty noob with Keras too. The idea is the following: basically, you are just defining the structure of the model with model.input as the input until you execute a line saying: "ok, this is the input". This line is line 98. This line (loss_value, grads_value = iterate([input_img_data])) says: compute loss and grads running the model defined before with input_imag_data as the input. iterate is the connection between both.
– Julian Peller
Nov 22 at 4:36




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395427%2fkeras-what-is-model-inputs-in-vgg16%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

What visual should I use to simply compare current year value vs last year in Power BI desktop

How to ignore python UserWarning in pytest?

Alexandru Averescu