How to import pre-downloaded MNIST dataset from a specific directory or folder?

I have downloaded the MNIST dataset from LeCun site. What I want is to write the Python code in order to extract the gzip and read the dataset directly from the directory, meaning that I don't have to download or access to the MNIST site anymore.

Desire process:
Access folder/directory --> extract gzip --> read dataset (one hot encoding)

How to do it? Since almost all tutorials have to access to the either the LeCun or Tensoflow site to download and read the dataset. Thanks in advance!

asked Jan 15 at 5:13

Joshua

114118

1

You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
– yuji
Jan 15 at 5:17

Have you tried anything?
– Vivek Kumar
Jan 15 at 9:32

Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
– Joshua
Jan 15 at 13:42

add a comment |

Desire process:
Access folder/directory --> extract gzip --> read dataset (one hot encoding)

How to do it? Since almost all tutorials have to access to the either the LeCun or Tensoflow site to download and read the dataset. Thanks in advance!

asked Jan 15 at 5:13

Joshua

114118

1

You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
– yuji
Jan 15 at 5:17

Have you tried anything?
– Vivek Kumar
Jan 15 at 9:32

Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
– Joshua
Jan 15 at 13:42

add a comment |

Desire process:
Access folder/directory --> extract gzip --> read dataset (one hot encoding)

How to do it? Since almost all tutorials have to access to the either the LeCun or Tensoflow site to download and read the dataset. Thanks in advance!

asked Jan 15 at 5:13

Joshua

114118

Desire process:
Access folder/directory --> extract gzip --> read dataset (one hot encoding)

How to do it? Since almost all tutorials have to access to the either the LeCun or Tensoflow site to download and read the dataset. Thanks in advance!

python tensorflow machine-learning deep-learning mnist

asked Jan 15 at 5:13

Joshua

114118

asked Jan 15 at 5:13

Joshua

114118

asked Jan 15 at 5:13

Joshua

114118

asked Jan 15 at 5:13

Joshua

114118

asked Jan 15 at 5:13

Joshua

114118

1

You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
– yuji
Jan 15 at 5:17

Have you tried anything?
– Vivek Kumar
Jan 15 at 9:32

Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
– Joshua
Jan 15 at 13:42

add a comment |

1

You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
– yuji
Jan 15 at 5:17

Have you tried anything?
– Vivek Kumar
Jan 15 at 9:32

Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
– Joshua
Jan 15 at 13:42

You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
– yuji
Jan 15 at 5:17

Have you tried anything?
– Vivek Kumar
Jan 15 at 9:32

Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
– Joshua
Jan 15 at 13:42

add a comment |

3 Answers
3

active

oldest

votes

This tensorflow call

from tensorflow.examples.tutorials.mnist import input_data

input_data.read_data_sets('my/directory')

... won't download anything it if you already have the files there.

But if for some reason you wish to unzip it yourself, here's how you do it:

from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels



with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:

  train_images = extract_images(f)

with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:

  train_labels = extract_labels(f)



with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:

  test_images = extract_images(f)

with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:

  test_labels = extract_labels(f)

edited Nov 22 at 18:25

binke ou

answered Jan 15 at 14:01

Maxim

29.9k2174121

Thank you! It's work!
– Joshua
Jan 17 at 3:03

add a comment |

I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot

import cPickle

import gzip

import numpy as np

import matplotlib.pyplot as plt



def load_data():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    # get the first image and it's label

    img1_arr, img1_label = X_train[0], y_train[0]

    print img1_arr.shape, img1_label

    # (784L,) , 5



    # reshape first image(1 D vector) to 2D dimension image

    img1_2d = np.reshape(img1_arr, (28, 28))

    # show it

    plt.subplot(111)

    plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))

    plt.show()

enter image description here

You can also vectorize label to a 10-dimensional unit vector by this sample function:

def vectorized_result(label):

    e = np.zeros((10, 1))

    e[label] = 1.0

    return e

vectorize the above label:

print vectorized_result(img1_label)

# output as below:

[[ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 1.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]]

If you want to translate it to CNN input, you can reshape it like this:

def load_data_v2():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])

    y_train = np.array([vectorized_result(item) for item in y_train])



    print X_train.shape, y_train.shape

    # (50000L, 28L, 28L) (50000L, 10L, 1L)

answered Jul 14 at 4:11

Jayhello

97811018

add a comment |

If you have the MNIST data extracted, then you can load it low-level with NumPy directly:

def loadMNIST( prefix, folder ):

    intType = np.dtype( 'int32' ).newbyteorder( '>' )

    nMetaDataBytes = 4 * intType.itemsize



    data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )

    magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )

    data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )



    labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',

                          dtype = 'ubyte' )[2 * intType.itemsize:]



    return data, labels



trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )

testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )

And to convert to hot-encoding:

def toHotEncoding( classification ):

    # emulates the functionality of tf.keras.utils.to_categorical( y )

    hotEncoding = np.zeros( [ len( classification ), 

                              np.max( classification ) + 1 ] )

    hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1

    return hotEncoding



trainingLabels = toHotEncoding( trainingLabels )

testLabels = toHotEncoding( testLabels )

edited Dec 16 at 11:00

answered Nov 9 at 12:53

mxmlnkn

893914

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f48257255%2fhow-to-import-pre-downloaded-mnist-dataset-from-a-specific-directory-or-folder%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

This tensorflow call

from tensorflow.examples.tutorials.mnist import input_data

input_data.read_data_sets('my/directory')

... won't download anything it if you already have the files there.

But if for some reason you wish to unzip it yourself, here's how you do it:

from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels



with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:

  train_images = extract_images(f)

with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:

  train_labels = extract_labels(f)



with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:

  test_images = extract_images(f)

with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:

  test_labels = extract_labels(f)

edited Nov 22 at 18:25

binke ou

answered Jan 15 at 14:01

Maxim

29.9k2174121

Thank you! It's work!
– Joshua
Jan 17 at 3:03

add a comment |

This tensorflow call

from tensorflow.examples.tutorials.mnist import input_data

input_data.read_data_sets('my/directory')

... won't download anything it if you already have the files there.

But if for some reason you wish to unzip it yourself, here's how you do it:

from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels



with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:

  train_images = extract_images(f)

with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:

  train_labels = extract_labels(f)



with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:

  test_images = extract_images(f)

with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:

  test_labels = extract_labels(f)

edited Nov 22 at 18:25

binke ou

answered Jan 15 at 14:01

Maxim

29.9k2174121

Thank you! It's work!
– Joshua
Jan 17 at 3:03

add a comment |

This tensorflow call

from tensorflow.examples.tutorials.mnist import input_data

input_data.read_data_sets('my/directory')

... won't download anything it if you already have the files there.

But if for some reason you wish to unzip it yourself, here's how you do it:

from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels



with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:

  train_images = extract_images(f)

with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:

  train_labels = extract_labels(f)



with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:

  test_images = extract_images(f)

with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:

  test_labels = extract_labels(f)

edited Nov 22 at 18:25

binke ou

answered Jan 15 at 14:01

Maxim

29.9k2174121

This tensorflow call

from tensorflow.examples.tutorials.mnist import input_data

input_data.read_data_sets('my/directory')

... won't download anything it if you already have the files there.

But if for some reason you wish to unzip it yourself, here's how you do it:

from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels



with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:

  train_images = extract_images(f)

with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:

  train_labels = extract_labels(f)



with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:

  test_images = extract_images(f)

with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:

  test_labels = extract_labels(f)

edited Nov 22 at 18:25

binke ou

answered Jan 15 at 14:01

Maxim

29.9k2174121

edited Nov 22 at 18:25

binke ou

edited Nov 22 at 18:25

binke ou

edited Nov 22 at 18:25

binke ou

answered Jan 15 at 14:01

Maxim

29.9k2174121

answered Jan 15 at 14:01

Maxim

29.9k2174121

answered Jan 15 at 14:01

Maxim

29.9k2174121

Thank you! It's work!
– Joshua
Jan 17 at 3:03

add a comment |

Thank you! It's work!
– Joshua
Jan 17 at 3:03

Thank you! It's work!
– Joshua
Jan 17 at 3:03

add a comment |

I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot

import cPickle

import gzip

import numpy as np

import matplotlib.pyplot as plt



def load_data():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    # get the first image and it's label

    img1_arr, img1_label = X_train[0], y_train[0]

    print img1_arr.shape, img1_label

    # (784L,) , 5



    # reshape first image(1 D vector) to 2D dimension image

    img1_2d = np.reshape(img1_arr, (28, 28))

    # show it

    plt.subplot(111)

    plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))

    plt.show()

enter image description here

You can also vectorize label to a 10-dimensional unit vector by this sample function:

def vectorized_result(label):

    e = np.zeros((10, 1))

    e[label] = 1.0

    return e

vectorize the above label:

print vectorized_result(img1_label)

# output as below:

[[ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 1.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]]

If you want to translate it to CNN input, you can reshape it like this:

def load_data_v2():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])

    y_train = np.array([vectorized_result(item) for item in y_train])



    print X_train.shape, y_train.shape

    # (50000L, 28L, 28L) (50000L, 10L, 1L)

answered Jul 14 at 4:11

Jayhello

97811018

add a comment |

I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot

import cPickle

import gzip

import numpy as np

import matplotlib.pyplot as plt



def load_data():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    # get the first image and it's label

    img1_arr, img1_label = X_train[0], y_train[0]

    print img1_arr.shape, img1_label

    # (784L,) , 5



    # reshape first image(1 D vector) to 2D dimension image

    img1_2d = np.reshape(img1_arr, (28, 28))

    # show it

    plt.subplot(111)

    plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))

    plt.show()

enter image description here

You can also vectorize label to a 10-dimensional unit vector by this sample function:

def vectorized_result(label):

    e = np.zeros((10, 1))

    e[label] = 1.0

    return e

vectorize the above label:

print vectorized_result(img1_label)

# output as below:

[[ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 1.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]]

If you want to translate it to CNN input, you can reshape it like this:

def load_data_v2():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])

    y_train = np.array([vectorized_result(item) for item in y_train])



    print X_train.shape, y_train.shape

    # (50000L, 28L, 28L) (50000L, 10L, 1L)

answered Jul 14 at 4:11

Jayhello

97811018

add a comment |

I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot

import cPickle

import gzip

import numpy as np

import matplotlib.pyplot as plt



def load_data():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    # get the first image and it's label

    img1_arr, img1_label = X_train[0], y_train[0]

    print img1_arr.shape, img1_label

    # (784L,) , 5



    # reshape first image(1 D vector) to 2D dimension image

    img1_2d = np.reshape(img1_arr, (28, 28))

    # show it

    plt.subplot(111)

    plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))

    plt.show()

enter image description here

You can also vectorize label to a 10-dimensional unit vector by this sample function:

def vectorized_result(label):

    e = np.zeros((10, 1))

    e[label] = 1.0

    return e

vectorize the above label:

print vectorized_result(img1_label)

# output as below:

[[ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 1.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]]

If you want to translate it to CNN input, you can reshape it like this:

def load_data_v2():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])

    y_train = np.array([vectorized_result(item) for item in y_train])



    print X_train.shape, y_train.shape

    # (50000L, 28L, 28L) (50000L, 10L, 1L)

answered Jul 14 at 4:11

Jayhello

97811018

I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot

import cPickle

import gzip

import numpy as np

import matplotlib.pyplot as plt



def load_data():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    # get the first image and it's label

    img1_arr, img1_label = X_train[0], y_train[0]

    print img1_arr.shape, img1_label

    # (784L,) , 5



    # reshape first image(1 D vector) to 2D dimension image

    img1_2d = np.reshape(img1_arr, (28, 28))

    # show it

    plt.subplot(111)

    plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))

    plt.show()

enter image description here

You can also vectorize label to a 10-dimensional unit vector by this sample function:

def vectorized_result(label):

    e = np.zeros((10, 1))

    e[label] = 1.0

    return e

vectorize the above label:

print vectorized_result(img1_label)

# output as below:

[[ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 1.]

 [ 0.]

 [ 0.]

 [ 0.]

 [ 0.]]

If you want to translate it to CNN input, you can reshape it like this:

def load_data_v2():

    path = '../../data/mnist.pkl.gz'

    f = gzip.open(path, 'rb')

    training_data, validation_data, test_data = cPickle.load(f)

    f.close()



    X_train, y_train = training_data[0], training_data[1]

    print X_train.shape, y_train.shape

    # (50000L, 784L) (50000L,)



    X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])

    y_train = np.array([vectorized_result(item) for item in y_train])



    print X_train.shape, y_train.shape

    # (50000L, 28L, 28L) (50000L, 10L, 1L)

answered Jul 14 at 4:11

Jayhello

97811018

answered Jul 14 at 4:11

Jayhello

97811018

answered Jul 14 at 4:11

Jayhello

97811018

answered Jul 14 at 4:11

Jayhello

97811018

add a comment |

If you have the MNIST data extracted, then you can load it low-level with NumPy directly:

def loadMNIST( prefix, folder ):

    intType = np.dtype( 'int32' ).newbyteorder( '>' )

    nMetaDataBytes = 4 * intType.itemsize



    data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )

    magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )

    data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )



    labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',

                          dtype = 'ubyte' )[2 * intType.itemsize:]



    return data, labels



trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )

testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )

And to convert to hot-encoding:

def toHotEncoding( classification ):

    # emulates the functionality of tf.keras.utils.to_categorical( y )

    hotEncoding = np.zeros( [ len( classification ), 

                              np.max( classification ) + 1 ] )

    hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1

    return hotEncoding



trainingLabels = toHotEncoding( trainingLabels )

testLabels = toHotEncoding( testLabels )

edited Dec 16 at 11:00

answered Nov 9 at 12:53

mxmlnkn

893914

add a comment |

If you have the MNIST data extracted, then you can load it low-level with NumPy directly:

def loadMNIST( prefix, folder ):

    intType = np.dtype( 'int32' ).newbyteorder( '>' )

    nMetaDataBytes = 4 * intType.itemsize



    data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )

    magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )

    data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )



    labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',

                          dtype = 'ubyte' )[2 * intType.itemsize:]



    return data, labels



trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )

testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )

And to convert to hot-encoding:

def toHotEncoding( classification ):

    # emulates the functionality of tf.keras.utils.to_categorical( y )

    hotEncoding = np.zeros( [ len( classification ), 

                              np.max( classification ) + 1 ] )

    hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1

    return hotEncoding



trainingLabels = toHotEncoding( trainingLabels )

testLabels = toHotEncoding( testLabels )

edited Dec 16 at 11:00

answered Nov 9 at 12:53

mxmlnkn

893914

add a comment |

If you have the MNIST data extracted, then you can load it low-level with NumPy directly:

def loadMNIST( prefix, folder ):

    intType = np.dtype( 'int32' ).newbyteorder( '>' )

    nMetaDataBytes = 4 * intType.itemsize



    data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )

    magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )

    data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )



    labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',

                          dtype = 'ubyte' )[2 * intType.itemsize:]



    return data, labels



trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )

testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )

And to convert to hot-encoding:

def toHotEncoding( classification ):

    # emulates the functionality of tf.keras.utils.to_categorical( y )

    hotEncoding = np.zeros( [ len( classification ), 

                              np.max( classification ) + 1 ] )

    hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1

    return hotEncoding



trainingLabels = toHotEncoding( trainingLabels )

testLabels = toHotEncoding( testLabels )

edited Dec 16 at 11:00

answered Nov 9 at 12:53

mxmlnkn

893914

If you have the MNIST data extracted, then you can load it low-level with NumPy directly:

def loadMNIST( prefix, folder ):

    intType = np.dtype( 'int32' ).newbyteorder( '>' )

    nMetaDataBytes = 4 * intType.itemsize



    data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )

    magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )

    data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )



    labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',

                          dtype = 'ubyte' )[2 * intType.itemsize:]



    return data, labels



trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )

testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )

And to convert to hot-encoding:

def toHotEncoding( classification ):

    # emulates the functionality of tf.keras.utils.to_categorical( y )

    hotEncoding = np.zeros( [ len( classification ), 

                              np.max( classification ) + 1 ] )

    hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1

    return hotEncoding



trainingLabels = toHotEncoding( trainingLabels )

testLabels = toHotEncoding( testLabels )

edited Dec 16 at 11:00

answered Nov 9 at 12:53

mxmlnkn

893914

edited Dec 16 at 11:00

answered Nov 9 at 12:53

mxmlnkn

893914

answered Nov 9 at 12:53

mxmlnkn

893914

answered Nov 9 at 12:53

mxmlnkn

893914

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi