How to import pre-downloaded MNIST dataset from a specific directory or folder?












3














I have downloaded the MNIST dataset from LeCun site. What I want is to write the Python code in order to extract the gzip and read the dataset directly from the directory, meaning that I don't have to download or access to the MNIST site anymore.



Desire process:
Access folder/directory --> extract gzip --> read dataset (one hot encoding)



How to do it? Since almost all tutorials have to access to the either the LeCun or Tensoflow site to download and read the dataset. Thanks in advance!










share|improve this question


















  • 1




    You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
    – yuji
    Jan 15 at 5:17










  • Have you tried anything?
    – Vivek Kumar
    Jan 15 at 9:32










  • Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
    – Joshua
    Jan 15 at 13:42


















3














I have downloaded the MNIST dataset from LeCun site. What I want is to write the Python code in order to extract the gzip and read the dataset directly from the directory, meaning that I don't have to download or access to the MNIST site anymore.



Desire process:
Access folder/directory --> extract gzip --> read dataset (one hot encoding)



How to do it? Since almost all tutorials have to access to the either the LeCun or Tensoflow site to download and read the dataset. Thanks in advance!










share|improve this question


















  • 1




    You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
    – yuji
    Jan 15 at 5:17










  • Have you tried anything?
    – Vivek Kumar
    Jan 15 at 9:32










  • Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
    – Joshua
    Jan 15 at 13:42
















3












3








3


1





I have downloaded the MNIST dataset from LeCun site. What I want is to write the Python code in order to extract the gzip and read the dataset directly from the directory, meaning that I don't have to download or access to the MNIST site anymore.



Desire process:
Access folder/directory --> extract gzip --> read dataset (one hot encoding)



How to do it? Since almost all tutorials have to access to the either the LeCun or Tensoflow site to download and read the dataset. Thanks in advance!










share|improve this question













I have downloaded the MNIST dataset from LeCun site. What I want is to write the Python code in order to extract the gzip and read the dataset directly from the directory, meaning that I don't have to download or access to the MNIST site anymore.



Desire process:
Access folder/directory --> extract gzip --> read dataset (one hot encoding)



How to do it? Since almost all tutorials have to access to the either the LeCun or Tensoflow site to download and read the dataset. Thanks in advance!







python tensorflow machine-learning deep-learning mnist






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 15 at 5:13









Joshua

114118




114118








  • 1




    You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
    – yuji
    Jan 15 at 5:17










  • Have you tried anything?
    – Vivek Kumar
    Jan 15 at 9:32










  • Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
    – Joshua
    Jan 15 at 13:42
















  • 1




    You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
    – yuji
    Jan 15 at 5:17










  • Have you tried anything?
    – Vivek Kumar
    Jan 15 at 9:32










  • Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
    – Joshua
    Jan 15 at 13:42










1




1




You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
– yuji
Jan 15 at 5:17




You should extract the gzip locally onto your computer and then use scipy.misc.imread or opencv to read images to Python.
– yuji
Jan 15 at 5:17












Have you tried anything?
– Vivek Kumar
Jan 15 at 9:32




Have you tried anything?
– Vivek Kumar
Jan 15 at 9:32












Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
– Joshua
Jan 15 at 13:42






Yes, I tried to remove the 'from tensorflow.examples.tutorials.mnist import input_data'. But it still downloading the dataset from the site. Still figuring out why even left this "mnist = input_data.read_data_sets('mnist_data/', one_hot=True)" line of code it still access and downloading the dataset.
– Joshua
Jan 15 at 13:42














3 Answers
3






active

oldest

votes


















5














This tensorflow call



from tensorflow.examples.tutorials.mnist import input_data
input_data.read_data_sets('my/directory')


... won't download anything it if you already have the files there.



But if for some reason you wish to unzip it yourself, here's how you do it:



from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels

with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:
train_images = extract_images(f)
with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:
train_labels = extract_labels(f)

with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:
test_images = extract_images(f)
with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:
test_labels = extract_labels(f)





share|improve this answer























  • Thank you! It's work!
    – Joshua
    Jan 17 at 3:03



















2














I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot



import cPickle
import gzip
import numpy as np
import matplotlib.pyplot as plt

def load_data():
path = '../../data/mnist.pkl.gz'
f = gzip.open(path, 'rb')
training_data, validation_data, test_data = cPickle.load(f)
f.close()

X_train, y_train = training_data[0], training_data[1]
print X_train.shape, y_train.shape
# (50000L, 784L) (50000L,)

# get the first image and it's label
img1_arr, img1_label = X_train[0], y_train[0]
print img1_arr.shape, img1_label
# (784L,) , 5

# reshape first image(1 D vector) to 2D dimension image
img1_2d = np.reshape(img1_arr, (28, 28))
# show it
plt.subplot(111)
plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))
plt.show()


enter image description here



You can also vectorize label to a 10-dimensional unit vector by this sample function:



def vectorized_result(label):
e = np.zeros((10, 1))
e[label] = 1.0
return e


vectorize the above label:



print vectorized_result(img1_label)
# output as below:
[[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 1.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]]


If you want to translate it to CNN input, you can reshape it like this:



def load_data_v2():
path = '../../data/mnist.pkl.gz'
f = gzip.open(path, 'rb')
training_data, validation_data, test_data = cPickle.load(f)
f.close()

X_train, y_train = training_data[0], training_data[1]
print X_train.shape, y_train.shape
# (50000L, 784L) (50000L,)

X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])
y_train = np.array([vectorized_result(item) for item in y_train])

print X_train.shape, y_train.shape
# (50000L, 28L, 28L) (50000L, 10L, 1L)





share|improve this answer





























    1














    If you have the MNIST data extracted, then you can load it low-level with NumPy directly:



    def loadMNIST( prefix, folder ):
    intType = np.dtype( 'int32' ).newbyteorder( '>' )
    nMetaDataBytes = 4 * intType.itemsize

    data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )
    magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )
    data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )

    labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',
    dtype = 'ubyte' )[2 * intType.itemsize:]

    return data, labels

    trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )
    testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )


    And to convert to hot-encoding:



    def toHotEncoding( classification ):
    # emulates the functionality of tf.keras.utils.to_categorical( y )
    hotEncoding = np.zeros( [ len( classification ),
    np.max( classification ) + 1 ] )
    hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1
    return hotEncoding

    trainingLabels = toHotEncoding( trainingLabels )
    testLabels = toHotEncoding( testLabels )





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f48257255%2fhow-to-import-pre-downloaded-mnist-dataset-from-a-specific-directory-or-folder%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      5














      This tensorflow call



      from tensorflow.examples.tutorials.mnist import input_data
      input_data.read_data_sets('my/directory')


      ... won't download anything it if you already have the files there.



      But if for some reason you wish to unzip it yourself, here's how you do it:



      from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels

      with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:
      train_images = extract_images(f)
      with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:
      train_labels = extract_labels(f)

      with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:
      test_images = extract_images(f)
      with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:
      test_labels = extract_labels(f)





      share|improve this answer























      • Thank you! It's work!
        – Joshua
        Jan 17 at 3:03
















      5














      This tensorflow call



      from tensorflow.examples.tutorials.mnist import input_data
      input_data.read_data_sets('my/directory')


      ... won't download anything it if you already have the files there.



      But if for some reason you wish to unzip it yourself, here's how you do it:



      from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels

      with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:
      train_images = extract_images(f)
      with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:
      train_labels = extract_labels(f)

      with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:
      test_images = extract_images(f)
      with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:
      test_labels = extract_labels(f)





      share|improve this answer























      • Thank you! It's work!
        – Joshua
        Jan 17 at 3:03














      5












      5








      5






      This tensorflow call



      from tensorflow.examples.tutorials.mnist import input_data
      input_data.read_data_sets('my/directory')


      ... won't download anything it if you already have the files there.



      But if for some reason you wish to unzip it yourself, here's how you do it:



      from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels

      with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:
      train_images = extract_images(f)
      with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:
      train_labels = extract_labels(f)

      with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:
      test_images = extract_images(f)
      with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:
      test_labels = extract_labels(f)





      share|improve this answer














      This tensorflow call



      from tensorflow.examples.tutorials.mnist import input_data
      input_data.read_data_sets('my/directory')


      ... won't download anything it if you already have the files there.



      But if for some reason you wish to unzip it yourself, here's how you do it:



      from tensorflow.contrib.learn.python.learn.datasets.mnist import extract_images, extract_labels

      with open('my/directory/train-images-idx3-ubyte.gz', 'rb') as f:
      train_images = extract_images(f)
      with open('my/directory/train-labels-idx1-ubyte.gz', 'rb') as f:
      train_labels = extract_labels(f)

      with open('my/directory/t10k-images-idx3-ubyte.gz', 'rb') as f:
      test_images = extract_images(f)
      with open('my/directory/t10k-labels-idx1-ubyte.gz', 'rb') as f:
      test_labels = extract_labels(f)






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 22 at 18:25









      binke ou

      54




      54










      answered Jan 15 at 14:01









      Maxim

      29.9k2174121




      29.9k2174121












      • Thank you! It's work!
        – Joshua
        Jan 17 at 3:03


















      • Thank you! It's work!
        – Joshua
        Jan 17 at 3:03
















      Thank you! It's work!
      – Joshua
      Jan 17 at 3:03




      Thank you! It's work!
      – Joshua
      Jan 17 at 3:03













      2














      I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot



      import cPickle
      import gzip
      import numpy as np
      import matplotlib.pyplot as plt

      def load_data():
      path = '../../data/mnist.pkl.gz'
      f = gzip.open(path, 'rb')
      training_data, validation_data, test_data = cPickle.load(f)
      f.close()

      X_train, y_train = training_data[0], training_data[1]
      print X_train.shape, y_train.shape
      # (50000L, 784L) (50000L,)

      # get the first image and it's label
      img1_arr, img1_label = X_train[0], y_train[0]
      print img1_arr.shape, img1_label
      # (784L,) , 5

      # reshape first image(1 D vector) to 2D dimension image
      img1_2d = np.reshape(img1_arr, (28, 28))
      # show it
      plt.subplot(111)
      plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))
      plt.show()


      enter image description here



      You can also vectorize label to a 10-dimensional unit vector by this sample function:



      def vectorized_result(label):
      e = np.zeros((10, 1))
      e[label] = 1.0
      return e


      vectorize the above label:



      print vectorized_result(img1_label)
      # output as below:
      [[ 0.]
      [ 0.]
      [ 0.]
      [ 0.]
      [ 0.]
      [ 1.]
      [ 0.]
      [ 0.]
      [ 0.]
      [ 0.]]


      If you want to translate it to CNN input, you can reshape it like this:



      def load_data_v2():
      path = '../../data/mnist.pkl.gz'
      f = gzip.open(path, 'rb')
      training_data, validation_data, test_data = cPickle.load(f)
      f.close()

      X_train, y_train = training_data[0], training_data[1]
      print X_train.shape, y_train.shape
      # (50000L, 784L) (50000L,)

      X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])
      y_train = np.array([vectorized_result(item) for item in y_train])

      print X_train.shape, y_train.shape
      # (50000L, 28L, 28L) (50000L, 10L, 1L)





      share|improve this answer


























        2














        I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot



        import cPickle
        import gzip
        import numpy as np
        import matplotlib.pyplot as plt

        def load_data():
        path = '../../data/mnist.pkl.gz'
        f = gzip.open(path, 'rb')
        training_data, validation_data, test_data = cPickle.load(f)
        f.close()

        X_train, y_train = training_data[0], training_data[1]
        print X_train.shape, y_train.shape
        # (50000L, 784L) (50000L,)

        # get the first image and it's label
        img1_arr, img1_label = X_train[0], y_train[0]
        print img1_arr.shape, img1_label
        # (784L,) , 5

        # reshape first image(1 D vector) to 2D dimension image
        img1_2d = np.reshape(img1_arr, (28, 28))
        # show it
        plt.subplot(111)
        plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))
        plt.show()


        enter image description here



        You can also vectorize label to a 10-dimensional unit vector by this sample function:



        def vectorized_result(label):
        e = np.zeros((10, 1))
        e[label] = 1.0
        return e


        vectorize the above label:



        print vectorized_result(img1_label)
        # output as below:
        [[ 0.]
        [ 0.]
        [ 0.]
        [ 0.]
        [ 0.]
        [ 1.]
        [ 0.]
        [ 0.]
        [ 0.]
        [ 0.]]


        If you want to translate it to CNN input, you can reshape it like this:



        def load_data_v2():
        path = '../../data/mnist.pkl.gz'
        f = gzip.open(path, 'rb')
        training_data, validation_data, test_data = cPickle.load(f)
        f.close()

        X_train, y_train = training_data[0], training_data[1]
        print X_train.shape, y_train.shape
        # (50000L, 784L) (50000L,)

        X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])
        y_train = np.array([vectorized_result(item) for item in y_train])

        print X_train.shape, y_train.shape
        # (50000L, 28L, 28L) (50000L, 10L, 1L)





        share|improve this answer
























          2












          2








          2






          I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot



          import cPickle
          import gzip
          import numpy as np
          import matplotlib.pyplot as plt

          def load_data():
          path = '../../data/mnist.pkl.gz'
          f = gzip.open(path, 'rb')
          training_data, validation_data, test_data = cPickle.load(f)
          f.close()

          X_train, y_train = training_data[0], training_data[1]
          print X_train.shape, y_train.shape
          # (50000L, 784L) (50000L,)

          # get the first image and it's label
          img1_arr, img1_label = X_train[0], y_train[0]
          print img1_arr.shape, img1_label
          # (784L,) , 5

          # reshape first image(1 D vector) to 2D dimension image
          img1_2d = np.reshape(img1_arr, (28, 28))
          # show it
          plt.subplot(111)
          plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))
          plt.show()


          enter image description here



          You can also vectorize label to a 10-dimensional unit vector by this sample function:



          def vectorized_result(label):
          e = np.zeros((10, 1))
          e[label] = 1.0
          return e


          vectorize the above label:



          print vectorized_result(img1_label)
          # output as below:
          [[ 0.]
          [ 0.]
          [ 0.]
          [ 0.]
          [ 0.]
          [ 1.]
          [ 0.]
          [ 0.]
          [ 0.]
          [ 0.]]


          If you want to translate it to CNN input, you can reshape it like this:



          def load_data_v2():
          path = '../../data/mnist.pkl.gz'
          f = gzip.open(path, 'rb')
          training_data, validation_data, test_data = cPickle.load(f)
          f.close()

          X_train, y_train = training_data[0], training_data[1]
          print X_train.shape, y_train.shape
          # (50000L, 784L) (50000L,)

          X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])
          y_train = np.array([vectorized_result(item) for item in y_train])

          print X_train.shape, y_train.shape
          # (50000L, 28L, 28L) (50000L, 10L, 1L)





          share|improve this answer












          I will show how to load it from scratch(for better understanding), and show how to show digit image from it by matplotlib.pyplot



          import cPickle
          import gzip
          import numpy as np
          import matplotlib.pyplot as plt

          def load_data():
          path = '../../data/mnist.pkl.gz'
          f = gzip.open(path, 'rb')
          training_data, validation_data, test_data = cPickle.load(f)
          f.close()

          X_train, y_train = training_data[0], training_data[1]
          print X_train.shape, y_train.shape
          # (50000L, 784L) (50000L,)

          # get the first image and it's label
          img1_arr, img1_label = X_train[0], y_train[0]
          print img1_arr.shape, img1_label
          # (784L,) , 5

          # reshape first image(1 D vector) to 2D dimension image
          img1_2d = np.reshape(img1_arr, (28, 28))
          # show it
          plt.subplot(111)
          plt.imshow(img1_2d, cmap=plt.get_cmap('gray'))
          plt.show()


          enter image description here



          You can also vectorize label to a 10-dimensional unit vector by this sample function:



          def vectorized_result(label):
          e = np.zeros((10, 1))
          e[label] = 1.0
          return e


          vectorize the above label:



          print vectorized_result(img1_label)
          # output as below:
          [[ 0.]
          [ 0.]
          [ 0.]
          [ 0.]
          [ 0.]
          [ 1.]
          [ 0.]
          [ 0.]
          [ 0.]
          [ 0.]]


          If you want to translate it to CNN input, you can reshape it like this:



          def load_data_v2():
          path = '../../data/mnist.pkl.gz'
          f = gzip.open(path, 'rb')
          training_data, validation_data, test_data = cPickle.load(f)
          f.close()

          X_train, y_train = training_data[0], training_data[1]
          print X_train.shape, y_train.shape
          # (50000L, 784L) (50000L,)

          X_train = np.array([np.reshape(item, (28, 28)) for item in X_train])
          y_train = np.array([vectorized_result(item) for item in y_train])

          print X_train.shape, y_train.shape
          # (50000L, 28L, 28L) (50000L, 10L, 1L)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jul 14 at 4:11









          Jayhello

          97811018




          97811018























              1














              If you have the MNIST data extracted, then you can load it low-level with NumPy directly:



              def loadMNIST( prefix, folder ):
              intType = np.dtype( 'int32' ).newbyteorder( '>' )
              nMetaDataBytes = 4 * intType.itemsize

              data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )
              magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )
              data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )

              labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',
              dtype = 'ubyte' )[2 * intType.itemsize:]

              return data, labels

              trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )
              testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )


              And to convert to hot-encoding:



              def toHotEncoding( classification ):
              # emulates the functionality of tf.keras.utils.to_categorical( y )
              hotEncoding = np.zeros( [ len( classification ),
              np.max( classification ) + 1 ] )
              hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1
              return hotEncoding

              trainingLabels = toHotEncoding( trainingLabels )
              testLabels = toHotEncoding( testLabels )





              share|improve this answer




























                1














                If you have the MNIST data extracted, then you can load it low-level with NumPy directly:



                def loadMNIST( prefix, folder ):
                intType = np.dtype( 'int32' ).newbyteorder( '>' )
                nMetaDataBytes = 4 * intType.itemsize

                data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )
                magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )
                data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )

                labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',
                dtype = 'ubyte' )[2 * intType.itemsize:]

                return data, labels

                trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )
                testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )


                And to convert to hot-encoding:



                def toHotEncoding( classification ):
                # emulates the functionality of tf.keras.utils.to_categorical( y )
                hotEncoding = np.zeros( [ len( classification ),
                np.max( classification ) + 1 ] )
                hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1
                return hotEncoding

                trainingLabels = toHotEncoding( trainingLabels )
                testLabels = toHotEncoding( testLabels )





                share|improve this answer


























                  1












                  1








                  1






                  If you have the MNIST data extracted, then you can load it low-level with NumPy directly:



                  def loadMNIST( prefix, folder ):
                  intType = np.dtype( 'int32' ).newbyteorder( '>' )
                  nMetaDataBytes = 4 * intType.itemsize

                  data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )
                  magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )
                  data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )

                  labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',
                  dtype = 'ubyte' )[2 * intType.itemsize:]

                  return data, labels

                  trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )
                  testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )


                  And to convert to hot-encoding:



                  def toHotEncoding( classification ):
                  # emulates the functionality of tf.keras.utils.to_categorical( y )
                  hotEncoding = np.zeros( [ len( classification ),
                  np.max( classification ) + 1 ] )
                  hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1
                  return hotEncoding

                  trainingLabels = toHotEncoding( trainingLabels )
                  testLabels = toHotEncoding( testLabels )





                  share|improve this answer














                  If you have the MNIST data extracted, then you can load it low-level with NumPy directly:



                  def loadMNIST( prefix, folder ):
                  intType = np.dtype( 'int32' ).newbyteorder( '>' )
                  nMetaDataBytes = 4 * intType.itemsize

                  data = np.fromfile( folder + "/" + prefix + '-images-idx3-ubyte', dtype = 'ubyte' )
                  magicBytes, nImages, width, height = np.frombuffer( data[:nMetaDataBytes].tobytes(), intType )
                  data = data[nMetaDataBytes:].astype( dtype = 'float32' ).reshape( [ nImages, width, height ] )

                  labels = np.fromfile( folder + "/" + prefix + '-labels-idx1-ubyte',
                  dtype = 'ubyte' )[2 * intType.itemsize:]

                  return data, labels

                  trainingImages, trainingLabels = loadMNIST( "train", "../datasets/mnist/" )
                  testImages, testLabels = loadMNIST( "t10k", "../datasets/mnist/" )


                  And to convert to hot-encoding:



                  def toHotEncoding( classification ):
                  # emulates the functionality of tf.keras.utils.to_categorical( y )
                  hotEncoding = np.zeros( [ len( classification ),
                  np.max( classification ) + 1 ] )
                  hotEncoding[ np.arange( len( hotEncoding ) ), classification ] = 1
                  return hotEncoding

                  trainingLabels = toHotEncoding( trainingLabels )
                  testLabels = toHotEncoding( testLabels )






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Dec 16 at 11:00

























                  answered Nov 9 at 12:53









                  mxmlnkn

                  893914




                  893914






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f48257255%2fhow-to-import-pre-downloaded-mnist-dataset-from-a-specific-directory-or-folder%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      What visual should I use to simply compare current year value vs last year in Power BI desktop

                      How to ignore python UserWarning in pytest?

                      Alexandru Averescu