How to deal with huge data? #12

aryopg · 2017-01-16T12:41:46Z

I'm currently trying to generate a dataset based on your tutorial, but i was wondering how to deal with a huge data(10 gb of images). My laptop can't handle the huge amount of data(because the tutorial told us that we need to store the data into array variable first). Is there anyway to handle this? Thanks

aditbiswas1 · 2017-01-18T08:45:41Z

you can use the input queues to read from the filesystem instead of loading everything to memory, this is discussed in this tutorial https://ischlag.github.io/2016/06/19/tensorflow-input-pipeline-example/

aryopg · 2017-01-18T10:47:45Z

Thanks for the response @aditbiswas1 . If my current dataset directory structure is like this:
Dataset
->Category A
--->Photos
->Category B
--->Photos
->Category C
--->Photos
->Category D
--->Photos

should i make the csv file containing the path to the datasets first or is there any other way to deal with that kind of problem?

aditbiswas1 · 2017-01-19T07:18:36Z

hmm, ya using a csv sounds like a reasonably well way to solve the problem 👍

aryopg · 2017-01-19T12:12:26Z

is there any tutorial that you know to create csv out of it? sorry i'm new at this @aditbiswas1

aryopg · 2017-01-22T07:22:44Z

hi @aditbiswas1 i've learn on how to do things from the tutorial you gave, but i was wondering if i wanted to resize my image first, in which step i can do it? this is the tutorial btw https://ischlag.github.io/2016/06/19/tensorflow-input-pipeline-example/

aditbiswas1 · 2017-01-22T07:32:19Z

oh hi, sorry didnt see the previous notification. hmm, you can create a csv in multiple ways, I would normally try to find a way to create a python list from all the fileobject names and then just dump that into a csv using the inbuilt csv module, you'll most likely want to just write some loops to do this. you can get a list of what files are present in the directory by using os.listdir("directory")

regarding resizing of the images. this can be part of your preprocessing right after you've decoded the image file. an example for this can be found in the docs here
https://www.tensorflow.org/api_docs/python/image/resizing#resize_images

aditbiswas1 · 2017-01-22T07:34:14Z

since you also wanted to do some stuff like grouping by labels, you probably want to look at something like pandas which is super useful for manipulating datasets. http://pandas.pydata.org/pandas-docs/version/0.13.1/index.html

aryopg · 2017-01-23T03:09:46Z

hi @aditbiswas1 i've managed to enter the training step, but at certain epoch the training is stopped. It said that there is an Out of Range Error because of FIFO batch. Here is the error:

OutOfRangeError (see above for traceback): FIFOQueue '_3_batch/fifo_queue' is closed and has insufficient elements (requested 8, current size 5)
[[Node: batch = QueueDequeueMany[_class=["loc:@batch/fifo_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]

I'm following the tutorial you gave, but i uncomment the num_thread line in the tf.train.batch

aditbiswas1 · 2017-01-23T04:28:53Z

Hmm, looks like you are trying to read 8 elements but the batch size has been declared at 5. Are you getting this after some iterations? Or does the error come immediately? If it's coming immediately then some variable got configured incorrectly, but if you managed to run few iterations it means the last iteration caused the issue because the number of training data is not a factor of 8, I think you could declare the batch size as None and possibly solve the issue, will look into it

…

On Mon, Jan 23, 2017, 8:39 AM aryopg ***@***.***> wrote: hi @aditbiswas1 <https://github.com/aditbiswas1> i've managed to enter the training step, but at certain epoch the training is stopped. It said that there is an Out of Range Error because of FIFO batch. Here is the error: OutOfRangeError (see above for traceback): FIFOQueue '_3_batch/fifo_queue' is closed and has insufficient elements (requested 8, current size 5) [[Node: batch = ***@***.***/fifo_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]] I'm following the tutorial you gave, but i uncomment the num_thread line in the tf.train.batch — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAe8wxaUPHAdhfNiwPHnRCdVl7l8SIE-ks5rVBn7gaJpZM4Lkg2z> .

aryopg · 2017-01-23T06:43:19Z

yes i've managed to reach 600-ish iterations, and i got the error, and i set the batch size into 8, and i also suspect that the last batch is not dividable by 8, but i've used this line of code : allow_smaller_final_batch=True
but still didn't work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with huge data? #12

How to deal with huge data? #12

aryopg commented Jan 16, 2017

aditbiswas1 commented Jan 18, 2017

aryopg commented Jan 18, 2017

aditbiswas1 commented Jan 19, 2017

aryopg commented Jan 19, 2017

aryopg commented Jan 22, 2017

aditbiswas1 commented Jan 22, 2017

aditbiswas1 commented Jan 22, 2017

aryopg commented Jan 23, 2017

aditbiswas1 commented Jan 23, 2017 via email

aryopg commented Jan 23, 2017

How to deal with huge data? #12

How to deal with huge data? #12

Comments

aryopg commented Jan 16, 2017

aditbiswas1 commented Jan 18, 2017

aryopg commented Jan 18, 2017

aditbiswas1 commented Jan 19, 2017

aryopg commented Jan 19, 2017

aryopg commented Jan 22, 2017

aditbiswas1 commented Jan 22, 2017

aditbiswas1 commented Jan 22, 2017

aryopg commented Jan 23, 2017

aditbiswas1 commented Jan 23, 2017 via email

aryopg commented Jan 23, 2017