-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to deal with huge data? #12
Comments
you can use the input queues to read from the filesystem instead of loading everything to memory, this is discussed in this tutorial https://ischlag.github.io/2016/06/19/tensorflow-input-pipeline-example/ |
Thanks for the response @aditbiswas1 . If my current dataset directory structure is like this: should i make the csv file containing the path to the datasets first or is there any other way to deal with that kind of problem? |
hmm, ya using a csv sounds like a reasonably well way to solve the problem 👍 |
is there any tutorial that you know to create csv out of it? sorry i'm new at this @aditbiswas1 |
hi @aditbiswas1 i've learn on how to do things from the tutorial you gave, but i was wondering if i wanted to resize my image first, in which step i can do it? this is the tutorial btw https://ischlag.github.io/2016/06/19/tensorflow-input-pipeline-example/ |
oh hi, sorry didnt see the previous notification. hmm, you can create a csv in multiple ways, I would normally try to find a way to create a python list from all the fileobject names and then just dump that into a csv using the inbuilt csv module, you'll most likely want to just write some loops to do this. you can get a list of what files are present in the directory by using os.listdir("directory") regarding resizing of the images. this can be part of your preprocessing right after you've decoded the image file. an example for this can be found in the docs here |
since you also wanted to do some stuff like grouping by labels, you probably want to look at something like pandas which is super useful for manipulating datasets. http://pandas.pydata.org/pandas-docs/version/0.13.1/index.html |
hi @aditbiswas1 i've managed to enter the training step, but at certain epoch the training is stopped. It said that there is an Out of Range Error because of FIFO batch. Here is the error: OutOfRangeError (see above for traceback): FIFOQueue '_3_batch/fifo_queue' is closed and has insufficient elements (requested 8, current size 5) I'm following the tutorial you gave, but i uncomment the num_thread line in the tf.train.batch |
Hmm, looks like you are trying to read 8 elements but the batch size has
been declared at 5. Are you getting this after some iterations? Or does the
error come immediately? If it's coming immediately then some variable got
configured incorrectly, but if you managed to run few iterations it means
the last iteration caused the issue because the number of training data is
not a factor of 8, I think you could declare the batch size as None and
possibly solve the issue, will look into it
…On Mon, Jan 23, 2017, 8:39 AM aryopg ***@***.***> wrote:
hi @aditbiswas1 <https://github.com/aditbiswas1> i've managed to enter
the training step, but at certain epoch the training is stopped. It said
that there is an Out of Range Error because of FIFO batch. Here is the
error:
OutOfRangeError (see above for traceback): FIFOQueue '_3_batch/fifo_queue'
is closed and has insufficient elements (requested 8, current size 5)
[[Node: batch = ***@***.***/fifo_queue"],
component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1,
_device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue,
batch/n)]]
I'm following the tutorial you gave, but i uncomment the num_thread line
in the tf.train.batch
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAe8wxaUPHAdhfNiwPHnRCdVl7l8SIE-ks5rVBn7gaJpZM4Lkg2z>
.
|
yes i've managed to reach 600-ish iterations, and i got the error, and i set the batch size into 8, and i also suspect that the last batch is not dividable by 8, but i've used this line of code : allow_smaller_final_batch=True |
I'm currently trying to generate a dataset based on your tutorial, but i was wondering how to deal with a huge data(10 gb of images). My laptop can't handle the huge amount of data(because the tutorial told us that we need to store the data into array variable first). Is there anyway to handle this? Thanks
The text was updated successfully, but these errors were encountered: