Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using my own dataset to run prediction #3

Closed
ohmycaptainnemo opened this issue Nov 26, 2020 · 8 comments
Closed

Using my own dataset to run prediction #3

ohmycaptainnemo opened this issue Nov 26, 2020 · 8 comments

Comments

@ohmycaptainnemo
Copy link

Hi,

Do I need to convert my own images to .hdf5 format before I can make a prediction on them?
What structure should my data have?

@him4318
Copy link
Owner

him4318 commented Nov 26, 2020

Hi @nimanamjouyan

It's up to you, but you can follow the pre-processing steps(normalizing, removing cursive style,) for an image in the dataset as it helps the model to learn better. HDF5 format was just used to save the images in one place, you can store yours as flat files and write a custom data loader function in PyTorch which will perform all the pre-processing steps on the image while yielding.

@ohmycaptainnemo
Copy link
Author

ohmycaptainnemo commented Nov 26, 2020

@him4318

Thank you for that.

I have stored my image files inside a folder and slightly changed one of the cells in your notebook to use my images for prediction:
I changed this cell:

test_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'test',transform), batch_size=1, shuffle=False, num_workers=2)

predicts, gt, imgs = test(model, test_loader, max_text_length)

predicts = list(map(lambda x : x.replace('SOS','').replace('EOS',''),predicts))
gt = list(map(lambda x : x.replace('SOS','').replace('EOS',''),gt))

to this:

import torchvision.transforms as T

device = torch.device("cuda")
transform = T.Compose([
    # T.ToPILImage(),
    T.Resize((1024,128)),
    T.ToTensor()])

test_dataset = datasets.ImageFolder('/content/mydata/', transform=transform) #my images are inside a folder called data inside mydata folder
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=2)
predicts, gt, imgs = test(model, test_loader, max_text_length)

predicts = list(map(lambda x : x.replace('SOS','').replace('EOS',''),predicts))
gt = list(map(lambda x : x.replace('SOS','').replace('EOS',''),gt))

But not only my images show as all black, but also I do not get any useful predictions. Am I doing something wrong here?

@him4318
Copy link
Owner

him4318 commented Nov 26, 2020

If you are using the trained model provided by me then images should be in the same format i.e same preprocessing should be done on the images as I did, to get accurate results.

You can use the CLI provided in the code to run a prediction for an image. If you go through the code you will see the appropriate steps to convert an image to a required format.

@ohmycaptainnemo
Copy link
Author

Thank you Himanshu,

That code was extremely helpful.
I ended up running the following code segment in notebook based on the code you showed me ( I used your pretrained weights):

from google.colab.patches import cv2_imshow
from data import preproc as pp

input_size = (1024, 128, 1)
max_text_length = 256
charset_base = string.printable[:95]
tokenizer = Tokenizer(chars=charset_base, max_text_length=max_text_length)

path_2_im = '/content/data/3.PNG'
target_path = '/content/Transformer_ocr/src/resnet_best.pt'


img = pp.preprocess(path_2_im, input_size=input_size)


#making image compitable with resnet
img = np.repeat(img[..., np.newaxis],3, -1)
x_test = pp.normalization(img)


# model = make_model(tokenizer.vocab_size, hidden_dim=256, nheads=4,
#           num_encoder_layers=4, num_decoder_layers=4)
# device = torch.device(device)

model = make_model(vocab_len=100)
_=model.to(device)

transform = T.Compose([
        T.ToTensor()])
        

if os.path.exists(target_path):
    model.load_state_dict(torch.load(target_path))            
else:            
    print('No model checkpoint found')

prediction = single_image_inference(model, x_test, tokenizer, transform, device)

print("\n####################################")
print("predicted text is: {}".format(prediction))
cv2_imshow(cv2.imread(path_2_im))
print("\n####################################")

I used one of your images and I got this:

Capture

The outcome is very different from yours in your notebook.
More importantly, I noticed something interesting. The function:

img = pp.preprocess(path_2_im, input_size=input_size)

is

image

which is strange. It seems the image is turned vertically for whatever reason.
I also tried a number of other images and still had no luck.

@him4318
Copy link
Owner

him4318 commented Nov 27, 2020

Hi @nimanamjouyan

Please check the path of the model in model.load_state_dict(torch.load(target_path)) as you are getting just random output. I checked on my end it is working fine.
Image is transformed to like this only while pre-processing that is nothing to worry about as we are getting the features only from resnet.

@ohmycaptainnemo
Copy link
Author

Hi @him4318

Thank you.
The path is definitely correct and the model exists there. Because otherwise this if statement would tell me that it does not:

if os.path.exists(target_path):
    model.load_state_dict(torch.load(target_path))            
else:            
    print('No model checkpoint found')

Thank you for clarifying the preprocessing functions

@him4318
Copy link
Owner

him4318 commented Nov 27, 2020

Hi @nimanamjouyan

I tried the same steps in the notebook and the result is fine.

image

@him4318 him4318 pinned this issue Nov 28, 2020
@ohmycaptainnemo
Copy link
Author

Hi @him4318

I get the same result as you with that image. That is interesting.

Thank you.

@him4318 him4318 closed this as completed Dec 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants