Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Manual select #4

Open
rohan-gt opened this issue Sep 9, 2020 · 10 comments
Open

[Feature Request] Manual select #4

rohan-gt opened this issue Sep 9, 2020 · 10 comments

Comments

@rohan-gt
Copy link

rohan-gt commented Sep 9, 2020

Hi, is it possible to add an option to manually select an issue if it's not scraped automatically? I'd rather have Comixology info on all my books rather than ComicVine

@SenorSmartyPants
Copy link
Owner

There's no UI to manually select an issue. But you can edit the notes in ComicRack (or directly in ComicInfo.xml) and add the issue ID from comixology.com

https://www.comixology.com/Fantastic-Four-2018-1/digital-comic/687096

Copy the numbers at the end of the URL as shown above, and add it to notes in this format

[CMXDB687096]

Run the scraper again and it will use the id from the notes.

If you run comicrack with this option
"C:\Program Files\ComicRack\ComicRack.exe" -ssc

It will display the script console. I'd be curious to see the output from a couple comics that aren't getting matched.

@rohan-gt
Copy link
Author

@SenorSmartyPants how do I contact you? I have a few ideas and would like to contribute to this project if I can

@SenorSmartyPants
Copy link
Owner

You're doing it.

@rohan-gt
Copy link
Author

rohan-gt commented Sep 10, 2020

Okay @SenorSmartyPants some suggestions:

  1. Is would be useful to download the entire Comixology metadata into a local database with some filters like publisher to limit the data and then simply fetch the data from it to populate the comic info
  2. I don't know how the fuzzy matching is done at the moment but I believe it is possible to improve the logic and match comics to a very high degree since we are only matching digital releases and they usually have clean names as opposed to scans
  3. It would be useful to have some kind of matching between collected editions and single issues. I believe this info is already available in Comixology. This info can be then used to detect duplicates, missing issues etc.

@SenorSmartyPants
Copy link
Owner

Each one of these items should have been a separate issue. But here goes:

  1. I'm not going to scrape the entire comixology site. There is not API provided by comixology to download all the metadata for everything. Select your issues and download for each of them.
  2. If you have specific examples of issues not being found (with filenames provided) I would be interested to see them. Getting good search results is something I am having issues with, but mostly because of google's bot detection.
  3. This is not a library management tool. Try ComicRack for finding duplicates. If collected edition information is ever scraped, where would it be stored?

@rohan-gt
Copy link
Author

  1. Ah, okay I thought there was an API similar to ComicVine
  2. I'll try to get some examples out
  3. By duplicates I meant if you have both the trade paper back as well as the single issues within them separately, it would be useful to point those out. Comixology actually has the single issue links under the TPB page so if it's possible to store the IDs of the single issues within the TPB XML, you can reference it easily

@rohan-gt
Copy link
Author

rohan-gt commented Sep 15, 2020

@SenorSmartyPants So I have files named:
The Books of Magic (1993) (Digital).cbr
Aquaman (2011-2016) Vol. 1 The Trench.cbr
which aren't scraped

@SenorSmartyPants
Copy link
Owner

These look like graphic novels or trade paperbacks. The search is currently pretty specific to single issues. I'll see what I can do (assuming I'm not blocked by Google).

Are you scraping in comicrack or with the mylar version?

What's in the comicinfo.xml? If you are in ComicRack you can select a book and right click 'copy data' to get that info.

@rohan-gt
Copy link
Author

rohan-gt commented Sep 16, 2020

@SenorSmartyPants yes they are TPBs. I'm using ComicRack. There's no info generated since I get a message saying 0 comics scraped, 1 skipped. It seems easy to implement. You just need to fuzzy match the name with a high percentage score (95%) along with the year if it is provided

@SenorSmartyPants
Copy link
Owner

SenorSmartyPants commented Sep 16, 2020

Comicrack will parse the file name, so there's probably proposed values at least of these books. So I'd still like a copy data output. And console output, which you can get if you run CR with the a shortcut like this
"C:\Program Files\ComicRack\ComicRack.exe" -ssc

You're welcome to submit a pull request as well. But I won't merge it until I can test it not working (which is why I want the data I'm asking for).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants