Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add the ability to load a pretrained net? #3

Open
ianni67 opened this issue Feb 18, 2017 · 1 comment
Open

add the ability to load a pretrained net? #3

ianni67 opened this issue Feb 18, 2017 · 1 comment

Comments

@ianni67
Copy link

ianni67 commented Feb 18, 2017

Though I fully trust Dmitry and believe in his claim that a random cnn is as good as a pretrained net in detecting and extracting texture features (the "style"), I would really appreciate the possibility of testing some pretrained net for extracting the "content" features.
While experiencing with this lovely software I found that its ability to discriminate the content structure in "content" sound files does not appear as accurate as in the examples provided elsewhere for the "image style transfer" case. In particular it seems that too much of the style still remains in the content, and this is perhaps the cause of high dominance of some audio files when combined with others.
I noted that the best combinations (i.e., where the "content" audio imposes only its structure and the "style" audio enforces its own texture) are produced, when the spectra of the two audios share most of their frequencies, but the "style" has less structure or, in other words, less evident "beats". This would correspond, in images, to the "style" image having mostly the same spectrum as the "content" one, but featuring weaker and shorter edges. The output audio, in this case, resembles a combination of an "envelope" taken from the "content" audio, modulating the amplitude of the "style" audio.
On the other hand, when the "style" audio lies on a mostly different region of the frequency spectrum (e.g. higher frequencies) with respect to the "content", then the two audios get mixed (their spectra appear to be merged) and both are almost equally present in the output, producing in most cases very confusing output.
I can provide some examples, but I guess anyone can figure out what I'm trying to explain, by testing on the available audio samples.
By looking at the results produced by applying style transfer to images, I would expect a different behavior, where the style (i.e. the texture) of the "style" image almost completely substitute the texture of the "content" image. I suspect that some more investigation might be needed in the selection of the most suitable net for content features selection, and therefore I would love to have some hints about how to load and use a pretrained network.
Sorry for the long message.

@DmitryUlyanov
Copy link
Owner

Hi Ianni, I am not an expert in TF, but in torch code there is a possibility to load a pretrained model. I am sure it is easy to change the current TF code to work with pretrained models also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants