You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am looking for some guidance on how to replicate the Indigo fingerprint binary string representation that can be created as part of the KNIME Indigo fingerprint component but using the python API. I took a SMILES string for Bisphenol A (from https://comptox.epa.gov/dashboard/chemical/details/DTXSID7020182) and created a similarity fingerprint with 'similarity part size in qwords' of 8 which was the default option in the KNIME component. If I used the Expand BitVector node - I could generate the corresponding matrix with 512 columns corresponding to the bivector length of 512. If I count the 1s, I find that the number of 'on bits' for my substance is 26. So far so good.
Except what I really want to do is replicate this using the python API. Here I instantiated Indigo() and set my option as indigo.setOption('fp-sim-qwords', 8) on the assumption that this was the same setting as in the corresponding KNIME component. If I generate the fingerprint using mol1.fingerprint('sim') for the same molecule I, I find that the number of Bits are 32 based on fp1.countBits() which is obvious different. I wanted to understand why so I was looking at how I could derive the corresponding fingeprint as a binary array. The hex string generated from fp1.toString() seemed like my best option but I did observed that this had a different length. I assumed that the length should have been 512 based on the KNIME node so I truncated the hex string like this binary_representation = format(int(hex_string[:128], 16), '0512b') which produced a bitstring more like I was expecting. This gives rise to the same number of on Bits as using the fp1.countBits() but still does not agree with what the KNIME node produced.
Can anyone help clarify how to translate the settings in the KNIME node Indigo fingerprint component so I can reproduce the same bitstring representation using the API. I have an application where we want to calculate pairwise similarities but need to store the fingerprints. Any pointers would be really helpful. I have been reading through the documentation and searching for any related issues/discussions but not have found any guidance to date that helps. In case it helps here is the csv file I could produce from KNIME together with the SMILES strings for the compounds I was looking at. Bisphenol A-Similar Compounds.csv
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
I am looking for some guidance on how to replicate the Indigo fingerprint binary string representation that can be created as part of the KNIME Indigo fingerprint component but using the python API. I took a SMILES string for Bisphenol A (from https://comptox.epa.gov/dashboard/chemical/details/DTXSID7020182) and created a similarity fingerprint with 'similarity part size in qwords' of 8 which was the default option in the KNIME component. If I used the Expand BitVector node - I could generate the corresponding matrix with 512 columns corresponding to the bivector length of 512. If I count the 1s, I find that the number of 'on bits' for my substance is 26. So far so good.
Except what I really want to do is replicate this using the python API. Here I instantiated Indigo() and set my option as indigo.setOption('fp-sim-qwords', 8) on the assumption that this was the same setting as in the corresponding KNIME component. If I generate the fingerprint using mol1.fingerprint('sim') for the same molecule I, I find that the number of Bits are 32 based on fp1.countBits() which is obvious different. I wanted to understand why so I was looking at how I could derive the corresponding fingeprint as a binary array. The hex string generated from fp1.toString() seemed like my best option but I did observed that this had a different length. I assumed that the length should have been 512 based on the KNIME node so I truncated the hex string like this binary_representation = format(int(hex_string[:128], 16), '0512b') which produced a bitstring more like I was expecting. This gives rise to the same number of on Bits as using the fp1.countBits() but still does not agree with what the KNIME node produced.
Can anyone help clarify how to translate the settings in the KNIME node Indigo fingerprint component so I can reproduce the same bitstring representation using the API. I have an application where we want to calculate pairwise similarities but need to store the fingerprints. Any pointers would be really helpful. I have been reading through the documentation and searching for any related issues/discussions but not have found any guidance to date that helps. In case it helps here is the csv file I could produce from KNIME together with the SMILES strings for the compounds I was looking at.
Bisphenol A-Similar Compounds.csv
Beta Was this translation helpful? Give feedback.
All reactions