Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental indexing #3

Open
junkblocker opened this issue Feb 3, 2017 · 2 comments
Open

Incremental indexing #3

junkblocker opened this issue Feb 3, 2017 · 2 comments

Comments

@junkblocker
Copy link
Owner

From @cloudspeech on October 29, 2015 14:10

Great to discover today there's an actively worked on fork of codesearch!

I am using codesearch already, and noticed that with lots of files reindexing is slow.

It would be great if one could tell the indexer to (re-)index a few files only and merge that efficiently into the existing index. Cursory inspection of the code tells me this should be doable.

A strong plus would be to read the file names - 1 per line - from a (named or regular) pipe, or else a regular file, and index those as soon as a new line becomes available.

Maybe an option --reindex-using < pipeOrFile > ?

Copied from original issue: junkblocker/codesearch-pre-github#8

@junkblocker
Copy link
Owner Author

Hi @cloudspeech , this in its entirety including filesystem / FIFO notification driven indexing has been on my wishlist forever but I currently do not have the kind of free time this would require to implement. I saw when you requested this upstream and was hoping somebody finds time.

@junkblocker
Copy link
Owner Author

From @abingham on January 12, 2017 12:57

Incremental updating is my biggest missing feature for codesearch, and I'd love to see some support for it. I've looked over the code and - taking into account that I know almost nothing about go - it looks like there might be a relatively simple way to support incremental indexing.

First, I noticed that cindex will happily take a path to a file as an argument. It will index that file and merge the results into the existing index. The only downside to this that I can see is that this also adds the file to the list of "indexed paths" which get stored in the index file. This in itself isn't terrible, but it's redundant if you're just trying to incrementally index something that's already accounted for by another indexed path.

However, it appears that the two operations - indexing a file and adding it to the list of indexed paths - are effectively independent operations. That is, we could index a single file without adding that path to the list of indexed paths.

If I'm right, we could add a flag to cindex which says "index the provided path, but don't add it to the list of paths". Then, for incremental indexing, you could just pass that flag and the path to your file.

Is this workable? Is it wrong-headed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant