Tool for finding identifiers in files. Supports recursive globbing and boolean arbitrary complex queries.
-
Find all files that contains
cats
,dogs
or both.$ idgrep -- 'cats | dogs'
-
Find all files that contains either
fish
withoutcats
ordogs
cats
ordogs
withoutfish
$ idgrep -- 'fish ^ (cats | dogs)'
-
List identifier count in all small (
< 200 bytes
) files$ idgrep -l 200 --file-id-count
-
List all identifiers beginning with
a
orA
, sort by size.$ idgrep -i --sort-by-size -- 'a*'
-
Find all python (
.py
) files in current directory without descending into directories, looking forcat
.$ idgrep -p '*.py' -- cat
-
Find all python (
.py
) files in current directory and descending into directories, looking for anything not beingcat
.$ idgrep -p '**/*.py' -- '~cat'
-
Find files with identifiers including
data
, group by identifier$ idgrep --group-by-id -- '*data*'
Note that in many shells the following characters needs escaping: !
, (
, )
, &
, |
, *
, ?
The easiest option is to put the entire query within single ('
) or double ("
) quotation marks.
usage: idgrep [-p FILE_PATTERN] [-l LIMIT_SIZE] [-d LIMIT_IDENTIFIERS] [-i]
[--help | --file-id-count] [--group-by-id]
[--sort-by-name | --sort-by-count | --sort-by-size]
[--ascending | --descending] [paths ...] -- query
positional arguments:
[paths] -- query
options:
-p FILE_PATTERN, --file-pattern FILE_PATTERN
-l LIMIT_SIZE, --limit-size LIMIT_SIZE
-d LIMIT_IDENTIFIERS, --limit-identifiers LIMIT_IDENTIFIERS
-i
--help
--file-id-count
--group-by-id
--sort-by-name
--sort-by-count
--sort-by-size
--ascending, --asc
--descending, --desc
The file pattern is processed using the glob1 function of pathlib.
Sets a maximum size limit for files to prevent processing large files. This is enabled by default and set to one megabyte (1M).
The following suffixes are recognized:
suffix | size |
---|---|
k | 2¹⁰ bytes |
m | 2²⁰ bytes |
g | 2³⁰ bytes |
t | 2⁴⁰ bytes |
Sets the maximum identifier count limit for files to prevent processing files with too many identifiers. This is enabled by default and set to 1k.
The following suffixes are recognized:
suffix | count |
---|---|
k | 10³ identifiers |
m | 10⁶ identifiers |
Ignore case in matching.
Shows the output shown above under Usage.
Counts number of identifiers in all matching files. Any query entered will be ignored.
Group output by identifier. If a file is matching multiple identifiers it will be listed multiple times.
Select sort key for output.
Select sort order.
The paths to perform search in. Currently exclusion is not supported but will be added in a later version.
The query is added last after a double dash. This double dash is used to separate file arguments from the query. If you wish to specify a file named double dash, use ./--
for it.
-
~identifier
Do not match this identifier (logical not)
-
!identifier
Do not match this identifier (logical not)
-
expr_1 & expr_2
Match both expr_1 and expr_2 (and)
-
expr_1 | expr_2
Match either expr_1 or expr_2 or both together (inclusive or)
-
expr_1 ^ expr_2
Match either expr_1 or expr_2 but not both together (exclusive or)
-
(subexpression)
Use parenthesis to define subexpressions. This is useful for queries such as:
(fish | birds) & (chips | bees)
Footnotes
-
See https://docs.python.org/3/library/pathlib.html?highlight=path#pathlib.Path.glob for more information. ↩