Skip to content

xlisp/ai-any-text-clusterer

Repository files navigation

AI any text clusterer & sorting

  • Why: When your log files are very large and there are so many of them, and you don't know where to start, you can use ai-any-text-clusterer to classify your logs or files, git logs and other texts, so that you can clearly see where the relevant information is.

Feature

  • File clustering (matching rules)
  • Filter markdown todos or other pattern
  • Git log clustering
  • Log clustering
  • File sorting
  • Visualization operate & process
  • Support GPT ask for search or sorting

Init

  • Setup python env
conda create -n ai-any-text-clusterer python=3.11
conda activate ai-any-text-clusterer
poetry install
ollama run nomic-embed-text

Usage

  • command
# function name: find_files_with_chinese_names, get_todo_items, get_git_log, get_pattern_items ...
# When there are a lot of content or files, it is recommended that the n_clusters value is larger, such as 20. When there are fewer files, the n_clusters value is recommended to be 5
python ai_any_text_clusterer.py <function_name> <index_file_name> <n_clusters> <work_path>
  • run filter markdown
$ prunp ai_any_text_clusterer.py get_pattern_items  get_pattern_items.index 20 /Users/clojure/Documents/my_markdown_notes "^.*(?:Breakthrough|Revolution).*$"
  • run git log clusterer
$ ai-any-text-clusterer  main @ poetry run python ai_any_text_clusterer.py get_git_log get_git_log.index 5 /Users/clojure/Desktop/ai-any-text-clusterer
Loading embeddings from FAISS index...
Group 1:
  [f9816067d95d20ba18cf3e8238ca3c0be252866e] Add git log
  [9712fbd2c4611230b666dd3a02ee0bd76d832c68] https://ollama.com/ install embed model
  [b519b481cf68833a116eb79e250a22b8ca02c2e5] Filter markdown todos
  [338e72e4e42eef9900eba7749dc88dee302d735e] Add visualization.gif

Group 2:
  [c23c23bfa72af712153fdd0596f19472acfb7f2b] Add Usage
  [99e49f6f0b165e5c6058ca9b7828170e9006ac73] Add why

Group 3:
  [631d8cd8c50a2989100a951c06e3bfb5cc6d8fea] Add function_name and index_file_name
  [7b26ecf07645522e40375779fd1e720a9f293a9a] add find_files_with_chinese_names
  [b1b515937cccfa5dff9b482e79bfa6dd14f96078] Add files classifier

Group 4:
  [c5bd5863e0ade7a558b1bf8eb8367416e7305df3] rename
  [31bef97ccda56f8b1989bc555ca4fd7a9d537f53] rename
  [55d4057cbd775beb910363e6902dbab534a65000] rename

Group 5:
  [c9f8e2351e0c456feb7c30754d261ba42505ec87] Add setup, use poetry

Visualization