Explore using stack graphs for better code search / navigation / context / repo map / etc #742

0xdevalias · 2024-04-05T01:44:47Z

Not really a feature request per se, but just wanted to share some info/references/libs that I've been collating recently RE: stack graphs, and how they can be used for better 'code navigation awareness'/etc; may be useful/interesting for this project:

Stack Graphs (an evolution of Scope Graphs) sound like they could be really interesting/useful with regards to code navigation, symbol mapping, etc. Perhaps we could use them for module identification, or variable/function identifier naming stabilisation or similar?

https://github.blog/changelog/2024-03-14-precise-code-navigation-for-typescript-projects/

Precise code navigation is now available for all TypeScript repositories.
Precise code navigation gives more accurate results by only considering the set of classes, functions, and imported definitions that are visible at a given point in your code.

Precise code navigation is powered by the stack graphs framework.
You can read about how we use stack graphs for code navigation and visit the stack graphs definition for TypeScript to learn more.

https://github.blog/2021-12-09-introducing-stack-graphs/

Introducing stack graphs

Precise code navigation is powered by stack graphs, a new open source framework we’ve created that lets you define the name binding rules for a programming language using a declarative, domain-specific language (DSL). With stack graphs, we can generate code navigation data for a repository without requiring any configuration from the repository owner, and without tapping into a build process or other CI job.

LOTS of interesting stuff in this post..

As part of developing stack graphs, we’ve added a new graph construction language to Tree-sitter, which lets you construct arbitrary graph structures (including but not limited to stack graphs) from parsed CSTs. You use stanzas to define the gadget of graph nodes and edges that should be created for each occurrence of a Tree-sitter query, and how the newly created nodes and edges should connect to graph content that you’ve already created elsewhere.

https://github.com/tree-sitter/tree-sitter-graph

tree-sitter-graph
The tree-sitter-graph library defines a DSL for constructing arbitrary graph structures from source code that has been parsed using tree-sitter.

https://marketplace.visualstudio.com/items?itemName=tree-sitter.tree-sitter-graph

tree-sitter-graph support for VS Code
This language extension for VS Code provides syntax support for tree-sitter-graph files.

Why aren’t we using the Language Server Protocol (LSP) or Language Server Index Format (LSIF)?

To dig even deeper and learn more, I encourage you to check out my Strange Loop talk and the stack-graphs crate: our open source Rust implementation of these ideas.

https://github.com/github/stack-graphs

Stack graphs
The crates in this repository provide a Rust implementation of stack graphs, which allow you to define the name resolution rules for an arbitrary programming language in a way that is efficient, incremental, and does not need to tap into existing build or program analysis tools.

https://docs.rs/stack-graphs/latest/stack_graphs/

https://github.com/github/stack-graphs/tree/main/languages

This directory contains stack graphs definitions for specific languages.

https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-javascript

tree-sitter-stack-graphs definition for JavaScript
This project defines tree-sitter-stack-graphs rules for JavaScript using the tree-sitter-javascript grammar.

The command-line program for tree-sitter-stack-graphs-javascript lets you do stack graph based analysis and lookup from the command line.

cargo install --features cli tree-sitter-stack-graphs-javascript

tree-sitter-stack-graphs-javascript index SOURCE_DIR

tree-sitter-stack-graphs-javascript status SOURCE_DIR

tree-sitter-stack-graphs-javascript query definition SOURCE_PATH:LINE:COLUMN

https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-typescript

tree-sitter-stack-graphs definition for TypeScript
This project defines tree-sitter-stack-graphs rules for TypeScript using the tree-sitter-typescript grammar.

The command-line program for tree-sitter-stack-graphs-typescript lets you do stack graph based analysis and lookup from the command line.

https://dcreager.net/talks/2021-strange-loop/

Redirects to https://dcreager.net/talks/stack-graphs/

Incremental, zero-config Code Navigation using stack graphs.

In this talk I’ll describe stack graphs, which use a graphical notation to define the name binding rules for a programming language. They work equally well for dynamic languages like Python and JavaScript, and for static languages like Go and Java. Our solution is fast — processing most commits within seconds of us receiving your push. It does not require setting up a CI job, or tapping into a project-specific build process. And it is open-source, building on the tree-sitter project’s existing ecosystem of language tools.

https://www.youtube.com/watch?v=l2R1PTGcwrE

"Incremental, zero-config Code Nav using stack graphs" by Douglas Creager

https://media.dcreager.net/dcreager-strange-loop-2021-slides.pdf

https://media.dcreager.net/dcreager-2022-ucsc-lsd-slides.pdf

https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github

GitHub has developed two code navigation approaches based on the open source tree-sitter and stack-graphs library:

Search-based - searches all definitions and references across a repository to find entities with a given name

Precise - resolves definitions and references based on the set of classes, functions, and imported definitions at a given point in your code

To learn more about these approaches, see "Precise and search-based navigation."

https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation

Precise and search-based navigation
Certain languages supported by GitHub have access to precise code navigation, which uses an algorithm (based on the open source stack-graphs library) that resolves definitions and references based on the set of classes, functions, and imported definitions that are visible at any given point in your code. Other languages use search-based code navigation, which searches all definitions and references across a repository to find entities with a given name. Both strategies are effective at finding results and both make sure to avoid inappropriate results such as comments, but precise code navigation can give more accurate results, especially when a repository contains multiple methods or functions with the same name.

https://pl.ewi.tudelft.nl/research/projects/scope-graphs/

Scope Graphs | A Theory of Name Resolution

Scope graphs provide a new approach to defining the name binding rules of programming languages. A scope graph represents the name binding facts of a program using the basic concepts of declarations and reference associated with scopes that are connected by edges. Name resolution is defined by searching for paths from references to declarations in a scope graph. Scope graph diagrams provide an illuminating visual notation for explaining the bindings in programs.

Originally posted by @0xdevalias in 0xdevalias/chatgpt-source-watch#11

The text was updated successfully, but these errors were encountered:

foragerr · 2024-04-05T02:22:08Z

Would you know how Stack Graphs compare to Lossless Semantic Trees ?

0xdevalias · 2024-04-05T03:04:32Z

Would you know how Stack Graphs compare to Lossless Semantic Trees ?

@foragerr Not deeply/immediately.. but from skimming that page, some things that popped up in my mind while reading it:

It talks about LSTs being 'format-preserving' compared to ASTs (eg. not losing whitespace/etc). The stack-graphs lib above is implemented on top of tree-sitter, which creates Concrete Syntax Trees (CSTs) rather than ASTs, which also capture the full data in a 'lossless' way like these LSTs seem to:

https://tree-sitter.github.io/tree-sitter/
- Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited.
https://en.wikipedia.org/wiki/Parse_tree
https://en.wikipedia.org/wiki/Abstract_syntax_tree

That said, I don't believe the stack-graphs themselves are aiming to be CSTs, I think they are more just the higher level 'map' between the files/dependencies/symbols/etc.

Where it talks about 'type-attributed', it seems to basically be saying that instead of just a symbol name, there is a way to know where that symbol comes from/etc; that is something stack graphs inherently handle as well.

OpenRewrite's LST lifecycle suggests that a new LST will be created for each run, as nothing is stored between recipe runs. For a small repo, it may be quick enough to do this, but for a larger repo this may introduce a time penalty. One aspect that the stack-graphs implementation specifically had a goal of was to avoid needing to re-parse the entire repo whenever any file changed; and is designed in a way that only the relevant parts need to be re-parsed/etc.

Hopefully that's helpful. I haven't dug super deep into stack-graphs myself yet in a 'practically using them' sense, but so far they have seemed like they will be a super interesting/useful tool; and the fact that these underlying libs are backed by GitHub and used for their code search/etc feels like they should be pretty 'battle tested' at scale/across a large number of languages/etc.

github-actions · 2024-05-08T01:40:01Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

0xdevalias · 2024-05-11T02:23:45Z

Not sure stale is relevant to this?

0xdevalias · 2024-05-30T02:53:30Z

This seems to be aligned to how some other agents have chosen to go, eg.

aider.chat/2023/10/22/repomap.html

Building a better repository map with tree sitter

Where they saw it as an improvement on their older method:

aider.chat/2023/05/25/ctags.html

Improving GPT-4's codebase understanding with ctags

I understand it not being a current priority; but to discount the concept entirely (particularly without reasoning beyond seemingly personal opinion) seems counterintuitive to getting the best agent/outcome here.

Further to this, aider just set a new SOTA and topped the SWE-bench lite leaderboard with 26.3%. While all of that performance gain can't be attributed to just their smart code search/repo map'; I would happily bet that it helped it achieve it:

https://aider.chat/2024/05/22/swe-bench-lite.html

Aider scored 26.3% on the SWE Bench Lite benchmark, achieving a state-of-the-art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere seems to be 25% from OpenDevin.

SWE Bench Lite results for aider swe-bench/experiments#7

https://www.swebench.com/

It will be interesting to see if they end up exploring stack graphs directly, and if that improves their performance further again:

Explore whether stack graphs may be useful in this tool Aider-AI/aider#534

Originally posted by @0xdevalias in princeton-nlp/SWE-agent#38 (comment)

See also:

enyst · 2024-07-01T17:42:27Z

Personally I still think this would be relevant to explore; unless the thought it just to continue to leverage code context via '3rd party' integrations (eg. aider)

I agree, and I believe we're working on different approaches to improve this both with and without additional tools.

That said, it's not fully clear to me how actionable is this issue because it seems very generic, wouldn't it be better as one issue per proposed approach?

0xdevalias · 2024-07-02T04:28:12Z

it's not fully clear to me how actionable is this issue because it seems very generic, wouldn't it be better as one issue per proposed approach?

@enyst I guess the perspective I opened it for is to highlight a single tool/library (stack graphs) that gives that sort of code context/navigation/etc aspect of things:

https://github.com/github/stack-graphs

I don't know the OpenDevin project enough to know exactly where it would fit in/how to do so/etc. I was hoping to generate some discussion that might then lead to someone with more of that 'other side' of the knowledge figuring how it might better fit in; particularly given that I keep seeing places where it seems like the same sorts of tools are being re-created/etc.

In my original post I included a bunch of other relevant context/links that explain it a bit more/how it works/is used by GitHub's code navigation/etc.

github-actions · 2024-08-02T01:50:45Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-09-20T01:57:22Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-09-27T01:58:58Z

This issue was closed because it has been stalled for over 30 days with no activity.

0xdevalias added the enhancement New feature or request label Apr 5, 2024

0xdevalias mentioned this issue Apr 5, 2024

Explore stack graphs / scope graphs 0xdevalias/chatgpt-source-watch#11

Open

rbren added architecture Related to architecture, including frontend and backend agent framework Strategies for prompting, agent, etc severity:low Minor issues, affecting single user labels Apr 7, 2024

rbren mentioned this issue Apr 10, 2024

Wishlist. #993

Closed

github-actions bot added the Stale Inactive for 30 days label May 8, 2024

github-actions bot removed the Stale Inactive for 30 days label May 12, 2024

0xdevalias mentioned this issue May 30, 2024

Create Aider Agent #120

Closed

This was referenced Jun 23, 2024

Use Language Server Protocol (LSP) to re-implement all code editing #1934

Open

Implement new agent using AutoCodeRover's approach #942

Closed

github-actions bot added the Stale Inactive for 30 days label Jun 30, 2024

0xdevalias mentioned this issue Jul 1, 2024

[Feature]: Aider-inspired RepoMap #2185

Open

3 tasks

github-actions bot removed the Stale Inactive for 30 days label Jul 1, 2024

0xdevalias mentioned this issue Jul 1, 2024

Lightweight 'OmniscientAgent' which the head agent can query for a broader view of the codebase #2363

Open

0xdevalias changed the title ~~Explore whether stack graphs may be useful in this tool~~ Explore using stack graphs for better code search / context / repo map / etc Jul 1, 2024

0xdevalias changed the title ~~Explore using stack graphs for better code search / context / repo map / etc~~ Explore using stack graphs for better code search / navigation / context / repo map / etc Jul 1, 2024

github-actions bot added the Stale Inactive for 30 days label Aug 2, 2024

mamoodi removed the Stale Inactive for 30 days label Aug 14, 2024

github-actions bot added the Stale Inactive for 30 days label Sep 20, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore using stack graphs for better code search / navigation / context / repo map / etc #742

Explore using stack graphs for better code search / navigation / context / repo map / etc #742

0xdevalias commented Apr 5, 2024

foragerr commented Apr 5, 2024

0xdevalias commented Apr 5, 2024 •

edited

Loading

github-actions bot commented May 8, 2024

0xdevalias commented May 11, 2024

0xdevalias commented May 30, 2024 •

edited

Loading

github-actions bot commented Jun 30, 2024

0xdevalias commented Jun 30, 2024 •

edited

Loading

enyst commented Jul 1, 2024

0xdevalias commented Jul 2, 2024

github-actions bot commented Aug 2, 2024

github-actions bot commented Sep 20, 2024

github-actions bot commented Sep 27, 2024

Explore using stack graphs for better code search / navigation / context / repo map / etc #742

Explore using stack graphs for better code search / navigation / context / repo map / etc #742

Comments

0xdevalias commented Apr 5, 2024

foragerr commented Apr 5, 2024

0xdevalias commented Apr 5, 2024 • edited Loading

github-actions bot commented May 8, 2024

0xdevalias commented May 11, 2024

0xdevalias commented May 30, 2024 • edited Loading

github-actions bot commented Jun 30, 2024

0xdevalias commented Jun 30, 2024 • edited Loading

enyst commented Jul 1, 2024

0xdevalias commented Jul 2, 2024

github-actions bot commented Aug 2, 2024

github-actions bot commented Sep 20, 2024

github-actions bot commented Sep 27, 2024

0xdevalias commented Apr 5, 2024 •

edited

Loading

0xdevalias commented May 30, 2024 •

edited

Loading

0xdevalias commented Jun 30, 2024 •

edited

Loading