Code Graph Analysis Pipeline

Contained within this repository is a comprehensive and automated code graph analysis pipeline. While initially designed to support Java through the utilization of jQAssistant, it now also supports Typescript and is open to extension for further programming languages. The graph database Neo4j serves as the foundation for storing and querying the graph, which encompasses all the structural intricacies of the analyzed code. Additionally, Neo4j's Graph Data Science provides additional algorithms like community detection to analyze the code structure. The generated reports offer flexibility, ranging from simple query results presented as CSV files to more elaborate Jupyter Notebooks converted to Markdown or PDF formats.

✨ Features

Analyze static code structure as a graph
Supports Java Code Analysis
🌟New🌟: Supports Typescript Code Analysis (experimental)
Fully automated pipeline for Java from tool installation to report generation
Fully automated pipeline for Typescript from tool installation to report generation
Fully automated local run
More than 130 CSV reports for dependencies, metrics, cycles, annotations, algorithms and many more
Jupyter notebook reports for dependencies, metrics, visibility and many more
Graph structure visualization
Automated reference document generation
Runtime and library independent automation using shell scripts
Tested on MacOS (zsh), Linux (bash) and Windows (Git Bash)
Comprehensive list of Cypher queries
Example analysis for AxonFramework

📖 Jupyter Notebook Reports

Here is an overview of reports made with Jupyter Notebooks. For a detailed reference see [Jupyter Notebook Report Reference](#page_with_curl-jupyter-notebook-report-reference

External Dependencies contains detailed information about external library usage (Notebook).
Internal Dependencies is based on Analyze java package metrics in a graph database and also includes cyclic dependencies (Notebook).
Method Metrics shows how the effective number of lines of code and the cyclomatic complexity are distributed across the methods in the code (Notebook).
Node Embeddings shows how to generate node embeddings and to further reduce their dimensionality to be able to visualize them in a 2D plot (Notebook).
Object Oriented Design Quality Metrics is based on OO Design Quality Metrics by Robert Martin (Notebook).
Overview contains overall statistics and details about methods and their complexity. (Notebook).
Visibility Metrics (Notebook).
Wordcloud contains a visual representation of package and class names (Notebook).

📖 Graph Data Science Reports

Here are some reports that utilize Neo4j's Graph Data Science Library. For a detailed reference of all CSV reports see CSV Cypher Query Report Reference

📖 Blog Articles

🛠️ Prerequisites

Java 17 is required for Neo4j (Neo4j 5.x requirement).
On Windows it is recommended to use the git bash provided by git for windows.
jq the "lightweight and flexible command-line JSON processor" needs to be installed. Latest releases: https://github.com/jqlang/jq/releases/latest. Check using jq --version.
Set environment variable NEO4J_INITIAL_PASSWORD to a password of your choice. For example:
```
export NEO4J_INITIAL_PASSWORD=neo4j_password_of_my_choice
```
To run Jupyter notebooks, create an .env file in the folder from where you open the notebook containing for example: NEO4J_INITIAL_PASSWORD=neo4j_password_of_my_choice

Additional Prerequisites for Python and Jupyter Notebooks

Python is required for Jupyter Notebook reports.
A conda package manager like Miniconda or Anaconda(Recommended for Windows) is required for Jupyter Notebook reports.
Chromium will automatically be downloaded if needed for Jupyter Notebook PDF reports generation.

Additional Prerequisites for Graph Visualization

These tools are needed to run the graph visualization scripts of directory graph-visualization:

Node.js
npm

Additional Prerequisites for Windows

Add this line to your ~/.bashrc file if you are using Anaconda3: /c/ProgramData/Anaconda3/etc/profile.d/conda.sh. Try to find a similar script for other conda package managers or versions.
Run conda init in the git bash opened as administrator. Running it in normal mode usually leads to an error message.

Additional Prerequisites for analyzing Typescript

Please follow the description on how to create a json file with the static code information of your Typescript project here: https://github.com/jqassistant-plugin/jqassistant-typescript-plugin
This could be as simple as running the following command in your Typescript project:
```
npx --yes @jqassistant/ts-lce
```
The cloned repository or source project needs to be copied into the directory called source within the analysis workspace, so that it will also be picked up during scan by resetAndScan.sh and optional importGit.sh.

🚀 Getting Started

See GETTING_STARTED.md on how to get started on your local machine.

🏗️ Pipeline and Tools

The Code Structure Analysis Pipeline utilizes GitHub Actions to automate the whole analysis process:

Use GitHub Actions Linux Runner
Checkout GIT Repository
Setup Java
Setup Python with Conda package manager Mambaforge
Download artifacts and optionally source code that contain the code to be analyzed scripts/downloader
Setup Neo4j Graph Database (analysis.sh)
Setup jQAssistant for Java and Typescript analysis (analysis.sh)
Start Neo4j Graph Database (analysis.sh)
Generate CSV Reports scripts/reports using the command line JSON parser jq
Generate Jupyter Notebook reports using these libraries specified in the environment.yml:
- Python
- jupyter
- matplotlib
- nbconvert
- numpy
- pandas
- pip
- monotonic
- Neo4j Python Driver
- openTSNE
- wordcloud
Graph Visualization uses node.js and the dependencies listed in package.json.
Check links in markdown documentation (GitHub workflow) uses markdown-link-check.

Big shout-out 📣 to all the creators and contributors of these great libraries 👍. Projects like this wouldn't be possible without them. Feel free to create an issue if something is missing or wrong in the list.

🏃 Command Reference

COMMANDS.md contains further details on commands and how to do a manual setup.

📃 CSV Cypher Query Report Reference

CSV_REPORTS.md lists all CSV Cypher query result reports inside the results directory. It can be generated as described in Generate CSV Report Reference.

📃 Jupyter Notebook Report Reference

JUPYTER_REPORTS.md lists all Jupyter Notebook reports inside the results directory. It can be generated as described in Generate Jupyter Notebook Report Reference.

📷 Image Reference

IMAGES.md lists all PNG images inside the results directory. It can be generated as described in Generate Image Reference.

⚙️ Script Reference

SCRIPTS.md lists all shell scripts of this repository including their first comment line as a description. It can be generated as described in Generate Script Reference.

🔍 Cypher Query Reference

CYPHER.md lists all Cypher queries of this repository including their first comment line as a description. It can be generated as described in Generate Cypher Reference.

Cypher is Neo4j’s graph query language that lets you retrieve data from the graph.

🌐 Environment Variable Reference

ENVIRONMENT_VARIABLES.md contains all environment variables that are supported by the scripts including default values and description. It can be generated as described in Generate Environment Variable Reference.

🤔 Questions & Answers

How can i run an analysis locally?
👉 Check the prerequisites. 👉 See Start an analysis in the Commands Reference. 👉 To get started from scratch see GETTING_STARTED.md.
How can i explore the Graph manually? 👉 After analysis start Neo4j and open the Neo4j Web UI (http://localhost:7474/browser).
How can i add a CSV report to the pipeline?
👉 Put your new cypher query into the cypher directory or a suitable (new) sub directory.
👉 Create a new CSV report script in the scripts/reports directory. Take for example OverviewCsv.sh as a reference.
👉 The script will automatically be included because of the directory and its name ending with "Csv.sh".
How can i add a Jupyter Notebook report to the pipeline?
👉 Put your new notebook into the jupyter directory.
👉 The file will then automatically be picked up by executeJupyterNotebookReport.sh.
How can i analyze a different code basis automatically?
👉 Create a new download script like the ones in the scripts/downloader directory. Take for example downloadAxonFramework.sh as a reference for Java projects and downloadReactRouter.sh as a reference for Typescript projects. 👉 After downloading, run analyze.sh. You can find these steps also in the pipeline as a reference.
How can i trigger a full re-scan of all artifacts?
👉 Delete the file artifactsChangeDetectionHash.txt in the artifacts directory. 👉 Delete the file typescriptFileChangeDetectionHashFile.txt in the source directory to additionally re-scan Typescript projects.
How can i enable PDF generation for Jupyter Notebooks (depends on chromium, takes more time)?
👉 Set environment variable ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION to anything except an empty string. Example:
```
export ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION="true"
```
👉 Alternatively prepend your command with ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION="true" like:
```
ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION=true ./../../scripts/analysis/analyze.sh
```
How can i disable git log data import?
👉 Set environment variable IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT to none. Example:
```
export IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none"
```
👉 Alternatively prepend your command with IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none":
```
IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none" ./../../scripts/analysis/analyze.sh
```
👉 An in-between option would be to only import monthly aggregated changes using IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated":
```
IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated" ./../../scripts/analysis/analyze.sh
```
Why are some Jupyter Notebook reports skipped? 👉 The custom Jupyter Notebook metadata property code_graph_analysis_pipeline_data_validation can be set to choose a query from cypher/Validation that will be executed preliminary to the notebook. If the query leads to at least one result, the validation succeeds and the notebook will be run. If the query leads to no result, the notebook will be skipped. For more details see Data Availability Validation.
How can i increase the heap memory when scanning large Typescript projects?
👉 Use the environment variable TYPESCRIPT_SCAN_HEAP_MEMORY in megabyte (default = 4096):
```
TYPESCRIPT_SCAN_HEAP_MEMORY=16384 ./../../scripts/analysis/analyze.sh
```
How can i continue on errors when scanning Typescript projects instead of cancelling the whole analysis?
👉 Use the profile Neo4jv5-continue-on-scan-errors (default = Neo4jv5):
```
./../../scripts/analysis/analyze.sh --profile Neo4jv5-continue-on-scan-errors
```

Name		Name	Last commit message	Last commit date
Latest commit History 1,714 Commits
.github		.github
cypher		cypher
graph-visualization		graph-visualization
images		images
jupyter		jupyter
results		results
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
COMMANDS.md		COMMANDS.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
README.md		README.md
init.sh		init.sh
markdown-lint-check-config.json		markdown-lint-check-config.json
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Graph Analysis Pipeline

✨ Features

📖 Jupyter Notebook Reports

📖 Graph Data Science Reports

📖 Blog Articles

🛠️ Prerequisites

Additional Prerequisites for Python and Jupyter Notebooks

Additional Prerequisites for Graph Visualization

Additional Prerequisites for Windows

Additional Prerequisites for analyzing Typescript

🚀 Getting Started

🏗️ Pipeline and Tools

🏃 Command Reference

📃 CSV Cypher Query Report Reference

📃 Jupyter Notebook Report Reference

📷 Image Reference

⚙️ Script Reference

🔍 Cypher Query Reference

🌐 Environment Variable Reference

🤔 Questions & Answers

🕸 Web References

About

Releases 4

Contributors 2

Languages

License

JohT/code-graph-analysis-pipeline

Folders and files

Latest commit

History

Repository files navigation

Code Graph Analysis Pipeline

✨ Features

📖 Jupyter Notebook Reports

📖 Graph Data Science Reports

📖 Blog Articles

🛠️ Prerequisites

Additional Prerequisites for Python and Jupyter Notebooks

Additional Prerequisites for Graph Visualization

Additional Prerequisites for Windows

Additional Prerequisites for analyzing Typescript

🚀 Getting Started

🏗️ Pipeline and Tools

🏃 Command Reference

📃 CSV Cypher Query Report Reference

📃 Jupyter Notebook Report Reference

📷 Image Reference

⚙️ Script Reference

🔍 Cypher Query Reference

🌐 Environment Variable Reference

🤔 Questions & Answers

🕸 Web References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Contributors 2

Languages