This is a replication package and online appendix for the ICSA2025 paper "Network Centrality as a New Perspective on Microservice Architecture"
This repository contains the following:
- INSTALL: Detailed installation instructions for each used tool
- Appendix:
- Table I: Data on the 24 studied projects
- Table II: List of all gathered metrics
- Project information: Tables of all projects gathered from all dataset and their metadata
- Projects: a folder containing source code of all studied projects
- Figures: a folder containing all figures used in the paper
- Sankey diagram: Sankey diagram of project selection process
- BMP process: BMP process for the data collection
- Size Heat Map: Heat of Spearman Rho for correlation of size metrics with centrality
- Size Heat Map (Stat. Sig.): Heat of Spearman Rho for correlation of size metrics with centrality (stat. sig. only)
- Complexity Heat Map: Heat of Spearman Rho for correlation of complexity metrics with centrality
- Complexity Heat Map (Stat. Sig.): Heat of Spearman Rho for correlation of complexity metrics with centrality (stat. sig. only)
- Quality Heat Map: Heat of Spearman Rho for correlation of quality metrics with centrality
- Quality Heat Map (Stat. Sig.): Heat of Spearman Rho for correlation of quality metrics with centrality (stat. sig. only)
- Centrality Heat Map (Stat. Sig.): Heat of Spearman Rho for correlation of centrality metrics with each other (stat. sig. only)
- Raw data: a folder containing all raw data extracted from different tools
- code2dfd: Raw output of Code2DFD
- graph: Graphs extracted from Code2DFD output
- understand: Raw data from Understand
- jasome: Raw data from Jasome
- package_map.json: Mapping of Java packages to microservices
- Metrics: metrics extracted from raw data
- metrics_centrality.csv: All the centrality metrics for all microservice
- metrics_understand.csv: All the
Understand
metrics for all microservices - metrics_jasome_package.csv: All the
Jasome
metrics for all microservices on package level - metrics_jasome_class.csv: All the
Jasome
metrics for all microservices on class level - metrics_jasome_method.csv: All the
Jasome
metrics for all microservices on method level - metrics_sonarqube.csv: All the
SonarQube
metrics for all microservices - metrics_merged.csv: All the metrics for all microservices
- metrics_statsig.csv: List of all metrics that have a statistically significant correlation with centrality
- Results: Data files containing the analyzed results to answer the Research Question
- NormalityAndersonDarling: Results of testing normality of each metric distribution with Anderson-Darling
- RQ1: Does centrality correlate with size metrics?
- metrics_size.csv: All the size metrics that have a statistically significant correlation with centrality
- SizeSpearmanRho: Spearman Rho correlation between centrality and size metrics
- RQ2: Does centrality correlate with complexity metrics?
- metrics_complexity.csv: All the complexity metrics that have a statistically significant correlation with centrality
- ComplexitySpearmanRho: Spearman Rho correlation between centrality and complexity metrics
- RQ3: Does centrality correlate with quality metrics?
- metrics_quality.csv: All the quality metrics that have a statistically significant correlation with centrality
- QualitySpearmanRho: Spearman Rho correlation between centrality and complexity metrics
- Scripts: a folder containing all the scripts
- extract_graphs.py: processes the
Code2DFD
output into standard graphjson
- metrics_centrality.py: computes centrality scores with
NetworkX
- metrics_jasome.py: executes
Jasome
tool and saves raw data - merge_jasome.py: merge the raw
Jasome
data intocsv
files - metrics_understand.py: executes
Understand
tool and saves raw data - merge_understand.py: merge the raw
Understand
data into acsv
file - metrics_sonarqube.py: executes
SonarQube
analysis - merge_sonarqube.py: put the
SonarQube
data into acsv
file - merge_all.py: merge all metrics into a single
csv
file - metrics_results.py: keep only the statistically significantly correlated metrics
- extract_graphs.py: processes the
All generated data is provided under Creative Commons 4.0 Attribution License.
All scripts are provided under the MIT License.
All the analysed projects must be used in accordance with their respective licenses (shared in each project when applicable).
Follow the instructions in INSTALL to install and configure all used tools.
All DFDs are reconstructed with c65b4a version of Code2DFD
tool.
The raw output for each PROJECT
is copied here to a folder PROJECT-code2dfd
.
The script process_graphs.py converts the json
files of the Code2DFD
output into
network files suitable for NetworkX
.
For each PROJECT
, it creates 2 files in a PROJECT-graph
folder:
PROJECT-gwcc.json
: The Greatest Weakly Connected Component (GWCC) of the reconstructed architecture graphPROJECT-gwcc_noDB.json
: The GWCC with all databases that are only connected to one service removed
The script metrics_centrality.py loads the PROJECT-gwcc_noDB.json
files and computes the following centrality metrics with NetworkX
:
- Degree centrality
- In-degree centrality
- Out-degree centrality
- Eigenvector centrality
- Closeness centrality
- Betweenness centrality
- Load centrality
- Harmonic centrality
- Information Centrality
- Current flow centrality
- Subgraph centrality
The centrality metrics for each system and service are saved into metrics_centrality.csv csv
file.
Jasome
tool can be downloaded from its GitHub page.
The script metrics_jasome.py executes the Jasome
tool for each PROJECT
.
Change the variable JASOME_PATH
in the script to point to the Jasome
binary on your system.
For each PROJECT
, the scripts saves to the folder PROJECT-jasome
the raw xml
output from Jasome
for each src
folder in the project.
The script merge_jasome.py takes the data from all the raw xml
s into the following csv
files:
- metrics_jasome_package.csv: metrics calculated for each package
- metrics_jasome_class.csv: metrics calculated for each class
- metrics_jasome_method.csv: metrics calculated for each method
Download the Understand
tool and acquire its license on the official website
The script metrics_understand.py executes the Understand
tool for each PROJECT
.
Change the variable UND_PATH
in the script to point to the und
cli tool on your system.
For each PROJECT
, the scripts saves to the folder PROJECT-und
the raw csv
output from Understand
.
The script merge_understand.py takes only the metrics calculated on Package
level for all
PROJECTS
and saves them to metrics_understand.csv csv
file.
Deploy a SonarQube
instance using the instructions from the official website.
Generate a Global Analysis Token
and a User token
.
Download the SonarScanner
application from the official website.
The script metrics_sonarqube.py sets up a SonarQube
project for each of the PROJECT
s
in the repository and executes the analysis with SonarScanner
.
Change the SONAR_PATH
variable to the location of the sonar-scanner
binary.
Change the TOKEN
variable to the Global Analysis Token
generated in SonarQube
.
Additionally, if SonarQube
is not deployed on localhost:9000
, change the -Dsonar.host.url
parameter in the run command.
After executing the script, you should see all projects analyzed in the SonarQube
dashboard.
The script merge_sonarqube.py queries data for each PROJECT
.
Change the variable USER_TOKEN
to the User token
generated in SonarQube
.
The script will query the SonarQube
metrics on directory level, infer the package name from the directory path,
and save the metrics for each PROJECT
and each package in the metrics_sonarqube.csv.
The file package_map.json contains the mapping of Java packages to the microservices.
The script merge_data.py takes the metrics_centrality.csv, metrics_understand.csv,
metrics_jasome_package.csv, metrics_jasome_class.csv, metrics_jasome_method.csv ,
metrics_sonarqube.csv files, maps the packages to microservices and creates a unified csv
file metrics_merged.csv with microservices that have all possible metrics.
Metrics are aggregated from packages using sum
, mean
and max
wherever suitable.
The file metrics_statsig.csv contains a list of metrics that have a statistically significant (p<0.01
)
correlation with at least one centrality score, and their category of either size
, complexity
or quality
.
The script metrics_filter_statsig.py takes only such metrics from metrics_merged.csv
and saves them to three respective csv
files: metrics_size.csv, metrics_complexity.csv and metrics_quality.csv together with the centrality metrics.