Skip to content

Commit

Permalink
#22: Create JSON parser for data files
Browse files Browse the repository at this point in the history
  • Loading branch information
maxime-bfsquall committed Sep 30, 2024
1 parent eaa2a42 commit edd150f
Show file tree
Hide file tree
Showing 19 changed files with 497 additions and 98 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ __pycache__
*.mps
*.sol
.tox/*
.vscode/*
.vscode/*
output/*
93 changes: 71 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,87 +7,136 @@ A Python program to build and solve the CCM-MILP problems, with a set of example
* Python 3.x

## Execution
from the folder `cd src/`

### Generate and solve a problem:
*From the folder `src`*

### Generate and solve a problem (and apply permutation on Synthetic Blocks example):
Either in interactive mode:

`python ccm_milp_full.py`
```shell
python ccm_milp_full.py
```

or in direct mode with a configuration file in YAML format:

`python ccm_milp_full.py -c <configuration file name>`
```shell
python ccm_milp_full.py -c "<configuration file name>"
```

or with a specific solver installed on the machine:

`python ccm_milp_full.py -s <solver name: COIN_CMD, GLPK_CMD, PULP_CBC_CMD>`
```shell
python ccm_milp_full.py -s "<solver name: COIN_CMD, GLPK_CMD, PULP_CBC_CMD>"
```

By default the solver used is **PULP_CBC_CMD**

### Generate problem files (.pl and .mps):
Either in interactive mode:

`python ccm_milp_problem.py`
```shell
python ccm_milp_problem.py
```

or in direct mode with a configuration file in YAML format:

`python ccm_milp_problem.py -c <configuration file name>`
```shell
python ccm_milp_problem.py -c "<configuration file name>"
```

### Solve a problem file (.mps):
Solve the generated problem.mps file generated before:

`python ccm_milp_solver.py`
```shell
python ccm_milp_solver.py
```

Solve a specific .mps problem file:

`python ccm_milp_solver.py -p <problem file .mps>`
```shell
python ccm_milp_solver.py -p "<problem file .mps>"
```

Solve with a specific solver installed on the machine:

`python ccm_milp_solver.py -s <solver name: COIN_CMD, GLPK_CMD, PULP_CBC_CMD>`
```shell
python ccm_milp_solver.py -s "<solver name: COIN_CMD, GLPK_CMD, PULP_CBC_CMD>"
```


### Permutation for JSON data files using a permutation file:
*The seprator for `--input-json-files` files is a space*

Permute json data and create output files:

```shell
python ccm_milp_permute_json.py --permutation-file="/abs/path/permutation.json" --input-json-files="/abs/path/data.0.json /abs/path/data.1.json /abs/path/data.2.json ..."
```

Permute json data and create output files using a prefix for output files:

```shell
python ccm_milp_permute_json.py --output-file-prefix="permuted_" --permutation-file="/abs/path/permutation.json" --input-json-files="/abs/path/data.0.json /abs/path/data.1.json /abs/path/data.2.json ..."
```

### Parse JSON
*The seprator for files is a space*

Parse json data and output result into the terminal:

```shell
python ccm_milp_parse_json.py --input-json-files="/abs/path/data.0.json /abs/path/data.1.json /abs/path/data.2.json ..."
```

## What to Expect:
A text output in the terminal and a `.sol` file when it solve with the results of the optimization process, along with a `.lp` and `.mps` file containing the generated linear program.

For `SyntheticBlock` we also can retrieve in the `output` folder, after running `ccm_milp_full.py`, the JSON data files permuted and the permutation file generated and used for it.

## Expected Results:
By default the configuration is:

`alpha = 1, beta = 0, gamma = 0, delta= 0, bounded_memory = False, preserve_clusters = False`
```YAML
alpha: 1
beta: 0
gamma: 0
delta: 0
bounded_memory: false
preserve_clusters: false
```
The example tested is `SyntheticBlock` with these differents configurations:
* Load only
* Configuration: `is_fwmp = False`
* Configuration: `is_fwmp: false`
* Optimal objective value `2.00000000`

* Load only and-cluster
* Configuration: `is_fwmp = False, preserve_clusters = True`
* Configuration: `is_fwmp: false, preserve_clusters: true`
* Optimal objective value `2.50000000`

* Load only and-memory-bound
* Configuration: `is_fwmp = False, bounded_memory: True`
* Configuration: `is_fwmp: false, bounded_memory: true`
* Optimal objective value `2.00000000`

* FWMP with alpha
* Configuration: `is_fwmp = True`
* Configuration: `is_fwmp: true`
* Optimal objective value `2.00000000`

* FWMP with alpha-beta
* Configuration: `is_fwmp = True, beta = 1`
* Configuration: `is_fwmp: true, beta: 1`
* Optimal objective value `4.00000000`

* Null case
* Configuration: `is_fwmp = True, alpha = 0, preserve_clusters = True`
* Configuration: `is_fwmp: true, alpha: 0, preserve_clusters: true`
* Optimal objective value `0.00000000`

* Off node communication-only
* Configuration: `is_fwmp = True, alpha = 0, beta = 1`
* Configuration: `is_fwmp: true, alpha: 0, beta: 1`
* Optimal objective value `0.00000000`

* Load no memory homing (delta: 0.1)
* Configuration: `is_fwmp = True, delta = 0.1, preserve_clusters = True`
* Configuration: `is_fwmp: true, delta: 0.1, preserve_clusters: true`
* Optimal objective value `2.50000000`

* Load no memory homing (delta: 0.3)
* Configuration: `is_fwmp = True, delta = 0.3, preserve_clusters = True`
* Optimal objective value `4.00000000`
* Configuration: `is_fwmp: true, delta: 0.3, preserve_clusters: true`
* Optimal objective value `4.00000000`
1 change: 0 additions & 1 deletion data/synthetic-blocks-permutation.json

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
16 changes: 15 additions & 1 deletion examples/configuration.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DARMA Toolkit v. 1.0.0
#
# Copyright 2024 National Technology & Engineering Solutions of Sandia, LLC
# Copyright 2019-2024 National Technology & Engineering Solutions of Sandia, LLC
# (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S.
# Government retains certain rights in this software.
#
Expand Down Expand Up @@ -33,27 +33,35 @@
# Questions? Contact darma@sandia.gov
#

import os

class ExampleConfig:
"""Examples Configs"""
def __init__(
self,
filename: str = None,
classname: str = None,
json: list = [],
test: bool = False,
test_configs: any = None,
test_regexp: dict = None
):
self.filename = filename
self.classname = classname
self.json = json
self.test = test
self.test_configs = test_configs
self.test_regexp = test_regexp

class Examples:
"""Examples"""

@staticmethod
def list():
"""Examples list"""
# Get src dir
data_dir = os.path.join(os.path.dirname(__file__), "..", "data")

# Available CCM-MILP examples regexp_test [PULP_CBC_CMD & COIN_CMD, GLPK_CMD]
return [
ExampleConfig(
Expand All @@ -63,6 +71,12 @@ def list():
ExampleConfig(
filename = 'synthetic_blocks',
classname = 'SyntheticBlocks',
json = [
os.path.join(data_dir, "synthetic_blocks", "data.0.json"),
os.path.join(data_dir, "synthetic_blocks", "data.1.json"),
os.path.join(data_dir, "synthetic_blocks", "data.2.json"),
os.path.join(data_dir, "synthetic_blocks", "data.3.json")
],
test = True,
test_configs = [
{
Expand Down
2 changes: 1 addition & 1 deletion src/ccm_milp/configuration.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DARMA Toolkit v. 1.0.0
#
# Copyright 2024 National Technology & Engineering Solutions of Sandia, LLC
# Copyright 2019-2024 National Technology & Engineering Solutions of Sandia, LLC
# (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S.
# Government retains certain rights in this software.
#
Expand Down
164 changes: 164 additions & 0 deletions src/ccm_milp/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# DARMA Toolkit v. 1.5.0
#
# Copyright 2019-2024 National Technology & Engineering Solutions of Sandia, LLC
# (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S.
# Government retains certain rights in this software.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from this
# software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
#
# Questions? Contact darma@sandia.gov
#

import json

class Data:

Check notice on line 38 in src/ccm_milp/data.py

View workflow job for this annotation

GitHub Actions / code-quality (ubuntu-latest, 3.8)

Too many instance attributes (12/10) (too-many-instance-attributes)

Check notice on line 38 in src/ccm_milp/data.py

View workflow job for this annotation

GitHub Actions / code-quality (ubuntu-latest, 3.8)

Too few public methods (1/2) (too-few-public-methods)
"""Data file class used after parsing json data files"""

def __init__(self):
self.mems = 0
self.rank_mems = None
self.rank_working_bytes = None
self.task_loads = None
self.task_working_bytes = None
self.task_footprint_bytes = None
self.task_rank = None
self.task_id = None
self.memory_blocks = None
self.memory_block_home = None
self.task_memory_block_mapping = None
self.task_communications = None

def parse_json(self, data_files: list):

Check notice on line 55 in src/ccm_milp/data.py

View workflow job for this annotation

GitHub Actions / code-quality (ubuntu-latest, 3.8)

Too many local variables (30/15) (too-many-locals)

Check notice on line 55 in src/ccm_milp/data.py

View workflow job for this annotation

GitHub Actions / code-quality (ubuntu-latest, 3.8)

Too many statements (75/50) (too-many-statements)
"""Parse JSON data files"""
tasks = []
tasks_working_bytes = []
tasks_footprint_bytes = []
task_indices = []
task_rank_obj_id = []
tasks_rank = []
task_shared_id_map = {}
shared_id_map = {}
shared_id_home = {}
task_id = 0
ranks = {}
comunications = []
total_load = 0.0

for data_file in data_files:
# Get rank from filename (/some/path/data.{rank}.json})
rank: int = -1
if len(data_file.split('.')) > 1:
split_data_filename = data_file.split('.')
rank = int(split_data_filename[len(split_data_filename) - 2])

# Get content file
data_json = None
with open(data_file, 'r', encoding="UTF-8") as f:
# Load data
data_json = json.load(f)

# Manage Tasks data
if "tasks" in data_json["phases"][0]:
# Even if rank as no task we set mems to 0
if len(data_json["phases"][0]["tasks"]) < 1:
ranks[rank] = 0

# For each tasks
for task in data_json["phases"][0]["tasks"]:
# Get data
time = task["time"]
index = task.get("entity").get("index")
obj_id = task.get("entity").get("id")
shared_id = task.get("user_defined").get("shared_block_id")
shared_bytes = task.get("user_defined").get("shared_bytes")
task_working_bytes = task.get("user_defined").get("task_working_bytes", 0)
task_footprint_bytes = task.get("user_defined").get("task_footprint_bytes", 0)
rank_working_bytes = task.get("user_defined").get("rank_working_bytes", 0)

# Set data
ranks[rank] = rank_working_bytes
shared_id_map[shared_id] = shared_bytes
shared_id_home[shared_id] = rank
tasks.append(time)
tasks_footprint_bytes.append(task_footprint_bytes)
tasks_working_bytes.append(task_working_bytes)
task_indices.append(index)
tasks_rank.append( rank)
task_rank_obj_id.append(obj_id)
if shared_id not in task_shared_id_map:
task_shared_id_map[shared_id] = []
task_shared_id_map[shared_id].append(task_id)
total_load += time

# Manage counter
task_id += 1

# Manage Communications data
if "communications" in data_json["phases"][0]:
for com in data_json["phases"][0]["communications"]:
comunications.append([
com["from"]["id"],
com["to"]["id"],
com["bytes"]
])

# Set data
self.rank_mems = []
for _ in range(0, len(ranks)) :
self.rank_mems.append(self.mems)

self.rank_working_bytes = list(ranks.values())
self.task_loads = tasks
self.task_working_bytes = tasks_working_bytes
self.task_footprint_bytes = tasks_footprint_bytes
self.task_rank = tasks_rank
self.task_id = task_rank_obj_id
self.task_id.sort()

self.memory_blocks = list(shared_id_map.values())
self.memory_blocks.sort()

self.memory_block_home = list(shared_id_home.values())
self.memory_block_home.sort()

self.task_memory_block_mapping = list(task_shared_id_map.values())
self.task_memory_block_mapping.sort()

self.task_communications = comunications

# Print data object
print(f"rank_mems: {self.rank_mems}")
print(f"rank_working_bytes: {self.rank_working_bytes}")
print(f"task_loads: {self.task_loads}")
print(f"task_working_bytes: {self.task_working_bytes}")
print(f"task_footprint_bytes: {self.task_footprint_bytes}")
print(f"task_rank: {self.task_rank}")
print(f"task_id: {self.task_id}")
print(f"memory_blocks: {self.memory_blocks}")
print(f"memory_block_home: {self.memory_block_home}")
print(f"task_memory_block_mapping: {self.task_memory_block_mapping}")
print(f"task_communications: {self.task_communications}")
Loading

0 comments on commit edd150f

Please sign in to comment.