#22: Create JSON parser for data files

DARMA-tasking · Sep 30, 2024 · edd150f · edd150f
1 parent eaa2a42
commit edd150f
Show file tree

Hide file tree

Showing 19 changed files with 497 additions and 98 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,4 +4,5 @@ __pycache__
 *.mps
 *.sol
 .tox/*
-.vscode/*
+.vscode/*
+output/*
diff --git a/README.md b/README.md
@@ -7,87 +7,136 @@ A Python program to build and solve the CCM-MILP problems, with a set of example
 * Python 3.x
 
 ## Execution
-from the folder `cd src/`
-
-### Generate and solve a problem:
+*From the folder `src`*
 
+### Generate and solve a problem (and apply permutation on Synthetic Blocks example):
 Either in interactive mode:
 
-`python ccm_milp_full.py`
+```shell
+python ccm_milp_full.py
+```
 
 or in direct mode with a configuration file in YAML format:
 
-`python ccm_milp_full.py -c <configuration file name>`
+```shell
+python ccm_milp_full.py -c "<configuration file name>"
+```
 
 or with a specific solver installed on the machine:
 
-`python ccm_milp_full.py -s <solver name: COIN_CMD, GLPK_CMD, PULP_CBC_CMD>`
+```shell
+python ccm_milp_full.py -s "<solver name: COIN_CMD, GLPK_CMD, PULP_CBC_CMD>"
+```
 
 By default the solver used is **PULP_CBC_CMD**
 
 ### Generate problem files (.pl and .mps):
 Either in interactive mode:
 
-`python ccm_milp_problem.py`
+```shell
+python ccm_milp_problem.py
+```
 
 or in direct mode with a configuration file in YAML format:
 
-`python ccm_milp_problem.py -c <configuration file name>`
+```shell
+python ccm_milp_problem.py -c "<configuration file name>"
+```
 
 ### Solve a problem file (.mps):
 Solve the generated problem.mps file generated before:
 
-`python ccm_milp_solver.py`
+```shell
+python ccm_milp_solver.py
+```
 
 Solve a specific .mps problem file:
 
-`python ccm_milp_solver.py -p <problem file .mps>`
+```shell
+python ccm_milp_solver.py -p "<problem file .mps>"
+```
 
 Solve with a specific solver installed on the machine:
 
-`python ccm_milp_solver.py -s <solver name: COIN_CMD, GLPK_CMD, PULP_CBC_CMD>`
+```shell
+python ccm_milp_solver.py -s "<solver name: COIN_CMD, GLPK_CMD, PULP_CBC_CMD>"
+```
+
+
+### Permutation for JSON data files using a permutation file:
+*The seprator for `--input-json-files` files is a space*
+
+Permute json data and create output files:
+
+```shell
+python ccm_milp_permute_json.py --permutation-file="/abs/path/permutation.json" --input-json-files="/abs/path/data.0.json /abs/path/data.1.json /abs/path/data.2.json ..."
+```
+
+Permute json data and create output files using a prefix for output files:
+
+```shell
+python ccm_milp_permute_json.py --output-file-prefix="permuted_" --permutation-file="/abs/path/permutation.json" --input-json-files="/abs/path/data.0.json /abs/path/data.1.json /abs/path/data.2.json ..."
+```
+
+### Parse JSON
+*The seprator for files is a space*
+
+Parse json data and output result into the terminal:
+
+```shell
+python ccm_milp_parse_json.py --input-json-files="/abs/path/data.0.json /abs/path/data.1.json /abs/path/data.2.json ..."
+```
 
 ## What to Expect:
 A text output in the terminal and a `.sol` file when it solve with the results of the optimization process, along with a `.lp` and `.mps` file containing the generated linear program.
 
+For `SyntheticBlock` we also can retrieve in the `output` folder, after running `ccm_milp_full.py`, the JSON data files permuted and the permutation file generated and used for it.
+
 ## Expected Results:
 By default the configuration is:
 
-`alpha = 1, beta = 0, gamma = 0, delta= 0, bounded_memory = False, preserve_clusters = False`
+```YAML
+alpha: 1
+beta:  0
+gamma: 0
+delta: 0
+bounded_memory: false
+preserve_clusters: false
+```
 
 The example tested is `SyntheticBlock` with these differents configurations:
 * Load only
-    * Configuration: `is_fwmp = False`
+    * Configuration: `is_fwmp: false`
     * Optimal objective value `2.00000000`
 
 * Load only and-cluster
-    * Configuration: `is_fwmp = False, preserve_clusters = True`
+    * Configuration: `is_fwmp: false, preserve_clusters: true`
     * Optimal objective value `2.50000000`
 
 * Load only and-memory-bound
-    * Configuration: `is_fwmp = False, bounded_memory: True`
+    * Configuration: `is_fwmp: false, bounded_memory: true`
     * Optimal objective value `2.00000000`
 
 * FWMP with alpha
-    * Configuration: `is_fwmp = True`
+    * Configuration: `is_fwmp: true`
     * Optimal objective value `2.00000000`
 
 * FWMP with alpha-beta
-    * Configuration: `is_fwmp = True, beta = 1`
+    * Configuration: `is_fwmp: true, beta: 1`
     * Optimal objective value `4.00000000`
 
 * Null case
-    * Configuration: `is_fwmp = True, alpha = 0, preserve_clusters = True`
+    * Configuration: `is_fwmp: true, alpha: 0, preserve_clusters: true`
     * Optimal objective value `0.00000000`
 
 * Off node communication-only
-    * Configuration: `is_fwmp = True, alpha = 0, beta = 1`
+    * Configuration: `is_fwmp: true, alpha: 0, beta: 1`
     * Optimal objective value `0.00000000`
 
 * Load no memory homing (delta: 0.1)
-    * Configuration: `is_fwmp = True, delta = 0.1, preserve_clusters = True`
+    * Configuration: `is_fwmp: true, delta: 0.1, preserve_clusters: true`
     * Optimal objective value `2.50000000`
 
 * Load no memory homing (delta: 0.3)
-    * Configuration: `is_fwmp = True, delta = 0.3, preserve_clusters = True`
-    * Optimal objective value `4.00000000`
+    * Configuration: `is_fwmp: true, delta: 0.3, preserve_clusters: true`
+    * Optimal objective value `4.00000000`
diff --git a/data/synthetic-blocks-permutation.json b/data/synthetic-blocks-permutation.json
diff --git a/data/synthetic-dataset-blocks.0.json → data/synthetic_blocks/data.0.json b/data/synthetic-dataset-blocks.0.json → data/synthetic_blocks/data.0.json
diff --git a/data/synthetic-dataset-blocks.1.json → data/synthetic_blocks/data.1.json b/data/synthetic-dataset-blocks.1.json → data/synthetic_blocks/data.1.json
diff --git a/data/synthetic-dataset-blocks.2.json → data/synthetic_blocks/data.2.json b/data/synthetic-dataset-blocks.2.json → data/synthetic_blocks/data.2.json
diff --git a/data/synthetic-dataset-blocks.3.json → data/synthetic_blocks/data.3.json b/data/synthetic-dataset-blocks.3.json → data/synthetic_blocks/data.3.json
diff --git a/examples/configuration.py b/examples/configuration.py
@@ -1,6 +1,6 @@
 #                           DARMA Toolkit v. 1.0.0
 #
-# Copyright 2024 National Technology & Engineering Solutions of Sandia, LLC
+# Copyright 2019-2024 National Technology & Engineering Solutions of Sandia, LLC
 # (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S.
 # Government retains certain rights in this software.
 #
@@ -33,27 +33,35 @@
 # Questions? Contact darma@sandia.gov
 #
 
+import os
+
 class ExampleConfig:
     """Examples Configs"""
     def __init__(
         self,
         filename: str = None,
         classname: str = None,
+        json: list = [],
         test: bool = False,
         test_configs: any = None,
         test_regexp: dict = None
     ):
         self.filename = filename
         self.classname = classname
+        self.json = json
         self.test = test
         self.test_configs = test_configs
         self.test_regexp = test_regexp
 
 class Examples:
     """Examples"""
+
     @staticmethod
     def list():
         """Examples list"""
+        # Get src dir
+        data_dir = os.path.join(os.path.dirname(__file__), "..", "data")
+
         # Available CCM-MILP examples regexp_test [PULP_CBC_CMD & COIN_CMD, GLPK_CMD]
         return [
             ExampleConfig(
@@ -63,6 +71,12 @@ def list():
             ExampleConfig(
                 filename = 'synthetic_blocks',
                 classname = 'SyntheticBlocks',
+                json = [
+                    os.path.join(data_dir, "synthetic_blocks", "data.0.json"),
+                    os.path.join(data_dir, "synthetic_blocks", "data.1.json"),
+                    os.path.join(data_dir, "synthetic_blocks", "data.2.json"),
+                    os.path.join(data_dir, "synthetic_blocks", "data.3.json")
+                ],
                 test = True,
                 test_configs = [
                     {

diff --git a/src/ccm_milp/configuration.py b/src/ccm_milp/configuration.py
@@ -1,6 +1,6 @@
 #                           DARMA Toolkit v. 1.0.0
 #
-# Copyright 2024 National Technology & Engineering Solutions of Sandia, LLC
+# Copyright 2019-2024 National Technology & Engineering Solutions of Sandia, LLC
 # (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S.
 # Government retains certain rights in this software.
 #

diff --git a/src/ccm_milp/data.py b/src/ccm_milp/data.py
@@ -0,0 +1,164 @@
+#                           DARMA Toolkit v. 1.5.0
+#
+# Copyright 2019-2024 National Technology & Engineering Solutions of Sandia, LLC
+# (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S.
+# Government retains certain rights in this software.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+#   this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright notice,
+#   this list of conditions and the following disclaimer in the documentation
+#   and/or other materials provided with the distribution.
+#
+# * Neither the name of the copyright holder nor the names of its
+#   contributors may be used to endorse or promote products derived from this
+#   software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+#
+# Questions? Contact darma@sandia.gov
+#
+
+import json
+
+class Data:
+    """Data file class used after parsing json data files"""
+
+    def __init__(self):
+        self.mems = 0
+        self.rank_mems = None
+        self.rank_working_bytes = None
+        self.task_loads = None
+        self.task_working_bytes = None
+        self.task_footprint_bytes = None
+        self.task_rank = None
+        self.task_id = None
+        self.memory_blocks = None
+        self.memory_block_home = None
+        self.task_memory_block_mapping = None
+        self.task_communications = None
+
+    def parse_json(self, data_files: list):
+        """Parse JSON data files"""
+        tasks  = []
+        tasks_working_bytes = []
+        tasks_footprint_bytes  = []
+        task_indices  = []
+        task_rank_obj_id  = []
+        tasks_rank  = []
+        task_shared_id_map = {}
+        shared_id_map = {}
+        shared_id_home  = {}
+        task_id = 0
+        ranks = {}
+        comunications = []
+        total_load = 0.0
+
+        for data_file in data_files:
+            # Get rank from filename (/some/path/data.{rank}.json})
+            rank: int = -1
+            if len(data_file.split('.')) > 1:
+                split_data_filename =  data_file.split('.')
+                rank = int(split_data_filename[len(split_data_filename) - 2])
+
+            # Get content file
+            data_json = None
+            with open(data_file, 'r', encoding="UTF-8") as f:
+                # Load data
+                data_json = json.load(f)
+
+                # Manage Tasks data
+                if "tasks" in data_json["phases"][0]:
+                    # Even if rank as no task we set mems to 0
+                    if len(data_json["phases"][0]["tasks"]) < 1:
+                        ranks[rank] = 0
+
+                    # For each tasks
+                    for task in data_json["phases"][0]["tasks"]:
+                        # Get data
+                        time = task["time"]
+                        index = task.get("entity").get("index")
+                        obj_id = task.get("entity").get("id")
+                        shared_id = task.get("user_defined").get("shared_block_id")
+                        shared_bytes = task.get("user_defined").get("shared_bytes")
+                        task_working_bytes = task.get("user_defined").get("task_working_bytes", 0)
+                        task_footprint_bytes = task.get("user_defined").get("task_footprint_bytes", 0)
+                        rank_working_bytes = task.get("user_defined").get("rank_working_bytes", 0)
+
+                        # Set data
+                        ranks[rank] = rank_working_bytes
+                        shared_id_map[shared_id] = shared_bytes
+                        shared_id_home[shared_id] = rank
+                        tasks.append(time)
+                        tasks_footprint_bytes.append(task_footprint_bytes)
+                        tasks_working_bytes.append(task_working_bytes)
+                        task_indices.append(index)
+                        tasks_rank.append( rank)
+                        task_rank_obj_id.append(obj_id)
+                        if shared_id not in task_shared_id_map:
+                            task_shared_id_map[shared_id] = []
+                        task_shared_id_map[shared_id].append(task_id)
+                        total_load += time
+
+                        # Manage counter
+                        task_id += 1
+
+                # Manage Communications data
+                if "communications" in data_json["phases"][0]:
+                    for com in data_json["phases"][0]["communications"]:
+                        comunications.append([
+                            com["from"]["id"],
+                            com["to"]["id"],
+                            com["bytes"]
+                        ])
+
+        # Set data
+        self.rank_mems = []
+        for _ in range(0, len(ranks)) :
+            self.rank_mems.append(self.mems)
+
+        self.rank_working_bytes =  list(ranks.values())
+        self.task_loads = tasks
+        self.task_working_bytes = tasks_working_bytes
+        self.task_footprint_bytes = tasks_footprint_bytes
+        self.task_rank = tasks_rank
+        self.task_id = task_rank_obj_id
+        self.task_id.sort()
+
+        self.memory_blocks = list(shared_id_map.values())
+        self.memory_blocks.sort()
+
+        self.memory_block_home = list(shared_id_home.values())
+        self.memory_block_home.sort()
+
+        self.task_memory_block_mapping = list(task_shared_id_map.values())
+        self.task_memory_block_mapping.sort()
+
+        self.task_communications = comunications
+
+        #  Print data object
+        print(f"rank_mems:                 {self.rank_mems}")
+        print(f"rank_working_bytes:        {self.rank_working_bytes}")
+        print(f"task_loads:                {self.task_loads}")
+        print(f"task_working_bytes:        {self.task_working_bytes}")
+        print(f"task_footprint_bytes:      {self.task_footprint_bytes}")
+        print(f"task_rank:                 {self.task_rank}")
+        print(f"task_id:                   {self.task_id}")
+        print(f"memory_blocks:             {self.memory_blocks}")
+        print(f"memory_block_home:         {self.memory_block_home}")
+        print(f"task_memory_block_mapping: {self.task_memory_block_mapping}")
+        print(f"task_communications:       {self.task_communications}")