From eef86ded5dd1f3077aba271c7bf06dbdf7229291 Mon Sep 17 00:00:00 2001 From: Lu Chueng <97081990+noobotdj@users.noreply.github.com> Date: Tue, 18 Jan 2022 22:12:13 +0800 Subject: [PATCH 1/5] Create 0052-FamilySeer.md --- rfcs/0052-FamilySeer.md | 202 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 202 insertions(+) create mode 100644 rfcs/0052-FamilySeer.md diff --git a/rfcs/0052-FamilySeer.md b/rfcs/0052-FamilySeer.md new file mode 100644 index 00000000..2bccb428 --- /dev/null +++ b/rfcs/0052-FamilySeer.md @@ -0,0 +1,202 @@ +- Feature Name: (FamilySeer: A new search method for Auto-scheduler) +- Start Date: (2021-01-07) +- RFC PR: [apache/tvm-rfcs#25](https://github.com/apache/tvm-rfcs/pull/52) +- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875) + +# Summary +[summary]: #summary + +We propose FamilySeer, a new search method that optimized search effienecy and search quality of the Auto-scheduler. We introduce several features: + +- FamilySeer exploits the subgraph similarity to form a collection of subgraph families, and construct cost models at subgraph family basis to improve cost model accuracy. +- We enable the subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results. +- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU. + +# Motivation +[motivation]: #motivation + +Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space. + +Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach: + +The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning. + +The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance. + +The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +We intergrate our search method into Auto-scheduler, therefore users only need to change some of the parameters to enable our search method. + +We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example: + +```python +#... + +# load all task and into tuner +tuner = auto_scheduler.TaskScheduler(tasks, task_weights) + +#define tuning option for tuner +tune_option = auto_scheduler.TuningOptions( + num_measure_trials=200, # change this to 20000 to achieve the best performance + runner=measure_ctx.runner, + measure_callbacks=[auto_scheduler.RecordToFile(log_file)], +) + +# start tuning +#tuner.tune(tune_option) #add new parameter to tune function +tuner.tune(tune_option,search_policy="sketch.xgb.family_op") + +``` + +When we begin tuning, the `tuner` load the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_`. We currently provide two family algorithm as an option: `op` refers to classfiying subgraphs based on the core operation and `hash` refers to classfiying subgraphs based on operation sequence. We recommend using `op` to achieve better performance. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +Our search method consists of three steps: + +1. Identifying similar subgraphs +```python +def make_family_group( + tasks, + search_policy, +): + if search_policy == "default": + search_policy = "sketch.xgb" + + if isinstance(search_policy, str): + policy = search_policy.split(".") + if len(policy) == 2: + return {} + elif len(policy) == 3: + policy_type, model_type, model_group = policy + _, class_type = model_group.split("_") + else: + raise ValueError("Invalid search policy: " + search_policy) + + + family_group = {} + # Identifying Similar Subgraphs based on core operation + if class_type == "op": + for idx, task in enumerate(tasks): + task_layers=task.desc.split('_') + if task_layers[1] not in family_group: + family_group[task_layers[1]] = [] + family_group[task_layers[1]].append(idx) + else: + family_group[task_layers[1]].append(idx) + + # Identifying Similar Subgraphs based on hash + elif class_type == "hash": + for idx, task in enumerate(tasks): + task_hash=task.workload_key[2:34] + if task_hash not in family_group: + family_group[task_hash] = [] + family_group[task_hash].append(idx) + else: + family_group[task_hash].append(idx) + + if family_group is not None: + for key,value in family_group.items(): + print("family group :", key, "---", value) + + return family_group + +``` + +We use static analyzing, which classifies the subgraphs(tasks) based on +their attributes. + +2. Constructing family cost model +```python +elif "family" in model_group: + + # generate cost model for each family + cost_model_pool = [] + for _,group in family_group.items(): + if model_type == "xgb": + cost_model = XGBModel( + num_warmup_sample=len(group) * num_measures_per_round, + model_file=load_model_file, + adapative_training=adapative_training, + ) + if load_model_file and os.path.isfile(load_model_file): + logger.info("TaskScheduler: Load pretrained model...") + cost_model.load(load_model_file) + elif load_log_file: + logger.info("TaskScheduler: Reload measured states and train the model...") + cost_model.update_from_file(load_log_file) + elif model_type == "random": + cost_model = RandomModel() + else: + raise ValueError("Invalid search policy: " + search_policy) + cost_model_pool.append(cost_model) + + #bind each subgraph(task) with its family cost model + search_policies = [] + for task_idx,task in enumerate(tasks): + for group_idx,group in enumerate(family_group.values()): + if task_idx in group: + search_policies.append( + SketchPolicy( + task, + cost_model_pool[group_idx], + params=search_policy_params, + verbose=verbose, + init_search_callbacks=init_search_callbacks, + ) + ) + +``` + +After identifying similar subgraphs, We return a list of subgraph families (subgraph group) and build a cost model for each subgraph families. + +3.Foresee tuning + +```python +def _tune_family_task(self, task_idx_groups,skip_measures_per_round): + """Tune the select family task for one round. + """ + for task_idx in task_idx_groups: + # Run pre-tune callbacks + for callback in self.callbacks: + callback.pre_tune(self, task_idx) + + measure_inputs, measure_results = self.search_policies[task_idx].continue_search_one_round( + skip_measures_per_round, self.measurer + ) + + …… +``` + +The foresee tuning takes `task_idx_groups` (A list of subgraph families) and `skip_measures_per_round` as inputs and tunes all the subgraphs inside the list. + +# Drawbacks +[drawbacks]: #drawbacks + +FamilySeer currently relys on static analysis to identify subgraphs, which might result in misjudgements on some of the subgraphs. We are looking for alternative method to identify subgraphs dynamically while maintain the same time budget. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +Auto-Scheduler generates a large enough search space so searching the space efficiently is important. With FamilySeer, Users can search for the same optimal code under less time budget. We hope that our search method can be an alternative option for those who expect to obtain better optimal code under limited time budget. + +# Prior art +[prior-art]: #prior-art + +Please refer to [this paper](https://arxiv.org/abs/2201.00194). + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +Our search method is up for [discussion](https://discuss.tvm.apache.org/t/rfc-familyseer-a-new-search-method-for-auto-scheduler/11877). After the conversation on the discussion forum, We will focus on identifying subgraph dynamically. + +# Future possibilities +[future-possibilities]: #future-possibilities + +1. Advanced Foresee tuning + +Auto-tuning is the process of looking for the only best optimal code for Deep learning network. To build a cost model, many less performed code has to be evaluated. If we can analyze the subgraph similarity accurately, we can draw an relationship map between each subgraph and focus on building highly accurate cost model for the most related subgraph. Once an accurate cost model has been built, We can predict optimal code for other subgraphs instead of searching iteratively. \ No newline at end of file From cba5f2cd316999e839c8a8557553dab05ea9b055 Mon Sep 17 00:00:00 2001 From: Lu Chueng <97081990+noobotdj@users.noreply.github.com> Date: Wed, 19 Jan 2022 21:47:57 +0800 Subject: [PATCH 2/5] Update 0052-FamilySeer.md --- rfcs/0052-FamilySeer.md | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/rfcs/0052-FamilySeer.md b/rfcs/0052-FamilySeer.md index 2bccb428..cdda61dd 100644 --- a/rfcs/0052-FamilySeer.md +++ b/rfcs/0052-FamilySeer.md @@ -6,10 +6,10 @@ # Summary [summary]: #summary -We propose FamilySeer, a new search method that optimized search effienecy and search quality of the Auto-scheduler. We introduce several features: +We propose FamilySeer, a new search method that optimizes search efficiency and search quality of the Auto-scheduler. We introduce several features: -- FamilySeer exploits the subgraph similarity to form a collection of subgraph families, and construct cost models at subgraph family basis to improve cost model accuracy. -- We enable the subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results. +- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy. +- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results. - We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU. # Motivation @@ -28,7 +28,7 @@ The search process will at the end take a dozen of hours. This motivates us to f # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -We intergrate our search method into Auto-scheduler, therefore users only need to change some of the parameters to enable our search method. +We integrate our search method into Auto-scheduler. Therefore, users only need to change some of the parameters to enable our search method. We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example: @@ -51,7 +51,7 @@ tuner.tune(tune_option,search_policy="sketch.xgb.family_op") ``` -When we begin tuning, the `tuner` load the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_`. We currently provide two family algorithm as an option: `op` refers to classfiying subgraphs based on the core operation and `hash` refers to classfiying subgraphs based on operation sequence. We recommend using `op` to achieve better performance. +The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -107,8 +107,7 @@ def make_family_group( ``` -We use static analyzing, which classifies the subgraphs(tasks) based on -their attributes. +We use static analyzing to classify the subgraphs(tasks) based on their attributes. 2. Constructing family cost model ```python @@ -152,7 +151,7 @@ elif "family" in model_group: ``` -After identifying similar subgraphs, We return a list of subgraph families (subgraph group) and build a cost model for each subgraph families. +After identifying similar subgraphs, We return `family_group` (a list of subgraph families) and build a cost model for each subgraph family. 3.Foresee tuning @@ -177,12 +176,12 @@ The foresee tuning takes `task_idx_groups` (A list of subgraph families) and `sk # Drawbacks [drawbacks]: #drawbacks -FamilySeer currently relys on static analysis to identify subgraphs, which might result in misjudgements on some of the subgraphs. We are looking for alternative method to identify subgraphs dynamically while maintain the same time budget. +When searching on a larger search space (such as larger batch size), FamilySeer performs similarly or sometimes worse than Auto-scheduler. This is because a larger search space requires more time before the cost model can provide an accurate prediction. Deploying an inaccurate cost model on Foresee tuning may result in spending time budget on non-improvement code transformations. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -Auto-Scheduler generates a large enough search space so searching the space efficiently is important. With FamilySeer, Users can search for the same optimal code under less time budget. We hope that our search method can be an alternative option for those who expect to obtain better optimal code under limited time budget. +Auto-Scheduler generates a large enough search space, so searching the space with efficiency is important. With FamilySeer, Users can search for the same optimal code under less time budget. We hope that our search method can be an alternative option for those who expect to obtain better optimal code under a limited time budget. # Prior art [prior-art]: #prior-art @@ -192,11 +191,15 @@ Please refer to [this paper](https://arxiv.org/abs/2201.00194). # Unresolved questions [unresolved-questions]: #unresolved-questions -Our search method is up for [discussion](https://discuss.tvm.apache.org/t/rfc-familyseer-a-new-search-method-for-auto-scheduler/11877). After the conversation on the discussion forum, We will focus on identifying subgraph dynamically. +Our search method is up for [discussion](https://discuss.tvm.apache.org/t/rfc-familyseer-a-new-search-method-for-auto-scheduler/11877). # Future possibilities [future-possibilities]: #future-possibilities -1. Advanced Foresee tuning +1. Dynamic subgraph family anlaysis -Auto-tuning is the process of looking for the only best optimal code for Deep learning network. To build a cost model, many less performed code has to be evaluated. If we can analyze the subgraph similarity accurately, we can draw an relationship map between each subgraph and focus on building highly accurate cost model for the most related subgraph. Once an accurate cost model has been built, We can predict optimal code for other subgraphs instead of searching iteratively. \ No newline at end of file +FamilySeer currently relies on static analysis to identify subgraphs, which might result in misjudgments on some of the subgraphs. We are looking for an alternative method to identify subgraphs dynamically while maintaining the same time budget. + +2. Advanced Foresee tuning + +Auto-tuning is the procedure of looking for the only best optimal code for deep learning network. Many less performed codes have to be evaluated to build an accurate cost model. By accurately analyzing the subgraph similarity, we can draw a relationship map between each subgraph and focus on building highly accurate cost model for the most related subgraphs. Once an accurate cost model has been built, We can predict optimal code for other subgraphs instead of searching iteratively. \ No newline at end of file From 5fbbc3ec5699c1053a98de4615f0e59264162ce3 Mon Sep 17 00:00:00 2001 From: Lu Chueng <97081990+noobotdj@users.noreply.github.com> Date: Fri, 18 Feb 2022 19:41:12 +0800 Subject: [PATCH 3/5] Update 0052-FamilySeer.md --- rfcs/0052-FamilySeer.md | 73 ++++++++++++++++++++++------------------- 1 file changed, 40 insertions(+), 33 deletions(-) diff --git a/rfcs/0052-FamilySeer.md b/rfcs/0052-FamilySeer.md index cdda61dd..a9d8f677 100644 --- a/rfcs/0052-FamilySeer.md +++ b/rfcs/0052-FamilySeer.md @@ -6,7 +6,7 @@ # Summary [summary]: #summary -We propose FamilySeer, a new search method that optimizes search efficiency and search quality of the Auto-scheduler. We introduce several features: +We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features: - FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy. - We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results. @@ -28,7 +28,7 @@ The search process will at the end take a dozen of hours. This motivates us to f # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -We integrate our search method into Auto-scheduler. Therefore, users only need to change some of the parameters to enable our search method. +We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method. We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example: @@ -64,47 +64,55 @@ def make_family_group( tasks, search_policy, ): + """identify each subgraphs and group them into subgraph family. + """ if search_policy == "default": - search_policy = "sketch.xgb" + search_policy = "sketch.xgb" if isinstance(search_policy, str): - policy = search_policy.split(".") - if len(policy) == 2: - return {} - elif len(policy) == 3: - policy_type, model_type, model_group = policy - _, class_type = model_group.split("_") - else: - raise ValueError("Invalid search policy: " + search_policy) + policy = search_policy.split(".") + if len(policy) == 2: + return {} + elif len(policy) == 3: + _, _, model_group = policy + _, class_type = model_group.split("_") + else: + raise ValueError("Invalid search policy: " + search_policy) - family_group = {} - # Identifying Similar Subgraphs based on core operation if class_type == "op": - for idx, task in enumerate(tasks): - task_layers=task.desc.split('_') - if task_layers[1] not in family_group: - family_group[task_layers[1]] = [] - family_group[task_layers[1]].append(idx) - else: - family_group[task_layers[1]].append(idx) + for idx, task in enumerate(tasks): + task_layers = task.desc.split('_') + if task_layers[1] not in family_group: + family_group[task_layers[1]] = [] + family_group[task_layers[1]].append(idx) + else: + family_group[task_layers[1]].append(idx) - # Identifying Similar Subgraphs based on hash elif class_type == "hash": - for idx, task in enumerate(tasks): - task_hash=task.workload_key[2:34] - if task_hash not in family_group: - family_group[task_hash] = [] - family_group[task_hash].append(idx) - else: - family_group[task_hash].append(idx) - + for idx, task in enumerate(tasks): + first = task.workload_key.find("[\"") + 2 + end = task.workload_key.find("\",") + task_hash = task.workload_key[first:end] + if task_hash not in family_group: + family_group[task_hash] = [] + family_group[task_hash].append(idx) + else: + family_group[task_hash].append(idx) + + elif class_type == "ind": + for idx, task in enumerate(tasks): + if task.workload_key not in family_group: + family_group[task.workload_key] = [] + family_group[task.workload_key].append(idx) + else: + family_group[task.workload_key].append(idx) + if family_group is not None: - for key,value in family_group.items(): - print("family group :", key, "---", value) + for key, value in family_group.items(): + print("family group :", key, "---", value) return family_group - ``` We use static analyzing to classify the subgraphs(tasks) based on their attributes. @@ -112,7 +120,6 @@ We use static analyzing to classify the subgraphs(tasks) based on their attribut 2. Constructing family cost model ```python elif "family" in model_group: - # generate cost model for each family cost_model_pool = [] for _,group in family_group.items(): From 41ed311ff1a8e4565ed69416e58acc5b4d64d32f Mon Sep 17 00:00:00 2001 From: Lu Chueng <97081990+noobotdj@users.noreply.github.com> Date: Fri, 18 Feb 2022 19:42:09 +0800 Subject: [PATCH 4/5] rename --- rfcs/{0052-FamilySeer.md => 0057-FamilySeer.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{0052-FamilySeer.md => 0057-FamilySeer.md} (100%) diff --git a/rfcs/0052-FamilySeer.md b/rfcs/0057-FamilySeer.md similarity index 100% rename from rfcs/0052-FamilySeer.md rename to rfcs/0057-FamilySeer.md From 54a4ba7d44e79e91c7cc6d4fb1e57bef624b3f49 Mon Sep 17 00:00:00 2001 From: Lu Chueng <97081990+noobotdj@users.noreply.github.com> Date: Fri, 18 Feb 2022 19:51:24 +0800 Subject: [PATCH 5/5] Update 0057-FamilySeer.md --- rfcs/0057-FamilySeer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0057-FamilySeer.md b/rfcs/0057-FamilySeer.md index a9d8f677..546ca5ce 100644 --- a/rfcs/0057-FamilySeer.md +++ b/rfcs/0057-FamilySeer.md @@ -1,6 +1,6 @@ - Feature Name: (FamilySeer: A new search method for Auto-scheduler) - Start Date: (2021-01-07) -- RFC PR: [apache/tvm-rfcs#25](https://github.com/apache/tvm-rfcs/pull/52) +- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57) - GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875) # Summary