Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compilation times #3295

Open
charleskawczynski opened this issue Sep 12, 2024 · 7 comments
Open

Improve compilation times #3295

charleskawczynski opened this issue Sep 12, 2024 · 7 comments
Assignees

Comments

@charleskawczynski
Copy link
Member

charleskawczynski commented Sep 12, 2024

Compilation times are pretty long, and we should probably see what low hanging fruit there is.

I used SnoopCompile to see what inference looks like when we run the driver (which took over 20 minutes to reach the end of the first call to step!):

# julia --project=examples

using SnoopCompileCore
tinf = @snoop_inference begin
	empty!(ARGS);
	push!(ARGS, "--config_file", "config/model_configs/diagnostic_edmfx_trmm_stretched_box.yml");
	push!(ARGS, "--job_id", "diagnostic_edmfx_trmm_stretched_box");
	include("examples/hybrid/driver.jl")
end;

using SnoopCompile
# staleinstances(tinf) # quite a few invalidation
# using AbstractTrees
# print_tree(tinf) # prints way too much info

using FlameGraphs
fg = flamegraph(tinf)
using ProfileView
ProfileView.view(fg)
Screenshot 2024-09-12 at 1 11 31 PM

The two large blocks are build_cache (left side), which mostly spends time in set_precomputed_quantities!, and step_u! (right side).

@charleskawczynski
Copy link
Member Author

Looks like there's still room for improvement:

Screenshot 2024-09-12 at 7 49 09 PM

@Sbozzolo
Copy link
Member

Wow, the big orange rectangle disappearing with a three line change!

@charleskawczynski
Copy link
Member Author

Wow, the big orange rectangle disappearing with a three line change!

Yeah, but it's a bit suspicious to me. I don't know why the other large platforms appeared. The results could be non-deterministic 🙁.

@charleskawczynski
Copy link
Member Author

From slack:

So far I've boiled the catastrophic case down to:

### Buildkite job `rcemipii_box_diagnostic_edmfx`
# julia --project=examples
# using Revise; include("../ca_compile_times.jl")
# using Revise; @time include("../ca_compile_times.jl")
using Revise;
import Thermodynamics as TD
TD.print_warning() = false
import ClimaAtmos as CA;
import ClimaAtmos.InitialConditions as ICs;
import ClimaCore.Spaces

config_dict = Dict();
config_dict["moist"] = "equil";
config_dict["rayleigh_sponge"] = true;
config_dict["edmfx_entr_model"] = "Generalized";
config_dict["implicit_diffusion"] = true;
config_dict["approximate_linear_solve_iters"] = 2;
config_dict["edmfx_sgs_mass_flux"] = true;
config_dict["edmfx_upwinding"] = "first_order";
config_dict["prognostic_tke"] = true;
config_dict["surface_setup"] = "DefaultMoninObukhov";
config_dict["override_τ_precip"] = false;
config_dict["dt"] = "30secs";
config_dict["netcdf_output_at_levels"] = true;
config_dict["t_end"] = "3600secs";
config_dict["insolation"] = "rcemipii";
config_dict["edmfx_sgs_diffusive_flux"] = true;
config_dict["turbconv"] = "diagnostic_edmfx";
config_dict["dt_save_state_to_disk"] = "12hours";
config_dict["ode_algo"] = "ARS343";
config_dict["config"] = "box";
config_dict["netcdf_interpolation_num_points"] = [8, 8, 60];
config_dict["edmfx_nh_pressure"] = true;
config_dict["precip_model"] = "0M";
config_dict["toml"] = ["toml/rcemipii_diagnostic_edmfx_0M.toml"];
config_dict["diagnostics"] = Dict{Any, Any}[Dict("short_name" => ["ts", "ta", "thetaa", "ha", "pfull", "rhoa", "ua", "va", "wa", "hur", "hus", "cl", "clw", "cli", "hussfc", "evspsbl", "pr"], "period" => "5mins"), Dict("short_name" => ["arup", "waup", "taup", "thetaaup", "haup", "husup", "hurup", "clwup", "cliup", "waen", "tke", "lmix"], "period" => "5mins")];
config_dict["surface_temperature"] = "RCEMIPII";
config_dict["edmfx_detr_model"] = "Generalized";
config_dict["rad"] = "allskywithclear";

config = CA.AtmosConfig(config_dict);

# include("examples/hybrid/driver.jl")
import ClimaParams as CP
import YAML
params = CA.create_parameter_set(config)
atmos = CA.get_atmos(config, params)

sim_info = CA.get_sim_info(config)
job_id = sim_info.job_id
output_dir = sim_info.output_dir
@info "Simulation info" job_id output_dir

CP.log_parameter_information(
    config.toml_dict,
    joinpath(output_dir, "$(job_id)_parameters.toml"),
    strict = true,
)
YAML.write_file(joinpath(output_dir, "$job_id.yml"), config.parsed_args)

spaces = CA.get_spaces(config.parsed_args, params, config.comms_ctx)

initial_condition = CA.get_initial_condition(config.parsed_args)
surface_setup = CA.get_surface_setup(config.parsed_args)

Y = ICs.atmos_state(
    initial_condition(params),
    atmos,
    spaces.center_space,
    spaces.face_space,
)
t_start = Spaces.undertype(axes(Y.c))(0)

tracers = CA.get_tracers(config.parsed_args)
@info "About to call build_cache"

@time CA.build_cache(
    Y,
    atmos,
    params,
    surface_setup,
    sim_info,
    tracers.aerosol_names,
)

and it seems like compilation in the first call to set_precomputed_quantities! is very expensive.

I think

@charleskawczynski
Copy link
Member Author

xref: #3242 (comment)

@charleskawczynski
Copy link
Member Author

#3357 had a big impact here. I suppose we could close, but I think there's still room for improvement.

@Sbozzolo
Copy link
Member

Sbozzolo commented Oct 4, 2024

Yeah, my restart tests take a long time and it's all compilation time :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants