Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix access violation crash #2137

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

likeuclinux
Copy link

fix access violation crash

TYPE: bug fix

KEYWORDS: crash, access violation error

SOURCE: Charlie Li, software developer from lakes environmental, Canada

DESCRIPTION OF CHANGES:
Problem:
wrf crashed for access violation frequently, check details in #1991
when namelist.input has
sf_urban_physics = 0

the /inc/allocs_5.F
IF(okay_to_alloc.AND.in_use_for_config(id,'dlg_bep').AND.(.NOT.grid%is_intermediate))THEN
will be false, then it would go to following branch:

ELSE
ALLOCATE(grid%dlg_bep(1,1,1),STAT=ierr)
if (ierr.ne.0) then
CALL wrf_error_fatal ( &
'frame/module_domain.f: Failed to allocate grid%dlg_bep(1,1,1). ')
endif
ENDIF

it only allocate (1,1,1) for memory, then it trigger crash in phys/module_bl_ysu.F

if(present(a_u_bep) .and. present(a_v_bep) .and. present(a_t_bep) .and. &
present(a_q_bep) .and. present(a_e_bep) .and. present(b_u_bep) .and. &
present(b_v_bep) .and. present(b_t_bep) .and. present(b_q_bep) .and. &
present(b_e_bep) .and. present(dlg_bep) .and. present(dl_u_bep) .and. &
present(sf_bep) .and. present(vl_bep) .and. present(frc_urb2d)) then

    do k = kts, kte
       do i = its,ite
          a_u_hv(i,k)  = a_u_bep(i,k,j)
          a_v_hv(i,k)  = a_v_bep(i,k,j)
          a_t_hv(i,k)  = a_t_bep(i,k,j)
          a_q_hv(i,k)  = a_q_bep(i,k,j)
          a_e_hv(i,k)  = a_e_bep(i,k,j)
          b_u_hv(i,k)  = b_u_bep(i,k,j)
          b_v_hv(i,k)  = b_v_bep(i,k,j)
          b_t_hv(i,k)  = b_t_bep(i,k,j)
          b_q_hv(i,k)  = b_q_bep(i,k,j)
          b_e_hv(i,k)  = b_e_bep(i,k,j)
          dlg_hv(i,k)  = dlg_bep(i,k,j)
          dl_u_hv(i,k) = dl_u_bep(i,k,j)
          vlk_hv(i,k) = vl_bep(i,k,j)
          sfk_hv(i,k)  = sf_bep(i,k,j)
       enddo
    enddo
    do i = its, ite
       frcurb_hv(i) = frc_urb2d(i,j)
    enddo

endif

the present() check in code won't help, since upper level
dyn_em\module_first_rk_step_part1.F

will always call pbl_driver with DLG_BEP=grid%dlg_bep, then it pass down to module_pbl_driver to module_bl_ysu

Solution:

the fix is actually using v4.5 logic like following:

if(present(a_u_bep) .and. present(a_v_bep) .and. present(a_t_bep) .and. &
present(a_q_bep) .and. present(a_e_bep) .and. present(b_u_bep) .and. &
present(b_v_bep) .and. present(b_t_bep) .and. present(b_q_bep) .and. &
present(b_e_bep) .and. present(dlg_bep) .and. present(dl_u_bep) .and. &
present(sf_bep) .and. present(vl_bep) .and. present(frc_urb2d)) then

 ! following v4.5 logic to fix access violation
 if(flag_bep) then

    do k = kts, kte
       do i = its,ite
          a_u_hv(i,k)  = a_u_bep(i,k,j)
          a_v_hv(i,k)  = a_v_bep(i,k,j)
          a_t_hv(i,k)  = a_t_bep(i,k,j)
          a_q_hv(i,k)  = a_q_bep(i,k,j)
          a_e_hv(i,k)  = a_e_bep(i,k,j)
          b_u_hv(i,k)  = b_u_bep(i,k,j)
          b_v_hv(i,k)  = b_v_bep(i,k,j)
          b_t_hv(i,k)  = b_t_bep(i,k,j)
          b_q_hv(i,k)  = b_q_bep(i,k,j)
          b_e_hv(i,k)  = b_e_bep(i,k,j)
          dlg_hv(i,k)  = dlg_bep(i,k,j)
          dl_u_hv(i,k) = dl_u_bep(i,k,j)
          vlk_hv(i,k) = vl_bep(i,k,j)
          sfk_hv(i,k)  = sf_bep(i,k,j)
       enddo
    enddo
    do i = its, ite
       frcurb_hv(i) = frc_urb2d(i,j)
    enddo

 endif

endif

the flag_bep came from:

SELECT CASE(sf_urban_physics)
CASE (BEPSCHEME)
flag_bep=.true.
CASE (BEP_BEMSCHEME)
flag_bep=.true.
CASE DEFAULT
flag_bep=.false.
END SELECT

when namelist.inpu has sf_urban_physics = 0, flag_bep will be false,
thus the code to access array with (1,1,1) allocation won't execute

the change in wrf_timeseries.F and start_em.F are memory leak detected when use PGI option:
-g -O0 -traceback -Mchkptr -Mbounds -Ktrap=fp -Msave -tp=px
It will failed for: "0: ALLOCATE: array already allocated"

ISSUE: For use when this PR closes an issue.
Fixes #123

LIST OF MODIFIED FILES: list of changed files (use git diff --name-status master to get formatted list)
phys/module_bl_ysu.F
share/wrf_timeseries.F
dyn_em/start_em.F

TESTS CONDUCTED:

  1. Do mods fix problem? How can that be demonstrated, and was that test conducted?
  2. Are the Jenkins tests all passing?

RELEASE NOTE: Include a stand-alone message suitable for the inclusion in the minor and annual releases. A publication citation is appropriate.

@likeuclinux likeuclinux requested review from a team as code owners December 3, 2024 00:22
@weiwangncar
Copy link
Collaborator

The regression test results:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

@islas islas changed the base branch from master to develop December 3, 2024 18:01
@islas
Copy link
Collaborator

islas commented Dec 3, 2024

@likeuclinux Could you separate out the edits in wrf_timeseries.F and start_em.F into their own PR? As it sounds like they are a separate issue from the not fully allocated dlg_bep array, it would help the review process if each PR was limited in scope to the exact issue being resolved.

@likeuclinux
Copy link
Author

Can I just create another PR for both wrf_timeseries.F and start_em.F changes?

@likeuclinux
Copy link
Author

great, I will do now

@islas
Copy link
Collaborator

islas commented Dec 5, 2024

Yes, sorry. That is what I meant. Those two files' edits appear to be related to improper deallocations which could be its own single PR

@likeuclinux
Copy link
Author

likeuclinux commented Dec 5, 2024

I just create this PR for memory leak issue:
only memory leak related#2139
#2139

@likeuclinux
Copy link
Author

I don't need re-create current PR 2137 for single file change relate to phys/module_bl_ysu.F ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants