Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out_of_memory with Dns_resolver_mirage #298

Open
dinosaure opened this issue Mar 2, 2022 · 10 comments
Open

Out_of_memory with Dns_resolver_mirage #298

dinosaure opened this issue Mar 2, 2022 · 10 comments

Comments

@dinosaure
Copy link
Member

After ~2 weeks, I systematically get an Out_of_memory with this trace:

console 2022-03-02T13:17:44-00:00: Fatal error: exception Out of memory
console 2022-03-02T13:17:44-00:00: Raised by primitive operation at Stdlib__Hashtbl.resize in file "hashtbl.ml", line 148, characters 16-38
console 2022-03-02T13:17:44-00:00: Called from Dns_resolver_mirage.Make.resolver.maybe_tcp.(fun) in file "duniverse/ocaml-dns/mirage/resolver/dns_resolver_mirage.ml", line 85, characters 8-53
console 2022-03-02T13:17:44-00:00: Called from Lwt.Miscellaneous.apply in file "duniverse/lwt/src/core/lwt.ml", line 3074, characters 22-25
console 2022-03-02T13:17:44-00:00: Solo5: solo5_exit(2) called

It seems about an internal Hashtbl. The unikernel is launched with 32MB. I will try to introspect further to understand where is the error.

@reynir
Copy link
Member

reynir commented Mar 2, 2022

Do you know what version of ocaml-dns was used?

@dinosaure
Copy link
Member Author

dinosaure commented Mar 2, 2022

The version is 6.1.4.

@hannesm
Copy link
Member

hannesm commented Mar 3, 2022

Thanks for your report. I've no insight which Hashtbl this may be... here in DNS we're using mostly Map data structures. Are you able to reproduce this behaviour in a shorter period of time by doing a lot of requests? Is the resolver serving UDP or TCP or both protocols? Is there a ~size: argument provided to Dns_resolver.create (or is the default of 10000 used)?

@dinosaure
Copy link
Member Author

Currently, my unikernel is available here: https://github.com/mirage/dns-resolver I will relaunch it with -l debug and give you a more precise trace then 👍.

@hannesm
Copy link
Member

hannesm commented Mar 3, 2022

That would be good -- though I'd expect more useful data if you add some metrics (or periodic log messages) about memory consumption and cache size (since the LRU of the cache is not measured in actual used bytes, but items (where the weight function is the size of the resource record map (and each resource record may be different in size)).

@reynir
Copy link
Member

reynir commented Mar 3, 2022

The reason I asked is because I can't make sense of the line number Dns_resolver_mirage.Make.resolver.maybe_tcp.(fun) in file "duniverse/ocaml-dns/mirage/resolver/dns_resolver_mirage.ml", line 85, characters 8-53 with any of the recent releases so I was curious if you were using some other version :-)

@hannesm
Copy link
Member

hannesm commented Mar 3, 2022

I opened #299 that should help to figure out whether the cache is causing (as created) the OOM. Is there a way to have a over-approximation of the memory used by an OCaml value (AFAICT not, esp. not if using bigarrays) -- Obj.size isn't the thing to look for (Obj.size (Obj.magic (Cstruct.create 1024)) is 3).

@dinosaure
Copy link
Member Author

Obj.size (Obj.magic (Cstruct.create 1024)) is 3

It's probably due to that Obj.size does not count malloc-ed objects.

@dinosaure
Copy link
Member Author

This is my last logs just before the Out_of_memory with -l debug:

console 2022-03-12T13:21:44-00:00: EDNS EDNS rcode 0 version 0 dnssec_ok false payload_size 4096 extensions  TSIG no
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] scrubbing (bailiwick facebook.com) data header A60B (response) operation Query rcode 
console 2022-03-12T13:21:44-00:00: no error flags: authoritative question graph.facebook.com A?
console 2022-03-12T13:21:44-00:00: data answer graph.facebook.com.	3600	CNAME	api.facebook.com. authority 
console 2022-03-12T13:21:44-00:00: additional 
console 2022-03-12T13:21:44-00:00: EDNS EDNS rcode 0 version 0 dnssec_ok false payload_size 4096 extensions  TSIG no
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] Dns_cache.set CNAME graph.facebook.com entry CNAME
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] set: graph.facebook.com (CNAME) found rank authoritative answer data insert rank authoritative answer data: 0
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] now querying graph.facebook.com (A)
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] alias while looking up graph.facebook.com (A)
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] follow_cname: cache miss, need to query api.facebook.com
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] go api.facebook.com
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] resolve returned zone facebook.com query api.facebook.com (A), ip 185.89.219.12
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] have to query (zone facebook.com) api.facebook.com (A) using ip 185.89.219.12
console 2022-03-12T13:21:44-00:00: 2022-03-12 13:21:34 -00:00: DBG [application] maybe_query: query 185.89.219.12 api.facebook.com (A)
console 2022-03-12T13:21:44-00:00: Fatal error: exception Out of memory
console 2022-03-12T13:21:44-00:00: Raised by primitive operation at Stdlib__Hashtbl.resize in file "hashtbl.ml", line 148, characters 16-38
console 2022-03-12T13:21:44-00:00: Called from Dns_resolver_mirage.Make.resolver.maybe_tcp.(fun) in file "duniverse/ocaml-dns/mirage/resolver/dns_resolver_mirage.ml", line 85, characters 8-53
console 2022-03-12T13:21:44-00:00: Called from Lwt.Miscellaneous.apply in file "duniverse/lwt/src/core/lwt.ml", line 3074, characters 22-25
console 2022-03-12T13:21:44-00:00: Solo5: solo5_exit(2) called

@palainp
Copy link
Member

palainp commented Mar 25, 2022

Not sure if that can help as it seems to take a long time to trigger the memory fault, but I tried to print memory usage, my modifications can be seen https://github.com/palainp/dns-resolver/tree/print-mem-usage.

It displays every 10" 3 memory sizes: the memory size available above top of the heap (including the stack) this can be used by malloc to growth the ocaml memory, the real free memory (above the heap (including the stack) + free memory inside the heap), and total memory available (the --mem arg minus the kernel size).
Normally the first should be less than or equal to the second (which indicates that there is some unused memory in the heap, something that may have been taken and may be freed by the GC at a point, malloc will use this before increasing the heap size):

$ opam pin
mirage-solo5.0.7.0          git  git+https://github.com/palainp/mirage-solo5#footprint-alt
ocaml-freestanding.0.7.0    git  git+https://github.com/palainp/ocaml-freestanding#footprint-alt
$ solo5-spt --net:service=tap100 --mem=32 dist/resolver.spt -l debug
            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Bindings version v0.7.1
Solo5: Memory map: 32 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x291fff)
Solo5:     rodata @ (0x292000 - 0x2e8fff)
Solo5:       data @ (0x2e9000 - 0x436fff)
Solo5:       heap >= 0x437000 < stack < 0x2000000
2022-03-25 16:06:04 -00:00: INF [netif] Plugging into service with mac 66:53:d1:de:e0:ad mtu 1500
2022-03-25 16:06:04 -00:00: INF [ethernet] Connected Ethernet interface 66:53:d1:de:e0:ad
2022-03-25 16:06:04 -00:00: INF [ARP] Sending gratuitous ARP for 10.0.0.2 (66:53:d1:de:e0:ad)
2022-03-25 16:06:04 -00:00: INF [ipv6] IP6: Starting
2022-03-25 16:06:04 -00:00: DBG [ndpc6] ND6: Sending RS
...
2022-03-25 16:06:08 -00:00: INF [ipv4] IP.output: no route to 202.12.27.33 (no default gateway is configured)
2022-03-25 16:06:08 -00:00: WRN [ipv4] Write to 202.12.27.33 would require an external route, which was not provided
2022-03-25 16:06:14 -00:00: INF [memory] Memory usage: Above top_heap is 23MiB / free memory 23MiB / total memory 28MiB
2022-03-25 16:06:14 -00:00: INF [ipv4] IP.output: no route to 202.12.27.33 (no default gateway is configured)
2022-03-25 16:06:14 -00:00: WRN [ipv4] Write to 202.12.27.33 would require an external route, which was not provided
2022-03-25 16:06:24 -00:00: INF [memory] Memory usage: Above top_heap is 23MiB / free memory 23MiB / total memory 28MiB
2022-03-25 16:06:26 -00:00: INF [ipv4] IP.output: no route to 202.12.27.33 (no default gateway is configured)
2022-03-25 16:06:26 -00:00: WRN [ipv4] Write to 202.12.27.33 would require an external route, which was not provided
2022-03-25 16:06:34 -00:00: INF [memory] Memory usage: Above top_heap is 22MiB / free memory 23MiB / total memory 28MiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants