How to understand long docs generation #1565

zverok · 2016-10-04T10:19:36Z

zverok
Oct 4, 2016

Hey. It's a question, probably, not a bug.

For some (really huge, I've already told about that) project, on current master yard command takes like ~7 min. It's pretty reasonable for us. On some branch, with not a huge changes (small refactorings to fix Rubocop offences, but in a lot of files), it suddenly became ~27 min. When I switch between branches, the effect is reproducible (7 min on master, 27 min on "fixes"). How can I debug the reason of this slowlyness? Nothing "easily guessable" (like complicated metaprogramming or something) is changed.

Thanks.

Answered by lsegal

Oct 6, 2016

As far as I can see, YARD never tries to memoize anything,

That's correct. YARD doesn't do caching in the resolver lookup. Caching is hard. Caching requires sound (formally provable) invalidation stories. Unfortunately there's no sound invalidation story here.

but to the time of docs generation we know everything we have, and do not need to constantly re-lookup the things.

This part isn't true. There is no distinction between lookups occurring during parsing and during HTML generation. Lookups can (and do) occur during parsing. Every P() call (or Proxy object creation) involves a lookup if the proxy is resolved for a specific reason (e.g. registering a method on a proxy namespace-- th…

View full answer

zverok · 2016-10-04T13:27:22Z

zverok
Oct 4, 2016
Author

Managed to profile it down to the fact that YARD::RegistryResolver#lookup_by_path is called ~10 times more on problematic branch (~40k times on master vs ~400k times on fixes). Than I made a log of those calls, and found out that problematic branch does 200k additional lookups for "ActiveSupport" string, and 100k for "Concern". What could that mean?..

0 replies

zverok · 2016-10-04T14:57:46Z

zverok
Oct 4, 2016
Author

OK, I think I've tracked the problematic line, but still do not understand the cause.
We have some monkey-patches, and previously the was used this way (artifact of old times, when include was private?):

Bignum.send :include, OurNamespace::BugnumPatch

Then, Rubocop asked us to do it "normal" way:

Bignum.include(OurNamespace::BugnumPatch)

So, I assume, YARD now "see" the inclusion and tries to resolve it

After that, if I'll trust my gathered stats, the next things happened:

Each of our patches somehow go 1245 (???) times through YARD::CodeObjects::ModuleObject#inheritance_tree,
But one of them somehow goes 1248 times (3 more), and it is included into ActiveSupport::Concern;
After that, YARD::RegistryResolver#lookup_by_path is called in sum 300k times for ActiveSupport and Concern.

What I should look for, now, to resolve the issue?..

0 replies

lsegal · 2016-10-06T17:06:45Z

lsegal
Oct 6, 2016
Maintainer

I assume you're already using yard --debug?

Without seeing the code or a reproduction case it's hard to know if what's happening is unexpected behavior or not. Can you provide code or reproduction?

0 replies

zverok · 2016-10-06T17:10:42Z

zverok
Oct 6, 2016
Author

OK, it took me only two days to invent the solution.

What do you think about replacing this: https://github.com/lsegal/yard/blob/master/lib/yard/registry.rb#L303

With something like this (it is utterly naive, just to show an idea):

      def resolve(namespace, name, inheritance = false, proxy_fallback = false, type = nil)
        @@resolved ||= Hash.new { |h, (ns, n, i, pf, t)|
          h[[ns, n, i, pf, t]] =
            thread_local_resolver.lookup_by_path n,
              :namespace => ns, :inheritance => i,
              :proxy_fallback => pf, :type => t
        }
        @@resolved[[namespace, name, inheritance, proxy_fallback, type]]
      end

It is just a simple memoization trick to make less resolves for the same object over and over again.

Results for me:

really large codebase: 30 min => ~4 min
not-so-large gem codebase (infoboxer): 28 sec => 8 sec

Am I missing something about it? Is it a good thing to do?

0 replies

zverok · 2016-10-06T17:19:46Z

zverok
Oct 6, 2016
Author

@lsegal Haven't seen your comment while writing mine, sorry.
The thing is, I've played a bit with YARD codebase and I'm currently sure that YARD behaves "normally" from point of view of how it is done currently, we just were unfortunate to uncover one more path to screw the things.

Simple example: this file:

module M
end

class A
end

class B < A
  include M
end

Array.include(M)

Being the only file in "project", leads to this list of RegistryResolver#lookup_by_path calls:

lookup_by_path: "A".
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "B"
lookup_by_path: "Object"
lookup_by_path: "A"
lookup_by_path: "M"
lookup_by_path: "M"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "M"
lookup_by_path: "M"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: :Object
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: :Object
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"
lookup_by_path: "Object"

Considering that in large registries lookup_by_path is NOT cheap at all (from performance point of view), ....

0 replies

lsegal · 2016-10-06T18:17:37Z

lsegal
Oct 6, 2016
Maintainer

That's not necessarily an accurate description of lookup's performance behavior. Your output groups all calls together and omits data about where they are called. It also is completely lacking in timing information (to actually show how expensive it is).

Some of those lookup calls might be grouped together, but I imagine most of those are distinct calls. If so, it's not that lookup is expensive as much that it's a heavily used call. Can you add output to show where these are getting called? My guess is we will always need that many lookup calls, though we could optimize performance through caching.

0 replies

zverok · 2016-10-06T18:47:04Z

zverok
Oct 6, 2016
Author

Your output groups all calls together and omits data about where they are called.

Yes, I've investigated it much deeper, here just wanted to show the point (of constant re-lookups for the same things).

It also is completely lacking in timing information (to actually show how expensive it is).

On this toy example it is hard to show.

But that's what RubyProf says of our actual codebase:

(It is NOT an entire codebase, just a lib/ folder, which is really moderately-sized comparing to other important folders.)

If you are not familiar with ruby-prof, here is the meaning of rows:

bold one is current method
above it is its callers
under it is its callees

Here is the meaning of columns:

percent of time spent in this method and its callees (e.g. almost all the time of code parsing and docs generation, you see why I call it pricey?)
percent of time spent in method itself (excluding callees)
seconds spent in method and it callees
seconds spent in method itself
wait (IDK what it means)
seconds spent in callees
number of calls (380k of lookup_by_path, do you love it? as I've said, naive "log what's we are looking here" says 200k of them are looking for ActiveSupport and 100k for Concern)
name of the method

So, what am I missing here? As far as I can see, YARD never tries to memoize anything, but to the time of docs generation we know everything we have, and do not need to constantly re-lookup the things.

0 replies

lsegal · 2016-10-06T19:36:21Z

lsegal
Oct 6, 2016
Maintainer

As far as I can see, YARD never tries to memoize anything,

That's correct. YARD doesn't do caching in the resolver lookup. Caching is hard. Caching requires sound (formally provable) invalidation stories. Unfortunately there's no sound invalidation story here.

but to the time of docs generation we know everything we have, and do not need to constantly re-lookup the things.

This part isn't true. There is no distinction between lookups occurring during parsing and during HTML generation. Lookups can (and do) occur during parsing. Every P() call (or Proxy object creation) involves a lookup if the proxy is resolved for a specific reason (e.g. registering a method on a proxy namespace-- the namespace must be resolved for this to work). Modifications to the tree happen often during parsing, so there's plenty of chance for incorrect caching here.

More importantly, though, modifcation can still happen after parsing, so although YARD could make an assumption that the tree is locked after parsing, that's not currently a requirement. That would be a breaking change. Currently, plugins can modify the registry tree after YARD parses, and possibly even during HTML generation if they wanted to. It's probably bad form to do this, but it's currently allowed by the API.

The hard part about this in either case (lookups during, or modifications after) is dealing with invalidation. The invalidation story for newly registered or removed objects is fairly straightforward (every create / delete invalidates cache), but this is not the only way to modify the object tree, especially when it comes to inherited lookups.

Here's an invalidation story we could catch:

include YARD; YARD::CodeObjects
b_class = ClassObject.new(:root, "B")
a_class = ClassObject.new(:root, "A")
obj = Registry.resolve(a_class, "B") # this is b_class

ab_class = ClassObject.new(a_class, "B") # we can invalidate here
obj = Registry.resolve(a_class, "B") # this is ab_class

But here's a very simple example of an un-invalidateable case:

include YARD; YARD::CodeObjects
foo_mod = ModuleObject.new(:root, "Foo")
MethodObject.new(foo_mod, "some_method")

a_class = ClassObject.new(:root, "A")
a_class.mixins(:instance) << foo_mod

# we get Foo#some_method here
obj = Registry.resolve(a_class, "#some_method")

bar_mod = ModuleObject.new(:root, "Foo")
MethodObject.new(foo_mod, "some_method")

# Mutable array manipulation. No (good) way to invalidate this.
a_class.mixins(:instance) << bar_mod

# we would still get Foo#some_method here-- INCORRECT
obj = Registry.resolve(a_class, "#some_method")

Calling a_class.mixins(:instance) << mod isn't the only way to modify the inheritance tree. Manually appending children to a namespace is another way to do it.

These aren't all impossible to solve, but it describes the complexity involved with cache invalidation. Caching is very likely to cause weird edge case caching issues that will be hard to debug. Furthermore, "invalidating the cache" itself is a hard problem. Even if we know to invalidate the cache-- how to effectively invalidate the cache is an equally hard problem. The naive solution would be to just smash the entire cache every time we invalidate. Trying to be "smart" about invalidation is likely to yield just as many edge case bugs. For example, if I add a method to a module, I have to invalidate every lookup that used the module in its lookup chain, so I have to look at all cache objects that contain module or a module/class that includes module at any level of indirection. That's an extremely complex resolution.

This level of code complexity is not worth the ~20 second speed bump on the "average" codebase size (I would argue that the average gem is significantly smaller than infoboxer). I agree that 30 min to 4 min is a huge improvement, but "extremely large codebase" is an edge case. It's also an edge case of an edge case, since it's only come up due to a specific formatting you happen to be using (it may even only be specific to projects with heavy module inclusion across lots of files).

If Object is the bulk of your lookups, we may want to optimize that special case (since it's used a lot in the templates) which should handle most of it. If your lookups are heavy on all sorts of module inclusion, I'd be more weary about adding cache logic for that. It's also unclear if the lookups on "Object" are equally expensive to arbitrary paths, of if your time is spent just dealing with the sheer size of modules you are including across namespaces. Without knowing what your data shows, I can't really provide more insight.

This may end up being a YARD plugin idea until there's a better longterm solution for this.

0 replies

zverok · 2016-10-07T11:37:04Z

zverok
Oct 7, 2016
Author

OK, I see your points.
What I currently thinking about (as a task for fork/plugin) is full registry resolution after the parsing could be appropriate for most of situations. But of course, if the amount of code we have is a rare edge case, nobody should care about speeding it up for several seconds or less.

Thanks for clarifications.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to understand long docs generation #1565

{{title}}

Replies: 9 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to understand long docs generation #1565

zverok Oct 4, 2016

Replies: 9 comments

zverok Oct 4, 2016 Author

zverok Oct 4, 2016 Author

lsegal Oct 6, 2016 Maintainer

zverok Oct 6, 2016 Author

zverok Oct 6, 2016 Author

lsegal Oct 6, 2016 Maintainer

zverok Oct 6, 2016 Author

lsegal Oct 6, 2016 Maintainer

zverok Oct 7, 2016 Author

zverok
Oct 4, 2016

zverok
Oct 4, 2016
Author

zverok
Oct 4, 2016
Author

lsegal
Oct 6, 2016
Maintainer

zverok
Oct 6, 2016
Author

zverok
Oct 6, 2016
Author

lsegal
Oct 6, 2016
Maintainer

zverok
Oct 6, 2016
Author

lsegal
Oct 6, 2016
Maintainer

zverok
Oct 7, 2016
Author