Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query doesn't correctly handle vectors inside of vectors #672

Open
dgr opened this issue Sep 16, 2024 · 9 comments · May be fixed by #673
Open

query doesn't correctly handle vectors inside of vectors #672

dgr opened this issue Sep 16, 2024 · 9 comments · May be fixed by #673

Comments

@dgr
Copy link
Contributor

dgr commented Sep 16, 2024

Version
1.1.42

Platform
All

Symptom
The query function accepts a number of query types (e.g., keyword, string, map) that indirectly specify a WebDriver query. query also allows for a vector argument which can contain multiple of these other query types, allowing the programmer to specify a rough path through the DOM to the desired element.

Etaoin's documentation (both doc strings and User Guide) does not speak to whether a vector can contain other vectors and if allowed what that behavior should be.

Actual behavior
Today, the behavior is a bit random. Here is some data from the REPL:

etaoin.api> (query driver {:class :foo})
"node-6D0603A0-0A6B-4BE4-AE6F-3BCA02ADEBF8"    ;; good
etaoin.api> (query driver [{:class :foo}])
"node-6D0603A0-0A6B-4BE4-AE6F-3BCA02ADEBF8"    ;; wrapping a single item in a vector is just the same as the item
etaoin.api> (query driver [[{:class :foo}]])
"node-6D0603A0-0A6B-4BE4-AE6F-3BCA02ADEBF8"    ;; putting a single vector in a vector also works !?!?
etaoin.api> (query driver [{:class :foo} {:class :target}])
"node-70F1272F-6EC3-4D32-AF00-84E60EB67927"    ;; good. just searching for a vector of things.
etaoin.api> (query driver [[{:class :foo}] [{:class :target}]])
Execution error at etaoin.impl.util/error (util.clj:38).  ;; but doing two sub-vectors blows up with an exception
Wrong query: [{:class :target}]                           ;; but oddly it blows up on the SECOND sub-vector

Expected behavior
The behavior should be consistent. I believe there are two possible choices:

  1. Vectors inside of vectors are disallowed. This should throw an exception.
  2. Vectors inside of vectors are allowed and the behavior is as if the top-level vector was flatten-ed. That is, the sub-vectors just specify sub-paths through the DOM.

Diagnosis
The code in query just needs to be updated.

Action
I can supply a PR for either choice.

@lread
Copy link
Collaborator

lread commented Sep 16, 2024

Choice 3. Garbage in means undefined behaviour. This would require no change.

Thoughts?

@dgr
Copy link
Contributor Author

dgr commented Sep 16, 2024

Yes, GIGO (garbage in, garbage out) is also an option. I like it less than the other two because I think it puts the user in the bad position of having to debug a not-very-well-specified system.

One of the consistently listed issues for the Clojure language is that error messages are cryptic. This occurs because typing is dynamic and unconstrained inputs are passed to functions deep in a call stack where they blow up. It would be better to either say "this is supported," or "this is not supported" and get an accurate error message.

BTW, I see either option 1 or 2 as being non-breaking. Today, it doesn't work, so making it work (option 2) wouldn't break anything. Today, in some cases, it will throw an exception, so with option 1, it will still throw an exception, but a better one and one thrown for the right reasons.

@lread
Copy link
Collaborator

lread commented Sep 16, 2024

Option 4. Introduce spec or malli and let folks turn on instrumentation if they want it.

If you feel strongly about doing something here, and don't like option 4, option 1 seems ok to me. Option 2 is too much magic for me.

@dgr
Copy link
Contributor Author

dgr commented Sep 16, 2024

Option 4 is a much heavier lift. IMO, more libraries should start using Malli/spec to constrain things, but it's harder for an existing library like Etaoin to retrofit everything. In this case, I'm suggesting tightening things up because query is such a key API within Etaoin. It's called from many other Etaoin functions and many are the doc strings that read, "See [[query]] for details on q." So, if there's one API that you want to make sure is pretty tight, well specified, and with good error messages when the user asks for something unsupported, I think it's query. When I first read the doc string for query, I wondered whether vectors were allowed in vectors or not. When I tried it, I got the behavior above, which is a bit of a mess.

@dgr
Copy link
Contributor Author

dgr commented Sep 16, 2024

Option 5: document that it's not supported and that GIGO may result if attempted. That's a quick, 1-line or thereabouts, doc change with no affect on code at all. But I think that's weak sauce because it's relatively easy to do either Option 1 or Option 2.

@lread
Copy link
Collaborator

lread commented Sep 16, 2024

Yep option 4 is big (and perhaps a maintenance burden), but it is an option.

I personally would not bother, but if you feel strongly about it go for option 1 (or 5).
Not a fan of option 2.

@dgr
Copy link
Contributor Author

dgr commented Sep 16, 2024

OK, I hear you on option 1 or 5. But just FYI, option 2, flattening, is actually easier than option 1, in terms of code. You just need to call flatten in the right place and then pass the result to the existing vector processing. For option 1, you need to look at every item in the vector and throw if it's another vector. That's still easy, but a slightly bigger change.

@lread
Copy link
Collaborator

lread commented Sep 16, 2024

Yeah but option 2 is automatically "helping" the user recover from a mistake they made.
I personally don't like to be "helped" this way.
This is why I described it as too much magic.

@dgr
Copy link
Contributor Author

dgr commented Sep 16, 2024

I don't necessarily agree that it's a user mistake. Again, it might be the easiest way to generate the DOM path. Think about how Hiccup works. Ultimately, it's serialized and flattened, but you manipulate it as vectors of vectors, which allows for generator functions to create sub-paths and then those just get wrapped in a collection. But it's also easy for the user to call flatten on it before the pass it to query. So, I think the most important thing is to decide and then document the expected behavior.

So, I'll submit a PR for option 1.

@dgr dgr linked a pull request Sep 23, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants