Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot connect to externally running webdriver #519

Closed
DonCziken opened this issue Feb 1, 2023 · 9 comments
Closed

Cannot connect to externally running webdriver #519

DonCziken opened this issue Feb 1, 2023 · 9 comments
Labels

Comments

@DonCziken
Copy link

DonCziken commented Feb 1, 2023

Version
[etaoin "1.0.39"]

Platform
Operating System: Ubuntu 18.04.6 LTS
Clojure version: org.clojure/clojure "1.9.0"
JDK vendor and version: OpenJDK Runtime Environment (build 1.8.0_352-8u352-ga-1~18.04-b08)

Browser vendor: chrome
WebDriver version: https://github.com/SeleniumHQ/docker-selenium / selenium/standalone-chrome:4.8.0-20230131

Symptom
Not able to connect to running webdriver via:
:start (e/chrome {:webdriver-url "http://localhost:4444" :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}})

exiting with following stacktrace:

Exception in thread "main" java.lang.RuntimeException: could not start [#'ct.a.driver/driver] due to
	at mount.core$up$fn__13907.invoke(core.cljc:92)
	at mount.core$up.invokeStatic(core.cljc:92)
	at mount.core$up.invoke(core.cljc:90)
	at mount.core$bring.invokeStatic(core.cljc:242)
	at mount.core$bring.invoke(core.cljc:234)
	at mount.core$start.invokeStatic(core.cljc:284)
	at mount.core$start.doInvoke(core.cljc:276)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at ct.cli$run_program_BANG_.invokeStatic(cli.clj:37)
	at ct.cli$run_program_BANG_.invoke(cli.clj:29)
	at ct.cli$_main.invokeStatic(cli.clj:176)
	at ct.cli$_main.doInvoke(cli.clj:167)
	at clojure.lang.RestFn.invoke(RestFn.java:425)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.RestFn.applyTo(RestFn.java:132)
	at ct.cli.main(Unknown Source)
Caused by: clojure.lang.ExceptionInfo: throw+: {:type :etaoin/http-ex, :driver {:type :chrome, :host "127.0.0.1", :port 33319, :url "http://127.0.0.1:33319", :locator "xpath", :webdriver-url "http://localhost:4444", :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}, :webdriver-url "http://localhost:4444", :host "127.0.0.1", :port 33319, :method :post, :path "session", :payload {:desiredCapabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}} {:type :etaoin/http-ex, :driver {:type :chrome, :host "127.0.0.1", :port 33319, :url "http://127.0.0.1:33319", :locator "xpath", :webdriver-url "http://localhost:4444", :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}, :webdriver-url "http://localhost:4444", :host "127.0.0.1", :port 33319, :method :post, :path "session", :payload {:desiredCapabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}}
	at slingshot.support$stack_trace.invoke(support.clj:201)
	at etaoin.impl.client$call.invokeStatic(client.cljc:131)
	at etaoin.impl.client$call.invoke(client.cljc:95)
	at etaoin.api$execute.invokeStatic(api.clj:235)
	at etaoin.api$execute.invoke(api.clj:209)
	at etaoin.api$create_session.invokeStatic(api.clj:262)
	at etaoin.api$create_session.doInvoke(api.clj:252)
	at clojure.lang.RestFn.invoke(RestFn.java:423)
	at etaoin.api$_connect_driver.invokeStatic(api.clj:3607)
	at etaoin.api$_connect_driver.doInvoke(api.clj:3520)
	at clojure.lang.RestFn.invoke(RestFn.java:423)
	at etaoin.api$boot_driver.invokeStatic(api.clj:3665)
	at etaoin.api$boot_driver.invoke(api.clj:3640)
	at clojure.core$partial$fn__5561.invoke(core.clj:2616)
	at ct.a.driver$fn__14100.invokeStatic(driver.clj:6)
	at ct.a.driver$fn__14100.invoke(driver.clj:5)
	at mount.core$record_BANG_.invokeStatic(core.cljc:86)
	at mount.core$record_BANG_.invoke(core.cljc:85)
	at mount.core$up$fn__13907.invoke(core.cljc:93)
	... 15 more
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at clj_http.core$request.invokeStatic(core.clj:496)
	at clj_http.core$request.invoke(core.clj:427)
	at clj_http.core$request.invokeStatic(core.clj:428)
	at clj_http.core$request.invoke(core.clj:427)
	at clojure.lang.Var.invoke(Var.java:381)
	at clj_http.client$wrap_request_timing$fn__8963.invoke(client.clj:1078)
	at clj_http.client$wrap_async_pooling$fn__8971.invoke(client.clj:1111)
	at clj_http.headers$wrap_header_map$fn__6609.invoke(headers.clj:147)
	at clj_http.client$wrap_query_params$fn__8853.invoke(client.clj:812)
	at clj_http.client$wrap_basic_auth$fn__8859.invoke(client.clj:835)
	at clj_http.client$wrap_oauth$fn__8864.invoke(client.clj:852)
	at clj_http.client$wrap_user_info$fn__8873.invoke(client.clj:872)
	at clj_http.client$wrap_url$fn__8945.invoke(client.clj:1030)
	at clj_http.client$wrap_decompression$fn__8662.invoke(client.clj:416)
	at clj_http.client$wrap_input_coercion$fn__8777.invoke(client.clj:632)
	at clj_http.client$wrap_additional_header_parsing$fn__8802.invoke(client.clj:687)
	at clj_http.client$wrap_output_coercion$fn__8764.invoke(client.clj:576)
	at clj_http.client$wrap_exceptions$fn__8615.invoke(client.clj:250)
	at clj_http.client$wrap_accept$fn__8817.invoke(client.clj:730)
	at clj_http.client$wrap_accept_encoding$fn__8824.invoke(client.clj:752)
	at clj_http.client$wrap_content_type$fn__8811.invoke(client.clj:713)
	at clj_http.client$wrap_form_params$fn__8910.invoke(client.clj:954)
	at clj_http.client$wrap_nested_params$fn__8931.invoke(client.clj:988)
	at clj_http.client$wrap_flatten_nested_params$fn__8940.invoke(client.clj:1012)
	at clj_http.client$wrap_method$fn__8878.invoke(client.clj:888)
	at clj_http.cookies$wrap_cookies$fn__5828.invoke(cookies.clj:131)
	at clj_http.links$wrap_links$fn__6922.invoke(links.clj:63)
	at clj_http.client$wrap_unknown_host$fn__8948.invoke(client.clj:1041)
	at etaoin.impl.client$http_request.invokeStatic(client.cljc:89)
	at etaoin.impl.client$http_request.invoke(client.cljc:86)
	at etaoin.impl.client$call$fn__11997.invoke(client.cljc:127)
	at etaoin.impl.client$call.invokeStatic(client.cljc:127)
	... 32 more

Reproduction
run:
docker run -d -p 4444:4444 -p 7900:7900 --shm-size="2g" selenium/standalone-chrome:4.8.0-20230131
then

(defstate driver
  :start (e/chrome {:webdriver-url "http://localhost:4444" :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}})
  :stop  (e/quit driver))

Expected behavior
To connect to wb (as when used without params works as charm and spawns new local wd process)

Diagnosis
Not sure. The only clue I have is that maybe there is issue within JDK and http-client (as its quite old), but its component which was there for much longer and worked ok...

Action
Looking for help how to approach that.

DISCLAIMER
I am not proficient with Clojure. This is an old piece of code which we use to do some recurring task. It's been developed by different developer which is no longer with us and I am just maintaining it. Recently I had upgraded etaoin lib, as there were some issues with execution due to changes of UI. It solved the problem, but now decided to trying to convert from using 'local' chromdriver towards web one, which would help out with using the tool for us greatly.

I could try bumping JDK or clojure version. However I expect this would mean quite a bit of rewriting and fixing of dependencies - so before I do that (which maybe the cause) I wanted to check if the above issue isn't related strictly with etaoin lib.

Note that I've tested webdriver with following python code and it worked ok:

#!/bin/python3

import time
from selenium import webdriver

driver = webdriver.Remote('http://localhost:4444', webdriver.DesiredCapabilities.CHROME)  # Optional argument, if not specified will search path.
driver.get('http://www.google.com/');
time.sleep(5)
driver.quit()
@lread
Copy link
Collaborator

lread commented Feb 2, 2023

Hi @DonCziken, thanks for raising an issue, and also thanks for being so detailed.
I'll take a peek sometime soon and follow up here.

@lread
Copy link
Collaborator

lread commented Feb 2, 2023

Ok, here's me trying this out.

I happen to be on Pop!_OS, which based on Ubuntu, so similar to your OS.

Personally, I use sdkman, so I'll use it to switch to a Java version that matches yours:

> sdk use java 8.0.352-tem

I'll fire up that docker image, like you did:

> docker run -d -p 4444:4444 -p 7900:7900 --shm-size="2g" selenium/standalone-chrome:4.8.0-20230131

From a new empty directory, I'll create a deps.edn to mimic your setup:

{:deps {org.clojure/clojure {:mvn/version "1.9.0"}
        etaoin/etaoin {:mvn/version "1.0.39"}}}

Now I'll fire up a REPL and test.
I'm not sure what defstate is (probabably from mount?) so I'll leave that out.

❯ clj
Clojure 1.9.0
user=> (require '[etaoin.api :as e])
nil
user=> (def driver (e/chrome {:webdriver-url "http://localhost:4444" :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}))
#'user/driver
user=> (e/go driver "https://en.wikipedia.org/")
{:state "success", :sessionId nil, :class "org.openqa.selenium.remote.Response", :value nil, :status 0}
user=> (e/get-url driver)
"https://en.wikipedia.org/wiki/Main_Page"
user=> (e/get-title driver)
"Wikipedia, the free encyclopedia"
user=> (e/quit driver)
{:type :chrome, :host "127.0.0.1", :port 42615, :url "http://127.0.0.1:42615", :locator "xpath", :webdriver-url "http://localhost:4444", :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}
user=>

So from an initial look-see, seems to be ok.
Can you try the above and see if that works for you too?

@lread
Copy link
Collaborator

lread commented Feb 2, 2023

I could try bumping JDK or clojure version. However I expect this would mean quite a bit of rewriting and fixing of dependencies - so before I do that (which maybe the cause) I wanted to check if the above issue isn't related strictly with etaoin lib.

I think this is a good approach.
But: you might not be aware that Clojure is typically extremely forward-compatible, so bumping Clojure is typically safe and painless.
So once you are sorted out I'd recommend you bump both Clojure and the JDK.

@DonCziken
Copy link
Author

@lread

I've tested as you suggested using REPL and it have worked like charm (same as for you).

However to try to do it in REPL I had to I had to install clojure locally following: https://clojure.org/guides/install_clojure. The reason is that the build for the project is based on boot and it seems I didn't need local clojure to be able to build it with it. I guess thats something ot replace as well, as it seems the tool haven't been developed since mid 2019 :[.

Anyway, my next guess was maybe its issue with Boot indeed, however you've asksed:

I'm not sure what defstate is (probabably from mount?) so I'll leave that out.

and it indeed it was a right question to ask, as following is how the driver is managed (just learned that mount is a global state manager)

(ns coretime-sync.sage.driver
  (:require [mount.core :refer [defstate]]
            [etaoin.api :as e]))

(defstate driver
  :start (e/chrome {:webdriver-url "http://localhost:4444" :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}})
  :stop  (e/quit driver))

So I decided to keep playing with REPL further, to use the mount as maybe it was the problem, so I've added following into deps with same version as is in the project

{:deps {org.clojure/clojure {:mvn/version "1.9.0"}
        etaoin/etaoin {:mvn/version "1.0.39"}
        mount/mount {:mvn/version "0.1.12"}}}

and run following with REPL (took me some time, as was doing :require instead of require... and couldn't figure out why it doesn't work ;))

Clojure 1.9.0
user=> (require '[mount.core :refer [defstate]] '[etaoin.api :as e] '[mount.core :as m])
nil
user=> (defstate driver :start (e/chrome {:webdriver-url "http://localhost:4444" :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}) :stop  (e/quit driver))
#'user/driver
user=> (m/start)
SocketTimeoutException Read timed out  java.net.SocketInputStream.socketRead0 (SocketInputStream.java:-2)

So as you can see, boom when using mount I am receiving Socket Ex, but I've checked that there is a newer version of it, so I've updated the deps for REPL:

{:deps {org.clojure/clojure {:mvn/version "1.9.0"}
        etaoin/etaoin {:mvn/version "1.0.39"}
        mount/mount {:mvn/version "0.1.17"}}}

and BINGO, it worked!

Clojure 1.9.0
user=> (require '[mount.core :refer [defstate]] '[etaoin.api :as e] '[mount.core :as m])
nil
user=> (defstate driver :start (e/chrome {:webdriver-url "http://localhost:4444" :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}) :stop  (e/quit driver))
#'user/driver
user=>  (m/start)
{:started ["#'user/driver"]}
user=> (e/go driver "https://en.wikipedia.org/")
{:state "success", :sessionId nil, :class "org.openqa.selenium.remote.Response", :value nil, :status 0}

The final test was with the project itself, I've bumped mount to latest version and yeah indeed manage to connect, however found further issue: #520

However in regards to this thread there is one more bit. I've tested connecting to webdriver without providing :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}} part and it turned out that with that the connection would not work neither and end up with following:

[..]
mount/mount {:mvn/version "0.1.17"}}}
cziken@cziken-ThinkPad-T540p:~/Documents/repositories/iterative/clj-test$ clj
Clojure 1.9.0
user=> (require '[mount.core :refer [defstate]] '[etaoin.api :as e] '[mount.core :as m])
nil
user=>  (defstate driver :start (e/chrome {:webdriver-url "http://localhost:4444"}) :stop  (e/quit driver))
#'user/driver
user=> (m/start)
SocketTimeoutException Read timed out  java.net.SocketInputStream.socketRead0 (SocketInputStream.java:-2)

@lread lread added the question label Feb 3, 2023
@lread
Copy link
Collaborator

lread commented Feb 3, 2023

Well good for you @DonCziken, it seems you have solved some mysteries!

The reason is that the build for the project is based on boot

Yes, these days boot is not a popular choice.
Folks will typically use either the clojure cli or leiningen.

I don't see an Etaoin issue here, so I'm going to close this one and move on to #520.

@lread lread closed this as not planned Won't fix, can't repro, duplicate, stale Feb 3, 2023
@lread
Copy link
Collaborator

lread commented Feb 3, 2023

Just noticed:

However in regards to this thread there is one more bit. I've tested connecting to webdriver without providing :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}

Different WebDrivers have different options.
Can't remember, but I think if you aren't running as root, the --no-sandbox option for ChromeDriver gets things working. There are apparently security caveats to using this option though, so Etaoin does not enable it by default for chromedriver.

@DonCziken
Copy link
Author

Ok, but I would say it is not at all intuitive, that without that option lib fails with socket timeout exception - I would consider possibly some improvement on handling the error, especially that python library (from my 1st message) has connected without any issue (I only specified that its a chrome driver like I wanted to do here).

@lread
Copy link
Collaborator

lread commented Feb 4, 2023

Ok, but I would say it is not at all intuitive

Agreed, but Etaoin is just a very thin wrapper over WebDriver implementations.

The Python example uses Selenium which is a thicker wrapper.

But.. like you, I am kinda curious how the python example worked without the extra args.
I might look into that. I'll re-open this issue so I do not forget.

@lread lread reopened this Feb 4, 2023
@lread
Copy link
Collaborator

lread commented Feb 4, 2023

Ok. Sooo... here's what I've learned.

To get juicy details of what is going on when interacting with the Selenium Grid docker container, I start it like so in its own terminal window:

>  docker run -it -p 4444:4444 -p 7900:7900 --shm-size="2g" selenium/standalone-chrome:4.8.0-20230131

This allows me to see what is being logged by the container.

Now when I try your python script, I see the following logged:

16:31:36.518 INFO [LocalDistributor.newSession] - Session request received by the Distributor: 
 [Capabilities {browserName: chrome}]

Now I'll plunk our REPL session into a test.clj file so that I can easily rerun it:

(require '[etaoin.api :as e])

(def driver (e/chrome {:webdriver-url "http://localhost:4444"
                       :capabilities {"chromeOptions" {"args" ["--no-sandbox"]}}}))

(e/go driver "https://en.wikipedia.org/")
(println "url:"(e/get-url driver))
(println "title:" (e/get-title driver))
(e/quit driver)

Let's run that:

❯ clojure -M test.clj
url: https://en.wikipedia.org/wiki/Main_Page
title: Wikipedia, the free encyclopedia

From the container terminal window I see:

16:36:32.977 INFO [LocalDistributor.newSession] - Session request received by the Distributor: 
 [Capabilities {chromeOptions: {args: [--no-sandbox]}}]

Ok. So the fact that we are passing chromeOptions is helping the Selenium Grid to infer that we want a chrome browser session.

Let's edit our test.clj :capabilities to mimic what the python version seems to be doing under the hood:

(require '[etaoin.api :as e])

(def driver (e/chrome {:webdriver-url "http://localhost:4444"
                       :capabilities {:browserName "chrome"}}))

(e/go driver "https://en.wikipedia.org/")
(println "url:"(e/get-url driver))
(println "title:" (e/get-title driver))
(e/quit driver)

When I rerun, it works as above, and we see the following in the container window:

16:40:30.971 INFO [LocalDistributor.newSession] - Session request received by the Distributor: 
 [Capabilities {browserName: chrome}]

Conclusion/guesses:

  • Selenium Grid will fire up a browser session for the requested browser
  • It will wait for a requested browser session to become available and then eventually timeout if no session becomes available. This is even true when the requested browser makes no sense (or is absent)
  • we don't need "chromeOptions" {"args" ["--no-sandbox"]} for Selenium Grid, but chromeOptions helped Selenium Grid to infer we wanted a chrome browser session

This is all perhaps interesting info for #378.

Gonna re-close this one, can re-open if we want to explore more.

@lread lread closed this as not planned Won't fix, can't repro, duplicate, stale Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants