Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New content analyzis #233

Merged
merged 6 commits into from
Sep 28, 2021
Merged

New content analyzis #233

merged 6 commits into from
Sep 28, 2021

Conversation

JalilArfaoui
Copy link
Member

@JalilArfaoui JalilArfaoui commented Aug 22, 2021

TLDR; Code is not final, this is more a proof of concept that we can find the currently displayed amazon product title.

Objective was to use accessibility service to extract current DOM from Chrome, to be able to apply XPath rule on live content.

After a lot of trials and reading, I failed to meet this main goal.

What I managed to do is to deep analyze the DOM, but exposed as a tree of AccessibilityNodeInfo. Tag names and classes are not exposed (every tag is exposed as an Android component, View and so on …), but we seem to have the full hierarchy with ids and text contents.

So, I manage to build a poc/feature that, whenever a product page is loaded in amazon (or when app becomes visible), irrespective of the URL, we can find the product title :

 val webview = findWebview(root)
    if (webview != null) {
      val titleExpanderContent = findById(webview, "titleExpanderContent")
      if (titleExpanderContent != null) {
        val titleView = findHeading(titleExpanderContent)
        val title = titleView?.text ?: titleView?.contentDescription
        Log.d(TAG, "Found Amazon page title : $title")
      }
    }

gives

D/Accessibility: Event : TYPE_WINDOW_CONTENT_CHANGED, Package: com.google.android.apps.nexuslauncher, Source: null
D/Accessibility: Active window packageName : com.android.chrome, className: android.widget.FrameLayout
D/Accessibility: Found Amazon page title : 6S Casque Bluetooth sans Fil, écouteurs stéréo sans Fil stéréo Pliables Hi-FI Écouteurs avec Microphone intégré, Micro SD/TF, FM pour iPhone/Samsung/iPad/PC (Or Noir)

Now, we need to discuss what’s the best strategy to replace or full XPath strategy :
 - Keep current strategy (fetch current URL document in background and apply XPath on it)
 - Replace with a mechanism to find more specific informations on more specific websites
 - Both
 - …

 Used documentation and references :
 - https://stuff.mit.edu/afs/sipb/project/android/docs/guide/topics/ui/accessibility/services.html
 - https://medium.com/nerd-for-tech/track-web-browser-usage-in-android-using-accessibility-service-800bfa2745d2
 - https://stackoverflow.com/questions/33318083/how-to-get-webview-from-accessibilitynodeinfo
 - https://groups.google.com/a/chromium.org/g/chromium-dev/c/2VC16XswAaI
 - https://stackoverflow.com/questions/7282789/is-there-any-way-to-get-access-to-dom-structure-in-androids-webview
 - https://stackoverflow.com/questions/40522043/how-to-access-html-content-of-accessibilitynodeinfo-of-a-webview-element-using-a
 - https://stackoverflow.com/questions/65326148/why-accessibilityservice-failed-to-retrieve-content-of-a-webview-but-works-prop
 - https://github.com/google/talkback
 - https://github.com/chromium/chromium/blob/master/content/public/android/java/src/org/chromium/content/browser/accessibility/WebContentsAccessibilityImpl.java
 - https://www.py4u.net/discuss/630533
 - https://stackoverflow.com/questions/10634908/accessibility-and-android-webview
 - https://stackoverflow.com/questions/36793154/accessibilityservice-not-returning-view-ids

Code is not final, this is more a proof of concept that we can find the currently displayed amazon product title.
@JalilArfaoui JalilArfaoui added feature New feature or request question Further information is requested labels Aug 22, 2021
@JalilArfaoui JalilArfaoui self-assigned this Aug 22, 2021
@zhinu
Copy link

zhinu commented Aug 24, 2021

Rules (of new system):
Hierarchy is understood
node ids can be found
node names cannot be found (except for h1 h2 ...)

@JalilArfaoui
Copy link
Member Author

I’ve just pushed a big refacto in order to take the direction decided in #234 and dis-moi/backend#448

Technical context :

In current master version, simplified

  • We use accessibility services, Kotlin side, to detect URL, and we send all URLs to React Native HeadlessTask
  • In RN HeadlessTask we first fetch all matching contexts, and if current url matches and we have an xpath condition, we send it back to a Kotlin XPath module
  • In the Kotlin XPath module, we fetch current url and try on apply xpath on fetched html, and then we send the response back to RN HeadlessTask <- Here we don’t have access to accessiblity informations anymore
  • In RN HeadlessTask, we have some last checks (notice hasn’t been deleted, …) and then we call a Kotlin UI service that manages all the floating UI, to display found notices

Version proposed here

 The main work I did here (in 8ac3716 in particular) was to be able to analyze page content given a list of matching contexts to check.

 The thing is, because current content analysis (XPath) is done later and only given an URL, I could not change the strategy in a drop-in manner …

 I had to bring back most of the context matching process together, in the accessibility service, so that we can push screen content analysis further when needed

So, now, the BackgroundService (that deserves a better name) does the following :

  • Fetch all matching contexts (only once, will be happen regularly)
  • Every time screen change (well, with a 500 ms debounce), detect application, and if supported, current URL
  • Find all corresponding matching contexts
  • <-- If any has content condition, we’re ready to analyze here
  • Then we send all Ids to the Floating UI service I adapted for this

I do not use RN Headless task anymore, but it’s still there because I did not port all the functionality yet. Same for some other files, as FloatingModule, FloatingCoordinator, XPathModule …

Functional

What this version (also) brings

  • Firefox support
  • Background matching context fetch
  • Clean bubble show/hide/update on home / app switch / tab switch / keyboard type

What this version misses

A lot of things need to be «plugged back» :

  • cannot dismiss bubble notification
  • cannot delete message
  • cannot close message window
  • do not take subscribed contributors into account
  • links in messages not working
  • message window does not hide on screen change

Technical debt

I’ve let a lot of TODO comments …

  • Matching contexts and notices fetch should be moved to a model/repository
  • Avoid observeForever
  • Use proper data types
  • Fix typefaces
  • Sentry could be addde

@JalilArfaoui JalilArfaoui changed the title poc/introduce_webview_analyzing New webview analyzing Sep 21, 2021
@JalilArfaoui JalilArfaoui changed the title New webview analyzing New content analyzis Sep 21, 2021
@lutangar
Copy link
Member

@JalilArfaoui You're describing interesting domain logic and mentionning resources in the PR description which would both deserve to appears somewhere in a markdown!

@JalilArfaoui
Copy link
Member Author

@JalilArfaoui You're describing interesting domain logic and mentionning resources in the PR description which would both deserve to appears somewhere in a markdown!

You’re right @lutangar … I’ve added docs/Architecture.md that tries to sum up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants