How to count words. #1980
-
Hi, I would like to count the content words.
wordCount is 1 even though there are 3 words. When there is indent, the content will become one word. is there better way I can count the words. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
You could try looking at Intl.Segmenter for this. The issue with checking for spaces is that that won't work for many use-cases:
|
Beta Was this translation helpful? Give feedback.
-
To expand on @trueadm's answer, the immediate issue you're pointing out is a bug on our side - #1993 But even after we land the fix keep in mind that nodes can take many shapes. For example, both TextNodes and DecoratorNodes have content (even LineBreakNode). A DecoratorNode may be an image with an a11y label as text. Do you want to count this accessibility label as part of the number of words in your document? This adds up the Intl.Segmenter point for the text content itself. That said, there's various solutions from less to more flexible:
Also, keep in mind that solutions 2. and 3. are more expensive since they have to iterate all over the tree. 1. is O(1) because it's an exception where we build a cache during the reconciliation process as it's a common use case |
Beta Was this translation helpful? Give feedback.
-
thank you very much for your help.✨ and will try on option 2 or 3 if needed |
Beta Was this translation helpful? Give feedback.
@trueadm @zurfyx
thank you very much for your help.✨
yes I would like to roughly count the words. It would be nice if getTextContent will return the contents which has a space for the indent. I will check the update for the fix #1993
and will try on option 2 or 3 if needed