A HTML parser made in V for V This repository is now archived, since all code has been in official Vlib
If description below isn't enought, see test files
Responsible for read HTML in full strings or splited string and returns all Tag objets of it HTML or return a DocumentObjectModel, that will try to find how the HTML Tree is.
This functions is the main function called by parse method to fragment parse your HTML
This function is called passing a filename or a complete html data string to it
This function is used to add a tag for the parser ignore it's content. For example, if you have an html or XML with a custom tag, like <script>
, using this function, like add_code_tag('script')
will make all script
tags content be jumped, so you still have its content, but will not confuse the parser with it's >
or <
When using split_parse method, you must call this function to ends the parse completely
This functions returns a array with all tags and it's content
Returns the DocumentObjectModel for current parsed tags
If you want to reuse parser object to parse another HTML, call initialize_all()
function first
A DOM object that will make easier to access some tags and search it
This function retuns a Tag array with all tags in document that have a attribute with given name and given value
This function retuns a Tag array with all tags in document that have a name with the given value
This function retuns a Tag array with all tags in document that have a attribute with given name
This function returns the root Tag
This function returns all important tags, removing close tags
This function returns a xpath based on it internal tree
returns a tag array based on queue string given to function (it searchs the elements in dom and it's btree)
An object that holds tags information, such as name
, attributes
, children
Returns all children as an array
Returns the parent of current tag
Returns tag name
Returns tag content
Returns all attributes and it value
Returns the content of the tag and all tags inside it. Also, any <br>
tag will be converted into \n
A: Because in early stages of the project, strings.Builder are used, but for some bug existing somewhere, it was necessary to use string directly. Later, it's planned to use strings.Builder again
A: For some reason when using != and == in strings directly, it not working. So, this method is a workaround
A: For debuging purposes
A: Is a workaround, because to make it to be finish fast, use this method, to not worry with address manipulation and addresses. Maybe in future child tag arrays ([]&Tag) will be added again to be more easily to use
A: Like XPath yes. Exactly equal to it, no.
- Parser
-
<!-- Comments -->
detection -
Open Generic tags
detection -
Close Generic tags
detection -
verify string
detection -
tag attributes
detection -
attributes values
detection -
tag text
(on tag it is declared as content, maybe change for text in the future) -
text file for parse
support (open local files for parsing) -
open_code
verification -
split parse
use '\n' as delimiter
-
- DocumentObjectModel
- push elements that have a close tag into stack
- remove elements from stack
- add info about who's the parent of the current node
-
create a new document root if have some syntax error (deleted) - search tags in
DOM
by attributes - search tags in
DOM
by tag type - finish dom test
- XPath
- receive search string and identify what to search and when
- start search by root
- start search by tag name
- start search by attribute name
- get all tags from document
- Finish XPath search
- Profile -
shell sudo v -profile debug.profile -o main.out . && ./main.out