HTML Types
HTMLForge defines a type hierarchy for representing HTML documents.
Type Hierarchy
HTMLNode (abstract)
├── HTMLElement{T} — an element with tag T (e.g. HTMLElement{:div})
├── HTMLText — text content
└── NullNode — sentinel for missing parentsHTMLDocument
Returned by parsehtml. Has two fields:
doctype::AbstractString— the doctype of the parsed document (empty string if none)root::HTMLElement— the root element of the document tree
doc = parsehtml("<html><body><p>Hi</p></body></html>")
doc.doctype # ""
doc.root # HTMLElement{:HTML}HTMLElement{T}
The core type, parametrized by a Symbol representing its tag:
mutable struct HTMLElement{T} <: HTMLNode
children::Vector{HTMLNode}
parent::HTMLNode
attributes::Dict{AbstractString, AbstractString}
endConstructors
# Empty element
HTMLElement(:div)
# Element with a single child
HTMLElement(:div, HTMLText("hello"))
# Element with children and attributes
HTMLElement(:p, [HTMLText("text")], Dict("class" => "intro"))
# Element with children and keyword attributes (underscores become hyphens)
HTMLElement(:div, HTMLNode[]; data_id="42", class="box")Indexing
HTMLElement supports both integer indexing (children) and string/symbol indexing (attributes):
el = HTMLElement(:div)
el["class"] = "wide" # set attribute
el[:class] # get attribute → "wide"
el[1] # first childHTMLText
Represents text content within an HTML document:
mutable struct HTMLText <: HTMLNode
parent::HTMLNode
text::AbstractString
endHTMLText("Example text")Constructed text nodes have a NullNode parent by default.
NullNode
A sentinel type used as the parent of root elements and detached nodes:
struct NullNode <: HTMLNode end