Version: 5.1
HTML: Parsing Library
The
html library provides
functions to read html documents and structures to represent them.
Reads (X)HTML from a port, producing an
html instance.
If v is not #f, then comments are read and returned. Defaults to #f.
If v is not #f, then the HTML must respect the HTML specification
with regards to what elements are allowed to be the children of
other elements. For example, the top-level "<html>"
element may only contain a "<body>" and "<head>"
element. Defaults to #f.
1 Example
(module html-example racket | | ; Some of the symbols in html and xml conflict with | ; each other and with racket/base language, so we prefix | ; to avoid namespace conflict. | (require (prefix-in h: html) | (prefix-in x: xml)) | | (define an-html | (h:read-xhtml | (open-input-string | (string-append | "<html><head><title>My title</title></head><body>" | "<p>Hello world</p><p><b>Testing</b>!</p>" | "</body></html>")))) | | ; extract-pcdata: html-content -> (listof string) | ; Pulls out the pcdata strings from some-content. | (define (extract-pcdata some-content) | (cond [(x:pcdata? some-content) | (list (x:pcdata-string some-content))] | [(x:entity? some-content) | (list)] | [else | (extract-pcdata-from-element some-content)])) | | ; extract-pcdata-from-element: html-element -> (listof string) | ; Pulls out the pcdata strings from an-html-element. | (define (extract-pcdata-from-element an-html-element) | (match an-html-element | [(struct h:html-full (attributes content)) | (apply append (map extract-pcdata content))] | | [(struct h:html-element (attributes)) | '()])) | | (printf "~s\n" (extract-pcdata an-html))) |
|
|
> (require 'html-example) | ("My title" "Hello world" "Testing" "!") |
|
2 HTML Structures
pcdata, entity, and attribute are defined
in the xml documentation.
A html-content is either
|
content : (listof html-content) |
Any html tag that may include content also inherits from
html-full without adding any additional fields.
A Contents-of-html is either
A Contents-of-head is either
A Contents-of-tr is either
A Contents-of-table is either
A Contents-of-fieldset is either
A Contents-of-select is either
A Contents-of-dl is either
A Contents-of-pre is either
A Contents-of-object-applet is either
A Map is
(make-map (listof attribute) (listof Contents-of-map))A Contents-of-map is either
A Contents-of-a is either
A Contents-of-address is either
A Contents-of-body is either
A G12 is either
A G11 is either
A G10 is either
A G9 is either
A G8 is either
A G7 is either
A G6 is either
A G5 is either
A G4 is either
A G3 is either
A G2 is either