Home   Archive   Permalink



Parsing baby steps

Still just not getting the parsing. I am looking at the manual, and at the makedoc2 source code (a parse-fest if there ever was one) and trying to concoct a simple example to see what it is in my head that might be blocking my understanding. The example below has an explanation in the comments for what I am trying to do in the example. I understand the principle. When I get the === I will copy what follows to the end of the line, trim it, and surround it with h1 tags. When I get a blank line, I will copy to the next blank line and surround what I get with the p tags. But I just don't see what to do to make that happen and I wonder if anyone can offer guidance.
    
Thank you.
    
R E B O L []
    
;; [---------------------------------------------------------------------------]
;; [ Demo for the purpose of trying to understand parsing.                     ]
;; [                                                                         ]
;; [ This demo will transform simple text with one markup item into html.     ]
;; [ The one markup item is the === on one line that indicates a heading     ]
;; [ at the h1 level. The text on that line should be trimmed and surrounded ]
;; [ by the h1 tags. The other lines of text should be divided on the         ]
;; [ blank line and be surrounded by the "p" tags.                             ]
;; [                                                                         ]
;; [ So text like this:                                                        ]
;; [                                                                         ]
;; [ ===Heading 1                                                             ]
;; [                                                                         ]
;; [ Paragraph 1-1                                                             ]
;; [                                                                         ]
;; [ ===Heading 2                                                             ]
;; [                                                                         ]
;; [ Paragraph 2-1                                                             ]
;; [                                                                         ]
;; [ Should be transformed to this:                                            ]
;; [                                                                         ]
;; [ <h1>Heading 1</h1>                                                        ]
;; [ <p>Paragraph 1-1</p>                                                     ]
;; [ <h1>Heading 2</h1>                                                        ]
;; [ <p>Paragraph 2-1</p>                                                     ]
;; [                                                                         ]
;; [ ...or something equivalent.                                             ]
;; [---------------------------------------------------------------------------]
    
;; -- This is the sample input data that we will parse.
IN-TEXT: {
===Heading one
    
This is a paragraph of text under heading one.
We would want it surrounded by the "p" tags.
    
This is a second paragraph
that should have its own set of "p" tags.
    
===A second heading
    
The above heading would be emitted with the "h1"
tags.
    
And here is a second paragraph under the second
heading just to show things are working
}
    
;; -- This will be the parsed input data with its html tags.
HTML-OUT: copy ""
    
;; -- Parse IN-TEXT, mark it up, and append it to HTML-OUT.
    
;; ???
    
;; -- Display the output and halt for probing.
print HTML-OUT
halt
    


posted by:   Steven White       10-Sep-2018/13:13:06-7:00



A simplistic approach to this would be to use BITSET! to positively identify content portions. This is sort-of how MakeDoc works, but with a few more subtle rules.
    
; anything but newlines
content: complement charset "^/"
    
scan-doc: func [text [string!]][
     ; parse/all for rebol 2
     collect [
         parse/all text [
             any [
                 newline
                 | "===" opt " " copy part some content (
                     keep 'heading
                     keep part
                 )
                 | copy part [some content any [newline some content] (
                     keep 'para
                     keep part
                 )
             ]
         ]
     ]
]
    
probe scan-doc {
    
=== A Header
    
A Paragraph
    
Another Paragraph
    
}

posted by:   Chris       10-Sep-2018/13:30:33-7:00



*This line was missing a close bracket
    
| copy part [some content any [newline some content]] (
    
Note that this lets you identify multiline paragraphs.

posted by:   Chris       10-Sep-2018/13:33:45-7:00



While this seems an easy task, a few matters make it more difficult than it looks with historical PARSE.
    
One of the not-so-easy aspects is that the TO and THRU doesn't allow you to use complex rules, which complicates your paragraph termination conditions. This is a decision which was reversed in Red (and will be also in Ren-C, when time permits).
    
You can try this in Red, and while there are likely workarounds for it in Rebol2 and R3-Alpha I'd rather consider the fact that this doesn't work as-is a bug than figure out what that would be:
    
     heading-rule: [
         "===" copy heading to "^/" (
             append html-out reduce [
                 <h1> heading </h1> newline
             ]
         )
     ]
    
     paragraph-rule: [
         copy paragraph to ["^/^/" | "^/" end | end] (
             append html-out reduce [
                 <p> paragraph </p> newline
             ]
         )
     ]
    
     parse in-text [
         (html-out: copy {})
         some [newline | heading-rule | paragraph-rule]
     ]
    
     print mold html-out

posted by:   Fork       10-Sep-2018/15:17:01-7:00