loiterer2 wrote:
> Hi,
> I would like to --well, at least, I am hoping I can-- do 2 things:
> First, write a DTD parser. Then, concuct a DTD parser FAQ from a hands-
> on point of view.
> I have written a number of parsers, but they were all easy. SGML as in
> used in DTDs is proving to be much harder.
That is called Declaration Syntax (as opposed to Document Syntax).
It's not hard, per se, just different.
> The fact that there's hardly any usable information on the Web does
> not make things any easier. By 'usable', I mean stuff you can lookup
> and it tells you what is what in a language you can understand. From
> this POV, books I have seen (not many) have been way over my head.
ISO 8859 (the standard document) is a commercial product of the ISO.
You have to buy it, or buy Goldfarb's _SGML Handbook_.
> All I need is simle, example-oriented explanations. And, no, 'read the
> code stupid' does not help either.
The best guide to writing DTDs is "SGML DTDs" by Maler and El Andaloussi.
> So, I decided to ask here --hoping people would help creating such
> documentation.
> While still hoping, I'll list a few examples for 'ENTITY' definition:
> <!ENTITY % head.misc "SCRIPT|STYLE|META|LINK|OBJECT" -- repeatable
> head elements -->
> <!ENTITY % heading "H1|H2|H3|H4|H5|H6">
> <!ENTITY % HTML.Frameset "IGNORE">
> <!ENTITY % list "UL | OL">
> <!ENTITY % MediaDesc "CDATA" -- single or comma-separated list of
> media descriptors -->
> <!ENTITY % preformatted "PRE">
> Now, it's obvious that what follows '<!' is what we are defining. In
> this case an 'ENTITY'.
Yes, it's called the MDO (Markup Declaration Open).
> Then.. we have a '%' sign...
The Parameter Entity Reference Open (pero).
> I am assuming that it tells us that we are about to find a name
> string.
Not quite. It defines that the name being declared is a PE (Parameter
Entity -- one that can be used only in replacements in the DTD) as
opposed to a General Entity (which is used in the actual document).
> I don't remember seeing any 'ENTITY' definitions that did not have '%'
> as the next non-whitespace char.
That's because the only ones you have seen are PEs. Here are some
General Entities:
<!ENTITY IBM CDATA "International Business Machines">
<!ENTITY foobar SYSTEM "chapter1.sgm">
You use them in the text to refer to &IBM; or to include &foobar;
> So, I am assuming that '%' must be present.
> Is that a correct assumotion?
No. See pp 394-401 of Goldfarb, especially Productions 101-104.
> If not, what else can there be, and what do they mean?
The pero is only used for PEs. GEs don't have a symbol there, but they
may use the reserved string #DEFAULT (production 103).
> After ''%'' char, next, we have a piece of non-whitespace string.
The entity name.
> I am assuming it means 'name' of the 'ENTITY' we are defining.
Yep.
> Is that assumption correct, could there be something else meaning
> something else.
Nope.
> And, is it case-sensitive --I believe it isn't but I might as well
> have it confirmed.
This is defined in the SGML Declaration for the specific DTD. It can be
made case-sensitive or case-insensitive.
> Then, we have all sorts of goobledygook..
This is the entity text. In the case of PEs, this is usually a content
model fragment, consisting of element type names in the form used in
element declarations, allowing the parameter entity reference to be usd
in constructing complex content models. But it can also be a parameter
literal and a bunch of other things (like the HTML.Frameset value, used
in switching features on and off).
> I am assuming these to be the value(s) that ENTITY can have.
No, you will have to read the standard to find out. It's 650pp.
> I don't have much problem with those that are explicetly listed, but
> what does "IGNORE", "CDATA" mean?
Too much to explain here. Read Eve Maler's book.
> What other stuff can be there apart from "IGNORE", "CDATA", and what
> do they mean?
Lots and lots.
> Could you help clarify these please.
Could you please go and read the documentation first, then ask about
what more you need to know.
///Peter