254
edits
No edit summary |
No edit summary |
||
Line 36: | Line 36: | ||
* The same executor for the tree ops an on the HTML side (nsHtml5TreeOpExecutor, eventually to be named mozilla::parser::TreeOpExecutor) | * The same executor for the tree ops an on the HTML side (nsHtml5TreeOpExecutor, eventually to be named mozilla::parser::TreeOpExecutor) | ||
===Character encodings=== | |||
expat has built-in capability to decode US-ASCII, ISO-8859-1, UTF-8 and UTF-16 and has an API for plugging in support for other decoders. So why bother with putting bytes to UTF-16 conversion in mozilla::parser::xml::StreamParser outside expat? | expat has built-in capability to decode US-ASCII, ISO-8859-1, UTF-8 and UTF-16 and has an API for plugging in support for other decoders. So why bother with putting bytes to UTF-16 conversion in mozilla::parser::xml::StreamParser outside expat? | ||
Line 46: | Line 44: | ||
Encoding sniffing should be handled the [https://bugzilla.mozilla.org/attachment.cgi?id=524615&action=diff same way nsHtml5StreamParser handles it in the XML View Source mode]: mozilla::parser::xml::StreamParser itself should handle UTF-8 and UTF-16 BOM sniffing. If there's no BOM, an instance of expat itself should be used for extracting the encoding name from the XML declaration. | Encoding sniffing should be handled the [https://bugzilla.mozilla.org/attachment.cgi?id=524615&action=diff same way nsHtml5StreamParser handles it in the XML View Source mode]: mozilla::parser::xml::StreamParser itself should handle UTF-8 and UTF-16 BOM sniffing. If there's no BOM, an instance of expat itself should be used for extracting the encoding name from the XML declaration. | ||
===Connecting handlers to expat=== | |||
Looking at the existing sinks, it looks like there's no real value in having an abstraction between expat and code that does the actual work in response to expat's callbacks. If we switched away from expat today, we'd have to change the current abstraction layer anyway. That is, I think it doesn't make sense to have a single class (like the old nsExpatDriver) that provides a set of expat callbacks and then provides another abstraction for concrete handler classes that do the real work. I propose we make the concrete handler classes set themselves as expat callbacks directly. That is, mozilla::parser::xml::TreeOpGenerator should know how to register itself as the handler of various expat callbacks. | Looking at the existing sinks, it looks like there's no real value in having an abstraction between expat and code that does the actual work in response to expat's callbacks. If we switched away from expat today, we'd have to change the current abstraction layer anyway. That is, I think it doesn't make sense to have a single class (like the old nsExpatDriver) that provides a set of expat callbacks and then provides another abstraction for concrete handler classes that do the real work. I propose we make the concrete handler classes set themselves as expat callbacks directly. That is, mozilla::parser::xml::TreeOpGenerator should know how to register itself as the handler of various expat callbacks. | ||
Line 56: | Line 54: | ||
}; | }; | ||
===Dealing with stream data off the main thread=== | |||
mozilla::parser::xml::StreamParser should implement nsIStreamListener on the main thread and copy data over to the parser thread the way nsHtml5StreamParser does. | mozilla::parser::xml::StreamParser should implement nsIStreamListener on the main thread and copy data over to the parser thread the way nsHtml5StreamParser does. | ||
===Dealing with entity references off the main thread=== | |||
Currently, we map a small set of magic public ids to a DTD file that we actually feed to expat so that it gets parsed every time the user loads a document that references one of the magic public ids, such as the public ids for the XHTML 1.0 DTDs. This way, entities defined in the XHTML 1.0 DTDs are available to documents. | Currently, we map a small set of magic public ids to a DTD file that we actually feed to expat so that it gets parsed every time the user loads a document that references one of the magic public ids, such as the public ids for the XHTML 1.0 DTDs. This way, entities defined in the XHTML 1.0 DTDs are available to documents. | ||
Line 68: | Line 66: | ||
Instead of parsing a special file in this case, expat should be hacked in such a way that its internal entity tables can be mutated to a state that's equivalent with the state they'd end up in by parsing the special DTD without actually parsing anything. | Instead of parsing a special file in this case, expat should be hacked in such a way that its internal entity tables can be mutated to a state that's equivalent with the state they'd end up in by parsing the special DTD without actually parsing anything. | ||
===Lack of actual speculation=== | |||
In the HTML case, the only thing that can cause a speculation fail is document.write. Since XML has no document.write, the off-the-main-thread XML parser can parse its input to completion and doesn't need to support stream rewinding. All the tree ops can be queued up and they just need to be executed in chunks that end at a script execution op so that the world experienced by scripts looks as though the parts of the document after the current script didn't exist yet. | In the HTML case, the only thing that can cause a speculation fail is document.write. Since XML has no document.write, the off-the-main-thread XML parser can parse its input to completion and doesn't need to support stream rewinding. All the tree ops can be queued up and they just need to be executed in chunks that end at a script execution op so that the world experienced by scripts looks as though the parts of the document after the current script didn't exist yet. |
edits