Confirmed users
574
edits
(Add the mail I sent to resourceful students who found me) |
(Added references §.) |
||
Line 8: | Line 8: | ||
We propose to implement a new MediaWiki parser using proper parsing techniques: generating a parse tree, manipulating it, and then outputting (at least initially) HTML. Erik Rose has already done some research toward this: see the continually developing [http://github.com/erikrose/mediawiki-parser/blob/master/README.rst design document] and some [http://github.com/erikrose/mediawiki-parser/blob/master/lexer.py initial] [http://github.com/erikrose/mediawiki-parser/blob/master/parser.py code]. | We propose to implement a new MediaWiki parser using proper parsing techniques: generating a parse tree, manipulating it, and then outputting (at least initially) HTML. Erik Rose has already done some research toward this: see the continually developing [http://github.com/erikrose/mediawiki-parser/blob/master/README.rst design document] and some [http://github.com/erikrose/mediawiki-parser/blob/master/lexer.py initial] [http://github.com/erikrose/mediawiki-parser/blob/master/parser.py code]. | ||
== | == Intro Email == | ||
''If you are clueful but didn't mail me, act as if I sent this to you, too!'' | ''Here's a mail I sent to a few clueful souls who mailed me. If you are clueful but didn't mail me, act as if I sent this to you, too!'' | ||
Hi, all! You're receiving this because you have inquired about a Google Summer of Code project I'm mentoring: to create a more maintainable, extensible MediaWiki-syntax parser in Python. This is a bit of a "research" project in that I'm not entirely sure it's possible to create a comprehensible parser that still handles all the subtleties of the ill-designed MediaWiki language, but the plan is to at least have fun and learn something trying! | Hi, all! You're receiving this because you have inquired about a Google Summer of Code project I'm mentoring: to create a more maintainable, extensible MediaWiki-syntax parser in Python. This is a bit of a "research" project in that I'm not entirely sure it's possible to create a comprehensible parser that still handles all the subtleties of the ill-designed MediaWiki language, but the plan is to at least have fun and learn something trying! | ||
Line 36: | Line 36: | ||
Python Wrangler<br> | Python Wrangler<br> | ||
Mozilla | Mozilla | ||
== References == | |||
To get a sense of the project or start immersing yourself in its problem domain, you can start by reading this stuff: | |||
* The grammar horrors and half-baked implementation attempts at http://www.mediawiki.org/wiki/Markup_spec. http://www.mediawiki.org/wiki/Markup_spec/BNF seems to be most complete/helpful. | |||
* A nice short review of various parsing techniques: http://tratt.net/laurie/tech_articles/articles/parsing_the_solved_problem_that_isnt | |||
* Good material on Earley parsers. Wikipedia's could be better. If you find something, please note it here. |