htmLawed HTML filter/purifier
drpatnaik
|
7 comments | Tuesday, April 8, 2008
htmLawed is a new GPL'ed PHP script to filter text to make it secure, and standard-compliant. The easily-customized script can also remove admin-specified HTML attributes and elements, control spam, and so on. You can read the documentation and test the script at the htmLawed website.


Comments (Login to leave comments)
However, now I'm looking at the docs, and I love that it is non-OOP (I'm a procedural guy still), (although I already wrote a wrapper function for HTMLPurifier)
4.6 also has a section for comparison... perhaps I will give this a shot and see how I like it :)
http://htmlpurifier.org/comparison.html
<a href="http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/htmLawed_README.htm">www.bioinformatics.org/phplabware/internal_utilities/htmLawed/htmLawed_README.htm</a>
htmlPurifier is much more accurate and powerful. It's like comparing a regex parser (htmllaw) vs a full html dom parser (purifier).
1. HTML Purifier ensures that output is standards-compliant. htmLawed does not. Patnaik has told me so himself that this is not htmLawed's purpose.
2. HTML Purifier is a lot slower, bigger and requires more memory than htmLawed.
3. HTML Purifier does not support as many tags as htmLawed. In the comparison, I talk about why HTML Purifier's tagset is "better" than htmLawed, and most security experts will say that form and iframe should certainly not be allowed with untrusted users.
4. htmLawed is not safe by default. You must read the documentation to use it effectively. This is something not lost upon other security researchers: http://seclists.org/bugtraq/2008/Apr/0028.html
My view on the matter is this: if you need speed and compact code and don't mind spending some time configuring your software and reading documentation, use htmLawed. If you need validation, but don't need to protect against XSS, use Tidy. If you need full validation, use HTML Purifier.
Issue 2 is probably the biggest problem when it comes to getting HTML Purifier adopted by projects that currently use kses. (There already is a kses wrapper for HTML Purifier). I don't really have a good solution, except to use caching. Perhaps eventually we'll get a C/C++ wrapper for HTML Purifier. :-)
Been using HTMLPurifier for almost a year on multiple projects and I am still amazed at how I can drop it in, instantiate, configure quickly and forget it even exists. HtmLawed requires a far deeper set of configuration, and it's output is neither secure nor standards compliant by default.
HTMLPurifer is released under the venerable Lesser GPL License so I can integrate it into proprietary apps. HtmLawed is released under the GPL, which effectively means it cannot be used in proprietary applications unless you are released that application to open source.
HTMLPurifer has until now been the only active project of it's type - still a cooling period to see just how effective and adoptive this becomes.
HtmLawed has several documented security issues. HTMLPurifier does not. This reason alone is enough for me not to recommend HtmLawed for the simple reason someone neglected to have a running test suite checking a comprehensive collection of commonly known XSS exploits.
Overall, memory and performance issues aside (I honestly could care less about performance compared to important factors like reliability and maintenance record) HTMPurifer remains the only library I'd recommend.
My sole complaint about HTMLPurifer is that its news page doesn't have an RSS feed.
Disclaimer: I'm not a HTMLPurifer developer. I do interact with the author on the Devnetwork forums.