-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitizing SVG tag contents against XXS/XXE #409
Comments
I think the answer to both of those questions are: it depends on AngleSharp. AngleSharp does the actual markup parsing, so if it can handle XML that isn't XHTML, I'd imagine it wouldn't take much to wrangle it into parsing SVG. In fact, it does look like AngleSharp supports SVG processing: AngleSharp/AngleSharp#886 So something like this example might work for you: https://github.com/mganss/HtmlSanitizer/wiki/Examples#ex3-replacing-the-default-formatter - this assumes you want parse the SVG separately from anything else. |
AFAICT this use case is already possible as discussed in #287. @ImrePyhvel Do you think there currently is potential for malicious markup to slip through in the default configuration? Do you have examples? Re presets: Are there known allow lists for SVG elements? How do other sanitizers handle this? |
Yes, #287 hints the use case is possible, though the path of blindly trust the entire SVG tag content is not ok for me. Explicitly configuring @mganss AFAIK the current default HtmlSanitizer configuration seems perfectly safe as it is stripping it out everything from svg root. this also prevents hopping into SVG XML-like behavior inside HTML. I'm not too familiar with SVG or SVG-based attacks, which is why I'm reluctant to build an SVG whitelist myself and would prefer to find some community-verified whitelist. Also, I have a feeling the possible HTML-in-XML-in-HTML mess can cause all kinds of funky cases that a simple tag/attribute whitelist may require cutting big SVG features just to be on the safe side. The key would be to get a comprehensive list of threat examples to sanitize against. Some things to consider:
Regarding existing tools and whitelists, I found surprisingly little. More promising leads:
|
Did some testing.. Testfile
I could not trigger the The codeSanitizing code using whitelist from https://github.com/darylldoyle/svg-sanitizer:
The resultI removed manually a lot of empty whitespace left behind for brevity:
This works ok-ish, but it treats the input as html and hence:
I could remove Also saw that Anglesharp.XML can also parse simpler SVG, but for some reason it choked with parser error when doctype was included. Anyway, it should not matter as there does not seem to be a way to reuse the already parsed To sum it upit kinda works and checking with some real world samples some files will be hit by stripped features implemented as xml extensions in separate xmlns, including
Whitelisting some tags/attributes from xmlns extensions is trickier as xmlns could be set up with any alias name + it is difficult to assess if/which extension tags could be considered unsafe in their reader. The scope would become a mess easily. |
SVG images could be embedded inside html and include XSS similar to html, ex:
.. and many other nuisances (ex, see: The Image that called me.pdf, slide 11. Then there are also XXE attacks via <!ENTITY> declarations ,ex "Billion Laughs". Just like by spec HTML, while some SVG content may be harmful, it can be a legitimate part of HTML content, if safe. In my case, a required feature.
I'd imagine HtmlSanitizer would be able to traverse and clean up the SVG tag content just as well as HTML, but would require a significantly different configuration regarding safe elements/attributes compared to the default conf. Could HtmlSanitizer be used for SVG? Are there any known blockers?
Since SVG contains a ton of safe tags and attributes, it would be cool to have (optional) by-SVG-standard configuration preset for SVG (standalone or inside HTML)? A shared implementation would be more secure compared to each rolling their own.
This seems to chime in with the talk on presets in #350 .
The text was updated successfully, but these errors were encountered: