Browser configuration
You have to configure your web browser to use WebCleaner as a proxy.
Netscape/Mozilla
Select Edit -> Preferences -> Advanced
-> Proxies.
Activate Manual proxy configuration.
Under HTTP Proxy enter localhost, the Port is
8080.
Under HTTPS Proxy enter localhost, the Port is
8080.
Under No Proxy for enter localhost, 127.0.0.1.
Click Ok to use your new settings.
Internet Explorer
Select Tools -> Internet Options ->
Connections.
Click on LAN Settings. If you have a dialup connection to the
internet, select your dialup connection and click on Settings.
Activate Use a proxy server.
If activated, deactivate Bypass proxy server for local addresses.
Click on Advanced.
Under HTTP enter localhost, the Port is
8080.
Under Secure enter localhost, the Port is
8080.
Click Ok to use your new settings.
Opera 6
Select File -> Preferences -> Network
-> Proxy servers.
Activate HTTP and enter localhost, the Port is
8080.
Activate HTTPS and enter localhost, the Port is
8080.
Activate Do not use Proxy on the adresses below and enter
localhost, 127.0.0.1.
Click Ok to use your new settings.
Proxy filter modules
WebCleaner uses a modular filter design allowing
a lot of flexibility for different uses.
Each module has a list if mime types and a list of which
parts of request/response challenge it applies to. And each module can
be further customized by separate rules in the filter configuration.
BinaryCharFilter
Description
Replace illegal binary characters in HTML code like the quote chars often found in Microsoft pages.
Applies to
Response content data of text/html mime type.
Filter configuration rules
None
Blocker
Description
Block or allow specific sites by URL name. Before matching a URL the hostname and path is unquoted to avoid spoofing attacks.
Applies to
Request url of all mime types.
Filter configuration rules
Block, AllowCompress
Description
Compression of documents with good compression ratio like HTML, WAV, etc.
Applies to
Response content data of mime types text/*, application/postscript, application/pdf, application/x-dvi, audio/basic, audio/midi, audio/x-wav, image/x-portable-*map, x-world/x-vrml.
Filter configuration rules
None
GifImage
Description
Deanimates GIFs and removes all unwanted GIF image extensions (for example GIF comments).
Applies to
Response content data of mime types image/gif
Filter configuration rules
None
Header
Description
Add, modify and delete HTTP headers of request and response.
Applies to
Request and response headers of all mime types.
Filter configuration rules
HeaderImageReducer
Description
Convert images to low quality JPEG files to reduce bandwidth
Applies to
Response content data of all image types supported by the Python Imaging Library: jpeg, png, gif, bmp, pcx, tiff, xbm, xpm.
Filter configuration rules
None
ImageSize
Description
Remove images with certain width and/or height.
Applies to
Response content data of all image types supported by the Python Imaging Library: jpeg, png, gif, bmp, pcx, tiff, xbm, xpm.
Filter configuration rules
ImageRatingHeader
Description
Parse and evaluate content rating system header values.
Applies to
Response headers of all mime types.
Filter configuration rules
RatingReplacer
Description
Replace regular expressions in data streams.
Applies to
Response data of html and javascript
Filter configuration rules
ReplaceHtmlRewriter
Description
Parse HTML code and rewrite single tags, attributes and values. Execute and filter JavaScript. Parse and filter content rated pages. Filter HTML comments.
Applies to
Response data of html pages
Filter configuration rules
Javascript, Nocomments, Rating, HtmlrewriteXmlRewriter
Description
Parse XML code and rewrite single tags, attributes and values. Plus there is the ability to filter embedded HTML content, often occuring in RSS feeds.
Applies to
Response data of html pages
Filter configuration rules
Htmlrewrite, XmlrewriteVirusFilter
Description
Scan all data with the ClamAv virus scanner (which must be installed on the proxy host). For performance reasonse there is a maximum size of 4 MB. If an object exceeds that size the proxy gives an error.
Applies to
Response data of html pages
Filter configuration rules
AntivirusFilter configuration rules
Htmlrewrite
Matching
A HTML rewrite rule applies to one specified HTML tag and
can replace (or delete if the replacement data is empty) parts of or the
complete tag. The tag name is a case insensitive string.
If attributes are given, they must match too before the rule applies.
Action
If there is no replacement given the specified tag
part will be removed, else it will be replaced.
Back references to matched subgroups can be specified in the replacement
string with a backslash and the subgroup number (ie. \1, \2, ...).
What it does when replacement is foo | ||
---|---|---|
replace part | before | after |
tag | <blink>text</blink> | footextfoo |
tagname | <blink>text</blink> | <foo>text</foo> |
enclosed | <blink>text</blink> | <blink>foo</blink> |
attr | <a href="bla">..</a> | <a foo>..</a> |
attrval | <a href="bla">..</a> | <a href="foo">..</a> |
complete | <a href="bla">..</a> | foo |
If you specified zero or more than one attributes to match, 'attr' and 'attrvalue' replace the first occuring or matching attribute or nothing.
Xmlrewrite
Selector
An XML rewrite rule applies to one specific XML tag
and can replace (or delete) parts of or the complete tag.
The selector is a simplified XPath expression of the form
(/tag)+
where a tag is of the form
name([attr=val(,attr=val)*])?
.
Tag names, attributes and values are case sensitive.
Example: /rss/channel/item/description
selects the
<description>
XML tag in an RSS new feed.
Action
Defined replacement types | ||
---|---|---|
replace type | replace value | action |
rsshtml | unused | Assumes all text content inside the XML tag is HTML. Only allows certain HTML tags, and filters the HTML data with the Htmlrewrite rules. | remove | unused | Removes the complete selected XML tag and its content. |
Replace
Replace regular expressions in HTML or JavaScript pages.
Block
A block rule specifies regular expressions for urls
which must be blocked.
The replacement URL specifies the URL to show when the block matches. If none
is given a default block message is shown.
Back references to matched subgroups can be specified in the replacement
url with a backslash and the subgroup number (ie. \1, \2, ...).
Allow
An allow rule specifies regular expressions for urls which must be allowed, even if a matching block rule exists.
Header
Modify HTTP headers. If the replacement value is empty, the header is deleted, else it gets replaced or added if it did not exist before.
Image
Block images with a certain size by replacing them with a transparent 1x1 image.
Javascript
Execute and filter JavaScript (JS) in HTML pages using the integrated Spidermonkey JS engine. The filter deletes popups and places dynamic content emitted with document.write() into the HTML file.
Nocomments
Remove comments from HTML source. Comments inside <script> or <style> tags are not removed.
Rating
One activated Rating rule enables the content rating system in WebCleaner. Several distinct content rating services including the one defined by WebCleaner itself can be configured.