View previous topic :: View next topic |
Author |
Message |
Jakobud Guest
|
Posted: Fri Oct 31, 2003 Post subject: Why isn't everyone here compiling the ultimate filter list? |
|
|
Why don't we do this? We can post all of our filters or even our filter files so everyone can download them and import them into Adblock. |
|
Back to top |
|
 |
Org
Joined: 23 Oct 2003 Posts: 349
|
Posted: Fri Oct 31, 2003 Post subject: |
|
|
Why not indeed? This is what I'm using right now (works well, but it's still evolving slowly):
Code: | [Adblock]
*.ad.*
*.atwola.com/*
*/?DC=*
*/AD_Banner/*
*/Banner/*
*/ad/*
*/adclick/*
*/adimage.php*
*/adimages/*
*/ads/*
*/ads?*
*/adserve/*
*/banner.cgi*
*/banner/*
*/bannerfarm/*
*/bannerit/*
*/bannerlink.*
*/banners/*
*/bnr/*
*/cgi-bin/bd.m?*
*/onlineads/*
*adcontent.*
*adsdk.com/*
*advertising.com/*
*bizrate.com/*
*doubleclick.net*
*fastclick.net/*
*resellerratings.com/*
*spinbox.net*
*tradedoubler.com*
http://adimages.*
http://ads.*
http://banner.*
http://rcm-images.*
http://view.atdmt.com/*
http://www.theregister.co.uk/media/*
|
(List was created with the export command in Adblock preferences.) |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Fri Oct 31, 2003 Post subject: |
|
|
Here's a list from early August.
.
It's pretty good for macintosh / major news-sites; duplicates were entered for testing.
.
Oh, and prepended exclamation-marks disable stuff. |
|
Back to top |
|
 |
Aaron Spuler QA Testing
Joined: 23 Oct 2003 Posts: 19
|
Posted: Fri Oct 31, 2003 Post subject: |
|
|
http://mozthemes.tk -- go to adblock --> filters. that's the list i've compiled since using adblock and it blocks basically everything for me. check it out and see what you think  |
|
Back to top |
|
 |
Henrik Owner

Joined: 22 Oct 2003 Posts: 33 Location: Copenhagen, Denmark
|
Posted: Sat Nov 01, 2003 Post subject: Warning |
|
|
The idea of compiling the ultimate filter list is appealing, but may present problems. The reason for this is that Adblock works by matching the address of every blockable object against every filter in the list. In algorithmic terms, this scales as O(M*N).
This is actually not a very effective way of doing it, but it's necessary because of the use of filters instead of just blacklisting certain servers, and as long as the list is kept small most of us will never notice any delays.
What I'm trying to say is that compiling a huge list of every ad-directory on every server on the planet will scale extremely badly. The idea of creating a "beginner's list" is very good, though, but should perhaps be limited to containing very generic filters (*/ads/*, *doubleclick*, etc). At the moment, every filter-list should be adapted to the needs of the individual user and his surfing habits.
It should be noted that one of Rue's "near future"-plans are (correct me if I say something wrong here, Rue!) to make an auto-pruning feature for unused filters. When this arrives, a huge list will be a good idea, as it will slowly addapt itself to the users need as it's used. |
|
Back to top |
|
 |
Lanny Chambers Guest
|
Posted: Sun Nov 02, 2003 Post subject: |
|
|
We're all running BannerBlind, right? BB takes a large load off the Adblock filter list.
Now, if only there were a way to block all Flash movies smaller than a certain pixel size... |
|
Back to top |
|
 |
Org
Joined: 23 Oct 2003 Posts: 349
|
Posted: Sun Nov 02, 2003 Post subject: |
|
|
I used to run Bannerblind in Mozilla before I changed to Firebird. Now with FB and Adblock I see absolutely no need for Bannerblind. Adblock simply blocks everything unwanted. |
|
Back to top |
|
 |
Guest
|
Posted: Thu Nov 06, 2003 Post subject: |
|
|
my general filters:
Code: |
/[^a-zA-Z]ad[^a-zA-Z]/
/[^a-zA-Z]adcycle*
/[^a-zA-Z]adrotate*
/[^a-zA-Z]ads[^a-zA-Z]/
/[^a-zA-Z]adserv*
/[^a-zA-Z]adv*
|
[^a-zA-Z] means everything but the letters a-z, so /[^a-zA-Z]ad[^a-zA-Z]/ would block objects containing /ad.jpg and print.html?ad=true but won't block dad.jpg or mad.gif etc |
|
Back to top |
|
 |
McMurmel
Joined: 13 Nov 2003 Posts: 19 Location: Germany
|
Posted: Thu Nov 13, 2003 Post subject: |
|
|
Don't kill me but if you want to use regular expressions the string must begin and end with an / ?
so the correct list would look like /[^a-zA-Z]adcycle/ instead of /[^a-zA-Z]adcycle* and so on? _________________ The road goes ever on and on - down from the door that it began - now far ahead the road has gone and I must follow if I can... |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Thu Nov 13, 2003 Post subject: |
|
|
Guest:
You can't mix simple and RegExp syntax. Regular Expressions must begin and end with the forward-slash (/) and the wildcard is period+asterisk (.*). That list is mostly wrong.
McMurmel:
Ignore the expressions above you. The example you were correcting should read: /[^a-zA-Z]adcycle.*/ |
|
Back to top |
|
 |
McMurmel
Joined: 13 Nov 2003 Posts: 19 Location: Germany
|
Posted: Thu Nov 13, 2003 Post subject: |
|
|
Sorry to correct you but what is the difference between these three regular expressions?:
1. /[^a-zA-Z]adcycle/
2. /[^a-zA-Z]adcycle.*/
3. /.*[^a-zA-Z]adcycle.*/
As long as we don't use multiline-strings there's no difference (and we don't because url's are single-line-strings). They all deliver the same result. .* means any character but the newline-character (\n) 0 - oo times, so the first is the simplest of all three and should be prefered.
P.S: That the reason I suggested a change in entering simple filters. _________________ The road goes ever on and on - down from the door that it began - now far ahead the road has gone and I must follow if I can... |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Thu Nov 13, 2003 Post subject: |
|
|
McMurmel:
Yea- I knew that, but in the interest of preserving the original author's intent, I converted his wildcard. I didn't realize this nick was yours (Stefan), so I tried to be as unconfusing as possible :P |
|
Back to top |
|
 |
McMurmel
Joined: 13 Nov 2003 Posts: 19 Location: Germany
|
Posted: Fri Nov 14, 2003 Post subject: |
|
|
Quote: |
Yea- I knew that, but in the interest of preserving the original author's intent, I converted his wildcard. I didn't realize this nick was yours (Stefan), so I tried to be as unconfusing as possible :P
|
... well that is the reason you're the board-admin and I am but a normal user. Enduser-support was never one of my good skills, so I said no to Hendrik when he suggested that I should admin the project because he was too busy studying - but that's another story... _________________ The road goes ever on and on - down from the door that it began - now far ahead the road has gone and I must follow if I can... |
|
Back to top |
|
 |
Weber
Joined: 16 Nov 2003 Posts: 1
|
Posted: Sun Nov 16, 2003 Post subject: |
|
|
This filter get rid of everything, havnt seen a ad in weeks(after upgrading to the newest adblock... good work)
[AdBlock]
*/ad/*
*/ads/*
*/click/*
*/fastclick/*
*ADV*
*adlog*
*ads*
*adsdk*
*adserver*
*adtech*
*advertising*
*annone*
*atdmt*
*banners*
*click.*
*doubleclick*
*image.ugo.com*
*m2*
*sob*
*spinbox*
*viewad* |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Sun Nov 16, 2003 Post subject: |
|
|
The fourth pass of build 17 (posted last evening) allows unanchored simple-expressions.
.
This means you can replace *spinbox* with spinbox -- and it works just fine. |
|
Back to top |
|
 |
NJH
Joined: 13 Nov 2003 Posts: 183 Location: Hampshire, England
|
Posted: Mon Nov 17, 2003 Post subject: |
|
|
The idea of the ultimate filter list is not bad, but I do not think the wheel should be totally re-invented. This is not the only ad blocking software using regular expressions in its block file. Have a look at this thread I started as a guest. Rue very kindly converted the list into the correct format for AdBlock as the original used a slightly different dialect of regular expressions. It is a fairly large list and I have no idea what performance hit a PC would take when using it. I have not timed it. There may well be other lists already available on the web.
One thing which has to be watched is oversimplification of the block file as you possibly see on Weber's list. If my understanding is correct, *ads* would block www.dadsuk.co.uk, www.dadsaregreat.com and sites with things like madscientist or ADSL in the URL etc; *ADV* would block URLs with advice, advantage and so on.
Last edited by NJH on Thu Nov 27, 2003; edited 1 time in total |
|
Back to top |
|
 |
Org
Joined: 23 Oct 2003 Posts: 349
|
Posted: Mon Nov 17, 2003 Post subject: |
|
|
NJH wrote: | One thing which has to be watched is oversimplification of the block file as you possibly see on Weber's list. If my understanding is correct, *ads* would block www.dadsuk.co.uk, www.dadsaregreat.com and sites with things like madscientist or ADSL in the URL etc; *ADV* would block URLs with advice, advantage and so on. |
So very true. No matter what kind of list you make, be sure take take this point into consideration. (If you take a closer look at the filter list I posted earlier in this thread, you can see that I have tried to take care to avoid these kind of false blockings.) |
|
Back to top |
|
 |
Guest
|
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Wed Dec 10, 2003 Post subject: |
|
|
I suppose it couldn't hurt -- some further refinements, lifted straight from my list:
/[\W\d]ad(server|s)?[\W\d]/ -- catches .ad., /ad., cgi?ads=19, etc.
/[\W\d]banner(s|id\=)[\W\d]/ -- general banner-stuff
/\D\d{2,3}x\d{2,3}\D/ -- catches banner-sizes in source: 192x74
http://babelfish.altavista.com/babelfish/urltrtop -- removes babelfish's annoying upper-frame |
|
Back to top |
|
 |
NJH
Joined: 13 Nov 2003 Posts: 183 Location: Hampshire, England
|
Posted: Wed Dec 10, 2003 Post subject: |
|
|
Try something like /[^a-z]ad(s?|v)[^a-z]/. It gets rid of the words ad, ads, and adv bounded by any non-letter which includes . / and any number. This means it also gets rid of things like ad0.
A fuller filter is /[^a-z]ad(s?|serv(er|e)?|v)[^a-z]/
For banners, I use 2 filters to pick up words beginning in banner or ending in banner(s):
/[^a-z]banner/
/banners?[^a-z]/
From previous correspondance, I think your [\W\d] is broadly equivalent to my [^a-z] |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Wed Dec 10, 2003 Post subject: |
|
|
NJH:
Using [\W\d] is not only easier on the eyes, but also independant of Adblock's case-insensitivity.
.
Also, you made some prior mistakes again: /[^a-z]ad(s?|serv(er|e)?|v)[^a-z]/
First, an OR-set shouldn't contain zero-match cases if anything follows the set. In other words, (s?|..)[^a-z] should be (s|..)?[^a-z]. Second, you've ordered the set such that it matches (s? before |serv. Recall, you have to order by decreasing complexity.
.
In the end, the pattern better reads: /[\W\d]ad(serv(er|e)?|s|v)?[\W\d]/ |
|
Back to top |
|
 |
NJH
Joined: 13 Nov 2003 Posts: 183 Location: Hampshire, England
|
Posted: Wed Dec 10, 2003 Post subject: |
|
|
rue,
Interesting comments. I will change my filters to [\W\d]. I was going to have a go sometime about re-working my filter to get the ? of the s? outside the OR set, but what I posted I tested following our last correspondence. I do not necessarily understand why. My filter matches /ad/, /ads/, /ad0 and any other number, /adv/, /adserv/, /adserve/ and /adserver/. It also matches other leading and trailing characters.
If you can explain why it works when I could not get something similar to work in our previous correspondence I would be interested.
Nick |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Thu Dec 11, 2003 Post subject: |
|
|
NJH:
In the JavaScript Console, try these two cases: alert("vx".match(/(s?|v)/));
alert("vx".match(/(s?|v)x/)); The first correctly reports an empty, yet positive match: s? matches true for zero-occurrances on the first character it encounters, the set stops testing.
.
The second, however, catches the "v". Apparently sets continue testing if something follows the set. Whether this is a hidden-rule, or a bug, it's rather illogical. I wouldn't advise relying on it. |
|
Back to top |
|
 |
Guest
|
Posted: Wed Dec 17, 2003 Post subject: |
|
|
Does anyone have a simple, but effective filter that WON'T kill http://www.pvponline.com ? |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Wed Dec 17, 2003 Post subject: |
|
|
Guest (tychoquad):
Scroll up, and copy the entries posted here.
.
Then, add these: /[\W\d](ad|dime|double|fast|value|click)(stream|s|thrutraffic|thru|xchange|click)[\W\d]/
http://pagead2.googlesyndication.com |
|
Back to top |
|
 |
Guest
|
Posted: Wed Dec 17, 2003 Post subject: |
|
|
Thank you rue, this is exactly what I was after.
The only simple/effective list i have been able to find that didn't kill all images in existance was that mac one you posted a while ago. all the others were so long, it lagged my whole computer every time firebird loaded a page.
A big problem here, is that I don't think anyone exept you knows how to properly build a short and effective list. I have no idea what all this expression stuff is about, even after reading the stuff at adblock.mozdev.org |
|
Back to top |
|
 |
Henrik Owner

Joined: 22 Oct 2003 Posts: 33 Location: Copenhagen, Denmark
|
Posted: Wed Dec 17, 2003 Post subject: |
|
|
Quote: | A big problem here, is that I don't think anyone exept you knows how to properly build a short and effective list. I have no idea what all this expression stuff is about, even after reading the stuff at adblock.mozdev.org |
Remember that using regular expressions is optional, and should only be used by people who are comfortable with them.
There is still the option of using the simpler wildcard (*) filters. |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Wed Dec 17, 2003 Post subject: |
|
|
Guest:
It just takes a little patience -- thinking things through.
.
Having access as we do "behind-the-scenes", Henrik and I are acutely aware of the overhead for larger lists. For elements that aren't blocked, the entire list must be tested against them; if you're not careful, the penalty can be significant.
.
Check out the RegExp tutorials we came up with for Bethrezen, here. You'll find many links to further tutorials, as well. |
|
Back to top |
|
 |
TomChiverton
Joined: 11 Nov 2003 Posts: 3
|
Posted: Thu Dec 18, 2003 Post subject: |
|
|
Please don't but 'banner.*' in the generic block list - perfectly nice sites often call or serve their nav. banners like that :-) |
|
Back to top |
|
 |
Guest
|
Posted: Tue Jan 06, 2004 Post subject: |
|
|
rue wrote: |
/[\W\d]ad(server|s)?[\W\d]/ -- catches .ad., /ad., cgi?ads=19, etc.
/[\W\d]banner(s|id\=)[\W\d]/ -- general banner-stuff
/\D\d{2,3}x\d{2,3}\D/ -- catches banner-sizes in source: 192x74 |
Those are interesting.
Would you mind posting your whole list? |
|
Back to top |
|
 |
Guest
|
Posted: Wed Jan 07, 2004 Post subject: |
|
|
also how would i set up my filters so that they do not block http://winamp.com/images/home/winamp5ad.jpg
(i am using rue's filters posted earlier in this thread.)
thx |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Thu Jan 08, 2004 Post subject: |
|
|
Change
/[\W\d]ad(server|s)?[\W\d]/
to
/\Wad(server|s)?[\W\d]/
( removing the first \d ) |
|
Back to top |
|
 |
Guest
|
Posted: Fri Jan 09, 2004 Post subject: |
|
|
thx |
|
Back to top |
|
 |
Guest
|
Posted: Sat Jan 10, 2004 Post subject: |
|
|
Hi all,
I've met another false positive problem with this same rule, but you need to control the server side to reproduce it at will. Let's say :
my_site.com/something.php?PHPSESSID=70b25adf7aeb5
The "ad" inside the hexa parameter will be filtered by the rule.
I have fixed this using :
/[\W\d]ad(server|s)?[\W\d](\/|\.)/
That is, specifying that this must be followed by a slash ( a folder) or by a dot (a file).
Now my problem is that by reading this thread, it seems that passing "ad=..." as a parameter is sometimes used. Or is it ?
Your comments ?
Another problem is that neither versions of the rule filter content from www.smartadserver.com, which I don't really understand. |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Sat Jan 10, 2004 Post subject: |
|
|
/[\W\d]ad(server|s)?[\W\d](\/|\.)/
The [\W\d] part _requires_ a non-alphabet character or a digit after the adserver part of the name _followed_ by a slash or a period.
www.smartadserver.com will not be a match, but say www.smartadserver7.com would.
A modified version which would filter both is
/[\W\d]ad(server|s)?[\W\d]?(\/|\.)/ |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Sat Jan 10, 2004 Post subject: |
|
|
Sorry, brain fart.
A better variant would be:
/[\/\.]ad(server|s)?\d*[\/\.]/ |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Sun Jan 11, 2004 Post subject: |
|
|
Guest:
/[\W\d]ad(server|s)?[\W\d]/ doesn't match your proposed url: ..=70b25adf7..
.
It would match if the url contained another digit immediately after "ad". To avoid this, just continue this thread's earlier edit, removing the trailing digit-match:
/\Wad(server|s)?\W/ |
|
Back to top |
|
 |
NJH
Joined: 13 Nov 2003 Posts: 183 Location: Hampshire, England
|
Posted: Sun Jan 11, 2004 Post subject: |
|
|
kstahl wrote: | /[\W\d]ad(server|s)?[\W\d](\/|\.)/
The [\W\d] part _requires_ a non-alphabet character or a digit after the adserver part of the name _followed_ by a slash or a period.
www.smartadserver.com will not be a match, but say www.smartadserver7.com would.
A modified version which would filter both is
/[\W\d]ad(server|s)?[\W\d]?(\/|\.)/ |
kstahl wrote: | Sorry, brain fart.
A better variant would be:
/[\/\.]ad(server|s)?\d*[\/\.]/ |
I do not think any of these filters would match www.smartadserver.com or www.smartadserver7.com as all the filters require a "/" or "." or, for one of the filters, a number immediately before the "adserver" bit of smartadserver. The character before adserver is "t" so all tests will fail.
Also, from what I have seen, I would still allow a trailing digit match as I get advertisements like http://ad0. etc. (I think I have only ever seen a 0, but I am not sure) |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Sun Jan 11, 2004 Post subject: |
|
|
NJH wrote: |
I do not think any of these filters would match www.smartadserver.com or www.smartadserver7.com as all the filters require a "/" or "." or, for one of the filters, a number immediately before the "adserver" bit of smartadserver. The character before adserver is "t" so all tests will fail.
|
Oops.
/[\/\.](smart)?ad(server|s)?\d*[\/\.]/
NJH wrote: |
Also, from what I have seen, I would still allow a trailing digit match as I get advertisements like http://ad0. etc. (I think I have only ever seen a 0, but I am not sure) |
Yep, that's what the \d* is for. "Match zero or more digits." |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Mon Jan 12, 2004 Post subject: |
|
|
Earlier Guest:
You asked for my complete filter-list. I don't think it's necessary to post them all, but here's a few more, including some refinements of earlier postings:http://pagead2.googlesyndication.com/ -- universally heinous -!
http://forums.mozillazine.org/images/avatars/5085478903f36e1443c9ee.jpg -- person i never want to see
yimg.com/*.js -- yahoo's ad-scripts..
us.yimg.com/a/ -- ..and ad-images
/\/buy_assets\//
/[\W\d](top|bottom|left|right)?banner(s|id=|\d)[\W\d]/ -- general banner-stuff (improved) The improved breakup of NJH's super-filter (less unnecessary recursion)
/[\W\d](double|fast)click[\W\d]/
/[\W\d]click(stream|thrutraffic|thru|xchange)[\W\d]/
/[\W\d]value(stream|xchange|click)[\W\d]/
/[\W\d]dime(xchange|click)[\W\d]/ /[\W\d](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/
[Edited for line-length --rue]
Last edited by rue on Sat Jan 24, 2004; edited 3 times in total |
|
Back to top |
|
 |
MW Guest
|
Posted: Mon Jan 12, 2004 Post subject: |
|
|
/\.((bfast|bluestreak|pointroll)|(fast|double|value)click)\.(com|net)/
/(\/|\.)a(d|ds|dServer|d-flow)(\.|\/)/
/\/servedby.advertising.com\//
.yimg.com/a/
those are the ones i use and i can't remember aan ad i didn't want to see..... i also check them for false positives (don't remember the last time i got one)..... no doubt someone can probably "tighten" it up a bit as i know virtually nothing about regexp |
|
Back to top |
|
 |
MW Guest
|
Posted: Mon Jan 12, 2004 Post subject: |
|
|
and this one too....
/\/pagead2.googlesyndication.com\// |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Mon Jan 12, 2004 Post subject: |
|
|
MW:
I know you wouldn't inherently know this, but simple-filters which include a complete host-name are matched faster than any other filter. This is because all such filters' hostnames are copied into a hash and key-matched against the element's host.
.
So http://pagead2.googlesyndication.com/ matches faster than /\/pagead2.googlesyndication.com\//.
Last edited by rue on Sun Jan 18, 2004; edited 1 time in total |
|
Back to top |
|
 |
MW Guest
|
|
Back to top |
|
 |
ValkRaider Guest
|
Posted: Wed Jan 14, 2004 Post subject: How about Privoxy? |
|
|
Has anyone looked at the Privoxy default.actions file and tried to incoporate some of their rules into Adblock rules?
It seems like they could be pretty good, I had good luck with Privoxy on the computers I have used it on... (I like Adblock a bit better as a general style issue though).
Here are privoxy's "general" rules:
Code: | ######################################################################
#
# File : $Source: /cvsroot/ijbswa/current/default.action.master,v $
#
# $Id: default.action.master,v 1.1.2.26 2003/07/11 03:20:34 hal9 Exp $
#
# Purpose : Default actions file, see
# http://www.privoxy.org/user-manual/actions-file.html
#
# Copyright : Written by and Copyright
# Privoxy team. http://www.privoxy.org/
#
# Note: Updated versions of this file will be made available from time
# to time. Check http://sourceforge.net/project/showfiles.php?group_id=11118
# for updates and/or subscribe to the announce mailing list
# (http://lists.sourceforge.net/lists/listinfo/ijbswa-announce) if you
# wish to receive an email notice whenever updates are released. |
snip
Code: | #############################################################################
# Generic block patterns (the most effective!):
#############################################################################
{+block}
# By hostname:
#
ad*.
.*ads.
*banner*.
count*.
# By path:
#
/(.*/)?(ads(erver?|tream)?|.*?ads/|ad(images|cycle|rotate|mentor)?/|
adv(iew|ert(s|enties|is(ing|e?ments)?)?)?|(ad|all|nn|db|promo(tion)?)?
[-_]?banner(s|ads?|farm)?)
/(.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?/)
/.*(count|track|compteur|adframe)(er|run)?\.(pl|cgi|exe|dll|asp|php[34]?|cpt)
/.*promo.gif |
[Edited for line-length --rue] |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Fri Jan 16, 2004 Post subject: |
|
|
Look closely at that list -- in particular, here:
Code: | # By hostname:
#
ad*.
.*ads.
*banner*.
count*.
|
Those expressions aren't even "semi-regexp" -- they're just wrong. Easiest advice: stay away from 3rd-party lists. |
|
Back to top |
|
 |
Guest
|
Posted: Mon Jan 19, 2004 Post subject: |
|
|
rue wrote: |
/[\W\d](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/
. |
/[\W\d](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|type|vertisements?|v|xchange)?)[\W\d]/
Added 'type' in there to catch some extra stuff. |
|
Back to top |
|
 |
Markfive Guest
|
Posted: Fri Jan 23, 2004 Post subject: Amazon |
|
|
Here's an Amazon filter I created.. any ideas/sugesstions?
/amazon\.com.*\W(promotions|marketing|merchants|stores|associates)\W/ |
|
Back to top |
|
 |
Dan Guest
|
Posted: Sat Jan 24, 2004 Post subject: |
|
|
er, ok, having read the above, I'm none the wiser...
what would be suggested as the best basic adblock filter list then? |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Sat Jan 24, 2004 Post subject: |
|
|
Dan:
I'd say this: rue wrote: | /\D\d{2,3}x\d{2,3}\D/ -- catches banner-sizes in source: 192x74 | ..and these. |
|
Back to top |
|
 |
Guest
|
Posted: Sun Jan 25, 2004 Post subject: |
|
|
/[\W\d](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/
catches
http://winamp.com/images/home/winamp5ad.jpg
suggestions? |
|
Back to top |
|
 |
NJH
Joined: 13 Nov 2003 Posts: 183 Location: Hampshire, England
|
Posted: Sun Jan 25, 2004 Post subject: |
|
|
Going on the observation that most or all ads I've seen do not start with a number, I no longer use /[\W\d] at the start of my filters. I normally use /\W. This rule is the one exception where I use /[\W_] at the beginning because I have seen ads of the form "something_ad/somethingelse" and this still picks them up. This should allow your URL.
HTH |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Mon Jan 26, 2004 Post subject: |
|
|
Guest:
Regardless of prefix-digits' commonality, winamp5ad is a very unique exception. If a legitimate picture was named adimage.jpg, it too would be filtered. And I'd offer no apology.
.
If the extreme-exceptions seem bothersome, Adblock likely isn't your cup of tea. |
|
Back to top |
|
 |
Guest
|
Posted: Mon Jan 26, 2004 Post subject: |
|
|
no its ok i was just wondering if there was an easy way to fix it. guess not. well then maybe a whitelist could be developed? |
|
Back to top |
|
 |
NJH
Joined: 13 Nov 2003 Posts: 183 Location: Hampshire, England
|
Posted: Mon Jan 26, 2004 Post subject: |
|
|
Guest, you wrote
Anonymous wrote: | no its ok i was just wondering if there was an easy way to fix it. guess not. well then maybe a whitelist could be developed? |
I responded
NJH wrote: | Going on the observation that most or all ads I've seen do not start with a number, I no longer use /[\W\d] at the start of my filters. I normally use /\W. This rule is the one exception where I use /[\W_] at the beginning because I have seen ads of the form "something_ad/somethingelse" and this still picks them up. This should allow your URL.
HTH |
Try this:
/\W(onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/
or this:
/[\W_](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/
as I suggested. Although rue does not favour these form of his filter, they should work. |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Tue Jan 27, 2004 Post subject: |
|
|
NJH:
It's not that I dislike your variant. Rather, I just think it unnecesary.
.
I've seen a few false-positives with my filters. But the urls being caught in each case were absurdly close to what an ad might be. Rather than cripple the effectiveness toward catching real ads, I'm happy to entertain a few wrong matches.
.
ps: yes, whitelists are coming. hang in there.
Last edited by rue on Tue Jan 27, 2004; edited 1 time in total |
|
Back to top |
|
 |
Guest
|
Posted: Tue Jan 27, 2004 Post subject: |
|
|
sorry. I missed your post. Thanks for the suggestions. I'll try that. |
|
Back to top |
|
 |
jim Guest
|
Posted: Tue Jan 27, 2004 Post subject: |
|
|
/.{150,}/
only webbugs need that many characters. |
|
Back to top |
|
 |
Guest
|
Posted: Wed Feb 11, 2004 Post subject: |
|
|
taken from MyIE2. i haven't seen a single ad with this list:
Code: | [Adblock]
*.ad-*
*.ad.*
*/ad.*
*/ad/*
*/adbot.*
*/adc_*
*/adclient.*
*/adcouncil/*
*/adframe.*
*/adgifs/*
*/adgraph/*
*/adimages/*
*/adinfo*
*/adlog.*
*/adlog/*
*/adrotator.*
*.ads-*
*.ads.*
*/ads.*
*/ads/*
*/advert*
*/adview.*
*/housead/*
*/liveads/*
*/phpads/*
*/softad/*
*/sponsor/*
*/sponsors/*
*/tj_bs
*/tracker/*
*_ad_*
*_borders/*
*_superad*
*a.p.f.qz.*
*a.r.tv.*
*a.tribalfusion.*
*-ad.cgi*
*ad_type*
*adbot*
*adclick*
*adclix*
*adclub*
*adcycle*
*adflight*
*ad-flow*
*adimage*
*adknowledge*
*adlink*
*admaximize*
*admex*
*admonitor*
*adpulse*
*adrunner*
*-ads/*
*adserv*
*adsoftware*
*adswap*
*af.lygo.*/*
*aureate*
*avenuea*
*banner*
*bilbo.counted.*
*bluestreak.*
*burstmedia*
*burstnet*
*clickxchange*
*counter*.bravenet.*
*doubleclick*
*focalink*
*hitbox*
*hitexchange*
*hitlist*
*hitsites*
*houseads_*
*i.imdb*
*i.us.rmi.yahoo.*
*imaginemedia*
*linkads*
*linkexchange*
*linkshare*
*linksynergy*
*media.fastclick*
*paycounter*
*radiate*
*realtracker.*
*secure.webconnect*
*servedby.advertising.*
*spinbox.versiontracker.*
*spylog*
*thecounter*
*trafic.ro/*
*us.a1.yimg.*
*us.f.yahoofs.*
*valueclick*
*view.atdmt*
*adtomi.*
*.linkbuddies.*
*.qksrv.*
*x.mycity.*
*z.about.*
*zdmcirc*
|
|
|
Back to top |
|
 |
Guest
|
Posted: Wed Feb 11, 2004 Post subject: |
|
|
maybe I'm not understanding how adblock works but wouldn't that list slow down page load time? |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Wed Feb 11, 2004 Post subject: |
|
|
Guest:
Adblock automatically ignores head / end wildcards for Simple Filters.
.
What's more, using them (as above) provides an easy way to include the forward-slash (/) at both ends of the filter -- without making a regexp. |
|
Back to top |
|
 |
Melgund Guest
|
Posted: Wed Feb 11, 2004 Post subject: Simple vs regular expressions |
|
|
Is a longish list of simple expressions slower than a much shorter set of regular expressions that block the same stuff?  |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Wed Feb 11, 2004 Post subject: |
|
|
Melgund:
Regexp can compress "patterns" for speed. But, for unique terms, it wouldn't do much.
Ex: ad, adstuff, adworld -> slower than -> /ad(stuff|world)/
:
Ex: un, related, terms -> nearly same -> /un|related|terms/ |
|
Back to top |
|
 |
Guest
|
Posted: Fri Feb 13, 2004 Post subject: |
|
|
Anonymous wrote: | maybe I'm not understanding how adblock works but wouldn't that list slow down page load time? |
yes it would |
|
Back to top |
|
 |
Guest
|
Posted: Fri Feb 13, 2004 Post subject: |
|
|
heres my list
Code: | [Adblock]
*/ad/*
*/adimages/*
*/ads/*
*/adserver/*
*/advertisement.gif
*/advertising/*
*/banner/*
*/banners/*
*/fastclick/*
*/offsite-banners/*
*/phpAdsNew/*
*/promo/*
*/servlet/*
*/smartserve/*
*/sponsor/*
*/sponsors/*
*/viewad/*
/BannerSource/
/adframe/
/banners.php/
/banners/
/clickserve/
/doubleclick/
/hitbox.com/
/linkexchange.com/
http://*.adserver.com/*
http://205.180.85.40/*
http://69.57.136.40/*
http://a.as-us.falkag.net/*
http://a12.g.akamai.net/*
http://a619.g.akamai.net/*
http://ad.linkexchange.com/*
http://adfarm.mediaplex.com/*
http://adimg.cnet.com/*
http://adlog.com.com/*
http://ads.*
http://adserv.bravenet.com/*
http://ar.atwola.com/*
http://as1.falkag.de/*
http://bannerimages.*
http://banners.*
http://banners.dot.tk/*
http://bitzi.com/*
http://c1.zedo.com/*
http://cdn.valueclick.com/*
http://creativeby.viewpoint.com/*
http://cserver.mii.instacontent.net/*
http://stuff-access.com/*
http://gfx.dvlabs.com/klipmart/*
http://http.content.ru4.com/*
http://image.weather.com/creatives/ONDCP/*
http://image.weather.com/creatives/match/*
http://image.weather.com/creatives/wanderlodge/*
http://itxt.vibrantmedia.com/*
http://klipmart.dvlabs.com/*
http://lisa.belointeractive.com/*
http://mads.zdnet.com/mac-ad?*
http://majorgeeks.com/rm/*
http://media.fastclick.net/*
http://media.pointroll.com/*
http://mirror.pointroll.com/*
http://pagead.googlesyndication.com/*
http://pagead2.googlesyndication.com/pagead/ads?*
http://partners.ditto.com/*
http://s0b.bluestreak.com/*
http://servedby.advertising.com/*
http://spd.atdmt.com/*
http://us.a1.yimg.com/*
http://us.yimg.com/a/ya/*
http://view.atdmt.com/*
http://warp.crystalad.com/*
http://www.eyeblaster-bs.com/BurstingPipe/BannerSource.asp?*
http://www.flowgo.com/*
http://www.neowin.net/phpAdsNew/*
http://www.qksrv.net/*
http://www.resellerratings.com/*
http://www.tech-critic.com/dogger%5B1%5D.gif
http://www.tutorialcentral.com/php/adv/*
http://www3.bannerspace.com/*
http://www.fileplanet.com/flash/tickerstats.swf
http://rcm.amazon.com/e/cm?t=httpwwwwincuc-20&p=13&o=1&l=ez&f=ifr
http://www.dscripting.com/forums/style_images/mohaa-974/logo4.swf
http://xslt.alexa.com/site_stats/js/s/a?url=www.neowin.net
http://srv01.addaddy.net/*
http://ad2.ip.ro/*
http://pagead2.googlesyndication.com/*
http://storage.trafic.ro/js/*
http://log.trafic.ro/cgi-bin/*
http://www.neowin.net/images/buttons/survey-120x60.gif
http://213.158.116.18/template/sideads/Sky600_120_1.jpg
http://64.65.56.27/~fosi/casino.gif
http://64.65.56.27/~fosi/viagraworld.jpg
http://64.65.56.27/~fosi/hcmoviestation_200x200_01.gif
http://sc.communities.msn.com/themes/pby/img/promotelogo.gif
http://ad.linksynergy.com/*
http://service.bfast.com/*
http://www.dgm2.com/m/i_defaultB.asp?contid=2015&rand=[TIMESTAMP]
http://forums.overclockers.co.uk/ocukimages/nvidia.jpg
http://forums.overclockers.co.uk/ocukimages/creative.gif
http://netshelter.adtrix.com/*
http://a1964.g.akamaitech.net/*
http://rcm.amazon.com/*
http://adserver.ign.com/*
http://a.tribalfusion.com/*
http://www.tech-critic.com/klip/tcklip.bmp
http://www.zend.com/ads*
http://webpdp.gator.com/*
http://213.158.116.18/template/sideads/gigahosting_small.gif
http://66.79.191.80/~modchip/newbanneragain.gif
http://213.158.116.15/template/sideads/Sky600_120_1.jpg
http://213.158.116.15/template/sideads/gigahosting_small.gif
http://213.158.116.15/template/sideads/steadfast_small.gif
http://home.tiscali.nl/maple/template/sideads/Sky600_120_1.jpg
http://home.tiscali.nl/maple/template/sideads/gigahosting_small.gif
http://mediamgr.ugo.com/*
http://www.ugo.com/*
http://secure-us.imrworldwide.com/*
http://usads.vibrantmedia.com/* |
|
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Sat Feb 14, 2004 Post subject: |
|
|
Anonymous wrote: | Anonymous wrote: | maybe I'm not understanding how adblock works but wouldn't that list slow down page load time? |
yes it would |
Guest:
Actually it wouldn't. As I explained, the latest builds ignore beginning / ending wildcards in Simple Filters. The list is fine. |
|
Back to top |
|
 |
Guest
|
Posted: Sat Feb 14, 2004 Post subject: |
|
|
Hi. I use Agnitum Outpost firewall & can see filtered adverts in the log. Here's the statistics for 2 days' usage (~ 2.5 hours):
the firewall log wrote: | Code: |
CLICK 212
BANNER 92
COUNT. 86
88 x 31 49
ADVERT 46
TOP100. 40
SPYLOG. 31
/AD. 16
STAT. 16
/ADV. 15
100 x 100 15
.COUNT 13
468 x 60 11
HOTLOG. 11
DOUBLECLICK.NET 9
/TOP100/ 7
TOPCTO. 7
/STAT/ 6
|
| Code: |
RB2.DESIGN.RU/CGI-BIN 6
/ADS/ 5
.STAT 4
BANNERS. 4
/ADV/ 3
BIGMIR.NET/?CL 3
/BB.CGI 2
/CNT. 2
/REKLAMA/ 2
BS.YANDEX.RU 2
LINKEXCHANGE.RU 2
/SPONSORS. 1
/SPONSORS/ 1
?AD= 1
125 x 125 1
234 x 60 1
HTTP://BANNER. 1
|
|
|
So I tried to make a list to comply to that log a bit:
Code: | [Adblock]
/(?:hot|spy)log/
/[\W\d_](?:php|page)?ad(?:_id|click|frame|ima?ge?|log|s|server|space|url|v|vert)?[\W\d_]/
/[\W_]b(?:an|nr)s?[\W_]/
/[\W_]jump[\W_]/
/[\W_]redir(?:ect|s)?[\W_]/
/[\W_]stat[\W_]/
/\D\d{2,3}x\d{2,3}\D/
/\W(?:cy|r)?c(?:ou)?nt(?:er|ed)?\W/
/akamai/
/banner/
/bb\.cgi/
/by\.banclk/
/cl(?:ic)?k/
/partner/
/ping\.cgi/
/promotion/
/reklama/
/sponsor/
/spymagic/
/top(?:100|cto)/
http://www.bigmir.net/?cl= |
Do not say 'bout false positives here -- /banner/ for example, as there are many occurrences of different strings with 'banner' inside. I use /banner/ & other simple patterns without major problems. As for your advanced-filter: http://pagead2.googlesyndication.com/ -- in my hosts file there are:
Code: | 127.0.0.1 pagead.googlesyndication.com
127.0.0.1 pagead1.googlesyndication.com
127.0.0.1 pagead2.googlesyndication.com
127.0.0.1 pagead3.googlesyndication.com |
..and much more, with 24368 picked-out strings. Once I had about 500000 strings with almost all sex sites blocked; but then I removed many which were old & discontinued (I don't visit porno anyway ).
Also I don't want to be spied on, so I use something like: Code: | /\W(?:cy|r)?c(?:ou)?nt(?:er|ed)?\W/
/[\W_]stat[\W_]/ |
..etc. But, with the latter, be careful as it's not optimised.
I once had pretty heavily loaded /[\W_](?:double|fast|js|show)?cl(?:ic)?ks?[\W_]/
..but spat on that & now just use /cl(?:ic)?k/
rue, does it make any sense in Adblock to use (?:blah_blah) or (blah_blah)
How did you program it? |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Sat Feb 14, 2004 Post subject: |
|
|
Guest:
Excellent post. I'd never seen anything but pagead2.google.. -- the filter's now generalized to the principal domain.
.
Adblock filters with the JavaScript test() method, which means backrefs are basically ignored. Blocking them just adds to the character-load overhead -- pref-storage / retrieval. |
|
Back to top |
|
 |
Tristan2

Joined: 08 Jan 2004 Posts: 1 Location: Germany, Osnabrueck
|
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Sat Feb 14, 2004 Post subject: |
|
|
Tristan:
To clarify: "Advanced filters" is just a label chosen for the extracted list -- to set it apart.
.
Per your filter: "advertisement" has a trailing underscore. Just copy what you did at the beginning. /[\W\d_]ad(server|s|vertisement|vertising|vert|river)?[\W_]/ |
|
Back to top |
|
 |
Guest
|
Posted: Sun Feb 22, 2004 Post subject: |
|
|
I have done some snooping through the page source for Geocity sites and I've come up with the following filter to get rid of the Geocities ads: .geocities.com/js_source
I don't understand why it wasn't added before but there is probably a reason and I'm going to look like an idiot |
|
Back to top |
|
 |
MW Guest
|
Posted: Sun Feb 22, 2004 Post subject: |
|
|
that might interfere with some of the page functionality... the filter i use for geocities is
Code: | /\.geocities.com/js_source/(ygNSLib9|pu5geo).js/ |
|
|
Back to top |
|
 |
Guest Guest
|
Posted: Mon Feb 23, 2004 Post subject: |
|
|
It seems this filter:
Quote: | /[\W_](b(an|nr)s?|jump|redir(ect|s)?|stat)[\W_]/ |
makes you lose some of the real content on IMDB (www.imdb.com), like the movie posters and such. Still can't figure out exactly why. |
|
Back to top |
|
 |
Guest
|
Posted: Wed Mar 03, 2004 Post subject: |
|
|
I use Org's filters plus a acouple site specific ones. Works great.
Org wrote: | [Adblock]
.yimg.com/a/
/(ad-flow|adsdk|advertising|bizrate|prohosting|resellerratings|tradedoubler|\.atwola|\.atdmt|valueclick)\.com/
/(doubleclick|fastclick|spinbox)\.net/
/[.\/]adcontent[.\/]/
/[\/_]banner([sz]?\/|it\/|\.cgi|\.pl|farm\/|link\.|_pysty\.gif|.?\d*\.)/
/\/(page|online)ad[sz]?\//
/\/\?DC=/
/\/ad(_banner|[zs\?]|click|image.php|image[zs]?|server?)?\//
/\/ad(js.php|size=)/
/aslframe.html
/bd.m?
/\/bnr\//
/_ad[sz]?\.(js|gif|jpg|swf)/
/http:\/\/(mainos|rcm-images)\./
/http:\/\/ad([sz]?|images?|img|serv(er?)?)\./
/http:\/\/banner[sz]?\./
/imdb\.com\/(google\/|.*\.swf)/
http://www.theregister.co.uk/media/
http://www.nvnews.net/images/advertising/
http://mediamgr.ugo.com/
http://falk.speedera.net/ |
Hope that helps :) |
|
Back to top |
|
 |
Guest
|
Posted: Thu Mar 04, 2004 Post subject: |
|
|
just FYI google has changed its ad scheme. Not sure how its done now... I HATE THOSE ADS! |
|
Back to top |
|
 |
adave Guest
|
Posted: Fri Mar 26, 2004 Post subject: combining multiple /[^a-zA-Z]ad[^a-zA-Z]/ |
|
|
Anonymous (Thu Nov 06, 2003) wrote: | my general filters:
Code: |
/[^a-zA-Z]ad[^a-zA-Z]/
/[^a-zA-Z]ads[^a-zA-Z]/
etc...
|
[^a-zA-Z] means everything but the letters a-z, so /[^a-zA-Z]ad[^a-zA-Z]/ would block objects containing /ad.jpg and print.html?ad=true but won't block dad.jpg or mad.gif etc |
Hi, I came to this forum through a google search...
adblock regular expressions examples
...looking for examples of regexps.
I like the idea of this method and have just combined many into one reg exp:
Code: |
/[^a-zA-Z]([Aa]ds|adlog|adserver|advertise|advertising|adverts|fastclick|googlesyndication|serveredby)[^a-zA-Z]/
|
|
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Fri Mar 26, 2004 Post subject: |
|
|
You could use [\W\d_] instead of [^a-zA-Z], it looks much cleaner. Also, you don't need [Aa], Adblock isn't case-sensitive. |
|
Back to top |
|
 |
asd Guest
|
Posted: Wed Mar 31, 2004 Post subject: |
|
|
Why don't this /[\W\d](top|bottom|left|right)?banner(s|it|forum|id=|\d)[\W\d]/ remove banner from this code: Code: | <a href="/cgi/goto.php?banner_id=25"><img src="/dyn/banner/25/1.gif?1080616225" width=468 height=60 alt="" border=1> |
|
|
Back to top |
|
 |
Org
Joined: 23 Oct 2003 Posts: 349
|
Posted: Wed Mar 31, 2004 Post subject: |
|
|
The string "banner_id" inside "<a href>" doesn't match, because your regexp doesn't match the "_".
The string "banner" inside "<img>" doesn't match, because your regexp requires it to be followed by one of the strings "s", "it", "forum", "id=" or a single digit.
Try this one:
/[\W\d](top|bottom|left|right)?banner(s|it|forum|_?id=)?[\W\d]/ |
|
Back to top |
|
 |
asd Guest
|
Posted: Wed Mar 31, 2004 Post subject: |
|
|
OK, I have now made some general block rules. Please tell me if someting is not correct:
/\W(top|right|bottom|left|side)?ad(forum|click|s|server)\W/
So that ?-symbol makes it optional to ad-word to have those top,right-words in front? But is that \W also optional. So now it would also things like sadforum? And how to make that also possible to block ad-words only like /ad/? Should I put \W in those brackets too? |
|
Back to top |
|
 |
Guest
|
Posted: Wed Mar 31, 2004 Post subject: |
|
|
asd wrote: | OK, I have now made some general block rules. Please tell me if someting is not correct:
/\W(top|right|bottom|left|side)?ad(forum|click|s|server)\W/
So that ?-symbol makes it optional to ad-word to have those top,right-words in front? But is that \W also optional. So now it would also things like sadforum? And how to make that also possible to block ad-words only like /ad/? Should I put \W in those brackets too? |
What your filter matches is any string which
1. Starts with a non-word character. That means any character except letters, digits, or _ (underscore). This is what the \W means.
2. That character is possibly followed by one of the words in the paranthesis. The possbily part comes from the ? after the paranthesis, the | separating the words inside is an OR operator.
3. Then comes the letters "ad".
4. They must be followed by one of the words in the last paranthesis, because you have no ? after it.
5. Finally the string must end with another non-word character.
So, answering your questions, no it will not block "sadforum" becasue there is no room for that starting s. to make it block /ad/ you need to put a ? after the last paranthesis to make it's contents optional.
Brackets are also an OR operator, meaning "one of the (single) characters inside". Thus [abc] means "a OR b OR c".
To improve on your filter I would use this one:
/[\W_](top|right|bottom|left|side)?ad(forum|click|s|server)?[\W\d_]/
It means that the match can also start and end with a underscore, like /banner_ad/ or /big_ad.jpg etc. It can also end in a digit (\d) like in /adbanner6.gif (where the "/adbanner6" part would be matched by the filter.
There are very good regexp guides availible on the net. Check out the stickies at the top of the forum, or read the earlier posts in this thread. You'd also get bazillions of hits on google. |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Wed Mar 31, 2004 Post subject: |
|
|
(Yes, it was me that posted the above.)
Here's a decent page on regexps:
http://www.regular-expressions.info/reference.html
These are some of my own filters:
Code: |
/[\W_]ad(click|cycle|image|js|server|s|tech|trix|vert(ising)?|words)?[\W\d_]/
/[\W_]banner(s|id=)?[\W\d_]/
/\?clickTAG\d?=/
/\D(588|468|234|120)x600?\D/
/\W(affiliates|annons(er)?|associates|marketing|promos)\W/
/\W(double|fast|value)click[\W\d]/
/\W(ned|one)stat(basic)?\W/
/\W(page|side|value)ads\W/
/\Wat(dmt|wola)\W/
/\Wgoogle(adservices|syndication)\W/
|
|
|
Back to top |
|
 |
hri Guest
|
Posted: Wed Mar 31, 2004 Post subject: Try this |
|
|
Works great for me:
My strategy is to get as many ads as possible through generic extensions on ad, banner, promo & click. For the rest I rely on domain or directory specific filters.
[Adblock]
/(new|double|fast|value)click./
/[\/.]ban(ner|nerfarm|neri|image|source)s?[\/.]/
/[\/._-](|dhtm|i)ad(banner||mentor|s|s2|sdk|sv3|server|trix|image|img|log|vt|bureau|counter|v|vert|vertising|vertisement)s?[\/._?-]/
/[\/._]promo(|tion)s?[\/._]/
/[\/](associates|affiliates|us.yimg.com\/a)[\/]/
/[\W\d\/.](203.199.70.2|2o7|atdmt|atwola|bfast|bluestreak|coremetrics|dgm2|falkag|hitbox|marketbanker|qksrv|ru4|tribalfusion|zedo)[\W\d\/.]/ |
|
Back to top |
|
 |
asd Guest
|
Posted: Wed Mar 31, 2004 Post subject: |
|
|
Thank you kstahl. You answered my questions perfectly. |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Wed Mar 31, 2004 Post subject: Re: Try this |
|
|
asd: You're welcome. I just noticed I messed up though. My example /adbanner6.gif would of course not get blocked, as the "banner" part is missing in the filter. D'oh!
hri wrote: |
/[\/.]ban(ner|nerfarm|neri|image|source)s?[\/.]/
|
hri: Those brackets look strange to me. You need to prefix the . with a \ otherwise it's parsed as a special regexp character meaning "any character", and I doubt that's what you want? You could as well remove the bracket altogether then, as it matches anything.  |
|
Back to top |
|
 |
Org
Joined: 23 Oct 2003 Posts: 349
|
Posted: Wed Mar 31, 2004 Post subject: Re: Try this |
|
|
kstahl wrote: | hri: Those brackets look strange to me. You need to prefix the . with a \ otherwise it's parsed as a special regexp character meaning "any character" |
Actually, no. You don't have to quote . inside a character class. I think it was rue who had tested this and posted about it. |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Thu Apr 01, 2004 Post subject: |
|
|
Thanks, I must've missed that. I guess it makes sense when you think about it. |
|
Back to top |
|
 |
hri Guest
|
Posted: Thu Apr 01, 2004 Post subject: |
|
|
Thanks Org.
kStahl, I like your "/\D(588|468|234|120)x600?\D/ ". Thanks.
I might add sizes to my list to handle the ads I havent been able to get rid of. |
|
Back to top |
|
 |
Guest
|
Posted: Thu Apr 01, 2004 Post subject: Problem with filter |
|
|
I noticed the filter ( /\D\d{2,3}x\d{2,3}\D/ ) tends to vlock a lot of site logos that are not ads at all. In most, if not all cases, these logo's actually have the word "logo" in the file name. Could someone purpose a filter that would work the same but exlude images with the word logo in them? |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Thu Apr 01, 2004 Post subject: Re: Problem with filter |
|
|
Anonymous wrote: | I noticed the filter ( /\D\d{2,3}x\d{2,3}\D/ ) tends to vlock a lot of site logos that are not ads at all. In most, if not all cases, these logo's actually have the word "logo" in the file name. Could someone purpose a filter that would work the same but exlude images with the word logo in them? |
That's why I use specific sizes in my filter. The generic approach gives to many false positives. |
|
Back to top |
|
 |
Guest
|
Posted: Thu Apr 01, 2004 Post subject: Re: Problem with filter |
|
|
kstahl wrote: | Anonymous wrote: | I noticed the filter ( /\D\d{2,3}x\d{2,3}\D/ ) tends to vlock a lot of site logos that are not ads at all. In most, if not all cases, these logo's actually have the word "logo" in the file name. Could someone purpose a filter that would work the same but exlude images with the word logo in them? |
That's why I use specific sizes in my filter. The generic approach gives to many false positives. |
Could you explain your filter to me? I'm looking at it, and I know that ? means optional, but I'm not positive what the ? in your script is making optional. Is it making the whole statement "x600?" optional? If this is the case, what is your reasoning in making this optional?
Also, what are your numbers based on? They don't look like they come from the firewall log above. |
|
Back to top |
|
 |
Guest
|
Posted: Thu Apr 01, 2004 Post subject: |
|
|
I've extended your filter a little to include the examples from the firewall log above. It seems to be working nicely.
Where did you get your size related numbers from, just personal experience? |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Thu Apr 01, 2004 Post subject: |
|
|
Yes, the sizes are just from personal experience and the web pages I regularely visits.
The ? is making the second 0 in 600 optional so it matches 60 too. |
|
Back to top |
|
 |
Guest
|
Posted: Thu Apr 01, 2004 Post subject: |
|
|
What do you think of /\D([^l][^o][^g][^o]).*\d{2,3}x\d{2,3}\D/
I THINK this will block anything in the form (number)x(number) unless it has the word logo in it? Let me know if this will work as I expect it. |
|
Back to top |
|
 |
Guest
|
Posted: Thu Apr 01, 2004 Post subject: |
|
|
Actually, that doesn't work at all. Isn't there any way to negate something so that the filter will fail if it is in it? like !(logo) or something? I'm looking at the regular expression definitions at http://www.regular-expressions.info/reference.html and I don't see anything. It doesn't make a lot of sense to not have a not in regular expressions. |
|
Back to top |
|
 |
Guest
|
Posted: Thu Apr 01, 2004 Post subject: |
|
|
Ok, I think I have it now:
/\D(?<!logo).*\d{2,3}x\d{2,3}.*(?!logo)\D/
I THINK this will block anything in the form numberxnumber (only 2 to 3 digit numbers) unless it contain a logo befor or after it. It seems to work. Could anyone find an ad in the form numberxnumber for me so I can test it out? |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
|
Back to top |
|
 |
Guest
|
Posted: Fri Apr 02, 2004 Post subject: |
|
|
Ok, my reg exp doesn't seem to be working right. It's not blocking those flash ads. It's advanced, so I wouldn't be suprised if Adblock doesn't parse it as I expect it to. I've broken down and used a more specific filter:
/\D(((588|468|234|120)x600?)|88x31|100x100|125x125)\D/ |
|
Back to top |
|
 |
Erigion Guest
|
Posted: Sun Apr 04, 2004 Post subject: |
|
|
Ok I imported the guest's adblock filter list, who used his firewall to determine what to block.
Code: | [Adblock]
/(?:hot|spy)log/
/[\W\d_](?:php|page)?ad(?:_id|click|frame|ima?ge?|log|s|server|space|url|v|vert)?[\W\d_]/
/[\W_]b(?:an|nr)s?[\W_]/
/[\W_]jump[\W_]/
/[\W_]redir(?:ect|s)?[\W_]/
/[\W_]stat[\W_]/
/\D\d{2,3}x\d{2,3}\D/
/\W(?:cy|r)?c(?:ou)?nt(?:er|ed)?\W/
/akamai/
/banner/
/bb\.cgi/
/by\.banclk/
/cl(?:ic)?k/
/partner/
/ping\.cgi/
/promotion/
/reklama/
/sponsor/
/spymagic/
/top(?:100|cto)/
http://www.bigmir.net/?cl= |
The only problem is that this blocks quite a few previews on the Fx theme website on Texturizer, including all of Aaron Spuler's themes.
URL's for a few of the blocked previews:
http://www.cs.txstate.edu/~as1130/themes/previews/texturizer/smoke-fb-150x75.png
http://themes.mozdev.org/images/walnut_150x75.png
http://home.twcny.rr.com/jhalme/mozilla/previews/mz150x75.png
Anyone know what filter is causing the problem and what can be done to allows those previews?[/code] |
|
Back to top |
|
 |
NJH
Joined: 13 Nov 2003 Posts: 183 Location: Hampshire, England
|
Posted: Sun Apr 04, 2004 Post subject: |
|
|
Quote: | /\D\d{2,3}x\d{2,3}\D/ |
This is the problem. It blocks anything with 2 or 3 digits followed by an "x" followed by another 2 or three digits. I also have problems with this filter so I do not use it. There are too many false positives. I use very explicit filters for these type of images.
You can see which filters are blocking particular images by clicking "Adblock" on the status bar. Blocked images are red. If you select the blocked image then the filter blocking it appears in the New Filter box. |
|
Back to top |
|
 |
|