The Adblock Project Forum Index The Adblock Project
Pull up a seat ...stay a while.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Why isn't everyone here compiling the ultimate filter list?
Goto page 1, 2, 3, 4, 5  Next
 
Post new topic   This topic is locked: you cannot edit posts or make replies.    The Adblock Project Forum Index -> Main
View previous topic :: View next topic  
Author Message
Jakobud
Guest





PostPosted: Fri Oct 31, 2003    Post subject: Why isn't everyone here compiling the ultimate filter list? Reply with quote

Why don't we do this? We can post all of our filters or even our filter files so everyone can download them and import them into Adblock.
Back to top
Org



Joined: 23 Oct 2003
Posts: 349

PostPosted: Fri Oct 31, 2003    Post subject: Reply with quote

Why not indeed? This is what I'm using right now (works well, but it's still evolving slowly):

Code:
[Adblock]
*.ad.*
*.atwola.com/*
*/?DC=*
*/AD_Banner/*
*/Banner/*
*/ad/*
*/adclick/*
*/adimage.php*
*/adimages/*
*/ads/*
*/ads?*
*/adserve/*
*/banner.cgi*
*/banner/*
*/bannerfarm/*
*/bannerit/*
*/bannerlink.*
*/banners/*
*/bnr/*
*/cgi-bin/bd.m?*
*/onlineads/*
*adcontent.*
*adsdk.com/*
*advertising.com/*
*bizrate.com/*
*doubleclick.net*
*fastclick.net/*
*resellerratings.com/*
*spinbox.net*
*tradedoubler.com*
http://adimages.*
http://ads.*
http://banner.*
http://rcm-images.*
http://view.atdmt.com/*
http://www.theregister.co.uk/media/*

(List was created with the export command in Adblock preferences.)
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Fri Oct 31, 2003    Post subject: Reply with quote

Here's a list from early August.
.
It's pretty good for macintosh / major news-sites; duplicates were entered for testing.
.
Oh, and prepended exclamation-marks disable stuff.
Back to top
View user's profile Send private message
Aaron Spuler
QA Testing


Joined: 23 Oct 2003
Posts: 19

PostPosted: Fri Oct 31, 2003    Post subject: Reply with quote

http://mozthemes.tk -- go to adblock --> filters. that's the list i've compiled since using adblock and it blocks basically everything for me. check it out and see what you think Smile
Back to top
View user's profile Send private message Visit poster's website
Henrik
Owner


Joined: 22 Oct 2003
Posts: 33
Location: Copenhagen, Denmark

PostPosted: Sat Nov 01, 2003    Post subject: Warning Reply with quote

The idea of compiling the ultimate filter list is appealing, but may present problems. The reason for this is that Adblock works by matching the address of every blockable object against every filter in the list. In algorithmic terms, this scales as O(M*N).

This is actually not a very effective way of doing it, but it's necessary because of the use of filters instead of just blacklisting certain servers, and as long as the list is kept small most of us will never notice any delays.

What I'm trying to say is that compiling a huge list of every ad-directory on every server on the planet will scale extremely badly. The idea of creating a "beginner's list" is very good, though, but should perhaps be limited to containing very generic filters (*/ads/*, *doubleclick*, etc). At the moment, every filter-list should be adapted to the needs of the individual user and his surfing habits.

It should be noted that one of Rue's "near future"-plans are (correct me if I say something wrong here, Rue!) to make an auto-pruning feature for unused filters. When this arrives, a huge list will be a good idea, as it will slowly addapt itself to the users need as it's used.
Back to top
View user's profile Send private message Visit poster's website
Lanny Chambers
Guest





PostPosted: Sun Nov 02, 2003    Post subject: Reply with quote

We're all running BannerBlind, right? BB takes a large load off the Adblock filter list.

Now, if only there were a way to block all Flash movies smaller than a certain pixel size...
Back to top
Org



Joined: 23 Oct 2003
Posts: 349

PostPosted: Sun Nov 02, 2003    Post subject: Reply with quote

I used to run Bannerblind in Mozilla before I changed to Firebird. Now with FB and Adblock I see absolutely no need for Bannerblind. Adblock simply blocks everything unwanted.
Back to top
View user's profile Send private message
Guest






PostPosted: Thu Nov 06, 2003    Post subject: Reply with quote

my general filters:
Code:

/[^a-zA-Z]ad[^a-zA-Z]/
/[^a-zA-Z]adcycle*
/[^a-zA-Z]adrotate*
/[^a-zA-Z]ads[^a-zA-Z]/
/[^a-zA-Z]adserv*
/[^a-zA-Z]adv*

[^a-zA-Z] means everything but the letters a-z, so /[^a-zA-Z]ad[^a-zA-Z]/ would block objects containing /ad.jpg and print.html?ad=true but won't block dad.jpg or mad.gif etc
Back to top
McMurmel



Joined: 13 Nov 2003
Posts: 19
Location: Germany

PostPosted: Thu Nov 13, 2003    Post subject: Reply with quote

Don't kill me but if you want to use regular expressions the string must begin and end with an / ?

so the correct list would look like /[^a-zA-Z]adcycle/ instead of /[^a-zA-Z]adcycle* and so on?
_________________
The road goes ever on and on - down from the door that it began - now far ahead the road has gone and I must follow if I can...
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Thu Nov 13, 2003    Post subject: Reply with quote

Guest:
You can't mix simple and RegExp syntax. Regular Expressions must begin and end with the forward-slash (/) and the wildcard is period+asterisk (.*). That list is mostly wrong.

McMurmel:
Ignore the expressions above you. The example you were correcting should read: /[^a-zA-Z]adcycle.*/
Back to top
View user's profile Send private message
McMurmel



Joined: 13 Nov 2003
Posts: 19
Location: Germany

PostPosted: Thu Nov 13, 2003    Post subject: Reply with quote

Sorry to correct you but what is the difference between these three regular expressions?:

1. /[^a-zA-Z]adcycle/
2. /[^a-zA-Z]adcycle.*/
3. /.*[^a-zA-Z]adcycle.*/

As long as we don't use multiline-strings there's no difference (and we don't because url's are single-line-strings). They all deliver the same result. .* means any character but the newline-character (\n) 0 - oo times, so the first is the simplest of all three and should be prefered.

P.S: That the reason I suggested a change in entering simple filters.
_________________
The road goes ever on and on - down from the door that it began - now far ahead the road has gone and I must follow if I can...
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Thu Nov 13, 2003    Post subject: Reply with quote

McMurmel:
Yea- I knew that, but in the interest of preserving the original author's intent, I converted his wildcard. I didn't realize this nick was yours (Stefan), so I tried to be as unconfusing as possible :P
Back to top
View user's profile Send private message
McMurmel



Joined: 13 Nov 2003
Posts: 19
Location: Germany

PostPosted: Fri Nov 14, 2003    Post subject: Reply with quote

Quote:

Yea- I knew that, but in the interest of preserving the original author's intent, I converted his wildcard. I didn't realize this nick was yours (Stefan), so I tried to be as unconfusing as possible :P


... well that is the reason you're the board-admin and I am but a normal user. Enduser-support was never one of my good skills, so I said no to Hendrik when he suggested that I should admin the project because he was too busy studying - but that's another story...
_________________
The road goes ever on and on - down from the door that it began - now far ahead the road has gone and I must follow if I can...
Back to top
View user's profile Send private message
Weber



Joined: 16 Nov 2003
Posts: 1

PostPosted: Sun Nov 16, 2003    Post subject: Reply with quote

This filter get rid of everything, havnt seen a ad in weeks(after upgrading to the newest adblock... good work)

[AdBlock]
*/ad/*
*/ads/*
*/click/*
*/fastclick/*
*ADV*
*adlog*
*ads*
*adsdk*
*adserver*
*adtech*
*advertising*
*annone*
*atdmt*
*banners*
*click.*
*doubleclick*
*image.ugo.com*
*m2*
*sob*
*spinbox*
*viewad*
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sun Nov 16, 2003    Post subject: Reply with quote

The fourth pass of build 17 (posted last evening) allows unanchored simple-expressions.
.
This means you can replace *spinbox* with spinbox -- and it works just fine.
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Mon Nov 17, 2003    Post subject: Reply with quote

The idea of the ultimate filter list is not bad, but I do not think the wheel should be totally re-invented. This is not the only ad blocking software using regular expressions in its block file. Have a look at this thread I started as a guest. Rue very kindly converted the list into the correct format for AdBlock as the original used a slightly different dialect of regular expressions. It is a fairly large list and I have no idea what performance hit a PC would take when using it. I have not timed it. There may well be other lists already available on the web.

One thing which has to be watched is oversimplification of the block file as you possibly see on Weber's list. If my understanding is correct, *ads* would block www.dadsuk.co.uk, www.dadsaregreat.com and sites with things like madscientist or ADSL in the URL etc; *ADV* would block URLs with advice, advantage and so on.


Last edited by NJH on Thu Nov 27, 2003; edited 1 time in total
Back to top
View user's profile Send private message
Org



Joined: 23 Oct 2003
Posts: 349

PostPosted: Mon Nov 17, 2003    Post subject: Reply with quote

NJH wrote:
One thing which has to be watched is oversimplification of the block file as you possibly see on Weber's list. If my understanding is correct, *ads* would block www.dadsuk.co.uk, www.dadsaregreat.com and sites with things like madscientist or ADSL in the URL etc; *ADV* would block URLs with advice, advantage and so on.

So very true. No matter what kind of list you make, be sure take take this point into consideration. (If you take a closer look at the filter list I posted earlier in this thread, you can see that I have tried to take care to avoid these kind of false blockings.)
Back to top
View user's profile Send private message
Guest






PostPosted: Wed Dec 10, 2003    Post subject: one more Reply with quote

/http://ad\d*\./

blocks things like

http://ad.server.com/blabla
http://ad1.server.com/blabla
http://ad42.server.com/blabla

and so on.
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Wed Dec 10, 2003    Post subject: Reply with quote

I suppose it couldn't hurt -- some further refinements, lifted straight from my list:

/[\W\d]ad(server|s)?[\W\d]/ -- catches .ad., /ad., cgi?ads=19, etc.
/[\W\d]banner(s|id\=)[\W\d]/ -- general banner-stuff
/\D\d{2,3}x\d{2,3}\D/ -- catches banner-sizes in source: 192x74
http://babelfish.altavista.com/babelfish/urltrtop -- removes babelfish's annoying upper-frame
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Wed Dec 10, 2003    Post subject: Reply with quote

Try something like /[^a-z]ad(s?|v)[^a-z]/. It gets rid of the words ad, ads, and adv bounded by any non-letter which includes . / and any number. This means it also gets rid of things like ad0.
A fuller filter is /[^a-z]ad(s?|serv(er|e)?|v)[^a-z]/

For banners, I use 2 filters to pick up words beginning in banner or ending in banner(s):
/[^a-z]banner/
/banners?[^a-z]/

From previous correspondance, I think your [\W\d] is broadly equivalent to my [^a-z]
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Wed Dec 10, 2003    Post subject: Reply with quote

NJH:
Using [\W\d] is not only easier on the eyes, but also independant of Adblock's case-insensitivity.
.
Also, you made some prior mistakes again: /[^a-z]ad(s?|serv(er|e)?|v)[^a-z]/
First, an OR-set shouldn't contain zero-match cases if anything follows the set. In other words, (s?|..)[^a-z] should be (s|..)?[^a-z]. Second, you've ordered the set such that it matches (s? before |serv. Recall, you have to order by decreasing complexity.
.
In the end, the pattern better reads: /[\W\d]ad(serv(er|e)?|s|v)?[\W\d]/
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Wed Dec 10, 2003    Post subject: Reply with quote

rue,

Interesting comments. I will change my filters to [\W\d]. I was going to have a go sometime about re-working my filter to get the ? of the s? outside the OR set, but what I posted I tested following our last correspondence. I do not necessarily understand why. My filter matches /ad/, /ads/, /ad0 and any other number, /adv/, /adserv/, /adserve/ and /adserver/. It also matches other leading and trailing characters.

If you can explain why it works when I could not get something similar to work in our previous correspondence I would be interested.

Nick
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Thu Dec 11, 2003    Post subject: Reply with quote

NJH:
In the JavaScript Console, try these two cases:
alert("vx".match(/(s?|v)/));
alert("vx".match(/(s?|v)x/));
The first correctly reports an empty, yet positive match: s? matches true for zero-occurrances on the first character it encounters, the set stops testing.
.
The second, however, catches the "v". Apparently sets continue testing if something follows the set. Whether this is a hidden-rule, or a bug, it's rather illogical. I wouldn't advise relying on it.
Back to top
View user's profile Send private message
Guest






PostPosted: Wed Dec 17, 2003    Post subject: Reply with quote

Does anyone have a simple, but effective filter that WON'T kill http://www.pvponline.com ?
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Wed Dec 17, 2003    Post subject: Reply with quote

Guest (tychoquad):
Scroll up, and copy the entries posted here.
.
Then, add these:
/[\W\d](ad|dime|double|fast|value|click)(stream|s|thrutraffic|thru|xchange|click)[\W\d]/
http://pagead2.googlesyndication.com
Back to top
View user's profile Send private message
Guest






PostPosted: Wed Dec 17, 2003    Post subject: Reply with quote

Thank you rue, this is exactly what I was after.

The only simple/effective list i have been able to find that didn't kill all images in existance was that mac one you posted a while ago. all the others were so long, it lagged my whole computer every time firebird loaded a page.

A big problem here, is that I don't think anyone exept you knows how to properly build a short and effective list. I have no idea what all this expression stuff is about, even after reading the stuff at adblock.mozdev.org
Back to top
Henrik
Owner


Joined: 22 Oct 2003
Posts: 33
Location: Copenhagen, Denmark

PostPosted: Wed Dec 17, 2003    Post subject: Reply with quote

Quote:
A big problem here, is that I don't think anyone exept you knows how to properly build a short and effective list. I have no idea what all this expression stuff is about, even after reading the stuff at adblock.mozdev.org


Remember that using regular expressions is optional, and should only be used by people who are comfortable with them.

There is still the option of using the simpler wildcard (*) filters.
Back to top
View user's profile Send private message Visit poster's website
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Wed Dec 17, 2003    Post subject: Reply with quote

Guest:
It just takes a little patience -- thinking things through.
.
Having access as we do "behind-the-scenes", Henrik and I are acutely aware of the overhead for larger lists. For elements that aren't blocked, the entire list must be tested against them; if you're not careful, the penalty can be significant.
.
Check out the RegExp tutorials we came up with for Bethrezen, here. You'll find many links to further tutorials, as well.
Back to top
View user's profile Send private message
TomChiverton



Joined: 11 Nov 2003
Posts: 3

PostPosted: Thu Dec 18, 2003    Post subject: Reply with quote

Please don't but 'banner.*' in the generic block list - perfectly nice sites often call or serve their nav. banners like that :-)
Back to top
View user's profile Send private message
Guest






PostPosted: Tue Jan 06, 2004    Post subject: Reply with quote

rue wrote:

/[\W\d]ad(server|s)?[\W\d]/ -- catches .ad., /ad., cgi?ads=19, etc.
/[\W\d]banner(s|id\=)[\W\d]/ -- general banner-stuff
/\D\d{2,3}x\d{2,3}\D/ -- catches banner-sizes in source: 192x74


Those are interesting.
Would you mind posting your whole list?
Back to top
Guest






PostPosted: Wed Jan 07, 2004    Post subject: Reply with quote

also how would i set up my filters so that they do not block http://winamp.com/images/home/winamp5ad.jpg

(i am using rue's filters posted earlier in this thread.)

thx
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Thu Jan 08, 2004    Post subject: Reply with quote

Change
/[\W\d]ad(server|s)?[\W\d]/
to
/\Wad(server|s)?[\W\d]/

( removing the first \d )
Back to top
View user's profile Send private message
Guest






PostPosted: Fri Jan 09, 2004    Post subject: Reply with quote

thx
Back to top
Guest






PostPosted: Sat Jan 10, 2004    Post subject: Reply with quote

Hi all,

I've met another false positive problem with this same rule, but you need to control the server side to reproduce it at will. Let's say :
my_site.com/something.php?PHPSESSID=70b25adf7aeb5

The "ad" inside the hexa parameter will be filtered by the rule.

I have fixed this using :
/[\W\d]ad(server|s)?[\W\d](\/|\.)/
That is, specifying that this must be followed by a slash ( a folder) or by a dot (a file).
Now my problem is that by reading this thread, it seems that passing "ad=..." as a parameter is sometimes used. Or is it ?

Your comments ?

Another problem is that neither versions of the rule filter content from www.smartadserver.com, which I don't really understand.
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Sat Jan 10, 2004    Post subject: Reply with quote

/[\W\d]ad(server|s)?[\W\d](\/|\.)/

The [\W\d] part _requires_ a non-alphabet character or a digit after the adserver part of the name _followed_ by a slash or a period.

www.smartadserver.com will not be a match, but say www.smartadserver7.com would.

A modified version which would filter both is
/[\W\d]ad(server|s)?[\W\d]?(\/|\.)/
Back to top
View user's profile Send private message
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Sat Jan 10, 2004    Post subject: Reply with quote

Sorry, brain fart.

A better variant would be:
/[\/\.]ad(server|s)?\d*[\/\.]/
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sun Jan 11, 2004    Post subject: Reply with quote

Guest:
/[\W\d]ad(server|s)?[\W\d]/ doesn't match your proposed url: ..=70b25adf7..
.
It would match if the url contained another digit immediately after "ad". To avoid this, just continue this thread's earlier edit, removing the trailing digit-match:
/\Wad(server|s)?\W/
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Sun Jan 11, 2004    Post subject: Reply with quote

kstahl wrote:
/[\W\d]ad(server|s)?[\W\d](\/|\.)/

The [\W\d] part _requires_ a non-alphabet character or a digit after the adserver part of the name _followed_ by a slash or a period.

www.smartadserver.com will not be a match, but say www.smartadserver7.com would.

A modified version which would filter both is
/[\W\d]ad(server|s)?[\W\d]?(\/|\.)/


kstahl wrote:
Sorry, brain fart.

A better variant would be:
/[\/\.]ad(server|s)?\d*[\/\.]/


I do not think any of these filters would match www.smartadserver.com or www.smartadserver7.com as all the filters require a "/" or "." or, for one of the filters, a number immediately before the "adserver" bit of smartadserver. The character before adserver is "t" so all tests will fail.

Also, from what I have seen, I would still allow a trailing digit match as I get advertisements like http://ad0. etc. (I think I have only ever seen a 0, but I am not sure)
Back to top
View user's profile Send private message
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Sun Jan 11, 2004    Post subject: Reply with quote

NJH wrote:

I do not think any of these filters would match www.smartadserver.com or www.smartadserver7.com as all the filters require a "/" or "." or, for one of the filters, a number immediately before the "adserver" bit of smartadserver. The character before adserver is "t" so all tests will fail.


Oops.

/[\/\.](smart)?ad(server|s)?\d*[\/\.]/

NJH wrote:

Also, from what I have seen, I would still allow a trailing digit match as I get advertisements like http://ad0. etc. (I think I have only ever seen a 0, but I am not sure)


Yep, that's what the \d* is for. "Match zero or more digits."
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Mon Jan 12, 2004    Post subject: Reply with quote

Earlier Guest:
You asked for my complete filter-list. I don't think it's necessary to post them all, but here's a few more, including some refinements of earlier postings:
http://pagead2.googlesyndication.com/ -- universally heinous -!
http://forums.mozillazine.org/images/avatars/5085478903f36e1443c9ee.jpg -- person i never want to see
yimg.com/*.js -- yahoo's ad-scripts..
us.yimg.com/a/ -- ..and ad-images
/\/buy_assets\//
/[\W\d](top|bottom|left|right)?banner(s|id=|\d)[\W\d]/ -- general banner-stuff (improved)
The improved breakup of NJH's super-filter (less unnecessary recursion)
/[\W\d](double|fast)click[\W\d]/
/[\W\d]click(stream|thrutraffic|thru|xchange)[\W\d]/
/[\W\d]value(stream|xchange|click)[\W\d]/
/[\W\d]dime(xchange|click)[\W\d]/
/[\W\d](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/

[Edited for line-length --rue]


Last edited by rue on Sat Jan 24, 2004; edited 3 times in total
Back to top
View user's profile Send private message
MW
Guest





PostPosted: Mon Jan 12, 2004    Post subject: Reply with quote

/\.((bfast|bluestreak|pointroll)|(fast|double|value)click)\.(com|net)/
/(\/|\.)a(d|ds|dServer|d-flow)(\.|\/)/
/\/servedby.advertising.com\//
.yimg.com/a/


those are the ones i use and i can't remember aan ad i didn't want to see..... i also check them for false positives (don't remember the last time i got one)..... no doubt someone can probably "tighten" it up a bit as i know virtually nothing about regexp
Back to top
MW
Guest





PostPosted: Mon Jan 12, 2004    Post subject: Reply with quote

and this one too....

/\/pagead2.googlesyndication.com\//
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Mon Jan 12, 2004    Post subject: Reply with quote

MW:
I know you wouldn't inherently know this, but simple-filters which include a complete host-name are matched faster than any other filter. This is because all such filters' hostnames are copied into a hash and key-matched against the element's host.
.
So http://pagead2.googlesyndication.com/ matches faster than /\/pagead2.googlesyndication.com\//.


Last edited by rue on Sun Jan 18, 2004; edited 1 time in total
Back to top
View user's profile Send private message
MW
Guest





PostPosted: Mon Jan 12, 2004    Post subject: Reply with quote

thanx for the tip on the filter http://pagead2.googlesyndication.com ... i didn't realize that......

BTW what'a up with http://forums.mozillazine.org/images/avatars/5085478903f36e1443c9ee.jpg
she break your heart ??? Sad Sad
Back to top
ValkRaider
Guest





PostPosted: Wed Jan 14, 2004    Post subject: How about Privoxy? Reply with quote

Has anyone looked at the Privoxy default.actions file and tried to incoporate some of their rules into Adblock rules?

It seems like they could be pretty good, I had good luck with Privoxy on the computers I have used it on... (I like Adblock a bit better as a general style issue though).

Here are privoxy's "general" rules:
Code:
######################################################################
#
#  File        :  $Source: /cvsroot/ijbswa/current/default.action.master,v $
#
#  $Id: default.action.master,v 1.1.2.26 2003/07/11 03:20:34 hal9 Exp $
#
#  Purpose     :  Default actions file, see
#                 http://www.privoxy.org/user-manual/actions-file.html
#
#  Copyright   :  Written by and Copyright
#                 Privoxy team. http://www.privoxy.org/
#
# Note: Updated versions of this file will be made available from time
#       to time. Check http://sourceforge.net/project/showfiles.php?group_id=11118
#       for updates and/or subscribe to the announce mailing list
#       (http://lists.sourceforge.net/lists/listinfo/ijbswa-announce) if you
#       wish to receive an email notice whenever updates are released.


snip

Code:
#############################################################################
# Generic block patterns (the most effective!):
#############################################################################
{+block}

# By hostname:
#
ad*.
.*ads.
*banner*.
count*.

# By path:
#
/(.*/)?(ads(erver?|tream)?|.*?ads/|ad(images|cycle|rotate|mentor)?/|
      adv(iew|ert(s|enties|is(ing|e?ments)?)?)?|(ad|all|nn|db|promo(tion)?)?
      [-_]?banner(s|ads?|farm)?)
/(.*/)?(publicite|werbung|rekla(ma|me|am)|annonse|maino(kset|nta|s)?/)
/.*(count|track|compteur|adframe)(er|run)?\.(pl|cgi|exe|dll|asp|php[34]?|cpt)
/.*promo.gif


[Edited for line-length --rue]
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Fri Jan 16, 2004    Post subject: Reply with quote

Look closely at that list -- in particular, here:
Code:
# By hostname:
#
ad*.
.*ads.
*banner*.
count*.

Those expressions aren't even "semi-regexp" -- they're just wrong. Easiest advice: stay away from 3rd-party lists.
Back to top
View user's profile Send private message
Guest






PostPosted: Mon Jan 19, 2004    Post subject: Reply with quote

rue wrote:

/[\W\d](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/
.

/[\W\d](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|type|vertisements?|v|xchange)?)[\W\d]/

Added 'type' in there to catch some extra stuff.
Back to top
Markfive
Guest





PostPosted: Fri Jan 23, 2004    Post subject: Amazon Reply with quote

Here's an Amazon filter I created.. any ideas/sugesstions?

/amazon\.com.*\W(promotions|marketing|merchants|stores|associates)\W/
Back to top
Dan
Guest





PostPosted: Sat Jan 24, 2004    Post subject: Reply with quote

er, ok, having read the above, I'm none the wiser...

what would be suggested as the best basic adblock filter list then?
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sat Jan 24, 2004    Post subject: Reply with quote

Dan:
I'd say this:
    rue wrote:
    /\D\d{2,3}x\d{2,3}\D/ -- catches banner-sizes in source: 192x74
..and these.
Back to top
View user's profile Send private message
Guest






PostPosted: Sun Jan 25, 2004    Post subject: Reply with quote

/[\W\d](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/

catches

http://winamp.com/images/home/winamp5ad.jpg

suggestions?
Back to top
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Sun Jan 25, 2004    Post subject: Reply with quote

Going on the observation that most or all ads I've seen do not start with a number, I no longer use /[\W\d] at the start of my filters. I normally use /\W. This rule is the one exception where I use /[\W_] at the beginning because I have seen ads of the form "something_ad/somethingelse" and this still picks them up. This should allow your URL.

HTH
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Mon Jan 26, 2004    Post subject: Reply with quote

Guest:
Regardless of prefix-digits' commonality, winamp5ad is a very unique exception. If a legitimate picture was named adimage.jpg, it too would be filtered. And I'd offer no apology.
.
If the extreme-exceptions seem bothersome, Adblock likely isn't your cup of tea.
Back to top
View user's profile Send private message
Guest






PostPosted: Mon Jan 26, 2004    Post subject: Reply with quote

no its ok i was just wondering if there was an easy way to fix it. guess not. well then maybe a whitelist could be developed?
Back to top
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Mon Jan 26, 2004    Post subject: Reply with quote

Guest, you wrote

Anonymous wrote:
no its ok i was just wondering if there was an easy way to fix it. guess not. well then maybe a whitelist could be developed?


I responded

NJH wrote:
Going on the observation that most or all ads I've seen do not start with a number, I no longer use /[\W\d] at the start of my filters. I normally use /\W. This rule is the one exception where I use /[\W_] at the beginning because I have seen ads of the form "something_ad/somethingelse" and this still picks them up. This should allow your URL.

HTH


Try this:

/\W(onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/

or this:

/[\W_](onlineads?|ad(banner|click|frame|images?|js|log|serv(er|e)?|stream|_string|s|trix|vertisements?|v|xchange)?)[\W\d]/

as I suggested. Although rue does not favour these form of his filter, they should work.
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Tue Jan 27, 2004    Post subject: Reply with quote

NJH:
It's not that I dislike your variant. Rather, I just think it unnecesary.
.
I've seen a few false-positives with my filters. But the urls being caught in each case were absurdly close to what an ad might be. Rather than cripple the effectiveness toward catching real ads, I'm happy to entertain a few wrong matches.

.
ps: yes, whitelists are coming. hang in there.


Last edited by rue on Tue Jan 27, 2004; edited 1 time in total
Back to top
View user's profile Send private message
Guest






PostPosted: Tue Jan 27, 2004    Post subject: Reply with quote

sorry. I missed your post. Thanks for the suggestions. I'll try that.
Back to top
jim
Guest





PostPosted: Tue Jan 27, 2004    Post subject: Reply with quote

/.{150,}/

only webbugs need that many characters.
Back to top
Guest






PostPosted: Wed Feb 11, 2004    Post subject: Reply with quote

taken from MyIE2. i haven't seen a single ad with this list:

Code:
[Adblock]
*.ad-*
*.ad.*
*/ad.*
*/ad/*
*/adbot.*
*/adc_*
*/adclient.*
*/adcouncil/*
*/adframe.*
*/adgifs/*
*/adgraph/*
*/adimages/*
*/adinfo*
*/adlog.*
*/adlog/*
*/adrotator.*
*.ads-*
*.ads.*
*/ads.*
*/ads/*
*/advert*
*/adview.*
*/housead/*
*/liveads/*
*/phpads/*
*/softad/*
*/sponsor/*
*/sponsors/*
*/tj_bs
*/tracker/*
*_ad_*
*_borders/*
*_superad*
*a.p.f.qz.*
*a.r.tv.*
*a.tribalfusion.*
*-ad.cgi*
*ad_type*
*adbot*
*adclick*
*adclix*
*adclub*
*adcycle*
*adflight*
*ad-flow*
*adimage*
*adknowledge*
*adlink*
*admaximize*
*admex*
*admonitor*
*adpulse*
*adrunner*
*-ads/*
*adserv*
*adsoftware*
*adswap*
*af.lygo.*/*
*aureate*
*avenuea*
*banner*
*bilbo.counted.*
*bluestreak.*
*burstmedia*
*burstnet*
*clickxchange*
*counter*.bravenet.*
*doubleclick*
*focalink*
*hitbox*
*hitexchange*
*hitlist*
*hitsites*
*houseads_*
*i.imdb*
*i.us.rmi.yahoo.*
*imaginemedia*
*linkads*
*linkexchange*
*linkshare*
*linksynergy*
*media.fastclick*
*paycounter*
*radiate*
*realtracker.*
*secure.webconnect*
*servedby.advertising.*
*spinbox.versiontracker.*
*spylog*
*thecounter*
*trafic.ro/*
*us.a1.yimg.*
*us.f.yahoofs.*
*valueclick*
*view.atdmt*
*adtomi.*
*.linkbuddies.*
*.qksrv.*
*x.mycity.*
*z.about.*
*zdmcirc*
Back to top
Guest






PostPosted: Wed Feb 11, 2004    Post subject: Reply with quote

maybe I'm not understanding how adblock works but wouldn't that list slow down page load time?
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Wed Feb 11, 2004    Post subject: Reply with quote

Guest:
Adblock automatically ignores head / end wildcards for Simple Filters.
.
What's more, using them (as above) provides an easy way to include the forward-slash (/) at both ends of the filter -- without making a regexp.
Back to top
View user's profile Send private message
Melgund
Guest





PostPosted: Wed Feb 11, 2004    Post subject: Simple vs regular expressions Reply with quote

Is a longish list of simple expressions slower than a much shorter set of regular expressions that block the same stuff? Question
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Wed Feb 11, 2004    Post subject: Reply with quote

Melgund:
Regexp can compress "patterns" for speed. But, for unique terms, it wouldn't do much.

Ex: ad, adstuff, adworld -> slower than -> /ad(stuff|world)/
:
Ex: un, related, terms -> nearly same -> /un|related|terms/
Back to top
View user's profile Send private message
Guest






PostPosted: Fri Feb 13, 2004    Post subject: Reply with quote

Anonymous wrote:
maybe I'm not understanding how adblock works but wouldn't that list slow down page load time?


yes it would
Back to top
Guest






PostPosted: Fri Feb 13, 2004    Post subject: Reply with quote

heres my list

Code:
[Adblock]
*/ad/*
*/adimages/*
*/ads/*
*/adserver/*
*/advertisement.gif
*/advertising/*
*/banner/*
*/banners/*
*/fastclick/*
*/offsite-banners/*
*/phpAdsNew/*
*/promo/*
*/servlet/*
*/smartserve/*
*/sponsor/*
*/sponsors/*
*/viewad/*
/BannerSource/
/adframe/
/banners.php/
/banners/
/clickserve/
/doubleclick/
/hitbox.com/
/linkexchange.com/
http://*.adserver.com/*
http://205.180.85.40/*
http://69.57.136.40/*
http://a.as-us.falkag.net/*
http://a12.g.akamai.net/*
http://a619.g.akamai.net/*
http://ad.linkexchange.com/*
http://adfarm.mediaplex.com/*
http://adimg.cnet.com/*
http://adlog.com.com/*
http://ads.*
http://adserv.bravenet.com/*
http://ar.atwola.com/*
http://as1.falkag.de/*
http://bannerimages.*
http://banners.*
http://banners.dot.tk/*
http://bitzi.com/*
http://c1.zedo.com/*
http://cdn.valueclick.com/*
http://creativeby.viewpoint.com/*
http://cserver.mii.instacontent.net/*
http://stuff-access.com/*
http://gfx.dvlabs.com/klipmart/*
http://http.content.ru4.com/*
http://image.weather.com/creatives/ONDCP/*
http://image.weather.com/creatives/match/*
http://image.weather.com/creatives/wanderlodge/*
http://itxt.vibrantmedia.com/*
http://klipmart.dvlabs.com/*
http://lisa.belointeractive.com/*
http://mads.zdnet.com/mac-ad?*
http://majorgeeks.com/rm/*
http://media.fastclick.net/*
http://media.pointroll.com/*
http://mirror.pointroll.com/*
http://pagead.googlesyndication.com/*
http://pagead2.googlesyndication.com/pagead/ads?*
http://partners.ditto.com/*
http://s0b.bluestreak.com/*
http://servedby.advertising.com/*
http://spd.atdmt.com/*
http://us.a1.yimg.com/*
http://us.yimg.com/a/ya/*
http://view.atdmt.com/*
http://warp.crystalad.com/*
http://www.eyeblaster-bs.com/BurstingPipe/BannerSource.asp?*
http://www.flowgo.com/*
http://www.neowin.net/phpAdsNew/*
http://www.qksrv.net/*
http://www.resellerratings.com/*
http://www.tech-critic.com/dogger%5B1%5D.gif
http://www.tutorialcentral.com/php/adv/*
http://www3.bannerspace.com/*
http://www.fileplanet.com/flash/tickerstats.swf
http://rcm.amazon.com/e/cm?t=httpwwwwincuc-20&p=13&o=1&l=ez&f=ifr
http://www.dscripting.com/forums/style_images/mohaa-974/logo4.swf
http://xslt.alexa.com/site_stats/js/s/a?url=www.neowin.net
http://srv01.addaddy.net/*
http://ad2.ip.ro/*
http://pagead2.googlesyndication.com/*
http://storage.trafic.ro/js/*
http://log.trafic.ro/cgi-bin/*
http://www.neowin.net/images/buttons/survey-120x60.gif
http://213.158.116.18/template/sideads/Sky600_120_1.jpg
http://64.65.56.27/~fosi/casino.gif
http://64.65.56.27/~fosi/viagraworld.jpg
http://64.65.56.27/~fosi/hcmoviestation_200x200_01.gif
http://sc.communities.msn.com/themes/pby/img/promotelogo.gif
http://ad.linksynergy.com/*
http://service.bfast.com/*
http://www.dgm2.com/m/i_defaultB.asp?contid=2015&rand=[TIMESTAMP]
http://forums.overclockers.co.uk/ocukimages/nvidia.jpg
http://forums.overclockers.co.uk/ocukimages/creative.gif
http://netshelter.adtrix.com/*
http://a1964.g.akamaitech.net/*
http://rcm.amazon.com/*
http://adserver.ign.com/*
http://a.tribalfusion.com/*
http://www.tech-critic.com/klip/tcklip.bmp
http://www.zend.com/ads*
http://webpdp.gator.com/*
http://213.158.116.18/template/sideads/gigahosting_small.gif
http://66.79.191.80/~modchip/newbanneragain.gif
http://213.158.116.15/template/sideads/Sky600_120_1.jpg
http://213.158.116.15/template/sideads/gigahosting_small.gif
http://213.158.116.15/template/sideads/steadfast_small.gif
http://home.tiscali.nl/maple/template/sideads/Sky600_120_1.jpg
http://home.tiscali.nl/maple/template/sideads/gigahosting_small.gif
http://mediamgr.ugo.com/*
http://www.ugo.com/*
http://secure-us.imrworldwide.com/*
http://usads.vibrantmedia.com/*
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sat Feb 14, 2004    Post subject: Reply with quote

Anonymous wrote:
Anonymous wrote:
maybe I'm not understanding how adblock works but wouldn't that list slow down page load time?


yes it would

Guest:
Actually it wouldn't. As I explained, the latest builds ignore beginning / ending wildcards in Simple Filters. The list is fine.
Back to top
View user's profile Send private message
Guest






PostPosted: Sat Feb 14, 2004    Post subject: Reply with quote

Hi. I use Agnitum Outpost firewall & can see filtered adverts in the log. Here's the statistics for 2 days' usage (~ 2.5 hours):
the firewall log wrote:
Code:

CLICK                 212
BANNER                 92
COUNT.                 86
88 x 31                49
ADVERT                 46
TOP100.                40
SPYLOG.                31
/AD.                   16
STAT.                  16
/ADV.                  15
100 x 100              15
.COUNT                 13
468 x 60               11
HOTLOG.                11
DOUBLECLICK.NET         9
/TOP100/                7
TOPCTO.                 7
/STAT/                  6
                                   
Code:

RB2.DESIGN.RU/CGI-BIN   6
/ADS/                   5
.STAT                   4
BANNERS.                4
/ADV/                   3
BIGMIR.NET/?CL          3
/BB.CGI                 2
/CNT.                   2
/REKLAMA/               2
BS.YANDEX.RU            2
LINKEXCHANGE.RU         2
/SPONSORS.              1
/SPONSORS/              1
?AD=                    1
125 x 125               1
234 x 60                1
HTTP://BANNER.          1
                                   

So I tried to make a list to comply to that log a bit:
Code:
[Adblock]
/(?:hot|spy)log/
/[\W\d_](?:php|page)?ad(?:_id|click|frame|ima?ge?|log|s|server|space|url|v|vert)?[\W\d_]/
/[\W_]b(?:an|nr)s?[\W_]/
/[\W_]jump[\W_]/
/[\W_]redir(?:ect|s)?[\W_]/
/[\W_]stat[\W_]/
/\D\d{2,3}x\d{2,3}\D/
/\W(?:cy|r)?c(?:ou)?nt(?:er|ed)?\W/
/akamai/
/banner/
/bb\.cgi/
/by\.banclk/
/cl(?:ic)?k/
/partner/
/ping\.cgi/
/promotion/
/reklama/
/sponsor/
/spymagic/
/top(?:100|cto)/
http://www.bigmir.net/?cl=

Do not say 'bout false positives here -- /banner/ for example, as there are many occurrences of different strings with 'banner' inside. I use /banner/ & other simple patterns without major problems. As for your advanced-filter: http://pagead2.googlesyndication.com/ -- in my hosts file there are:
Code:
127.0.0.1   pagead.googlesyndication.com
127.0.0.1   pagead1.googlesyndication.com
127.0.0.1   pagead2.googlesyndication.com
127.0.0.1   pagead3.googlesyndication.com

..and much more, with 24368 picked-out strings. Once I had about 500000 strings with almost all sex sites blocked; but then I removed many which were old & discontinued (I don't visit porno anyway Cool).

Also I don't want to be spied on, so I use something like:
Code:
/\W(?:cy|r)?c(?:ou)?nt(?:er|ed)?\W/
/[\W_]stat[\W_]/

..etc. But, with the latter, be careful as it's not optimised.

I once had pretty heavily loaded /[\W_](?:double|fast|js|show)?cl(?:ic)?ks?[\W_]/
..but spat on that & now just use /cl(?:ic)?k/

rue, does it make any sense in Adblock to use (?:blah_blah) or (blah_blah)
How did you program it?
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sat Feb 14, 2004    Post subject: Reply with quote

Guest:
Excellent post. I'd never seen anything but pagead2.google.. -- the filter's now generalized to the principal domain.
.
Adblock filters with the JavaScript test() method, which means backrefs are basically ignored. Blocking them just adds to the character-load overhead -- pref-storage / retrieval.
Back to top
View user's profile Send private message
Tristan2



Joined: 08 Jan 2004
Posts: 1
Location: Germany, Osnabrueck

PostPosted: Sat Feb 14, 2004    Post subject: Reply with quote

I need help.
Why does this advanced filter

/[\W\d_]ad(server|s|vertisement|vertising|vert|river)?\W/

not match this:

http://img.onvista.de/gen/advertisement_de.gif

Tristan
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sat Feb 14, 2004    Post subject: Reply with quote

Tristan:
To clarify: "Advanced filters" is just a label chosen for the extracted list -- to set it apart.
.
Per your filter: "advertisement" has a trailing underscore. Just copy what you did at the beginning.
/[\W\d_]ad(server|s|vertisement|vertising|vert|river)?[\W_]/
Back to top
View user's profile Send private message
Guest






PostPosted: Sun Feb 22, 2004    Post subject: Reply with quote

I have done some snooping through the page source for Geocity sites and I've come up with the following filter to get rid of the Geocities ads: .geocities.com/js_source

I don't understand why it wasn't added before but there is probably a reason and I'm going to look like an idiot
Back to top
MW
Guest





PostPosted: Sun Feb 22, 2004    Post subject: Reply with quote

that might interfere with some of the page functionality... the filter i use for geocities is
Code:
/\.geocities.com/js_source/(ygNSLib9|pu5geo).js/
Back to top
Guest
Guest





PostPosted: Mon Feb 23, 2004    Post subject: Reply with quote

It seems this filter:

Quote:
/[\W_](b(an|nr)s?|jump|redir(ect|s)?|stat)[\W_]/


makes you lose some of the real content on IMDB (www.imdb.com), like the movie posters and such. Still can't figure out exactly why.
Back to top
Guest






PostPosted: Wed Mar 03, 2004    Post subject: Reply with quote

I use Org's filters plus a acouple site specific ones. Works great.

Org wrote:
[Adblock]
.yimg.com/a/
/(ad-flow|adsdk|advertising|bizrate|prohosting|resellerratings|tradedoubler|\.atwola|\.atdmt|valueclick)\.com/
/(doubleclick|fastclick|spinbox)\.net/
/[.\/]adcontent[.\/]/
/[\/_]banner([sz]?\/|it\/|\.cgi|\.pl|farm\/|link\.|_pysty\.gif|.?\d*\.)/
/\/(page|online)ad[sz]?\//
/\/\?DC=/
/\/ad(_banner|[zs\?]|click|image.php|image[zs]?|server?)?\//
/\/ad(js.php|size=)/
/aslframe.html
/bd.m?
/\/bnr\//
/_ad[sz]?\.(js|gif|jpg|swf)/
/http:\/\/(mainos|rcm-images)\./
/http:\/\/ad([sz]?|images?|img|serv(er?)?)\./
/http:\/\/banner[sz]?\./
/imdb\.com\/(google\/|.*\.swf)/
http://www.theregister.co.uk/media/
http://www.nvnews.net/images/advertising/
http://mediamgr.ugo.com/
http://falk.speedera.net/

Hope that helps :)
Back to top
Guest






PostPosted: Thu Mar 04, 2004    Post subject: Reply with quote

just FYI google has changed its ad scheme. Not sure how its done now... I HATE THOSE ADS!
Back to top
adave
Guest





PostPosted: Fri Mar 26, 2004    Post subject: combining multiple /[^a-zA-Z]ad[^a-zA-Z]/ Reply with quote

Anonymous (Thu Nov 06, 2003) wrote:
my general filters:
Code:

/[^a-zA-Z]ad[^a-zA-Z]/
/[^a-zA-Z]ads[^a-zA-Z]/
etc...

[^a-zA-Z] means everything but the letters a-z, so /[^a-zA-Z]ad[^a-zA-Z]/ would block objects containing /ad.jpg and print.html?ad=true but won't block dad.jpg or mad.gif etc

Hi, I came to this forum through a google search...
adblock regular expressions examples
...looking for examples of regexps.

I like the idea of this method and have just combined many into one reg exp:
Code:

/[^a-zA-Z]([Aa]ds|adlog|adserver|advertise|advertising|adverts|fastclick|googlesyndication|serveredby)[^a-zA-Z]/
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Fri Mar 26, 2004    Post subject: Reply with quote

You could use [\W\d_] instead of [^a-zA-Z], it looks much cleaner. Also, you don't need [Aa], Adblock isn't case-sensitive.
Back to top
View user's profile Send private message
asd
Guest





PostPosted: Wed Mar 31, 2004    Post subject: Reply with quote

Why don't this /[\W\d](top|bottom|left|right)?banner(s|it|forum|id=|\d)[\W\d]/ remove banner from this code:
Code:
<a href="/cgi/goto.php?banner_id=25"><img src="/dyn/banner/25/1.gif?1080616225" width=468 height=60 alt="" border=1>
Back to top
Org



Joined: 23 Oct 2003
Posts: 349

PostPosted: Wed Mar 31, 2004    Post subject: Reply with quote

The string "banner_id" inside "<a href>" doesn't match, because your regexp doesn't match the "_".

The string "banner" inside "<img>" doesn't match, because your regexp requires it to be followed by one of the strings "s", "it", "forum", "id=" or a single digit.

Try this one:

/[\W\d](top|bottom|left|right)?banner(s|it|forum|_?id=)?[\W\d]/
Back to top
View user's profile Send private message
asd
Guest





PostPosted: Wed Mar 31, 2004    Post subject: Reply with quote

OK, I have now made some general block rules. Please tell me if someting is not correct:

/\W(top|right|bottom|left|side)?ad(forum|click|s|server)\W/

So that ?-symbol makes it optional to ad-word to have those top,right-words in front? But is that \W also optional. So now it would also things like sadforum? And how to make that also possible to block ad-words only like /ad/? Should I put \W in those brackets too?
Back to top
Guest






PostPosted: Wed Mar 31, 2004    Post subject: Reply with quote

asd wrote:
OK, I have now made some general block rules. Please tell me if someting is not correct:

/\W(top|right|bottom|left|side)?ad(forum|click|s|server)\W/

So that ?-symbol makes it optional to ad-word to have those top,right-words in front? But is that \W also optional. So now it would also things like sadforum? And how to make that also possible to block ad-words only like /ad/? Should I put \W in those brackets too?


What your filter matches is any string which

1. Starts with a non-word character. That means any character except letters, digits, or _ (underscore). This is what the \W means.

2. That character is possibly followed by one of the words in the paranthesis. The possbily part comes from the ? after the paranthesis, the | separating the words inside is an OR operator.

3. Then comes the letters "ad".

4. They must be followed by one of the words in the last paranthesis, because you have no ? after it.

5. Finally the string must end with another non-word character.

So, answering your questions, no it will not block "sadforum" becasue there is no room for that starting s. to make it block /ad/ you need to put a ? after the last paranthesis to make it's contents optional.

Brackets are also an OR operator, meaning "one of the (single) characters inside". Thus [abc] means "a OR b OR c".

To improve on your filter I would use this one:
/[\W_](top|right|bottom|left|side)?ad(forum|click|s|server)?[\W\d_]/

It means that the match can also start and end with a underscore, like /banner_ad/ or /big_ad.jpg etc. It can also end in a digit (\d) like in /adbanner6.gif (where the "/adbanner6" part would be matched by the filter.

There are very good regexp guides availible on the net. Check out the stickies at the top of the forum, or read the earlier posts in this thread. You'd also get bazillions of hits on google.
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Wed Mar 31, 2004    Post subject: Reply with quote

(Yes, it was me that posted the above.)

Here's a decent page on regexps:
http://www.regular-expressions.info/reference.html

These are some of my own filters:

Code:

/[\W_]ad(click|cycle|image|js|server|s|tech|trix|vert(ising)?|words)?[\W\d_]/
/[\W_]banner(s|id=)?[\W\d_]/
/\?clickTAG\d?=/
/\D(588|468|234|120)x600?\D/
/\W(affiliates|annons(er)?|associates|marketing|promos)\W/
/\W(double|fast|value)click[\W\d]/
/\W(ned|one)stat(basic)?\W/
/\W(page|side|value)ads\W/
/\Wat(dmt|wola)\W/
/\Wgoogle(adservices|syndication)\W/
Back to top
View user's profile Send private message
hri
Guest





PostPosted: Wed Mar 31, 2004    Post subject: Try this Reply with quote

Works great for me:
My strategy is to get as many ads as possible through generic extensions on ad, banner, promo & click. For the rest I rely on domain or directory specific filters.


[Adblock]
/(new|double|fast|value)click./
/[\/.]ban(ner|nerfarm|neri|image|source)s?[\/.]/
/[\/._-](|dhtm|i)ad(banner||mentor|s|s2|sdk|sv3|server|trix|image|img|log|vt|bureau|counter|v|vert|vertising|vertisement)s?[\/._?-]/
/[\/._]promo(|tion)s?[\/._]/
/[\/](associates|affiliates|us.yimg.com\/a)[\/]/
/[\W\d\/.](203.199.70.2|2o7|atdmt|atwola|bfast|bluestreak|coremetrics|dgm2|falkag|hitbox|marketbanker|qksrv|ru4|tribalfusion|zedo)[\W\d\/.]/
Back to top
asd
Guest





PostPosted: Wed Mar 31, 2004    Post subject: Reply with quote

Thank you kstahl. You answered my questions perfectly.
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Wed Mar 31, 2004    Post subject: Re: Try this Reply with quote

asd: You're welcome. I just noticed I messed up though. My example /adbanner6.gif would of course not get blocked, as the "banner" part is missing in the filter. D'oh!


hri wrote:

/[\/.]ban(ner|nerfarm|neri|image|source)s?[\/.]/



hri: Those brackets look strange to me. You need to prefix the . with a \ otherwise it's parsed as a special regexp character meaning "any character", and I doubt that's what you want? You could as well remove the bracket altogether then, as it matches anything. Smile
Back to top
View user's profile Send private message
Org



Joined: 23 Oct 2003
Posts: 349

PostPosted: Wed Mar 31, 2004    Post subject: Re: Try this Reply with quote

kstahl wrote:
hri: Those brackets look strange to me. You need to prefix the . with a \ otherwise it's parsed as a special regexp character meaning "any character"

Actually, no. You don't have to quote . inside a character class. I think it was rue who had tested this and posted about it.
Back to top
View user's profile Send private message
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Thu Apr 01, 2004    Post subject: Reply with quote

Thanks, I must've missed that. I guess it makes sense when you think about it.
Back to top
View user's profile Send private message
hri
Guest





PostPosted: Thu Apr 01, 2004    Post subject: Reply with quote

Thanks Org.

kStahl, I like your "/\D(588|468|234|120)x600?\D/ ". Thanks.

I might add sizes to my list to handle the ads I havent been able to get rid of.
Back to top
Guest






PostPosted: Thu Apr 01, 2004    Post subject: Problem with filter Reply with quote

I noticed the filter ( /\D\d{2,3}x\d{2,3}\D/ ) tends to vlock a lot of site logos that are not ads at all. In most, if not all cases, these logo's actually have the word "logo" in the file name. Could someone purpose a filter that would work the same but exlude images with the word logo in them?
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Thu Apr 01, 2004    Post subject: Re: Problem with filter Reply with quote

Anonymous wrote:
I noticed the filter ( /\D\d{2,3}x\d{2,3}\D/ ) tends to vlock a lot of site logos that are not ads at all. In most, if not all cases, these logo's actually have the word "logo" in the file name. Could someone purpose a filter that would work the same but exlude images with the word logo in them?


That's why I use specific sizes in my filter. The generic approach gives to many false positives.
Back to top
View user's profile Send private message
Guest






PostPosted: Thu Apr 01, 2004    Post subject: Re: Problem with filter Reply with quote

kstahl wrote:
Anonymous wrote:
I noticed the filter ( /\D\d{2,3}x\d{2,3}\D/ ) tends to vlock a lot of site logos that are not ads at all. In most, if not all cases, these logo's actually have the word "logo" in the file name. Could someone purpose a filter that would work the same but exlude images with the word logo in them?


That's why I use specific sizes in my filter. The generic approach gives to many false positives.


Could you explain your filter to me? I'm looking at it, and I know that ? means optional, but I'm not positive what the ? in your script is making optional. Is it making the whole statement "x600?" optional? If this is the case, what is your reasoning in making this optional?

Also, what are your numbers based on? They don't look like they come from the firewall log above.
Back to top
Guest






PostPosted: Thu Apr 01, 2004    Post subject: Reply with quote

I've extended your filter a little to include the examples from the firewall log above. It seems to be working nicely.
Where did you get your size related numbers from, just personal experience?
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Thu Apr 01, 2004    Post subject: Reply with quote

Yes, the sizes are just from personal experience and the web pages I regularely visits.

The ? is making the second 0 in 600 optional so it matches 60 too.
Back to top
View user's profile Send private message
Guest






PostPosted: Thu Apr 01, 2004    Post subject: Reply with quote

What do you think of /\D([^l][^o][^g][^o]).*\d{2,3}x\d{2,3}\D/
I THINK this will block anything in the form (number)x(number) unless it has the word logo in it? Let me know if this will work as I expect it.
Back to top
Guest






PostPosted: Thu Apr 01, 2004    Post subject: Reply with quote

Actually, that doesn't work at all. Isn't there any way to negate something so that the filter will fail if it is in it? like !(logo) or something? I'm looking at the regular expression definitions at http://www.regular-expressions.info/reference.html and I don't see anything. It doesn't make a lot of sense to not have a not in regular expressions.
Back to top
Guest






PostPosted: Thu Apr 01, 2004    Post subject: Reply with quote

Ok, I think I have it now:
/\D(?<!logo).*\d{2,3}x\d{2,3}.*(?!logo)\D/
I THINK this will block anything in the form numberxnumber (only 2 to 3 digit numbers) unless it contain a logo befor or after it. It seems to work. Could anyone find an ad in the form numberxnumber for me so I can test it out?
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Fri Apr 02, 2004    Post subject: Reply with quote

http://www.aintitcoolnews.com has several flash banners of that type on the front page.
Back to top
View user's profile Send private message
Guest






PostPosted: Fri Apr 02, 2004    Post subject: Reply with quote

Ok, my reg exp doesn't seem to be working right. It's not blocking those flash ads. It's advanced, so I wouldn't be suprised if Adblock doesn't parse it as I expect it to. I've broken down and used a more specific filter:
/\D(((588|468|234|120)x600?)|88x31|100x100|125x125)\D/
Back to top
Erigion
Guest





PostPosted: Sun Apr 04, 2004    Post subject: Reply with quote

Ok I imported the guest's adblock filter list, who used his firewall to determine what to block.

Code:
[Adblock]
/(?:hot|spy)log/
/[\W\d_](?:php|page)?ad(?:_id|click|frame|ima?ge?|log|s|server|space|url|v|vert)?[\W\d_]/
/[\W_]b(?:an|nr)s?[\W_]/
/[\W_]jump[\W_]/
/[\W_]redir(?:ect|s)?[\W_]/
/[\W_]stat[\W_]/
/\D\d{2,3}x\d{2,3}\D/
/\W(?:cy|r)?c(?:ou)?nt(?:er|ed)?\W/
/akamai/
/banner/
/bb\.cgi/
/by\.banclk/
/cl(?:ic)?k/
/partner/
/ping\.cgi/
/promotion/
/reklama/
/sponsor/
/spymagic/
/top(?:100|cto)/
http://www.bigmir.net/?cl=


The only problem is that this blocks quite a few previews on the Fx theme website on Texturizer, including all of Aaron Spuler's themes.

URL's for a few of the blocked previews:

http://www.cs.txstate.edu/~as1130/themes/previews/texturizer/smoke-fb-150x75.png
http://themes.mozdev.org/images/walnut_150x75.png
http://home.twcny.rr.com/jhalme/mozilla/previews/mz150x75.png

Anyone know what filter is causing the problem and what can be done to allows those previews?[/code]
Back to top
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Sun Apr 04, 2004    Post subject: Reply with quote

Quote:
/\D\d{2,3}x\d{2,3}\D/


This is the problem. It blocks anything with 2 or 3 digits followed by an "x" followed by another 2 or three digits. I also have problems with this filter so I do not use it. There are too many false positives. I use very explicit filters for these type of images.

You can see which filters are blocking particular images by clicking "Adblock" on the status bar. Blocked images are red. If you select the blocked image then the filter blocking it appears in the New Filter box.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   This topic is locked: you cannot edit posts or make replies.    The Adblock Project Forum Index -> Main All times are GMT + 1 Hour
Goto page 1, 2, 3, 4, 5  Next
Page 1 of 5

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group