 |
The Adblock Project Pull up a seat ...stay a while.
|
View previous topic :: View next topic |
Author |
Message |
bene Guest
|
Posted: Tue Aug 17, 2004 Post subject: Between /\D\d{2,3}x\d{2,3}\D/ and specific sizes |
|
|
The filter following filter has seen a fair bit of discussion about it generating too many false positives:
/\D\d{2,3}x\d{2,3}\D/
A few attempts have been made to use the following to prevent blocking URLs with <text>:
(.(?!<text>))*
Usage of the (.(?!<text>))* was discussed in a different context:
http://aasted.org/adblock/viewtopic.php?t=143
Referring to the filter:
/wilderssecurity\.com\/yabbimages\/(.(?!bg))*\.jpg/
To prevent blocking URLs with "bg.jpg" at the end. The first . because Regex guide says "x(?!y)" matches "x" only if it is not followed by "y".
http://devedge.netscape.com/library/manuals/2000/javascript/1.5/guide/regexp.html#1010689
I have been trying to get a filter working using this technique, for example:
/(.(?!thumbs))*\D\d{2,3}x\d{2,3}\D/
Or
/(.(?!thumbs))+\D\d{2,3}x\d{2,3}\D/
Replacing the "0 or more" with "1 or more" to require that something other than "thumbs" appear before the character before the digits. My test case is http://tex.f3d.com, a friends blog that is intermittantly available. The URLs like:
http://tex.f3d.com/photos/thumbs/160x120/DSC00159.JPG
Has anyone been able to add exclusions (or any sort of extension) to this beautiful filter? I'd like to see something along the lines of (.(?!(thumb(nail)?s?|logos?)))* in there.
Additional filter work to exclude tracking images and things that just don't seem right:
/\.(swf|gif)\?(.+=.+)+/
//.\?(.+=.+)+/
And an example of cleaner domain exclusions:
/(2o7|alexa|paypal|amazon|advertising|qksrv|linkexchange)\.(com|net)/
That's me for the moment.
/bene. |
|
Back to top |
|
 |
kstahl Support
Joined: 02 Jan 2004 Posts: 1202 Location: Stockholm, Sweden
|
Posted: Tue Aug 17, 2004 Post subject: |
|
|
I suggest waiting for whitelist functionality. Negating filters makes my head hurt.  _________________ Adblock 0.5.3.042
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1. Gecko/20051111 Firefox/1.5 |
|
Back to top |
|
 |
bene

Joined: 17 Aug 2004 Posts: 123 Location: Home, I think
|
Posted: Tue Aug 17, 2004 Post subject: WhiteLists ain't for me, yet. |
|
|
Waiting for whitelist doesn't really turn my crank in a 100% enjoyable way - I'm the "fire and forget" type, as can be seen with the "don't show me any gifs with CGI parameters." type rules, and my non-interest in explicit sizing exclusions.
I'm off to test the actual function of the wilderssecurity.com filter. We'll see if that was realworld.
/bene. |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Tue Aug 17, 2004 Post subject: |
|
|
bene:
Negating via lookbehind isn't completely possible with mozilla. It's definitely not possible via standard parenthetical syntax. It's half-possible with incremental accounting for all conditions: Goal: /(?<!dont)match/
.
a. d(?!ont)
b. do(?!nt)
c. don(?!t)
d. [^d]?o(?!nt)
e. [^d]?[^o]?n(?!t)
f. *incomplete non-match condition: ^
.
/(d(?!ont)|do(?!nt)|don(?!t)|[^d]?o(?!nt)|[^d]?[^o]?n(?!t)|^)match/ That's some convoluted syntax :P
.
Lastly, the wilderssecurity solution worked, circa its entry-date.
Last edited by rue on Tue Aug 17, 2004; edited 1 time in total |
|
Back to top |
|
 |
bene

Joined: 17 Aug 2004 Posts: 123 Location: Home, I think
|
Posted: Tue Aug 17, 2004 Post subject: (.(?!<text>))* |
|
|
I've done more working with this filter, and I don't know that it'll actually work. For example, if I set a simple filter:
/(.(?!comics))*/
And visit Sluggy Freelance (http://www.sluggy.com), I want the following URL not to be blocked:
http://pics.sluggy.com/comics/040817a.gif
Problem is, the filter matches elsewhere - right from the start, "ht" is any character not followed by "comics", so there's a match, and it's excluded. So I get to thinking:
/(.(?!comics))+/[^/]*/
But somehow, this, too, blocks the URL, and I think because the filter matches right at the start again - "http:/" is any character not followed by "comics" one or more times, then followed by a / and then by zero or more non-/ characters. So next iteration:
/(.(?!comics))/[^/]+$/
Working backwards is easier on this one. The intent is to not block URLs that contain "comics" before a / that is then followed by one or more non-/ characters that lead to the end of the line. "http:/" doesn't match this one - it is followed by more than one slash. My only thought is that "omics/040817a.gif" matches - here is a URL that is any character not followed by "comics" and that matches the additional criteria. So next filter:
/(/(?!comics))/[^/]+$/
Looks good for a moment, then I realize that nothing is being blocked by this filter anymore - it's no longer matching URLs like:
http://www.sluggy.com/images/vcr0.gif
I've since messed around with wildcards a bit, but can't get to a situation where all but the matching URLs are blocked. Any suggestions?
/bene. |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Tue Aug 17, 2004 Post subject: |
|
|
bene:
Scroll back and read my reply.
.
You can't use negative-lookahead alone to any consequence, because it only works for characters which preceed the negative-match. In other words: /.(?!s)/ would match true for "xs", because the 's' character is also tested. |
|
Back to top |
|
 |
bene

Joined: 17 Aug 2004 Posts: 123 Location: Home, I think
|
Posted: Tue Aug 17, 2004 Post subject: matches. |
|
|
rue:
Thanks for the pointers. It's definitely not the most elegant solution, but work-arounds tend to be kludgy. And I did notice the wilderssecurity site had gone through a serious redesign when I dropped by looking to test, which is why I had to pick a different testing site.
And you've gone and edited your post!!! Changed the order around a bit. I don't quite understand the use of "^" to indicate incomplete non-match condition - I've got JavaScript regex in my head, I guess, 'cause "^" indicates start of line unless it's in square brackets, when it's negation (as you've used for [^d] etc.
Aren't filters b. and c. redundant? Filter a. will match any URL that has a "d" that is not followed by "ont". That covers all URLs that have a "do" that is not followed by an "nt", and all URLs that have a "don" that is not followed by a "t". With words:
dachsund a
donut a,b,c
don't a,b,c
dont
eldorado a,b
Is the "^" going to catch any incidental characters?
I'm going to have to get my hands dirty with real world examples. I hadn't noticed your previous post when I posted my previous...
/bene. |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Tue Aug 17, 2004 Post subject: |
|
|
bene:
No, b and c aren't redundant. You have to realize that we're architecting a whitelist of possibilities against the negative match.
.
"do" is allowed. "don" is also allowed.
.
Lastly, we also want to match if there's no characters matching our negative-condition at all. That's where we fail. RegExp is forward-searching only, so we can't really know if there's a "do" preceeding an "nt" once the search gets to the 'n'. Effectively, this nullifies out attempt. A weak work-around is to hope we have a string actually starting with whatever character we're checking, matching the "^" anchor. I moved this last, to allow for any possible matching to occur prior -- parentheticals stop searching once a child matches true. |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|