The Adblock Project Forum Index The Adblock Project
Pull up a seat ...stay a while.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Between /\D\d{2,3}x\d{2,3}\D/ and specific sizes

 
Post new topic   Reply to topic    The Adblock Project Forum Index -> Main
View previous topic :: View next topic  
Author Message
bene
Guest





PostPosted: Tue Aug 17, 2004    Post subject: Between /\D\d{2,3}x\d{2,3}\D/ and specific sizes Reply with quote

The filter following filter has seen a fair bit of discussion about it generating too many false positives:

/\D\d{2,3}x\d{2,3}\D/

A few attempts have been made to use the following to prevent blocking URLs with <text>:

(.(?!<text>))*

Usage of the (.(?!<text>))* was discussed in a different context:

http://aasted.org/adblock/viewtopic.php?t=143

Referring to the filter:

/wilderssecurity\.com\/yabbimages\/(.(?!bg))*\.jpg/

To prevent blocking URLs with "bg.jpg" at the end. The first . because Regex guide says "x(?!y)" matches "x" only if it is not followed by "y".

http://devedge.netscape.com/library/manuals/2000/javascript/1.5/guide/regexp.html#1010689

I have been trying to get a filter working using this technique, for example:

/(.(?!thumbs))*\D\d{2,3}x\d{2,3}\D/

Or

/(.(?!thumbs))+\D\d{2,3}x\d{2,3}\D/

Replacing the "0 or more" with "1 or more" to require that something other than "thumbs" appear before the character before the digits. My test case is http://tex.f3d.com, a friends blog that is intermittantly available. The URLs like:

http://tex.f3d.com/photos/thumbs/160x120/DSC00159.JPG

Has anyone been able to add exclusions (or any sort of extension) to this beautiful filter? I'd like to see something along the lines of (.(?!(thumb(nail)?s?|logos?)))* in there.

Additional filter work to exclude tracking images and things that just don't seem right:

/\.(swf|gif)\?(.+=.+)+/
//.\?(.+=.+)+/

And an example of cleaner domain exclusions:

/(2o7|alexa|paypal|amazon|advertising|qksrv|linkexchange)\.(com|net)/

That's me for the moment.

/bene.
Back to top
kstahl
Support


Joined: 02 Jan 2004
Posts: 1202
Location: Stockholm, Sweden

PostPosted: Tue Aug 17, 2004    Post subject: Reply with quote

I suggest waiting for whitelist functionality. Negating filters makes my head hurt. Confused
_________________
Adblock 0.5.3.042
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.Cool Gecko/20051111 Firefox/1.5
Back to top
View user's profile Send private message
bene



Joined: 17 Aug 2004
Posts: 123
Location: Home, I think

PostPosted: Tue Aug 17, 2004    Post subject: WhiteLists ain't for me, yet. Reply with quote

Waiting for whitelist doesn't really turn my crank in a 100% enjoyable way - I'm the "fire and forget" type, as can be seen with the "don't show me any gifs with CGI parameters." type rules, and my non-interest in explicit sizing exclusions.

I'm off to test the actual function of the wilderssecurity.com filter. We'll see if that was realworld.

/bene.
Back to top
View user's profile Send private message Visit poster's website
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Tue Aug 17, 2004    Post subject: Reply with quote

bene:
Negating via lookbehind isn't completely possible with mozilla. It's definitely not possible via standard parenthetical syntax. It's half-possible with incremental accounting for all conditions:
Goal: /(?<!dont)match/
.
a. d(?!ont)
b. do(?!nt)
c. don(?!t)
d. [^d]?o(?!nt)
e. [^d]?[^o]?n(?!t)
f. *incomplete non-match condition: ^

.
/(d(?!ont)|do(?!nt)|don(?!t)|[^d]?o(?!nt)|[^d]?[^o]?n(?!t)|^)match/
That's some convoluted syntax :P
.
Lastly, the wilderssecurity solution worked, circa its entry-date.


Last edited by rue on Tue Aug 17, 2004; edited 1 time in total
Back to top
View user's profile Send private message
bene



Joined: 17 Aug 2004
Posts: 123
Location: Home, I think

PostPosted: Tue Aug 17, 2004    Post subject: (.(?!<text>))* Reply with quote

I've done more working with this filter, and I don't know that it'll actually work. For example, if I set a simple filter:

/(.(?!comics))*/

And visit Sluggy Freelance (http://www.sluggy.com), I want the following URL not to be blocked:

http://pics.sluggy.com/comics/040817a.gif

Problem is, the filter matches elsewhere - right from the start, "ht" is any character not followed by "comics", so there's a match, and it's excluded. So I get to thinking:

/(.(?!comics))+/[^/]*/

But somehow, this, too, blocks the URL, and I think because the filter matches right at the start again - "http:/" is any character not followed by "comics" one or more times, then followed by a / and then by zero or more non-/ characters. So next iteration:

/(.(?!comics))/[^/]+$/

Working backwards is easier on this one. The intent is to not block URLs that contain "comics" before a / that is then followed by one or more non-/ characters that lead to the end of the line. "http:/" doesn't match this one - it is followed by more than one slash. My only thought is that "omics/040817a.gif" matches - here is a URL that is any character not followed by "comics" and that matches the additional criteria. So next filter:

/(/(?!comics))/[^/]+$/

Looks good for a moment, then I realize that nothing is being blocked by this filter anymore - it's no longer matching URLs like:

http://www.sluggy.com/images/vcr0.gif

I've since messed around with wildcards a bit, but can't get to a situation where all but the matching URLs are blocked. Any suggestions?

/bene.
Back to top
View user's profile Send private message Visit poster's website
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Tue Aug 17, 2004    Post subject: Reply with quote

bene:
Scroll back and read my reply.
.
You can't use negative-lookahead alone to any consequence, because it only works for characters which preceed the negative-match. In other words: /.(?!s)/ would match true for "xs", because the 's' character is also tested.
Back to top
View user's profile Send private message
bene



Joined: 17 Aug 2004
Posts: 123
Location: Home, I think

PostPosted: Tue Aug 17, 2004    Post subject: matches. Reply with quote

rue:

Thanks for the pointers. It's definitely not the most elegant solution, but work-arounds tend to be kludgy. And I did notice the wilderssecurity site had gone through a serious redesign when I dropped by looking to test, which is why I had to pick a different testing site.

And you've gone and edited your post!!! Changed the order around a bit. I don't quite understand the use of "^" to indicate incomplete non-match condition - I've got JavaScript regex in my head, I guess, 'cause "^" indicates start of line unless it's in square brackets, when it's negation (as you've used for [^d] etc.

Aren't filters b. and c. redundant? Filter a. will match any URL that has a "d" that is not followed by "ont". That covers all URLs that have a "do" that is not followed by an "nt", and all URLs that have a "don" that is not followed by a "t". With words:

dachsund a
donut a,b,c
don't a,b,c
dont
eldorado a,b

Is the "^" going to catch any incidental characters?

I'm going to have to get my hands dirty with real world examples. I hadn't noticed your previous post when I posted my previous...

/bene.
Back to top
View user's profile Send private message Visit poster's website
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Tue Aug 17, 2004    Post subject: Reply with quote

bene:
No, b and c aren't redundant. You have to realize that we're architecting a whitelist of possibilities against the negative match.
.
"do" is allowed. "don" is also allowed.
.
Lastly, we also want to match if there's no characters matching our negative-condition at all. That's where we fail. RegExp is forward-searching only, so we can't really know if there's a "do" preceeding an "nt" once the search gets to the 'n'. Effectively, this nullifies out attempt. A weak work-around is to hope we have a string actually starting with whatever character we're checking, matching the "^" anchor. I moved this last, to allow for any possible matching to occur prior -- parentheticals stop searching once a child matches true.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    The Adblock Project Forum Index -> Main All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group