The Adblock Project Forum Index The Adblock Project
Pull up a seat ...stay a while.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

What is wrong with this filter?

 
Post new topic   Reply to topic    The Adblock Project Forum Index -> Main
View previous topic :: View next topic  
Author Message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Sat Nov 29, 2003    Post subject: What is wrong with this filter? Reply with quote

This filter /[^a-z](ad|dime|double|fast|value)*click(s|stream|thru|thrutraffic|xchange)*[^a-z]/ works sucessfully with valueclick and doubleclick but not clickstream unless you remove the second * in which case it then stops working for doubleclick and valueclick. See http://www.exchangeandmart.co.uk/ for an example of clickstream and http://www.findit.co.uk/ for valueclick.

Can anyone help me, please?
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

NJH:
The asterisk is a wildcard for Simple Filters, but it has a different function in RegExp.
.
/(stuff)*/ catches zero or more occurrances of "stuff". The asterisk is directly attatched to whatever preceedes it. You can either use .* to acquire the SimpleFilter-functionality of asterisks, or use {n,x} to catch something no fewer than 'n' times, and no greater than 'x'.
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

rue,

What you said is my understanding, just about, as well. I was trying to use the * as 0 or 1 occurrences of the preceding bracketed text (I understand ? will only ever work on a single character and never strings). From what you said, when my filter is compared to clickstream, the (ad|dime|double|fast|value)* part should find no match which is ok, click matched the first five characters of clickstream and (s|stream|thru|thrutraffic|xchange)* should match with the rest of clickstream once. Unfortunately it does not work. If the second part is changed to (s?|stream|thru|thrutraffic|xchange), the whole expression works. This has me confused. It indicates to me that the first * worked as 0+ occurrences but the second * did not. As a counter argument, when compared to doubleclick, the second * appears to work allowing no matches of the bracketed text, and the first * allows one match of the bracketed text. I tried the section on REs you pointed us to here but its examples are just about all of single characters rather than bracketed expressions. I've tried a google search for more detail, but the documents I've looked at so far have also been very concise, so I have not been able to follow them. If you know of better texts I would appreciate a pointer, otherwise I will keep searching.

[Edit]

Another document I have found says ? does match expressions as well as single characters, in which case my expression above, technically, would be better with ?'s and not *'s as I am looking for 0 or 1 matches.

Nick
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

Nick:
Since the pattern following "click" was given an option of not matching (?), it matched only "click". Because of this, the very last pattern ([^a-z]) was matching false against "stream" in "clickstream". Also, there was no need for "click" to stand alone.
.
So, here's a functional re-working of that expression:
/[\W](ad|dime|double|fast|value|click)?(s|stream|thru|thrutraffic|xchange)[\W]/
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

rue,

In the example I gave from the website http://www.exchangeandmart.co.uk/, my test is trying to match against http://clickstream.ibnetplc.etc. With the filter /[^a-z](ad|dime|double|fast|value)*click(s|stream|thru|thrutraffic|xchange)*[^a-z]/ I was expecting the first [^a-z] to match the second /, (ad|dime|double|fast|value)* to match nothing but this is ok, click to match click, (s|stream|thru|thrutraffic|xchange)* to match stream and the second [^a-z] to match the first . giving a match on /clickstream.. This did not work.

I am generally using [^a-z] as it allows matching on numbers as well and I use it in other filters, rather than [\W].

I have tried your filter and it works on \clickstream. but not on .valueclick. in http://www.findit.co.uk/. I don't think it would work on /click. either but I do not off hand know of an example I can test against.

The filter I was trying to construct was meant to find matches on the following bounded by non-alphabetic characters:

adclick
dimeclick
doubleclick
fastclick
valueclick
click
clicks
clickstream
clickthru
clickthrutraffic
clickxchange

I know it would also match on valueclickthru but I was not worried. I did try to be even more clever with the thru|thrutraffic bit doing a thru(traffic)?, but as I had nothing to test it against I dropped it.

This is why I have ended up with the filter /[^a-z](ad|dime|double|fast|value)click(s?|stream|thru|thrutraffic|xchange)[^a-z]/. I can see why this works. I still just don't understand why my first filter did not work.
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

NJH:
To understand why your version fails, you have to consider the pattern in pieces.
.
This portion prefers to match true on nothing:
(s|stream|thru|thrutraffic|xchange)*
.
Because of this, only the word "click" has matched in your string so far. The next portion is then tested:
[^a-z] ..and, since the part of the string it's testing begins "stream", it fails.
.
Alright- so here's the filter, modified to catch the trailing "click" and any bounding digits:
/[\W\d](ad|dime|double|fast|value|click)(s|stream|thru|thrutraffic|xchange|click)[\W\d]/
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

rue,

I sort of understand where you are coming from but then I don't understand why my alternative form of the filter /[^a-z](ad|dime|double|fast|value)?click(s?|stream|thru|thrutraffic|xchange)[^a-z]/ works unless perhaps it is because the second ? is within the brackets which then forces the expression within the brackets to evaluate.

By the way, I'm not sure, but does your expression give true for \click.?

As a secondary question, is it any quicker or slower to use one complex expression like I am trying, or many simple expressions?
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

NJH:
Comparison:
(s?|stream|thru|thrutraffic|xchange) <-- has no trailing multiplier
(s|stream|thru|thrutraffic|xchange)* <-- trailing multiplier allows "zero" matching
The version I gave you matches false for "\.click" and "/click." (which I presume you meant.) It requires one term from each OR-set -- meaning none can stand alone.
.
The question of speed really depends on how much recursion a pattern generates. Using unnecessary wildcards, multipliers, and OR-statements can cause a string to be iterated a very high number of times.
.
Btw, to easily test expressions, open Tools > Web Development > JavaScript Console. Paste this line in the entry-field, replacing 'RegExp' and 'string' with whatever you'd like to test for: alert(/RegExp/.test("string"));
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

rue,

I'm understanding slowly. Now perhaps I need to undertsand, using the same rationale, why .valueclick. from www.findit.co.uk was matched in my test as that also had a trailing multiplier outside the first set of brackets.

I like your tip for testing filters!

Nick
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Sun Nov 30, 2003    Post subject: Reply with quote

NJH:
Originally, you had a pattern-match for "click" situated between the two OR-sets.
.
It was matching before the problematic second-set:
/[^a-z](ad|dime|double|fast|value)*click(s|stream|thru|thrutraffic|xchange)*[^a-z]/
.
Btw, I don't think you really want the (s|.. at the start of your second-set. It would also match "doubles", "values", "dimes" and "clicks".
Back to top
View user's profile Send private message
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Mon Dec 01, 2003    Post subject: Reply with quote

rue,

Thanks for your patience in all this.

rue wrote:
NJH:
It was matching before the problematic second-set:
/[^a-z](ad|dime|double|fast|value)*click(s|stream|thru|thrutraffic|xchange)*[^a-z]/


The relevance of the last question is that to me, if the second OR-set is problematic, the first OR-set has the same format and therefore should also be problematic but it works.

I thought my expression /[^a-z](ad|dime|double|fast|value)*click(s|stream|thru|thrutraffic|xchange)*[^a-z]/
broke down to:

[^a-z] - must match
(ad|dime|double|fast|value)* - could match anything in brackets or nothing
click - must match
(s|stream|thru|thrutraffic|xchange)* - could match anything in brackets or nothing. As the discussion has gone, you have explained why this didn't work.
[^a-z] - must match

If that is so, I would be happy for a match on /clicks. or /click.. I would not be happy with a match on /doubles. or any other word not containing click. If it can match /doubles. then I am missing even more than I thought in my understanding of REs.

Going back to an earlier point from last night which I did not make too well,

rue wrote:

(s?|stream|thru|thrutraffic|xchange) <-- has no trailing multiplier
(s|stream|thru|thrutraffic|xchange)* <-- trailing multiplier allows "zero" matching


I see that the second expression allows "zero" matching, but doesn't the first as well as s? will give a zero match?

I am happy to continue this thread with you, but it is turning very much into a private tutorial for me which is great but probably a big waste of time for you. If you want to me to terminate this thread, just let me know.
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Mon Dec 01, 2003    Post subject: Reply with quote

NJH:
Hey- sorry it took so long, but I suddenly realized the real reason your second-set was failing:
.
(s|stream.. prefers to match true for just the "s" on stream. Testing occurrs in sequence, so moving the s| to follow stream| fixes this. The same goes for thru| and thrutraffic. In other words: you should order sets of alphabetical-similarity by decreasing-complexity.
.
So, here's your original pattern, sans earlier optomizations -- yet functional:
/[^a-z](ad|dime|double|fast|value)*click(stream|s|thrutraffic|thru|xchange)*[^a-z]/
Back to top
View user's profile Send private message
Guest






PostPosted: Mon Dec 01, 2003    Post subject: Reply with quote

rue,

I like that explanation. I also had not been able to test thru|thrutraffic, and yes, now you've shown me how to test, it does not find a match with /clickthrutraffic. It explains the importance of the order of the OR'd bits.

If I think it through carefully I believe it also explains why does the second OR bit work when (s?|stream|thrutraffic|thru|xchange) is used instead of (s?|stream|thrutraffic|thru|xchange)* - the only difference being the *.

It also suggests to me that I should not try to be quite so smart in trying to create a single large filter and that I should break it down into a few more simple ones.

Thanks for your help.

Nick
Back to top
NJH



Joined: 13 Nov 2003
Posts: 183
Location: Hampshire, England

PostPosted: Mon Dec 01, 2003    Post subject: Reply with quote

rue,

I have problems getting Firebird to auto-login. The last post was from me.

Thanks again,

Nick
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    The Adblock Project Forum Index -> Main All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group