The Adblock Project Forum Index The Adblock Project
Pull up a seat ...stay a while.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

any interest in automatic learning of filters?

 
Post new topic   Reply to topic    The Adblock Project Forum Index -> Main
View previous topic :: View next topic  
Author Message
DrEasy



Joined: 02 Mar 2004
Posts: 16

PostPosted: Tue Mar 02, 2004    Post subject: any interest in automatic learning of filters? Reply with quote

Hi,

I just installed the AdBlock extension and first I'd like to thank the developers for such a useful tool (and for such a low price Wink ).

I read with interest the thread on compiling the best possible filter set. The agreement seemed to be that a list that is too long would slow the browser down. In fact the best thing is still to come up with a personalized filter set based on each person's browsing habits.

My question is: has anybody considered training such filter sets? The same way bayesian learning is used for filtering spam, we could come up with something that would learn what each particular user considers an annoying ad (for example, I am happy to see Google ads cause they're not flashy, but I hate Doubleclick stuff), and generalize to ads that haven't been seen yet.

I understand that maybe the same technique used for spam cannot be applied for ads, but as far as I know there are techniques out there that can learn regular expressions and grammars from samples, albeit with moderate success. I can think of Angluin's grammatical inference methods, and I'm sure better algorithms have been developed since. With a bit of luck, there might even be a Java implementation somewhere...

Again, I'm new here, so maybe this topic has been beaten to death already, in which case I apologize... Have the developers considered such a possibility? Are there other users interested in working on such a thing? Is there any way I can help?

DrEasy.
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Tue Mar 02, 2004    Post subject: Reply with quote

DrEasy:
This is our last (and really, only) round on the subject.
.
It remains an area of strong interest, but what's happened in the meantime was unexpected: major bugs have crept up and plagued things tirelessly. We're resolving them one-by-one, and only after they're all satisfactorily conquered, can radically new code deploy.
.
Actually, writing new code (eg. bayesian / learning) shouldn't be hard at all. It's the debugging process -- for setups I don't have access to -- which seems to take forever.
Back to top
View user's profile Send private message
DrEasy



Joined: 02 Mar 2004
Posts: 16

PostPosted: Tue Mar 02, 2004    Post subject: Reply with quote

Thanks for the prompt reply!

I went and read the thread you pointed me to, it was indeed interesting. I happen to agree with the point of view you expressed in there: analyzing the image itself would consume too much browser resource, whereas simply analyzing the paths seems more promising, if shallow.

I believe there are algorithms more suited to the generation of regular expressions and grammars than bayesian learning. As I mentioned in my previous post, one of them is grammatical inference, which for some reason (maybe its many limitations?) is not very well known.

I understand that exploring obscure machine learning algorithms might not be top priority for the development of AdBlock, so I am just asking if there is any interest in pursuing this as a longer term plan, and if so, whether I can help. Is there a Wiki site for this project where we could brainstorm?
Back to top
View user's profile Send private message
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Wed Mar 03, 2004    Post subject: Reply with quote

DrEasy:
No, there's no Wiki. I guess I'm on the wrong end of development to discern, but wikis don't seem useful for coding. I can't think of anything they might offer which this forum and mozillaZine's don't already -- presentation-wise, that is. Feel free to fire back, since there's probably benefits I'm unaware of.
.
I'm also largely unread on learning algorithms, so now I'm curious: how familiar are you with them -- and on which end: math or implementation?
Back to top
View user's profile Send private message
DrEasy



Joined: 02 Mar 2004
Posts: 16

PostPosted: Thu Mar 04, 2004    Post subject: Reply with quote

Quote:
No, there's no Wiki. I guess I'm on the wrong end of development to discern, but wikis don't seem useful for coding. I can't think of anything they might offer which this forum and mozillaZine's don't already -- presentation-wise, that is. Feel free to fire back, since there's probably benefits I'm unaware of.


I find Wikis useful for documentation purposes, for example FAQs are very easy to maintain this way, since everybody can contribute to them. Usually there is a CVS capability to rollback just in case... Automatic linking makes it easy to organize your thoughts and navigate the site, much more so than forums. As such they're a nice tool for collaborative work (have you seen Wikipedia?). They are not a substitute to forums, rather a nice complement.

There is a Wiki for Mozilla-related projects, in fact I found the link on the AdBlock web site!

Quote:
I'm also largely unread on learning algorithms, so now I'm curious: how familiar are you with them -- and on which end: math or implementation?


I have a bit of experience using simple machine learning algorithms (ID3, Version spaces) and I've tried to implement a few of my own (with mixed success). I don't know much about learnability theory, but I can find, read and understand scientific papers which describe algorithms.

My interest here is that I am curious about what kind of algorithm would do the best job learning ad filters, and my intuition tells me that Grammatical Inference (algorithms that learn tokens, regular expressions and grammars) could do better than Bayesian learning in this particular case. I am not aware of existing open-source implementations of GI algorithms, but the publications are quite clear as to how to DIY.

On the other hand I have zero experience in Mozilla code or extensions, but I am not averse to learning how it works to some extent. If it requires Java or a script language, no problem, but no C++ for me please!
Wink

If you want I can go ahead and start a Wiki for this, and throw a few relevant links, and then we can decide if it's worth pursuing!
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    The Adblock Project Forum Index -> Main All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group