The Adblock Project Forum Index The Adblock Project
Pull up a seat ...stay a while.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

AdBlock roadmap question / Bayesian filtering?

 
Post new topic   Reply to topic    The Adblock Project Forum Index -> Main
View previous topic :: View next topic  
Author Message
DrEasy



Joined: 02 Mar 2004
Posts: 16

PostPosted: Mon Aug 02, 2004    Post subject: AdBlock roadmap question / Bayesian filtering? Reply with quote

Hi,

I saw on the adBlock web site that the devs are considering using Bayesian filtering in the future: is this something that is worked on actively? Will it be in the 0.6 release? Would developer help be welcome?

Thanks!
Back to top
View user's profile Send private message
wonkothesane
The Other Developer


Joined: 22 May 2004
Posts: 210

PostPosted: Mon Aug 02, 2004    Post subject: Reply with quote

First, a disclaimer: I don't have the final word in the direction Adblock takes -- rue does -- so my word is not law.

That said, Bayesian reasoning based filtering is not going to be in 0.6.

My personal opinion is that it's unlikely that Bayesian filtering will ever appear in Adblock. It's certainly an attractive vision, where the entirety of user input is "Block this item" or "Unblock this item", but it doesn't seem very feasible, due to the different situation of ad classification vs. spam classification.

First, time is at a much higher premium for Adblock than, say Thunderbird. The user expects pages to load as or more quickly with Adblock than without; that's usually 10 to 30 yes/no decisions that Adblock has to make in what is usually about half a second. With regular expressions, a decision takes considerably less than one millisecond; with Thunderbird, it seems to me that the noticeable delay from marking 10 to 30 messages indicates Thunderbird spending several dozen to several hundred milliseconds per message; such a delay is unacceptable for Adblock.

There are a number of other issues as well, including the much-more-limited number of tokens available to feed the Bayesian engine for URLs versus email messages, the agility/responsiveness needed when a user overrides the Bayesian decision...

In addition, there's the issue of out-of-the-box training. With the current system, it's trivially easy to do, but a Bayesian system would pretty much require the raw data of a user's decisions over all the media they see. rue's stated goal of having a Bayesian system serve as a mediator between the user and a regexp-based backend poses its own set of problems, the most looming one being the difficulty of having the computer build its own regular expressions without any human intervention.



------------------------------------------------------------------------------



It's certainly possible that in the far future, Adblock will use Bayesian filtering in some form or another, but the essential fact is that human-made regular expressions still have quite a lot of life in them, and since there's already considerable architecture in place to allow Adblock to use regexps, we'll hopefully milk them for all they're worth.
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
FlashBanG
Guest





PostPosted: Mon Aug 02, 2004    Post subject: Reply with quote

thanks for the information, wonkothesane. I'm sure now, with more than one developer, Adblock will become considerably more featurepacked, in whatever way that may be.

By the way, would you be able to estimate how close you guys are to releasing version .6? is it going to be sometime int he near future? (1 month), or longer than that?

Thanks,
Back to top
wonkothesane
The Other Developer


Joined: 22 May 2004
Posts: 210

PostPosted: Mon Aug 02, 2004    Post subject: Reply with quote

"Hopefully not-too-distant future" is really the best I can give you.

rue has been busy with WindowQ for the last week, and so I haven't actually talked to him for more than a few minutes in the last few days.

My schedule look like:
Tomorrow: Doom 3 on sale.
This Friday: In NY till Sunday.
...
End of August: School starts.

As for Adblock... well, it's getting fairly close to code freeze. rue and I need to make a few final design decisions (and implement them) and then we'll call it a beta and start testing -- that'll be about two or three weeks from now, ideally. Testing will take two weeks to a month or more (better sooner than later), and then... 0.6.
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
FlashBang
Guest





PostPosted: Tue Aug 03, 2004    Post subject: Reply with quote

Can't wait!

Thanks for the update Very Happy
Back to top
rue
Developer


Joined: 22 Oct 2003
Posts: 752

PostPosted: Mon Aug 09, 2004    Post subject: Reply with quote

Whoa- hold-up.
.
DrEasy posted before, and holds some expertise on machine-learning. Yes, we're interested in this. Version .6 is as far away as development takes us, and can include whatever we want. Active development was moving in this direction, then scaled back a bit due to performance concerns.
--
DrEasy:
The hardest part would be the actual implementation of any algorithms / analysis in JS. Since they can be developed independant of Adblock, you have wide latitude to assist.
.
For now, assume the only metadata you'll have is the target-url; training-side, if you have the url, it's blocked. There's two approaches worth considering: the calculation of probability on-the-fly (advertisement vs. not), OR the periodic "rebuilding" of statistical regular expressions from stored metadata (training, of sorts).
.
Speed is a present-moment consideration, but overarchingly, as both the browser-code and computers in general become faster, it will cease to matter. We're looking ahead, with these features, not back.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    The Adblock Project Forum Index -> Main All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group