 |
The Adblock Project Pull up a seat ...stay a while.
|
View previous topic :: View next topic |
Author |
Message |
DrEasy

Joined: 02 Mar 2004 Posts: 16
|
Posted: Mon Aug 02, 2004 Post subject: AdBlock roadmap question / Bayesian filtering? |
|
|
Hi,
I saw on the adBlock web site that the devs are considering using Bayesian filtering in the future: is this something that is worked on actively? Will it be in the 0.6 release? Would developer help be welcome?
Thanks! |
|
Back to top |
|
 |
wonkothesane The Other Developer
Joined: 22 May 2004 Posts: 210
|
Posted: Mon Aug 02, 2004 Post subject: |
|
|
First, a disclaimer: I don't have the final word in the direction Adblock takes -- rue does -- so my word is not law.
That said, Bayesian reasoning based filtering is not going to be in 0.6.
My personal opinion is that it's unlikely that Bayesian filtering will ever appear in Adblock. It's certainly an attractive vision, where the entirety of user input is "Block this item" or "Unblock this item", but it doesn't seem very feasible, due to the different situation of ad classification vs. spam classification.
First, time is at a much higher premium for Adblock than, say Thunderbird. The user expects pages to load as or more quickly with Adblock than without; that's usually 10 to 30 yes/no decisions that Adblock has to make in what is usually about half a second. With regular expressions, a decision takes considerably less than one millisecond; with Thunderbird, it seems to me that the noticeable delay from marking 10 to 30 messages indicates Thunderbird spending several dozen to several hundred milliseconds per message; such a delay is unacceptable for Adblock.
There are a number of other issues as well, including the much-more-limited number of tokens available to feed the Bayesian engine for URLs versus email messages, the agility/responsiveness needed when a user overrides the Bayesian decision...
In addition, there's the issue of out-of-the-box training. With the current system, it's trivially easy to do, but a Bayesian system would pretty much require the raw data of a user's decisions over all the media they see. rue's stated goal of having a Bayesian system serve as a mediator between the user and a regexp-based backend poses its own set of problems, the most looming one being the difficulty of having the computer build its own regular expressions without any human intervention.
------------------------------------------------------------------------------
It's certainly possible that in the far future, Adblock will use Bayesian filtering in some form or another, but the essential fact is that human-made regular expressions still have quite a lot of life in them, and since there's already considerable architecture in place to allow Adblock to use regexps, we'll hopefully milk them for all they're worth. |
|
Back to top |
|
 |
FlashBanG Guest
|
Posted: Mon Aug 02, 2004 Post subject: |
|
|
thanks for the information, wonkothesane. I'm sure now, with more than one developer, Adblock will become considerably more featurepacked, in whatever way that may be.
By the way, would you be able to estimate how close you guys are to releasing version .6? is it going to be sometime int he near future? (1 month), or longer than that?
Thanks, |
|
Back to top |
|
 |
wonkothesane The Other Developer
Joined: 22 May 2004 Posts: 210
|
Posted: Mon Aug 02, 2004 Post subject: |
|
|
"Hopefully not-too-distant future" is really the best I can give you.
rue has been busy with WindowQ for the last week, and so I haven't actually talked to him for more than a few minutes in the last few days.
My schedule look like:
Tomorrow: Doom 3 on sale.
This Friday: In NY till Sunday.
...
End of August: School starts.
As for Adblock... well, it's getting fairly close to code freeze. rue and I need to make a few final design decisions (and implement them) and then we'll call it a beta and start testing -- that'll be about two or three weeks from now, ideally. Testing will take two weeks to a month or more (better sooner than later), and then... 0.6. |
|
Back to top |
|
 |
FlashBang Guest
|
Posted: Tue Aug 03, 2004 Post subject: |
|
|
Can't wait!
Thanks for the update  |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Mon Aug 09, 2004 Post subject: |
|
|
Whoa- hold-up.
.
DrEasy posted before, and holds some expertise on machine-learning. Yes, we're interested in this. Version .6 is as far away as development takes us, and can include whatever we want. Active development was moving in this direction, then scaled back a bit due to performance concerns.
--
DrEasy:
The hardest part would be the actual implementation of any algorithms / analysis in JS. Since they can be developed independant of Adblock, you have wide latitude to assist.
.
For now, assume the only metadata you'll have is the target-url; training-side, if you have the url, it's blocked. There's two approaches worth considering: the calculation of probability on-the-fly (advertisement vs. not), OR the periodic "rebuilding" of statistical regular expressions from stored metadata (training, of sorts).
.
Speed is a present-moment consideration, but overarchingly, as both the browser-code and computers in general become faster, it will cease to matter. We're looking ahead, with these features, not back. |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|