View previous topic :: View next topic |
Author |
Message |
johann_p
Joined: 14 Aug 2004 Posts: 4
|
Posted: Sat Aug 14, 2004 Post subject: Performance issues? |
|
|
Is there data or experience available how the number of patterns affects the performance? I would assume that dozens or even hundreds of regular expressions will have quite an impact on loading a page with many images or iframes?
From what I saw from a quick glance at the sourcecode, the extensions simply loops through all patterns for each url - is this correct?
The most efficient method for doing this would probably be to compile all the patterns into a finite matching automata at program startup or when the pattern list is changed and use that automata with with each URL - especiall if these functions were implemted in C++ and accessed from javascript via XPCOM.
Since generic effecient matching against a long list of patterns is something that could be useful for the base app and for many extensions, this could be a worthwile addition.
Just thinking aloud here - adblock is a fine extensions as it is and I am glad it exists  |
|
Back to top |
|
 |
johann_p
Joined: 14 Aug 2004 Posts: 4
|
Posted: Mon Aug 16, 2004 Post subject: |
|
|
Do developers read this forum or is it the wrong place to ask such things?
Does anyoneelse have experiences about what the perfomance impact of a using a list with say, 100 patterns is? |
|
Back to top |
|
 |
wonkothesane The Other Developer
Joined: 22 May 2004 Posts: 210
|
Posted: Mon Aug 16, 2004 Post subject: |
|
|
Essentially, what we have now is Fast Enough. It could certainly be made faster, but one of the key ideas with having regular expressions is that you don't need 100 different filters to be able to accurately target undesired content. Performance is also going to be variable across the variety of machines on which Adblock is being run.
Regarding finite matching automata (aka finite automata, aka finite state machine): it's been done for us -- the core engine of most, if not all, regular expression implementations is essentially a form of FSM. Languages like Python require you to explicitly compile your regular expressions from the text pattern into a usable regular expression; JavaScript handles compilation for you on object creation.
A framework for generic matching is a good idea in theory, but The Right Way to do it would be a compiled XPCOM module for small size & speed, but that has its own set of issues -- it's not terribly easy to get started compiling XPCOM components (plus my C++ is rather rusty), and I don't know what would happen if two different extensions registered the same component... and to top it off, I can't really think of any extensions that would actually make much use of such a generic framework.
As Knuth said, "premature optimization is the root of all evil in programming".
Reinventing the wheel comes next. |
|
Back to top |
|
 |
johann_p
Joined: 14 Aug 2004 Posts: 4
|
Posted: Tue Aug 17, 2004 Post subject: |
|
|
Thank you for the answer. I was just curious about how lists of 100s of regular expressions would impact performance on pages with many links, since each of the expression needs to get compiled and matched against each link.
I know that regexps are handled internally by getting compiled into FAs but the idea was to compile the whole list of all regexps into a single matcher once at startup/list change time. I do not know if it is possible to explicitly precompile regexps in javascript but if yes, this would be equivalent to converting the list into a large or-connected expression and compiling it and subsequently only use that for each match.
Again, this is not to be taken as critizism, just interested in your opinion  |
|
Back to top |
|
 |
rue Developer
Joined: 22 Oct 2003 Posts: 752
|
Posted: Tue Aug 17, 2004 Post subject: |
|
|
johann_p:
wonko's reply strayed slightly offtopic.
.
Yes, that's a potential optimization, and we had considered it. Naturally, we'll need to run tests to plot performance-gain against the trouble to maintain. |
|
Back to top |
|
 |
|