|
|
|
|
|
|
| Author |
Message |
bearophileHUGS@lycos.com *nix forums Guru Wannabe
Joined: 19 Feb 2005
Posts: 198
|
Posted: Tue Jul 18, 2006 8:11 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
Dan Bishop:
| Quote: | xrange already has __contains__. The problem is, it's implemented by a
highly-inefficient sequential search. Why not modify it to merely
check the bounds and (value - start) % step == 0?
|
I think this is a nice idea.
Bye,
bearophile |
|
| Back to top |
|
 |
Simon Forman *nix forums addict
Joined: 22 Jun 2006
Posts: 83
|
Posted: Tue Jul 18, 2006 8:28 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
K.S.Sreeram wrote:
| Quote: | Simon Forman wrote:
Nick Craig-Wood wrote:
Sets are pretty fast too, and have the advantage of flexibility in
that you can put any numbers in you like
I know this is self-evident to most of the people reading this, but I
thought it worth pointing out that this is a great way to test
membership in range(lo, hi, step) without doing "the necessary
algebra".
i.e. n in set(xrange(0, 10000, 23)) ...
This is very very misleading... here are some timings :
|
Yes it is. I'm sorry about that.
| Quote: | python -mtimeit "n=5000" "n in set(xrange(0,10000))"
1000 loops, best of 3: 1.32 msec per loop
python -mtimeit "n=5000" "n in xrange(0,10000)"
1000 loops, best of 3: 455 usec per loop
python -mtimeit "n=5000" "0 <= n < 10000"
1000000 loops, best of 3: 0.217 usec per loop
sets are fast only if you create them *once* and use them again and
again. even in that case, the sets use up O(n) memory.
|
That's what I meant. But I didn't state it clearly.
One of the things I like most about python is that it allows you to
specify the problem that you want to solve without a great deal of
difficulty as to *how* to specify it. To me, and perhaps others, "T =
set(xrange(0, 10000, 23))" and "n in T" are somewhat easier to read
and write than "not n % 23 and 0 <= n < 10000", YMMV.
In the given case a set of ~(10000 / 23) ints would not usually be too
burdensome on ram, and the code runs close to the same speed as
compared to the direct calculation:
from timeit import Timer
times = 100000
Max = 10000
n = 5000
T = set(xrange(0, Max, 23))
s1 = 'n in T'
s2 = 'not n %% 23 and 0 <= n < %s' % Max
setup = 'from __main__ import n, T'
S1 = Timer(s1, setup).repeat(number=times)
S2 = Timer(s2, setup).repeat(number=times)
print "%.3f usec/pass" % (1000000 * min(S1) / times)
print "%.3f usec/pass" % (1000000 * min(S2) / times)
On my machine this printed:
0.476 usec/pass
0.552 usec/pass
| Quote: |
with comparison operators, you don't need extra memory *and* there is no
pre-computation required.
|
When I set Max = 100000000 in the above test code there was serious
disk thrashing... ;-)
FWIW, in production code I would certainly use the comparison
operators. A kilobyte saved is a kilobyte earned.
Peace,
~Simon |
|
| Back to top |
|
 |
Summercoolness@gmail.com *nix forums beginner
Joined: 01 Jul 2006
Posts: 5
|
Posted: Tue Jul 18, 2006 9:25 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
Summercoolness@gmail.com wrote:
| Quote: | it seems that range() can be really slow:
if i in range (0, 10000):
|
My original use was like this:
if i in range (iStart, iEnd):
listData.append(a)
in which iStart is 1000 and iEnd is 1008
so in that case, the program ran fine...
but later on, i wanted to include all data, so I relaxed the range by
setting iStart to 0 and iEnd to 9999 and later on i found that the
program was slow due to this.
So looks like the usage of
if sDay in ("Tue", "Wed", "Thu"):
is more like good use of "in a list" but in range(0,10000) will be a
big search in a list. |
|
| Back to top |
|
 |
tactics40@gmail.com *nix forums beginner
Joined: 13 Jun 2006
Posts: 33
|
Posted: Tue Jul 18, 2006 9:34 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
Simon Forman wrote:
| Quote: | To me, and perhaps others, "T =
set(xrange(0, 10000, 23))" and "n in T" are somewhat easier to read
and write than "not n % 23 and 0 <= n < 10000", YMMV.
|
Eh? How is the first easier to read than the second?? You have a nested
function call in the first!
Regardless, testing if a member is part of a ranged set is always going
to be slower. It's the nature of what you're doing. Building a set and
then searching it takes much longer than a single modulus and
subtraction (which is all an integer comparison is). |
|
| Back to top |
|
 |
John Machin *nix forums Guru
Joined: 19 Feb 2005
Posts: 608
|
Posted: Tue Jul 18, 2006 9:45 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
On 19/07/2006 1:05 AM, Dan Bishop wrote:
| Quote: | Paul Boddie wrote:
Yes, he wants range to return an iterator, just like xrange more or
less does now. Given that xrange objects support __getitem__, unlike a
lot of other iterators (and, of course, generators), adding
__contains__ wouldn't be much of a hardship. Certainly, compared to
other notational conveniences bounced around on the various development
lists, this one would probably provide an order of magnitude
improvement on the usual bang per buck development ratio.
xrange already has __contains__.
|
As pointed out previously, xrange is a function and one would not expect
it to have a __contains__ method.
The objects returned by xrange do not (according to my reading of the
2.4.3 version of Objects/rangeobject.c) have a __contains__ method.
I find it difficult to believe that an inefficient __contains__ has been
implemented since.
Perhaps you are unaware that the mere fact that an object supports the
"in" operation does not mean that this support is provided by a
__contains__ method. The following section of the manual may help:
"""
The membership test operators (in and not in) are normally implemented
as an iteration through a sequence. However, container objects can
supply the following special method with a more efficient
implementation, which also does not require the object be a sequence.
__contains__( self, item)
Called to implement membership test operators. Should return true
if item is in self, false otherwise. For mapping objects, this should
consider the keys of the mapping rather than the values or the key-item
pairs.
""" |
|
| Back to top |
|
 |
Paul Boddie *nix forums Guru Wannabe
Joined: 20 Feb 2005
Posts: 222
|
Posted: Tue Jul 18, 2006 10:35 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
John Machin wrote:
| Quote: | On 19/07/2006 1:05 AM, Dan Bishop wrote:
xrange already has __contains__.
As pointed out previously, xrange is a function and one would not expect
it to have a __contains__ method.
|
Well, you pointed out that range is a function, but xrange seems to be
a type...
| Quote: | xrange
type 'xrange'
dir(xrange)
['__class__', '__delattr__', '__doc__', '__getattribute__', |
'__getitem__', '__hash__', '__init__', '__iter__', '__len__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__',
'__setattr__', '__str__']
No __contains__ method, though, at least in 2.4.1.
| Quote: | The objects returned by xrange do not (according to my reading of the
2.4.3 version of Objects/rangeobject.c) have a __contains__ method.
|
As confirmed by the above evidence.
| Quote: | I find it difficult to believe that an inefficient __contains__ has been
implemented since.
|
So do I. As you go on to say, the usual sequence traversal mechanisms
are probably used to support the "in" operator. Whether it's a pressing
matter to add support for a more efficient mechanism depends on how
often people want to use ranges in the way described. Perhaps I'll
write a patch - who knows? ;-)
Paul |
|
| Back to top |
|
 |
Simon Forman *nix forums addict
Joined: 22 Jun 2006
Posts: 83
|
Posted: Wed Jul 19, 2006 4:50 am Post subject:
Re: range() is not the best way to check range?
|
|
|
tac-tics wrote:
| Quote: | Simon Forman wrote:
To me, and perhaps others, "T =
set(xrange(0, 10000, 23))" and "n in T" are somewhat easier to read
and write than "not n % 23 and 0 <= n < 10000", YMMV.
Eh? How is the first easier to read than the second?? You have a nested
function call in the first!
|
I find the first form more immediately comprehensible than the latter.
I know what xrange() does, and I know what set() does, and "nested
function calls" give me no trouble, whereas the latter form with a
modulus, negation, and comparisons would take me a bit longer both to
compose and/or understand.
If this is not the case for you then by all means please disregard my
posting. YMMV.
| Quote: |
Regardless, testing if a member is part of a ranged set is always going
to be slower.
|
Yes. Roughly 0.0000001 seconds slower on my five year old computer.
I'm not worried.
| Quote: | It's the nature of what you're doing. Building a set and
then searching it takes much longer than a single modulus and
subtraction (which is all an integer comparison is).
|
Building the set, yes, but searching the set is very close to the same
speed, even for rather large sets. If you were performing the search
30000 times (like in the OP) it would only take about three thousandths
of a second longer, and that's on my old slow computer.
If I were doing this a thousand times more often, or on a range of a
million or more, or in production code, or with ranges that changed
often, then I would certainly take the time to write out the latter
form.
Peace,
~Simon |
|
| Back to top |
|
 |
Dan Bishop *nix forums addict
Joined: 24 Feb 2005
Posts: 91
|
Posted: Wed Jul 19, 2006 5:33 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
Paul Boddie wrote:
| Quote: | John Machin wrote:
On 19/07/2006 1:05 AM, Dan Bishop wrote:
xrange already has __contains__.
As pointed out previously, xrange is a function and one would not expect
it to have a __contains__ method.
Well, you pointed out that range is a function, but xrange seems to be
a type...
xrange
type 'xrange'
dir(xrange)
['__class__', '__delattr__', '__doc__', '__getattribute__',
'__getitem__', '__hash__', '__init__', '__iter__', '__len__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__',
'__setattr__', '__str__']
No __contains__ method, though, at least in 2.4.1.
The objects returned by xrange do not (according to my reading of the
2.4.3 version of Objects/rangeobject.c) have a __contains__ method.
As confirmed by the above evidence.
I find it difficult to believe that an inefficient __contains__ has been
implemented since.
So do I. As you go on to say, the usual sequence traversal mechanisms
are probably used to support the "in" operator. Whether it's a pressing
matter to add support for a more efficient mechanism depends on how
often people want to use ranges in the way described. Perhaps I'll
write a patch - who knows?
|
My mistake. I should have looked at dir(xrange) before posting.
But the point remains that xrange's "implicit __contains__" runs in
linear time when a constant-time algorithm exists. |
|
| Back to top |
|
 |
Alex Martelli *nix forums Guru
Joined: 19 Feb 2005
Posts: 955
|
Posted: Thu Jul 20, 2006 3:14 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
Paul Boddie <paul@boddie.org.uk> wrote:
| Quote: | John Machin wrote:
range() and xrange() are functions. You are suggesting that 2
*functions* should acquire a __contains__ method each? I trust not.
Well, range is a function in the current implementation, although its
usage is similar to that one would get if it were a class, particularly
a subclass of list or one providing a list-style interface. With such a
class, you could provide a __contains__ method which could answer the
question of what the range contains based on the semantics guaranteed
by a range (in contrast to a normal list).
|
You'd also have to override just about every mutating method to switch
back to a "normal" __contains__ (or change self's type on the fly) -- a
pretty heavy price to pay.
I have often noticed that subclassing list, dict and maybe set has this
kind of issue: the need to track every possible change to the object.
Maybe a good mechanism to have for the purpose would be to add to
mutable types a "hook" method, say __mutator__, which gets called either
right before or right after any mutating method (there are different
tradeoffs for before-calls and after-calls), presumably passing along
the *a and **k for generality (although it might be faster for the base
case to avoid that); the base types would have a no-op implementation,
but subtypes could easily override just the hook to facilitate their
task of maintaining extra state (could be as little as a per-instance
flag recording whether the object is guaranteed to be still "pristine").
At C level, that might be an extra slot tp_mutator, left NULL in base
types to indicate "no mutator-hook method implemented here".
Like any other addition of, or change to, functionality, this would of
course be a proposal for 2.6, since 2.5 is feature-frozen now.
Alex |
|
| Back to top |
|
 |
Paul Boddie *nix forums Guru Wannabe
Joined: 20 Feb 2005
Posts: 222
|
Posted: Thu Jul 20, 2006 3:25 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
Alex Martelli wrote:
| Quote: | Paul Boddie <paul@boddie.org.uk> wrote:
Well, range is a function in the current implementation, although its
usage is similar to that one would get if it were a class, particularly
a subclass of list or one providing a list-style interface. With such a
class, you could provide a __contains__ method which could answer the
question of what the range contains based on the semantics guaranteed
by a range (in contrast to a normal list).
You'd also have to override just about every mutating method to switch
back to a "normal" __contains__ (or change self's type on the fly) -- a
pretty heavy price to pay.
|
A subclass of list is probably a bad idea in hindsight, due to various
probable requirements of it actually needing to be a list with all its
contents, whereas we wanted to avoid having anything like a list around
until the contents of this "lazy list" were required by the program. If
we really wanted to subclass something, we could consider subclassing
the slice class/type, but that isn't subclassable in today's Python for
some reason, and it doesn't really provide anything substantial,
anyway. However, Python being the language it is, an appropriately
behaving class is quite easily written from scratch.
Paul |
|
| Back to top |
|
 |
Alex Martelli *nix forums Guru
Joined: 19 Feb 2005
Posts: 955
|
Posted: Thu Jul 20, 2006 3:41 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
Paul Boddie <paul@boddie.org.uk> wrote:
| Quote: | Alex Martelli wrote:
Paul Boddie <paul@boddie.org.uk> wrote:
Well, range is a function in the current implementation, although its
usage is similar to that one would get if it were a class, particularly
a subclass of list or one providing a list-style interface. With such a
class, you could provide a __contains__ method which could answer the
question of what the range contains based on the semantics guaranteed
by a range (in contrast to a normal list).
You'd also have to override just about every mutating method to switch
back to a "normal" __contains__ (or change self's type on the fly) -- a
pretty heavy price to pay.
A subclass of list is probably a bad idea in hindsight, due to various
probable requirements of it actually needing to be a list with all its
contents, whereas we wanted to avoid having anything like a list around
until the contents of this "lazy list" were required by the program. If
we really wanted to subclass something, we could consider subclassing
the slice class/type, but that isn't subclassable in today's Python for
some reason, and it doesn't really provide anything substantial,
anyway. However, Python being the language it is, an appropriately
behaving class is quite easily written from scratch.
|
Nevertheless, that class will still need to implement every single
method of the list type; making it a subclass of list has some advantage
in that every such implementation of a method can basically fill the
real list, self.__class__=list, and leave all the rest, forevermore
(explicitly here, implicitly in the future), to class list. Performance
should be much better than by working off semi-deprecated UserList.
A "hook method" __mutator__ (ideally called _before_ in this case), as I
was proposing (for 2.6 or later), would make such approaches way easier
and handier (and would help with most use cases I can think of for
subclassing list, dict or set).
Alex |
|
| Back to top |
|
 |
Antoon Pardon *nix forums Guru
Joined: 21 Feb 2005
Posts: 483
|
Posted: Fri Jul 21, 2006 12:14 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
On 2006-07-20, Paul Boddie <paul@boddie.org.uk> wrote:
| Quote: | Alex Martelli wrote:
Paul Boddie <paul@boddie.org.uk> wrote:
Well, range is a function in the current implementation, although its
usage is similar to that one would get if it were a class, particularly
a subclass of list or one providing a list-style interface. With such a
class, you could provide a __contains__ method which could answer the
question of what the range contains based on the semantics guaranteed
by a range (in contrast to a normal list).
You'd also have to override just about every mutating method to switch
back to a "normal" __contains__ (or change self's type on the fly) -- a
pretty heavy price to pay.
A subclass of list is probably a bad idea in hindsight, due to various
probable requirements of it actually needing to be a list with all its
contents, whereas we wanted to avoid having anything like a list around
until the contents of this "lazy list" were required by the program. If
we really wanted to subclass something, we could consider subclassing
the slice class/type, but that isn't subclassable in today's Python for
some reason, and it doesn't really provide anything substantial,
anyway. However, Python being the language it is, an appropriately
behaving class is quite easily written from scratch.
|
Except that if you write your own class from scratch, you can't use
it as a slice. For a language that is supposed to be about duck typing
I find it strange that if I make my own class with a start, stop and
step attribute, that python barfs on it when I want to use it as a
slice.
| Quote: | class sl(object):
.... def __init__(self, start = None, stop = None, step = None): |
.... self.start = start
.... self.stop = stop
.... self.step = step
....
| Quote: | lst = range(20)
s1 = slice(3,13)
s2 = sl(3,13)
s1.start
3
s2.start
3
s1.stop
13
s2.stop
13
s1.step
s2.step
lst[s1]
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
lst[s2]
Traceback (most recent call last): |
File "<stdin>", line 1, in ?
TypeError: list indices must be integers
--
Antoon Pardon |
|
| Back to top |
|
 |
Paul Boddie *nix forums Guru Wannabe
Joined: 20 Feb 2005
Posts: 222
|
Posted: Fri Jul 21, 2006 1:22 pm Post subject:
Re: range() is not the best way to check range?
|
|
|
Antoon Pardon wrote:
[Subclasses of list or slice for ranges]
| Quote: | Except that if you write your own class from scratch, you can't use
it as a slice. For a language that is supposed to be about duck typing
I find it strange that if I make my own class with a start, stop and
step attribute, that python barfs on it when I want to use it as a
slice.
|
[s2 has start, stop, step...]
| Quote: | lst[s2]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: list indices must be integers
|
In fact, the duck typing seems only to really work outside the
interpreter and the various accompanying built-in classes. Consider a
class similar to the one you defined with the name myslice (for
clarity) and with the apparently necessary indices method; consider it
used as follows:
| Quote: | range(0, 10)[myslice(1, 5, 2)]
Traceback (most recent call last): |
File "<stdin>", line 1, in ?
TypeError: list indices must be integers
Here, we start to believe that only traditional index or slice notation
will work. However, we did come across the built-in slice class before;
consider this:
| Quote: | range(0, 10)[slice(1, 5, 2)]
[1, 3] |
Regardless of whether myslice inherits from object or not, there's no
persuading the interpreter that it is a genuine slice, and remember
that we can't subclass slice (for some reason unknown). So, it would
appear that the interpreter really wants instances from some specific
set of types (presumably discoverable by looking at list_subscript in
listobject.c) rather than some objects conforming to some interface or
protocol, and perhaps it is implemented this way for performance
reasons.
In any case, in the core of Python some types/classes are more equal
than others, and for whatever reason the duck typing breaks down - a
case of "malbik endar" [1] that you just have to be aware of, I
suppose.
Paul
[1] http://www.sciamanna.com/island/pop/2004_07_13/280_malbik_endar.htm |
|
| Back to top |
|
 |
Google
|
|
| Back to top |
|
 |
|
|
The time now is Sat Nov 22, 2008 8:22 pm | All times are GMT
|
|
Unique MySpace Layouts | Die Casting Company | Loans | Debt Consolidation | Actress
|
|
Copyright © 2004-2005 DeniX Solutions SRL
|
|
|
|
Other DeniX Solutions sites:
Unix/Linux blog |
electronics forum |
medicine forum |
science forum |
|
|
Privacy Policy
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|