niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Databases » PostgreSQL
Index size
Post new topic   Reply to topic Page 1 of 2 [19 Posts] View previous topic :: View next topic
Goto page:  1, 2 Next
Author Message
Tatsuo Ishii
*nix forums beginner


Joined: 01 Mar 2005
Posts: 48

PostPosted: Tue Mar 01, 2005 2:51 pm    Post subject: Re: Index size Reply with quote

Quote:
I have created a btree index on a 'int4' attribute of a table.

After i have inserted 1,000,000 raws in my table, i can see that my index
size is 2745 Blocks (8KB each) from pg_class. That means about 21,960 KB
size.

I try to understand hows is this number generated, because thought that
for each new entry in table, there is a new entry in index and that each
entry of the index is:

4 Bytes for the int4 attribute
and
40 Bytes for oid

So 44 * 1,000,000 ~ 42,969 KB

Can anybody inform me where I do the mistake?

There's no oid in index tuples. There is an 8-byte long header for
each index tuple. Since you are inserting 4-byte long user data, you
index tuples are 12-byte each. Each index tuple needs a "pointer" in a
block, which is called "item pointer" and that is 4-byte long. Each
block can hold up to floor((8192-24(page header)-16(special
data))/(12+4)) = 509 tuples. ceil(1,000,000/509) = 1965 is the blocks
you need for your index. In addition to this, you need a "meta page"
and a "root page". So it becomes 1965+1+1 = 1967. Also you need
"internal pages", whose numer is hard to guess since it depends on the
actual index tree structure(for example, tree height). From my limited
experience, for 1,000,000 tuples, you will need at least 7 internal
pages. Now the number becomes 1967+7 = 1974. Still it's different from
2745. If you don't have deleted tuples, the difference probably comes
from the fact that a btree index can never be 100% occupied. IMO
1974/2745 = 0.71 seems not so bad.
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Back to top
Ioannis Theoharis
*nix forums beginner


Joined: 01 Mar 2005
Posts: 10

PostPosted: Tue Mar 01, 2005 3:31 pm    Post subject: Re: Index size Reply with quote

Thanks a lot.

An other question:

Is there any way to prevent duplicates on btree index attribute,
PERMITTING them on table?




On Tue, 1 Mar 2005, Tatsuo Ishii wrote:

Quote:
I have created a btree index on a 'int4' attribute of a table.

After i have inserted 1,000,000 raws in my table, i can see that my index
size is 2745 Blocks (8KB each) from pg_class. That means about 21,960 KB
size.

I try to understand hows is this number generated, because thought that
for each new entry in table, there is a new entry in index and that each
entry of the index is:

4 Bytes for the int4 attribute
and
40 Bytes for oid

So 44 * 1,000,000 ~ 42,969 KB

Can anybody inform me where I do the mistake?

There's no oid in index tuples. There is an 8-byte long header for
each index tuple. Since you are inserting 4-byte long user data, you
index tuples are 12-byte each. Each index tuple needs a "pointer" in a
block, which is called "item pointer" and that is 4-byte long. Each
block can hold up to floor((8192-24(page header)-16(special
data))/(12+4)) = 509 tuples. ceil(1,000,000/509) = 1965 is the blocks
you need for your index. In addition to this, you need a "meta page"
and a "root page". So it becomes 1965+1+1 = 1967. Also you need
"internal pages", whose numer is hard to guess since it depends on the
actual index tree structure(for example, tree height). From my limited
experience, for 1,000,000 tuples, you will need at least 7 internal
pages. Now the number becomes 1967+7 = 1974. Still it's different from
2745. If you don't have deleted tuples, the difference probably comes
from the fact that a btree index can never be 100% occupied. IMO
1974/2745 = 0.71 seems not so bad.
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Back to top
Tom Lane
*nix forums Guru


Joined: 24 Mar 2005
Posts: 2070

PostPosted: Tue Mar 01, 2005 5:22 pm    Post subject: Re: Index size Reply with quote

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
Quote:
... Now the number becomes 1967+7 = 1974. Still it's different from
2745. If you don't have deleted tuples, the difference probably comes
from the fact that a btree index can never be 100% occupied. IMO
1974/2745 = 0.71 seems not so bad.

In fact the traditional figure for the steady-state load factor of a
btree index is 2/3rds; that is, after a long sequence of inserts and
deletes you can expect about one-third of each page to be empty space.

If Ioannis' number was taken immediately after a CREATE INDEX operation,
then his index size isn't reflective of any settling to a steady-state
load factor; rather it happens because the CREATE INDEX command
deliberately loads the index leaf pages only 2/3rds full, to avoid a
disproportionate amount of page splitting when normal inserts commence.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Back to top
Tom Lane
*nix forums Guru


Joined: 24 Mar 2005
Posts: 2070

PostPosted: Wed Mar 02, 2005 12:38 am    Post subject: Re: Index size Reply with quote

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
Quote:
... rather it happens because the CREATE INDEX command
deliberately loads the index leaf pages only 2/3rds full, to avoid a
disproportionate amount of page splitting when normal inserts commence.

Interesting. Right after CREATE INDEX for a int4 column using pgbench
-s 10(1,000,000 tuples), I got 2184 leaf pages. From my caliculation
the number of leaf pages is expected to 1965, which is 100% full case
assumption of course. So 1965/2184 = 0.8997 = 90% is actually used?

Shoulda read the code rather than going by memory Wink. What nbtsort.c
actually says is

* It is not wise to pack the pages entirely full, since then *any*
* insertion would cause a split (and not only of the leaf page; the need
* for a split would cascade right up the tree). The steady-state load
* factor for btrees is usually estimated at 70%. We choose to pack leaf
* pages to 90% and upper pages to 70%. This gives us reasonable density
* (there aren't many upper pages if the keys are reasonable-size) without
* incurring a lot of cascading splits during early insertions.

and indeed the code seems to do that:

/* set "full" threshold based on level. See notes at head of file. */
if (level > 0)
state->btps_full = (PageGetPageSize(state->btps_page) * 3) / 10;
else
state->btps_full = PageGetPageSize(state->btps_page) / 10;

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Back to top
Tom Lane
*nix forums Guru


Joined: 24 Mar 2005
Posts: 2070

PostPosted: Wed Mar 02, 2005 1:26 am    Post subject: Re: Index size Reply with quote

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
Quote:
So it seems Ioannis' number was not taken immediately after a CREATE
INDEX operation?

I would guess not, but it's up to him to say. If it is a number derived
after some period of normal operation, then his result agrees with the
theory that says 70% is the steady-state figure ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Back to top
Tatsuo Ishii
*nix forums beginner


Joined: 01 Mar 2005
Posts: 48

PostPosted: Wed Mar 02, 2005 1:30 am    Post subject: Re: Index size Reply with quote

Quote:
An other question:

Is there any way to prevent duplicates on btree index attribute,
PERMITTING them on table?

I can't think of any usefull usage for such an index. Can you explain
why you need it?
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Back to top
Tatsuo Ishii
*nix forums beginner


Joined: 01 Mar 2005
Posts: 48

PostPosted: Wed Mar 02, 2005 1:30 am    Post subject: Re: Index size Reply with quote

Quote:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
... Now the number becomes 1967+7 = 1974. Still it's different from
2745. If you don't have deleted tuples, the difference probably comes
from the fact that a btree index can never be 100% occupied. IMO
1974/2745 = 0.71 seems not so bad.

In fact the traditional figure for the steady-state load factor of a
btree index is 2/3rds; that is, after a long sequence of inserts and
deletes you can expect about one-third of each page to be empty space.

If Ioannis' number was taken immediately after a CREATE INDEX operation,
then his index size isn't reflective of any settling to a steady-state
load factor; rather it happens because the CREATE INDEX command
deliberately loads the index leaf pages only 2/3rds full, to avoid a
disproportionate amount of page splitting when normal inserts commence.

Interesting. Right after CREATE INDEX for a int4 column using pgbench
-s 10(1,000,000 tuples), I got 2184 leaf pages. From my caliculation
the number of leaf pages is expected to 1965, which is 100% full case
assumption of course. So 1965/2184 = 0.8997 = 90% is actually used?
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Back to top
Tatsuo Ishii
*nix forums beginner


Joined: 01 Mar 2005
Posts: 48

PostPosted: Wed Mar 02, 2005 2:02 am    Post subject: Re: Index size Reply with quote

Quote:
Interesting. Right after CREATE INDEX for a int4 column using pgbench
-s 10(1,000,000 tuples), I got 2184 leaf pages. From my caliculation
the number of leaf pages is expected to 1965, which is 100% full case
assumption of course. So 1965/2184 = 0.8997 = 90% is actually used?

Shoulda read the code rather than going by memory Wink. What nbtsort.c
actually says is

* It is not wise to pack the pages entirely full, since then *any*
* insertion would cause a split (and not only of the leaf page; the need
* for a split would cascade right up the tree). The steady-state load
* factor for btrees is usually estimated at 70%. We choose to pack leaf
* pages to 90% and upper pages to 70%. This gives us reasonable density
* (there aren't many upper pages if the keys are reasonable-size) without
* incurring a lot of cascading splits during early insertions.

and indeed the code seems to do that:

/* set "full" threshold based on level. See notes at head of file. */
if (level > 0)
state->btps_full = (PageGetPageSize(state->btps_page) * 3) / 10;
else
state->btps_full = PageGetPageSize(state->btps_page) / 10;


Thanks for the explanation.

So it seems Ioannis' number was not taken immediately after a CREATE
INDEX operation?
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Back to top
Tom Lane
*nix forums Guru


Joined: 24 Mar 2005
Posts: 2070

PostPosted: Wed Mar 02, 2005 7:08 pm    Post subject: Re: Index size Reply with quote

Ioannis Theoharis <theohari@ics.forth.gr> writes:
Quote:
Where can i find a documentation with technical analysis for all (if
possible) of components of postgres?

Read the source code.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Back to top
Ioannis Theoharis
*nix forums beginner


Joined: 01 Mar 2005
Posts: 10

PostPosted: Wed Mar 02, 2005 7:49 pm    Post subject: Re: Index size Reply with quote

On Tue, 1 Mar 2005, Tom Lane wrote:

Quote:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
So it seems Ioannis' number was not taken immediately after a CREATE
INDEX operation?

I would guess not, but it's up to him to say. If it is a number derived
after some period of normal operation, then his result agrees with the
theory that says 70% is the steady-state figure ...

yes, my number was taken after a large amount of inserts.

Your comments about the block usage in case of b-tree indexes are
absolutely interesting.
Where can i find a documentation with technical analysis for all (if
possible) of components of postgres?

All documentations that i have found are very general and refer to simple
users.


Quote:

regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Back to top
Martijn van Oosterhout
*nix forums Guru


Joined: 02 Mar 2005
Posts: 674

PostPosted: Wed Mar 02, 2005 7:55 pm    Post subject: Re: Index size Reply with quote

On Wed, Mar 02, 2005 at 10:08:58PM +0200, Ioannis Theoharis wrote:
Quote:
I have a relation like this: (att0 varchar(1000), att1 int4)

i create a b-tree index on att1 ()
i cluster my raltion according to index

now i have a query
select *
form tc2000000000
where att1<=900000000 and att1>=0 ;

As far as i can see from explain analyze an index scan is used:
Index Scan using inst_id_idx on tc2000000000
Index Cond: ((att1 <= 900000000) AND (att1 >= 0))

If for each entry in table, an entry in index is beeing held, then the
index size is populated too fast.

I guess, that postgres uses index to find the first entry satisfying the
index conition, after find the last one and then do a sequential scan on
the appropriate fraction of the table (to take advantage of physical
clustering).

What makes you think that? Clustering is nice, but postgresql needs to
get the right answer and that the table in clustered is not something
postgresql can rely on. It uses the index to find *every* row you're
looking for, there's no shortcut here.

Quote:
In my case, discrete values on att1 are orders of magnitude less than
number of table raws.

Thus, the big index size is useless for me. I want to avoid the overhead
of scanning such a big index, just permitting ONLY the discrete values to
entry in index. In such a way the whole scenario i presented before for
how i guess, that postgres evaluates my query, is still in use.

There's no special relationship between two rows with the same att1.
Either you find the rows by using an index for each row, or scanning
the whole table. There's no inbetween. The only thing clustering
acheives is that due to values being together, the chance that
succeeding indexes entries will already have been loaded is higher,
thus reducing the overall cost.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Quote:
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
Back to top
Ioannis Theoharis
*nix forums beginner


Joined: 01 Mar 2005
Posts: 10

PostPosted: Wed Mar 02, 2005 8:08 pm    Post subject: Re: Index size Reply with quote

On Wed, 2 Mar 2005, Tatsuo Ishii wrote:

Quote:
An other question:

Is there any way to prevent duplicates on btree index attribute,
PERMITTING them on table?

I can't think of any usefull usage for such an index. Can you explain
why you need it?

I have a relation like this: (att0 varchar(1000), att1 int4)

i create a b-tree index on att1 ()
i cluster my raltion according to index

now i have a query
select *
form tc2000000000
where att1<=900000000 and att1>=0 ;

As far as i can see from explain analyze an index scan is used:
Index Scan using inst_id_idx on tc2000000000
Index Cond: ((att1 <= 900000000) AND (att1 >= 0))

If for each entry in table, an entry in index is beeing held, then the
index size is populated too fast.

I guess, that postgres uses index to find the first entry satisfying the
index conition, after find the last one and then do a sequential scan on
the appropriate fraction of the table (to take advantage of physical
clustering).

In my case, discrete values on att1 are orders of magnitude less than
number of table raws.

Thus, the big index size is useless for me. I want to avoid the overhead
of scanning such a big index, just permitting ONLY the discrete values to
entry in index. In such a way the whole scenario i presented before for
how i guess, that postgres evaluates my query, is still in use.

I think there must be a way to change the way of index_usage to alter it
to what i 'm looking for.


Quote:
--
Tatsuo Ishii


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Back to top
Martijn van Oosterhout
*nix forums Guru


Joined: 02 Mar 2005
Posts: 674

PostPosted: Wed Mar 02, 2005 8:59 pm    Post subject: Re: Index size Reply with quote

On Wed, Mar 02, 2005 at 11:30:58PM +0200, Ioannis Theoharis wrote:
Quote:
On Wed, 2 Mar 2005, Martijn van Oosterhout wrote:
What makes you think that? Clustering is nice, but postgresql needs to
get the right answer and that the table in clustered is not something
postgresql can rely on.

If postgresql doesn't rely on it, it' s postgresql's technical decision
(and i don't know the reason) and not a default decision between rdbms's.

But if you know exactly the reason, it would be a great help for me to
know it.

Easy, if you CLUSTER a table, it's CLUSTERed then. But it doesn't stay
that way. As soon as you insert a new row, or update an old one, it
gets added to the end (the only place with space) and now it's not
clustered anymore. It's almost clustered and from a caching point of
view it's fine. But postgresql can't assume at any point a table will
stay clustered, an insert could happen in the middle of your
processing.

Logically you can't magically add space in the middle of a file, you
have to move everything else up. If you know an efficient way to keep a
table clustered while handling arbitrary inserts and updates, I'd be
curious to know...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Quote:
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
Back to top
Ioannis Theoharis
*nix forums beginner


Joined: 01 Mar 2005
Posts: 10

PostPosted: Thu Mar 03, 2005 11:05 am    Post subject: Re: Index size Reply with quote

All you said are wright. But it 's not so difficult for postgresql to hold
on a bit attribute attached to each table the information, whether there
is done an insertion/deletion/update to a clustered table or not.

And i guess, postgresql would already implement this simply alternative.


Quote:
Easy, if you CLUSTER a table, it's CLUSTERed then. But it doesn't stay
that way. As soon as you insert a new row, or update an old one, it
gets added to the end (the only place with space) and now it's not
clustered anymore. It's almost clustered and from a caching point of
view it's fine. But postgresql can't assume at any point a table will
stay clustered, an insert could happen in the middle of your
processing.

Logically you can't magically add space in the middle of a file, you
have to move everything else up. If you know an efficient way to keep a
table clustered while handling arbitrary inserts and updates, I'd be
curious to know...

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Back to top
Guest






PostPosted: Tue Mar 08, 2005 6:12 pm    Post subject: SRF, JDBC and result info Reply with quote

Hi everybody!

I have an SRF which is called from a JAVA app with JDBC. Everything works
fine
and I want now to be able to pass some result-related info to my app. It is
not about the format of the results (ResultSetMetaData) or something like
that.

Is it possible to return some string (or other type of)info together with
the result tuples (even if it requiers some hacking i.e. there is no
provision for something like that)? Any ideas?

Regards,
Ntinos Katsaros


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 2 [19 Posts] Goto page:  1, 2 Next
View previous topic :: View next topic
The time now is Thu Jan 08, 2009 9:41 am | All times are GMT
navigation Forum index » Databases » PostgreSQL
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts container for insert/delete + fast index Neal Becker C++ 1 Fri Jul 21, 2006 12:57 pm
No new posts (secondary) index-only scans Ryan Berkeley DB 1 Thu Jul 20, 2006 8:32 pm
No new posts Create index hanging Claire McLister PostgreSQL 7 Thu Jul 20, 2006 8:11 pm
No new posts ORA-31167: XML nodes over 64K in size cannot be inserted Lars Tetzlaff Server 0 Thu Jul 20, 2006 10:43 am
No new posts Cursor size Ron Maxey Mandrake 10 Thu Jul 20, 2006 5:50 am

Bankruptcy Certification | Credit Cards | Magazine Subscriptions | Mortgage | vShare YouTube Clone
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2047s ][ Queries: 16 (0.0558s) ][ GZIP on - Debug on ]