niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Databases » PostgreSQL
Antw: Performance problem with query
Post new topic   Reply to topic Page 1 of 1 [11 Posts] View previous topic :: View next topic
Author Message
Christian Rengstl
*nix forums beginner


Joined: 27 Jun 2006
Posts: 19

PostPosted: Tue Jul 18, 2006 6:24 pm    Post subject: Antw: Performance problem with query Reply with quote

now finally after a long time i have the query plan for the whole filled table. I hope somebody can explain me why it takes so much longer...
QUERY PLAN

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
-----
Result (cost=0.09..254643.09 rows=8044000 width=39) (actual time=0..043..0.043
rows=0 loops=1)
One-Time Filter: ((((split_part(($1)::text, '_'::text, 2))::smallint = 1) IS
NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22) IS NOT TR
UE))
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width10) (never executed)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006...0.006 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width10) (actual time=0.004..0.004 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 0.238 ms

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual time=0..056..655772
..273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.003...0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006...0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual time=1..245..1.245
rows=0 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 22)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width10) (never executed)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.013...0.014 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width10) (actual time=0.009..0.009 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 22.270 ms
(31 Zeilen)



Quote:
"Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de> 13.07.06 8.37 Uhr
Good morning list,


the following query takes about 15 to 20 minutes for around 2 million lines in the file myfile.txt, but with 8 million lines it takes around 5 hours and i just don't understand why there is such a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command to load huge files into the db, but i don't want to load all the columns contained in the file in only one table but copy some of them into one table and some in a second table. As i found out with logging, the data is loaded into temp_table within 15 minutes, but to transfer it from the temp_table toagain only something like 10 minutes. Can it be that the cast takes up so much more time than when reading and transferring 2 million lines?

Thanks for any advice!

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Back to top
Merlin Moncure
*nix forums addict


Joined: 19 Oct 2005
Posts: 87

PostPosted: Tue Jul 18, 2006 7:39 pm    Post subject: Re: Performance problem with query Reply with quote

On 7/18/06, Christian Rengstl
<Christian.Rengstl@klinik.uni-regensburg.de> wrote:
Quote:
now finally after a long time i have the query plan for the whole filled table. I hope somebody can explain me why it takes so much longer...
QUERY PLAN

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
-----
Result (cost=0.09..254643.09 rows=8044000 width=39) (actual time=0.043..0.043
rows=0 loops=1)
One-Time Filter: ((((split_part(($1)::text, '_'::text, 2))::smallint = 1) IS
NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22) IS NOT TR
UE))
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.006 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 0.238 ms

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual time=1.245..1.245
rows=0 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 22)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.013..0.014 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.009..0.009 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 22.270 ms
(31 Zeilen)



"Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de> 13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2 million lines in the file myfile.txt, but with 8 million lines it takes around 5 hours and i just don't understand why there is such a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;


what is this phrase doing exactly?
CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2)

it looks fishy.
merlin

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Back to top
Q
*nix forums beginner


Joined: 17 Feb 2006
Posts: 11

PostPosted: Wed Jul 19, 2006 2:37 am    Post subject: Re: Antw: Performance problem with query Reply with quote

On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:

Quote:
now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...


These explain analyze results don't appear to be from the queries you
posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.

Quote:
Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width=
10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms


This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual INSERT
and not the SELECT part.

How many rows are in the target table and what indexes does it have?
Does it have any triggers, check constraints, or rules applied to it?
All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because the
indexes are not able to stay cached in RAM. At this point you should
ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines available and
ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do the
INSERT and recreate them afterwards.



Quote:
"Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de
13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command
to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out with
logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Back to top
Christian Rengstl
*nix forums beginner


Joined: 27 Jun 2006
Posts: 19

PostPosted: Wed Jul 19, 2006 8:29 am    Post subject: Re: Performance problem with query Reply with quote

The thing is that in the text file there is a column that is something like xyz_12 and in the table i just need the integer part of it that's what the query is used for. The problem though is not really in the select part, because running the select part on 8 million lines takes about 3 minutes, but i don't know why the insert is taking so long.

"Merlin Moncure" <mmoncure@gmail.com> wrote on 07/18/06 9:39 pm:
Quote:
On 7/18/06, Christian Rengstl
Christian.Rengstl@klinik.uni-regensburg.de> wrote:
now finally after a long time i have the query plan for the whole filled table. I hope somebody can explain me why it takes so much longer...
QUERY PLAN

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
-----
Result (cost=0.09..254643.09 rows=8044000 width=39) (actual time=0.043..0.043
rows=0 loops=1)
One-Time Filter: ((((split_part(($1)::text, '_'::text, 2))::smallint = 1) IS
NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22) IS NOT TR
UE))
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width>> 10) (never executed)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.006 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width>> 10) (actual time=0.004..0.004 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width>> 10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 0.238 ms

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width>> 10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width>> 10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual time=1.245..1.245
rows=0 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 22)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width>> 10) (never executed)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.013..0.014 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width>> 10) (actual time=0.009..0.009 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 22.270 ms
(31 Zeilen)



"Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de> 13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2 million lines in the file myfile.txt, but with 8 million lines it takes around 5 hours and i just don't understand why there is such a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;


what is this phrase doing exactly?
CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2)

it looks fishy.
merlin

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Back to top
Christian Rengstl
*nix forums beginner


Joined: 27 Jun 2006
Posts: 19

PostPosted: Wed Jul 19, 2006 8:32 am    Post subject: Re: Performance problem with query Reply with quote

The analyze is from the exact query and i dropped the indexes before the insert as well without imrpvement. The target table is as well completely empty and the insert is supposed to write, in this case, more or less 8 million lines in the table. There is a rule though, because i have inheritance table structure with one master table and around 20 child tables.

Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:
Quote:

On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:

now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...


These explain analyze results don't appear to be from the queries you
posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>> 10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>> 10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms


This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual INSERT
and not the SELECT part.

How many rows are in the target table and what indexes does it have?
Does it have any triggers, check constraints, or rules applied to it?
All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because the
indexes are not able to stay cached in RAM. At this point you should
ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines available and
ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do the
INSERT and recreate them afterwards.



"Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de
13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command
to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out with
logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Back to top
Q
*nix forums beginner


Joined: 17 Feb 2006
Posts: 11

PostPosted: Wed Jul 19, 2006 9:54 am    Post subject: Re: Performance problem with query Reply with quote

On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:

Quote:
The analyze is from the exact query and i dropped the indexes
before the insert as well without imrpvement. The target table is
as well completely empty and the insert is supposed to write, in
this case, more or less 8 million lines in the table. There is a
rule though, because i have inheritance table structure with one
master table and around 20 child tables.

I would say the problem is in the rule. Try doing the insert into a
duplicate table with no rules or inheritance and see how long it takes.

Perhaps you should provide the actual schema of tables and rules that
are involved in the query in question.

Quote:
Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:

On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:

now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...


These explain analyze results don't appear to be from the queries you
posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms


This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual INSERT
and not the SELECT part.

How many rows are in the target table and what indexes does it have?
Does it have any triggers, check constraints, or rules applied to it?
All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because the
indexes are not able to stay cached in RAM. At this point you should
ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines available and
ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do the
INSERT and recreate them afterwards.



"Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de
13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command
to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out with
logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Back to top
Christian Rengstl
*nix forums beginner


Joined: 27 Jun 2006
Posts: 19

PostPosted: Wed Jul 19, 2006 9:58 am    Post subject: Re: Performance problem with query Reply with quote

Well, i'll try that, but honestly i don't think it's the rule as the rule is really simple: it's just one simple integer comparison...

Q <qdolan@gmail.com> wrote on 07/19/06 11:54 am:
Quote:
On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:

The analyze is from the exact query and i dropped the indexes
before the insert as well without imrpvement. The target table is
as well completely empty and the insert is supposed to write, in
this case, more or less 8 million lines in the table. There is a
rule though, because i have inheritance table structure with one
master table and around 20 child tables.

I would say the problem is in the rule. Try doing the insert into a
duplicate table with no rules or inheritance and see how long it takes.

Perhaps you should provide the actual schema of tables and rules that
are involved in the query in question.

Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:

On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:

now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...


These explain analyze results don't appear to be from the queries you
posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms


This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual INSERT
and not the SELECT part.

How many rows are in the target table and what indexes does it have?
Does it have any triggers, check constraints, or rules applied to it?
All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because the
indexes are not able to stay cached in RAM. At this point you should
ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines available and
ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do the
INSERT and recreate them afterwards.



"Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de
13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command
to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out with
logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Back to top
Christian Rengstl
*nix forums beginner


Joined: 27 Jun 2006
Posts: 19

PostPosted: Wed Jul 19, 2006 10:49 am    Post subject: Re: Performance problem with query Reply with quote

Obviously it had something to do with the rule, because now everything finished within 20 minutes. the problem is just that i don't really want to give up the inheritance design. is there a way to maintain the inheritance that doesn't cause this huge performance problem?

Q <qdolan@gmail.com> wrote on 07/19/06 11:54 am:
Quote:
On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:

The analyze is from the exact query and i dropped the indexes
before the insert as well without imrpvement. The target table is
as well completely empty and the insert is supposed to write, in
this case, more or less 8 million lines in the table. There is a
rule though, because i have inheritance table structure with one
master table and around 20 child tables.

I would say the problem is in the rule. Try doing the insert into a
duplicate table with no rules or inheritance and see how long it takes.

Perhaps you should provide the actual schema of tables and rules that
are involved in the query in question.

Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:

On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:

now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...


These explain analyze results don't appear to be from the queries you
posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms


This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual INSERT
and not the SELECT part.

How many rows are in the target table and what indexes does it have?
Does it have any triggers, check constraints, or rules applied to it?
All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because the
indexes are not able to stay cached in RAM. At this point you should
ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines available and
ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do the
INSERT and recreate them afterwards.



"Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de
13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command
to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out with
logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Back to top
Q
*nix forums beginner


Joined: 17 Feb 2006
Posts: 11

PostPosted: Wed Jul 19, 2006 11:52 am    Post subject: Re: Performance problem with query Reply with quote

On 19/07/2006, at 8:49 PM, Christian Rengstl wrote:

Quote:
Obviously it had something to do with the rule, because now
everything finished within 20 minutes. the problem is just that i
don't really want to give up the inheritance design. is there a way
to maintain the inheritance that doesn't cause this huge
performance problem?

That is hard to say unless you post the rule and table schema you are
currently using.

Quote:
Q <qdolan@gmail.com> wrote on 07/19/06 11:54 am:
On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:

The analyze is from the exact query and i dropped the indexes
before the insert as well without imrpvement. The target table is
as well completely empty and the insert is supposed to write, in
this case, more or less 8 million lines in the table. There is a
rule though, because i have inheritance table structure with one
master table and around 20 child tables.

I would say the problem is in the rule. Try doing the insert into a
duplicate table with no rules or inheritance and see how long it
takes.

Perhaps you should provide the actual schema of tables and rules that
are involved in the query in question.

Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:

On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:

now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...


These explain analyze results don't appear to be from the
queries you
posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms


This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual
INSERT
and not the SELECT part.

How many rows are in the target table and what indexes does it
have?
Does it have any triggers, check constraints, or rules applied
to it?
All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because
the
indexes are not able to stay cached in RAM. At this point you
should
ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines
available and
ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do
the
INSERT and recreate them afterwards.



"Christian Rengstl" <Christian.Rengstl@klinik.uni-
regensburg.de
13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr
from
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY
command
to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out
with
logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan -
qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Back to top
Christian Rengstl
*nix forums beginner


Joined: 27 Jun 2006
Posts: 19

PostPosted: Wed Jul 19, 2006 12:03 pm    Post subject: Re: Performance problem with query Reply with quote

So here's the master table including the rules:

entry_no int8 NOT NULL DEFAULT nextval('public.master_seq'::regclass),
pid varchar(15) NOT NULL,
val_1 varchar(1),
val_2 varchar(1),
chr int2 NOT NULL,
aendat timestamp DEFAULT now(),
aennam varchar(Cool,
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
CONSTRAINT "UNIQUE_MASTER" UNIQUE (pid, entry_no)

CREATE OR REPLACE RULE "INSERT_INTO_1" AS
ON INSERT TO public.master
WHERE new.chr = 1 DO INSTEAD INSERT INTO public.table_1 (entry_no, pid, val_1, val_2, chr, aendat, aennam)
VALUES (new.entry_no, new.pid, new.val_1, new.val_2, new.chr, new.aendat, new.aennam);

Like this i have around 20 rules so far, but there might be more later on. The children tables are so far exactly as the master table.

Q <qdolan@gmail.com> wrote on 07/19/06 1:52 pm:
Quote:
On 19/07/2006, at 8:49 PM, Christian Rengstl wrote:

Obviously it had something to do with the rule, because now
everything finished within 20 minutes. the problem is just that i
don't really want to give up the inheritance design. is there a way
to maintain the inheritance that doesn't cause this huge
performance problem?

That is hard to say unless you post the rule and table schema you are
currently using.

Q <qdolan@gmail.com> wrote on 07/19/06 11:54 am:
On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:

The analyze is from the exact query and i dropped the indexes
before the insert as well without imrpvement. The target table is
as well completely empty and the insert is supposed to write, in
this case, more or less 8 million lines in the table. There is a
rule though, because i have inheritance table structure with one
master table and around 20 child tables.

I would say the problem is in the rule. Try doing the insert into a
duplicate table with no rules or inheritance and see how long it
takes.

Perhaps you should provide the actual schema of tables and rules that
are involved in the query in question.

Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:

On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:

now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...


These explain analyze results don't appear to be from the
queries you
posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.

Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
InitPlan
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>>>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
-> Limit (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00
rows=8044000 width>>>>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
-> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms


This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual
INSERT
and not the SELECT part.

How many rows are in the target table and what indexes does it
have?
Does it have any triggers, check constraints, or rules applied
to it?
All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because
the
indexes are not able to stay cached in RAM. At this point you
should
ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines
available and
ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do
the
INSERT and recreate them afterwards.



"Christian Rengstl" <Christian.Rengstl@klinik.uni-
regensburg.de
13.07.06 8.37 Uhr
Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr
from
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY
command
to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out
with
logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan -
qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Back to top
Q
*nix forums beginner


Joined: 17 Feb 2006
Posts: 11

PostPosted: Wed Jul 19, 2006 1:33 pm    Post subject: Re: Performance problem with query Reply with quote

On 19/07/2006, at 10:03 PM, Christian Rengstl wrote:

Quote:
So here's the master table including the rules:

entry_no int8 NOT NULL DEFAULT nextval('public.master_seq'::regclass),
pid varchar(15) NOT NULL,
val_1 varchar(1),
val_2 varchar(1),
chr int2 NOT NULL,
aendat timestamp DEFAULT now(),
aennam varchar(Cool,
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
CONSTRAINT "UNIQUE_MASTER" UNIQUE (pid, entry_no)

CREATE OR REPLACE RULE "INSERT_INTO_1" AS
ON INSERT TO public.master
WHERE new.chr = 1 DO INSTEAD INSERT INTO public.table_1
(entry_no, pid, val_1, val_2, chr, aendat, aennam)
VALUES (new.entry_no, new.pid, new.val_1, new.val_2, new.chr,
new.aendat, new.aennam);

Like this i have around 20 rules so far, but there might be more
later on. The children tables are so far exactly as the master table.

What about the children? Do they have the same indexes?

You could try adding an 'ORDER BY chr' to your long running INSERT
INTO ... SELECT ... query.

Quote:
Obviously it had something to do with the rule, because now
everything finished within 20 minutes. the problem is just that i
don't really want to give up the inheritance design. is there a way
to maintain the inheritance that doesn't cause this huge
performance problem?

When you say "now everything finished within 20 minutes", what did
you actually do to achieve this?


--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\




---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [11 Posts] View previous topic :: View next topic
The time now is Fri Nov 21, 2008 12:03 am | All times are GMT
navigation Forum index » Databases » PostgreSQL
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Unknown in header problem -SOLVED- Light Speed Postfix 0 Thu Jul 03, 2008 10:40 am
No new posts problem with sending mail nuxia Postfix 0 Mon Apr 21, 2008 3:58 am
No new posts Postfix 2.3.8 Virtual problem Blotto Postfix 0 Fri Apr 04, 2008 6:11 am
No new posts Postfix sending problem for local domain remote email monkey_magix Postfix 0 Mon Sep 10, 2007 10:17 am
No new posts bounce problem murkis Postfix 0 Sun Oct 08, 2006 3:45 pm

Debt Management | Mortgages | Homeowner Loans | Car Loan | Books
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.4026s ][ Queries: 16 (0.1664s) ][ GZIP on - Debug on ]