|
|
|
|
|
|
| Author |
Message |
conor.robinson@gmail.com *nix forums beginner
Joined: 08 Apr 2006
Posts: 23
|
Posted: Mon Jul 17, 2006 6:36 pm Post subject:
pytables - best practices / mem leaks
|
|
|
I have an H5 file with one group (off the root) and two large main
tables and I'm attempting to aggragate my data into 50+ new groups (off
the root) with two tables per sub group.
sys info:
PyTables version: 1.3.2
HDF5 version: 1.6.5
numarray version: 1.5.0
Zlib version: 1.2.3
BZIP2 version: 1.0.3 (15-Feb-2005)
Python version: 2.4.2 (#1, Jul 13 2006, 20:16:0
[GCC 4.0.1 (Apple Computer, Inc. build 5250)]
Platform: darwin-Power Macintosh (v10.4.7)
Byte-ordering: big
Ran all pytables tests included with package and recieved an OK.
Using the following code I get one of three errors:
1. Illegal Instruction
2. Malloc(): trying to call free() twice
3. Bus Error
I believe all three stem from the same issue, involving a malloc()
memory problem in the pytable c libraries. I also believe this may be
due to how I'm attempting to write my sorting script.
The script executes fine and all goes well until I'm sorting about
group 20 to 30 and I throw one of the three above errors depending on
how/when I'm flush() close() the file. When I open the file after the
error using h5ls all tables are in perfact order up to the crash and if
I continue from the point every thing runs fine until python throws the
same error again after another 10 sorts or so. The somewhat random
crashing is what leads me to believe I have a memory leak or my method
of doing this is incorrect.
Is there a better way to aggragate data using pytables/python? Is there
a better way to be doing this? This seems strait forward enough.
Thanks,
Conor
#function to agg state data from main neg/pos tables into neg/pos state
tables
import string
import tables
def aggstate(state, h5file):
print state
class PosRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)
class NegRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)
group1 = h5file.createGroup("/", state+"_raw_records", state+" raw
records")
table1 = h5file.createTable(group1, "pos_records", PosRecords, state+"
raw pos record table")
table2 = h5file.createTable(group1, "neg_records", NegRecords, state+"
raw neg record table")
table = h5file.root.raw_records.pos_records
point = table1.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']
point.append()
h5file.flush()
table = h5file.root.raw_records.neg_records
point = table2.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']
point.append()
h5file.flush()
states =
['AL','AK','AZ','AR','CA','CO','CT','DC','DE','FL','GA','HI','ID','IL','IN','IA','KS','KY','LA','ME','MD','MA','MI','MN','MS','MO','MT','NE','NV','NH','NJ','NM','NY','NC','ND','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VT','VA','WA','WV','WI','WY']
h5file = tables.openFile("200309_data.h5", mode = 'a')
for i in xrange(len(states)):
aggstate(states[i], h5file)
h5file.close() |
|
| Back to top |
|
 |
conor.robinson@gmail.com *nix forums beginner
Joined: 08 Apr 2006
Posts: 23
|
Posted: Tue Jul 18, 2006 8:45 pm Post subject:
Re: pytables - best practices / mem leaks
|
|
|
py_genetic wrote:
| Quote: | I have an H5 file with one group (off the root) and two large main
tables and I'm attempting to aggragate my data into 50+ new groups (off
the root) with two tables per sub group.
sys info:
PyTables version: 1.3.2
HDF5 version: 1.6.5
numarray version: 1.5.0
Zlib version: 1.2.3
BZIP2 version: 1.0.3 (15-Feb-2005)
Python version: 2.4.2 (#1, Jul 13 2006, 20:16:0
[GCC 4.0.1 (Apple Computer, Inc. build 5250)]
Platform: darwin-Power Macintosh (v10.4.7)
Byte-ordering: big
Ran all pytables tests included with package and recieved an OK.
Using the following code I get one of three errors:
1. Illegal Instruction
2. Malloc(): trying to call free() twice
3. Bus Error
I believe all three stem from the same issue, involving a malloc()
memory problem in the pytable c libraries. I also believe this may be
due to how I'm attempting to write my sorting script.
The script executes fine and all goes well until I'm sorting about
group 20 to 30 and I throw one of the three above errors depending on
how/when I'm flush() close() the file. When I open the file after the
error using h5ls all tables are in perfact order up to the crash and if
I continue from the point every thing runs fine until python throws the
same error again after another 10 sorts or so. The somewhat random
crashing is what leads me to believe I have a memory leak or my method
of doing this is incorrect.
Is there a better way to aggragate data using pytables/python? Is there
a better way to be doing this? This seems strait forward enough.
Thanks,
Conor
#function to agg state data from main neg/pos tables into neg/pos state
tables
import string
import tables
def aggstate(state, h5file):
print state
class PosRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)
class NegRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)
group1 = h5file.createGroup("/", state+"_raw_records", state+" raw
records")
table1 = h5file.createTable(group1, "pos_records", PosRecords, state+"
raw pos record table")
table2 = h5file.createTable(group1, "neg_records", NegRecords, state+"
raw neg record table")
table = h5file.root.raw_records.pos_records
point = table1.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']
point.append()
h5file.flush()
table = h5file.root.raw_records.neg_records
point = table2.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']
point.append()
h5file.flush()
states =
['AL','AK','AZ','AR','CA','CO','CT','DC','DE','FL','GA','HI','ID','IL','IN','IA','KS','KY','LA','ME','MD','MA','MI','MN','MS','MO','MT','NE','NV','NH','NJ','NM','NY','NC','ND','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VT','VA','WA','WV','WI','WY']
h5file = tables.openFile("200309_data.h5", mode = 'a')
for i in xrange(len(states)):
aggstate(states[i], h5file)
h5file.close()
|
The problem with my above posting is that h5file.flush() should be
table.flush() (flush the table not the whole object) although
h5file.flush() is an actual method I don't believe it correctly writes
to the tables, it causes all types of issues as time goes on and I
think overlaps .close() causing more issues. I also flushed the table1
and table2 after I created the new group and table1 and table2 each
iteration, things are stable now, pytables is great. |
|
| Back to top |
|
 |
Google
|
|
| Back to top |
|
 |
|
|
The time now is Thu Jan 08, 2009 7:33 am | All times are GMT
|
|
Remortgages | Mobile Phones | Loans | Credit Cards | Credit Cards
|
|
Copyright © 2004-2005 DeniX Solutions SRL
|
|
|
|
Other DeniX Solutions sites:
Unix/Linux blog |
electronics forum |
medicine forum |
science forum |
|
|
Privacy Policy
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|