niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Databases » Berkeley DB
Preallocate backing file for the Berkeley DB cache
Post new topic   Reply to topic Page 1 of 1 [7 Posts] View previous topic :: View next topic
Author Message
bostic@sleepycat.com
*nix forums beginner


Joined: 21 Jun 2005
Posts: 49

PostPosted: Fri Mar 11, 2005 2:25 pm    Post subject: Re: Preallocate backing file for the Berkeley DB cache Reply with quote

Quote:
The file system is pretty full (over 70%), so no long streaks
of continuous blocks are not available. It's much better than
the original version (with 8K increments).

OK, I'm convinced. Smile I've submitted code changes for Berkeley
DB to ensure we don't fragment when pre-allocating underlying
shared region files. This change will be part of the upcoming
DB 4.4 release, tracked in our Support Request #12125.

Thanks for finding this one!

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (Yahoo)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com
Back to top
Florian Weimer
*nix forums Guru


Joined: 19 Feb 2005
Posts: 418

PostPosted: Tue Mar 08, 2005 6:11 pm    Post subject: Re: Preallocate backing file for the Berkeley DB cache Reply with quote

* Florian Weimer:

Quote:
By the way, with recent debugfs versions, you need a patch to print
the actual block numbers in most indirect blocks:

Or you can use the filefrag tool. *sigh*

It's much more straightforward to use. Do you need further
statistics? It seems that writing the file (with write(2)) could be
beneficial, but I would have to test this on a clean file system
(which I can't do right now).
Back to top
Florian Weimer
*nix forums Guru


Joined: 19 Feb 2005
Posts: 418

PostPosted: Tue Mar 08, 2005 4:04 pm    Post subject: Re: Preallocate backing file for the Berkeley DB cache Reply with quote

Quote:
However, OS_VMPAGESIZE is set to 8192 unconditionally (see
dbinc/region.h), and DB_REGION_INIT touches pointers in
OS_VMPAGESIZE increments. Many systems have a page size of
4096, so it actually makes things worse because it
practically *guarantees* fragmentation of the underlying file.

Have you actually seen this happen anywhere?
If so, on what operating system/filesystem combination?

On Linux 2.6 (x86, 4K page size) with ext3fs (4K block size), the file
is created with holes:

Inode: 6963441 Type: regular Mode: 0640 Flags: 0x0 Generation: 2984990891
User: 1000 Group: 1000 Size: 262152192
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 258032
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x422dc37f -- Tue Mar 8 16:23:43 2005
atime: 0x422dc36e -- Tue Mar 8 16:23:26 2005
mtime: 0x422dc36e -- Tue Mar 8 16:23:26 2005
BLOCKS:
(0):13944376, (2):13944377, (4):13944378, (6):13944379, (Cool:13944380,
(10):13944381, (IND):13944382, (12):13944383, (14):13944384,
(16):13944392, (1Cool:13944393, (20):13944394, (22):13944395,
(24):13944396, (26):13944397, (2Cool:13944398, (30):13944399,
(32):13944400, (34):13944401, (36):1394 4402, (3Cool:13944403,
(40):13944404, (42):13944405, (44):13944406, (46):13944407,
(4Cool:13944408, (50):13944409, (52):13944624, (54):13944625, (56)
:13944626, (5Cool:13944627, (60):13944628, (62):13944629, (64):13944630,
(66):13944631, (6Cool:13944632, (70):13944633, (72):13944634,
(74):13944635, (76):13944636, (7Cool:13944637, (80):13944638,
(82):13944639, (84):13944640, (86):13944648, (8Cool:13944649,
(90):13944650, (92):13944651, (94):1394 [...]

Notice that only even-numbered blocks are backed with file system
storage.

After using the database for a while, part of the cache has not yet
been touched. Blocks 1, 3, 5, and so on are still not allocated. Yet
towards the end of the file, all blocks are allocated:

Inode: 6963441 Type: regular Mode: 0640 Flags: 0x0 Generation: 2984990891
User: 1000 Group: 1000 Size: 262152192
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 258032
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x422dc37f -- Tue Mar 8 16:23:43 2005
atime: 0x422dc36e -- Tue Mar 8 16:23:26 2005
mtime: 0x422dc36e -- Tue Mar 8 16:23:26 2005
BLOCKS:
(0):13944376, (2):13944377, (4):13944378, (6):13944379, (Cool:13944380,
(10):13944381, (IND):13944382, (12):13944383, (14):13944384,
(16):13944392, (1Cool:13944393, (20):13944394, (22):13944395,
(24):13944396, (26):13944397, (2Cool:13944398, (30):13944399,
(32):13944400, (34):13944401, (36):13944402, (3Cool:13944403,
(40):13944404, (42):13944405, (44):13944406, (46):13944407,
(4Cool:13944408, (50):13944409, (52):13944624, (54):13944625, (56)
(:13944626, (5Cool:13944627, (60):13944628, (62):13944629,
((64):13944630, 66):13944631, (6Cool:13944632, (70):13944633,
((72):13944634, 74):13944635, (76):13944636, (7Cool:13944637,
((80):13944638, 82):13944639, (84):13944640, (86):13944648,
((8Cool:13944649, 90):13944650, (92):13944651, (94):13944652,
((96):13944653, 9Cool:13944654, (100):13944655, (102):13944656,
((104):13944657, 106):13944658, (108):13944659, (110):13944660,
((112):13944661, 114):13944662, (116):13944663, (118):13944664,
((120):13944665, 122):13944666, (124):13944667, (126):13944668,
((128):13944669,
[...]
(63932):13968103, (63933):13978950, (63934):13968104, 5):13977853,
(6393(63936):13968105, (63937):13978951, Cool:13968106,
(6393(63939):13977854, (63940):13968107, (63941):13979009,
(6393(63942):13968108, (63943):13977855, (63944):13968109,
(6393(63945):13979010, (63946):13968110, (63947):13977856,
(6393(63948):13968111, (63949):13979011, (63950):13968112,
(6393(63951):13977864, (63952):13968113, (63953):13979012,
(6393(63954):13968114, (63955):13977872, (63956):13968115,
(6393(63957):13979013, (63958):13968116, (63959):13977880,
(6393(63960):13968117, (63961):13979014, (63962):13968118, (63963)
(6393:13977888, (63964):13968119, (63965):13979015, (63966):13968120,
(6393(63967):13977896, (63968):13968121, (63969):13979073,
(6393(63970):13968122, (63971): 13977904, (63972):13968123,
(6393(63973):13979074, (63974):13968124, (63975):13977857,
(6393(63976):13968125, (63977):13979075, (63978):13968126,
(6393(63979):13977858, (63980):13968127, (63981):13979076,
(6393(63982):13968128, (63983):13977859, (63984):13968136,
(6393(63985):13979077, (63986):13968137, (63987):13977860,
(6393(63988):13968138, (63989):13979078, (63990):13968139,
(6393(63991):13977861, (63992):13968140, (63993):13979079,
(6393(63994):13968141, (63995):13977862, (63996):13968142,
(6393(63997):13979137, (63998):13968143, (63999):13977863,
(6393(64000-64001):13943684-13943685

As you can see, the physical block numbers (after the colons) are in
pretty random order.

Quote:
Does changing OS_VMPAGESIZE to 4KB make a difference on that
system?

I hope I correctly made this change.

Inode: 6963441 Type: regular Mode: 0640 Flags: 0x0 Generation: 2985030886
User: 1000 Group: 1000 Size: 262148096
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 512520
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x422dce4f -- Tue Mar 8 17:09:51 2005
atime: 0x422dd774 -- Tue Mar 8 17:48:52 2005
mtime: 0x422dce4f -- Tue Mar 8 17:09:51 2005
BLOCKS:
(0-11):13944385-13944396, (12-23):13944398-13944409,
(24-69):13944624-13944669, (70):13945011, (71-75):13945015-13945019,
(76-217):13945175-13945316, (218-293):13945352-13945427,
(294-296):13945441-13945443, (297-757):13945467-13945927,
(758-876):13945930-13946048, (877-1035):13946051-13946209,
(IND):13944397, (1036-1056):13946211-13946231,
(1057-1068):13946237-13946248, (1069-1090):13946254-13946275,
(1091-1115):13946280-13946304, (1116-1172):13946312-13946368,
(1173-1229):13946376-13946432, (1230-1271):13946440-13946481,
(1272-1280):13946488-13946496, (1281-1337):13946504-13946560,
(1338-1394):13946568-13946624, (1395-1451):13946632-13946688,
(1452-1508):13946696-13946752, (1509-1553):13946760-13946804,
(1554):13947191, (1555-1563):13947256-13947264,
(1564-1620):13947272-13947328, (1621-1629):13947336-13947344,
(1630-1642):13947349-13947361, (1643-1663):13947372-13947392,
(1664-1682):13947400-13947418, (1683):13947426, (1684):13947434,
(1685):13947438, (1686-1687):13947440-13947441,
[...]
(60428-60431):3190676-3190679, (60432-60438):3190681-3190687,
(60439-60441):3190693-3190695, (60442-60448):3190697-3190703,
(60449-60450):3190710-3190711, (60451-60457):3190713-3190719,
(60458-60463):3191178-3191183, (60464-60470):3191185-3191191,
(60471-60472):3191286-3191287, (60473-60479):3191289-3191295,
(60480-60486):3191617-3191623, (60487-60493):3191625-3191631,
(60494-60500):3191633-3191639, (60501-60505):3191643-3191647,
(60506-60512):3191649-3191655, (60513-60519):3191657-3191663,
(60520):3199666, (60521):3199669, (60522-60730):3199792-3200000,
(60731-60736):3200002-3200007, (60737-60743):3200065-3200071,
(60744-60750):3200129-3200135, (60751):3200192,
(60752-60764):3200194-3200206, (60765-60769):3200211-3200215,
(60770-60776):3200257-3200263, (60777-60783):3200321-3200327,
(60784-60790):3200385-3200391, (60791-60797):3200449-3200455,
(60798-60804):3200513-3200519, (60805-60811):3200577-3200583,
(60812-60817):3200594-3200599, (60818-60824):3200641-3200647,
(60825-61451):3201240-3201866, (IND):3190675,
(61452-61639):3201868-3202055, (61640-61647):3202624-3202631,
(61648-61654):3202689-3202695, (61655-62475):3203184-3204004,
(IND):3201867, (62476-63141):3204006-3204671,
(63142-63499):3204680-3205037, (IND):3204005,
(63500-63581):3205038-3205119, (63582-63999):3205152-3205569,
(64000):13944379, (IND):13944378, (DIND):13944377
TOTAL: 64065

The file system is pretty full (over 70%), so no long streaks of
continuous blocks are not available. It's much better than the
original version (with 8K increments). Actually, a non-sparse copy of
the same file looks pretty much similar.

By the way, with recent debugfs versions, you need a patch to print
the actual block numbers in most indirect blocks:

--- e2fsprogs-1.36.orig/debugfs/debugfs.c 2004-12-06 23:45:50.000000000 +0100
+++ e2fsprogs-1.36/debugfs/debugfs.c 2005-03-08 18:00:45.000000000 +0100
@@ -411,7 +411,7 @@
lb.first_block = 0;
lb.f = f;
lb.first = 1;
- ext2fs_block_iterate2(current_fs, inode, 0, NULL,
+ ext2fs_block_iterate2(current_fs, inode, BLOCK_FLAG_DEPTH_TRAVERSE, NULL,
list_blocks_proc, (void *)&lb);
finish_range(&lb);
if (lb.total)
Back to top
bostic@sleepycat.com
*nix forums beginner


Joined: 21 Jun 2005
Posts: 49

PostPosted: Tue Mar 08, 2005 2:03 pm    Post subject: Re: Preallocate backing file for the Berkeley DB cache Reply with quote

Quote:
However, OS_VMPAGESIZE is set to 8192 unconditionally (see
dbinc/region.h), and DB_REGION_INIT touches pointers in
OS_VMPAGESIZE increments. Many systems have a page size of
4096, so it actually makes things worse because it
practically *guarantees* fragmentation of the underlying file.

Have you actually seen this happen anywhere? If so, on what
operating system/filesystem combination?

Does changing OS_VMPAGESIZE to 4KB make a difference on that
system?

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (Yahoo)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com
Back to top
Florian Weimer
*nix forums Guru


Joined: 19 Feb 2005
Posts: 418

PostPosted: Sun Mar 06, 2005 5:20 pm    Post subject: Re: Preallocate backing file for the Berkeley DB cache Reply with quote

* Philip Guenther:

Quote:
If the process that creates the environment sets the DB_REGION_INIT flag
before the DB_ENV->open() call, then the open will preallocate all the
region files, including the memory pool.

Ah, I missed that one. Thanks.

However, OS_VMPAGESIZE is set to 8192 unconditionally (see
dbinc/region.h), and DB_REGION_INIT touches pointers in OS_VMPAGESIZE
increments. Many systems have a page size of 4096, so it actually
makes things worse because it practically *guarantees* fragmentation
of the underlying file. 8-(
Back to top
Philip Guenther
*nix forums beginner


Joined: 06 Mar 2005
Posts: 6

PostPosted: Sun Mar 06, 2005 4:50 pm    Post subject: Re: Preallocate backing file for the Berkeley DB cache Reply with quote

Florian Weimer <fw@deneb.enyo.de> writes:
Quote:
Currently, the file backing the Berkeley DB cache is not preallocated
when it's created. Only a sparse file is created.
....
I think this could be avoided if the backing file is preallocated and
not just created as a sparse file.

If the process that creates the environment sets the DB_REGION_INIT flag
before the DB_ENV->open() call, then the open will preallocate all the
region files, including the memory pool.

(Don't forgot that you can make that change via a DB_CONFIG file...)


Philip Guenther
Back to top
Florian Weimer
*nix forums Guru


Joined: 19 Feb 2005
Posts: 418

PostPosted: Sat Mar 05, 2005 9:01 pm    Post subject: Preallocate backing file for the Berkeley DB cache Reply with quote

Currently, the file backing the Berkeley DB cache is not preallocated
when it's created. Only a sparse file is created. This means that
most file systems create a heavily fragmented backing file over time,
when more and more data is actually written to disk. If the
application which uses Berkeley DB terminates, recent Linux 2.6
versions start to immediately write the backing file. Because of its
heavy fragmentation, this write operation is rather slow.

I think this could be avoided if the backing file is preallocated and
not just created as a sparse file. (I still have to run a simulation,
to check if this is really the case, though.)
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [7 Posts] View previous topic :: View next topic
The time now is Thu Jan 08, 2009 4:30 am | All times are GMT
navigation Forum index » Databases » Berkeley DB
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Running php file everyday on scheduled time sachin PHP 1 Fri Jul 21, 2006 12:49 pm
No new posts Regarding thesaurus iso file Srikanth modules 0 Fri Jul 21, 2006 10:42 am
No new posts how can i get a file descriptor not used? mars system 0 Fri Jul 21, 2006 7:41 am
No new posts small GTK "Open file" dialog David Siroky Debian 0 Fri Jul 21, 2006 7:30 am
No new posts Trouble Declaring 3D Array in Header File free2klim C++ 1 Fri Jul 21, 2006 4:07 am

Loans | Bankruptcy | Debt Consolidation | Mortgage Calculator | Problem Mortgage
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2399s ][ Queries: 20 (0.1321s) ][ GZIP on - Debug on ]