niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Databases » Berkeley DB
real data vs. db file
Post new topic   Reply to topic Page 1 of 1 [12 Posts] View previous topic :: View next topic
Author Message
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Fri Jun 02, 2006 2:15 pm    Post subject: Re: real data vs. db file Reply with quote

I use subdatabases for such reasons:

The program will save many devices's output data. At first, i want to
create one subdatabase for each device, but the device number will up
to 50,000! So i decide to create one subdatabase for several devices.
If i put all devices' data into one subdatabase, i think it is not
effecient.
Maybe this decision is wrong.

The source code i put above is just a sample :)

In my program, the device's output data is just 4 or 8 bytes long, but
there are caches, each cache is divided into several blocks, the block
is 1024 bytes. Most devices' data will be compressed, that means the
block may be compressed to tens of bytes and then put into Berkeley DB.
So, what i concerned is: data size is 1024 bytes at most, tens of bytes
usually. Because each subdatabase holds several devices' data, the
data put into Berkeley DB is not ordered.

I use db_stat -d -s to find the statistic data for a subdatabase, and i
find that, for no-order data, the fill-factor for data size from 16, 32
to 512, are 45% -- 60%. I try to set page size as 4K and 8K, it only
changes when data size is 1024.
It is a little sad to see that the fill-factors are not very high.

Thanks for your reply.

Philip Guenther 写道:

Quote:
likun.navipal@gmail.com wrote:
please take a look at my source code, 1024*128 items have been
inserted. if there is only one item in the db, why the db file is so large?

Oops, my mistake: you're using subdatabases. So the single 'item'
shown by db_stat -d is the subdatabase in which all your data is
actually held. Unless you have a particular reason for using
subdatabases (bundling of multiple indexes, etc) I would recommend
passing NULL as the 'database' argument to DB->open(). Subdatabases
add a couple pages of overhead of their own, plus the extra complexity.

For now, you can get the stats for the _real_ B-tree using both the -d
option and the -s option. The later should be supplied the name of the
subdatabase, as stored in the 'table_name' variable in your code.
Since you're using a real environment you'll also want to specify the
path to the environment home using the -h option, ala:

db_stat -h db_home -d table_file_name -s table_name


Btw, I hope your code was intended as a sample only. It's obviously
incomplete---several variable declarations are missing---but if you
really did leave out the DB->close() and DBENV->close() calls then
you're going to have all sorts of problems. The concurrent data store
version enabled by the DB_INIT_CDB does _not_ guarantee recoverability
after an unclean close.

Anyway, when I take your code, add the missing bits, run it with the
default data size of 1024, and then do the db_stat above on the result,
I get:

Fri Jun 2 00:09:45 2006 Local time
53162 Btree magic number
9 Btree version number
Little-endian Byte order
multiple-databases Flags
2 Minimum keys per-page
4096 Underlying database page size
3 Number of levels in the tree
131072 Number of unique keys in the tree
131072 Number of data items in the tree
30 Number of tree internal pages
56764 Number of bytes free in tree internal pages (53% ff)
2157 Number of tree leaf pages
4064398 Number of bytes free in tree leaf pages (53% ff)
0 Number of tree duplicate pages
0 Number of bytes free in tree duplicate pages (0% ff)
131072 Number of tree overflow pages
399M Number of bytes free in tree overflow pages (25% ff)
0 Number of empty pages
0 Number of pages on the free list


So, a 53% fill-factor for the tree itself and 25% for the overflow
pages. That makes sense: the low fill-factor for the tree is caused by
the out-of-order insertion, while the low fill-factor for the overflow
pages is a direct result of the 4kB page size with 1kB items. Indeed,
for 4kB pages, data items of 1kB are pretty much pessimal: if they were
smaller they wouldn't be put on overflow pages and they could grow to
almost 4kB in size without using any additional file space.

We can confirm the "out-of-order causes 53% ff" deduction easily enough
by inserting the entries in order by changing the sprintf() format to
"aaaa_key_%06d". The average key length will actually increase with
that, but when we run it again and check db_stat, the relevant lines
now show:

10 Number of tree internal pages
5368 Number of bytes free in tree internal pages (86% ff)
1171 Number of tree leaf pages
47378 Number of bytes free in tree leaf pages (99% ff)

Yep, 1/3 the internal pages and 1/2 the leaf pages compared to the
out-of-order insertion, despite the larger average key size.


Now, what to do about the overflow pages bit. Well, increasing the
page size used would permit all the values to go on the primary pages
where they would be packed together. So, let's increase the page size
from 4kB to 16kB using DB->set_pagesize() and see what happens:

Fri Jun 2 01:15:59 2006 Local time
53162 Btree magic number
9 Btree version number
Little-endian Byte order
multiple-databases Flags
2 Minimum keys per-page
16384 Underlying database page size
3 Number of levels in the tree
131072 Number of unique keys in the tree
131072 Number of data items in the tree
19 Number of tree internal pages
29476 Number of bytes free in tree internal pages (90% ff)
9363 Number of tree leaf pages
15M Number of bytes free in tree leaf pages (90% ff)
0 Number of tree duplicate pages
0 Number of bytes free in tree duplicate pages (0% ff)
0 Number of tree overflow pages
0 Number of bytes free in tree overflow pages (0% ff)
0 Number of empty pages
0 Number of pages on the free list


Poof! No more overflow pages with their low fill-factor and the file
was only 146.64MB compared to the 516.625MB file of the original 4kB
page, out-of-order insertion version.


Philip Guenther
Back to top
guenther@gmail.com
*nix forums beginner


Joined: 29 Jan 2006
Posts: 8

PostPosted: Fri Jun 02, 2006 7:22 am    Post subject: Re: real data vs. db file Reply with quote

likun.navipal@gmail.com wrote:
Quote:
please take a look at my source code, 1024*128 items have been
inserted. if there is only one item in the db, why the db file is so large?

Oops, my mistake: you're using subdatabases. So the single 'item'
shown by db_stat -d is the subdatabase in which all your data is
actually held. Unless you have a particular reason for using
subdatabases (bundling of multiple indexes, etc) I would recommend
passing NULL as the 'database' argument to DB->open(). Subdatabases
add a couple pages of overhead of their own, plus the extra complexity.

For now, you can get the stats for the _real_ B-tree using both the -d
option and the -s option. The later should be supplied the name of the
subdatabase, as stored in the 'table_name' variable in your code.
Since you're using a real environment you'll also want to specify the
path to the environment home using the -h option, ala:

db_stat -h db_home -d table_file_name -s table_name


Btw, I hope your code was intended as a sample only. It's obviously
incomplete---several variable declarations are missing---but if you
really did leave out the DB->close() and DBENV->close() calls then
you're going to have all sorts of problems. The concurrent data store
version enabled by the DB_INIT_CDB does _not_ guarantee recoverability
after an unclean close.

Anyway, when I take your code, add the missing bits, run it with the
default data size of 1024, and then do the db_stat above on the result,
I get:

Fri Jun 2 00:09:45 2006 Local time
53162 Btree magic number
9 Btree version number
Little-endian Byte order
multiple-databases Flags
2 Minimum keys per-page
4096 Underlying database page size
3 Number of levels in the tree
131072 Number of unique keys in the tree
131072 Number of data items in the tree
30 Number of tree internal pages
56764 Number of bytes free in tree internal pages (53% ff)
2157 Number of tree leaf pages
4064398 Number of bytes free in tree leaf pages (53% ff)
0 Number of tree duplicate pages
0 Number of bytes free in tree duplicate pages (0% ff)
131072 Number of tree overflow pages
399M Number of bytes free in tree overflow pages (25% ff)
0 Number of empty pages
0 Number of pages on the free list


So, a 53% fill-factor for the tree itself and 25% for the overflow
pages. That makes sense: the low fill-factor for the tree is caused by
the out-of-order insertion, while the low fill-factor for the overflow
pages is a direct result of the 4kB page size with 1kB items. Indeed,
for 4kB pages, data items of 1kB are pretty much pessimal: if they were
smaller they wouldn't be put on overflow pages and they could grow to
almost 4kB in size without using any additional file space.

We can confirm the "out-of-order causes 53% ff" deduction easily enough
by inserting the entries in order by changing the sprintf() format to
"aaaa_key_%06d". The average key length will actually increase with
that, but when we run it again and check db_stat, the relevant lines
now show:

10 Number of tree internal pages
5368 Number of bytes free in tree internal pages (86% ff)
1171 Number of tree leaf pages
47378 Number of bytes free in tree leaf pages (99% ff)

Yep, 1/3 the internal pages and 1/2 the leaf pages compared to the
out-of-order insertion, despite the larger average key size.


Now, what to do about the overflow pages bit. Well, increasing the
page size used would permit all the values to go on the primary pages
where they would be packed together. So, let's increase the page size
from 4kB to 16kB using DB->set_pagesize() and see what happens:

Fri Jun 2 01:15:59 2006 Local time
53162 Btree magic number
9 Btree version number
Little-endian Byte order
multiple-databases Flags
2 Minimum keys per-page
16384 Underlying database page size
3 Number of levels in the tree
131072 Number of unique keys in the tree
131072 Number of data items in the tree
19 Number of tree internal pages
29476 Number of bytes free in tree internal pages (90% ff)
9363 Number of tree leaf pages
15M Number of bytes free in tree leaf pages (90% ff)
0 Number of tree duplicate pages
0 Number of bytes free in tree duplicate pages (0% ff)
0 Number of tree overflow pages
0 Number of bytes free in tree overflow pages (0% ff)
0 Number of empty pages
0 Number of pages on the free list


Poof! No more overflow pages with their low fill-factor and the file
was only 146.64MB compared to the 516.625MB file of the original 4kB
page, out-of-order insertion version.


Philip Guenther
Back to top
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Fri Jun 02, 2006 3:27 am    Post subject: Re: real data vs. db file Reply with quote

please take a look at my source code, 1024*128 items have been
inserted.

if there is only one item in the db, why the db file is so large?

Philip Guenther 写道:

Quote:
likun.navipal@gmail.com wrote:
db_stat -d file-1024.db (data size is 1024), the result is :
...
4096 Underlying database page size
1 Number of levels in the tree
1 Number of unique keys in the tree
1 Number of data items in the tree

Umm, you only have a single item in the table; of course it'll be
inefficient space-wise at that point: you don't even have enough data
to fill a single page! You should build a database of a size that
would/will be 'typical' and snag the stats on it before you start
worrying about fill-factors or overhead.


Philip Guenther
Back to top
guenther@gmail.com
*nix forums beginner


Joined: 29 Jan 2006
Posts: 8

PostPosted: Fri Jun 02, 2006 2:56 am    Post subject: Re: real data vs. db file Reply with quote

likun.navipal@gmail.com wrote:
Quote:
db_stat -d file-1024.db (data size is 1024), the result is :
....
4096 Underlying database page size
1 Number of levels in the tree
1 Number of unique keys in the tree
1 Number of data items in the tree

Umm, you only have a single item in the table; of course it'll be
inefficient space-wise at that point: you don't even have enough data
to fill a single page! You should build a database of a size that
would/will be 'typical' and snag the stats on it before you start
worrying about fill-factors or overhead.


Philip Guenther
Back to top
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Fri Jun 02, 2006 1:16 am    Post subject: Re: real data vs. db file Reply with quote

ok

db_stat -d file-1024.db (data size is 1024), the result is :

Fri Jun 2 09:12:11 2006 Local time
53162 Btree magic number
9 Btree version number
Little-endian Byte order
multiple-databases Flags
2 Minimum keys per-page
4096 Underlying database page size
1 Number of levels in the tree
1 Number of unique keys in the tree
1 Number of data items in the tree
0 Number of tree internal pages
0 Number of bytes free in tree internal pages (0% ff)
1 Number of tree leaf pages
4046 Number of bytes free in tree leaf pages (1% ff)
0 Number of tree duplicate pages
0 Number of bytes free in tree duplicate pages (0% ff)
0 Number of tree overflow pages
0 Number of bytes free in tree overflow pages (0% ff)
0 Number of empty pages
0 Number of pages on the free list

and this is for the db_stat -d file-512.db (data size is 512) :

Fri Jun 2 09:19:06 2006 Local time
53162 Btree magic number
9 Btree version number
Little-endian Byte order
multiple-databases Flags
2 Minimum keys per-page
4096 Underlying database page size
1 Number of levels in the tree
1 Number of unique keys in the tree
1 Number of data items in the tree
0 Number of tree internal pages
0 Number of bytes free in tree internal pages (0% ff)
1 Number of tree leaf pages
4046 Number of bytes free in tree leaf pages (1% ff)
0 Number of tree duplicate pages
0 Number of bytes free in tree duplicate pages (0% ff)
0 Number of tree overflow pages
0 Number of bytes free in tree overflow pages (0% ff)
0 Number of empty pages
0 Number of pages on the free list

Philip Guenther 写道:

Quote:
likun.navipal@gmail.com wrote:
I really used db_stat to try to find something, what i want to know is
why the db file/real data is from 1.3 to 4. The db_stat for the db
which db file/real data is 4 will put at the end.

Could you run that again, this time with the -d option and the name of
the .db file instead of the -e option? The -e option dumps the
environment statistics, while the -d options dumps the table statistics
for the given table.

Philip Guenther
Back to top
guenther@gmail.com
*nix forums beginner


Joined: 29 Jan 2006
Posts: 8

PostPosted: Thu Jun 01, 2006 10:01 pm    Post subject: Re: real data vs. db file Reply with quote

likun.navipal@gmail.com wrote:
Quote:
I really used db_stat to try to find something, what i want to know is
why the db file/real data is from 1.3 to 4. The db_stat for the db
which db file/real data is 4 will put at the end.

Could you run that again, this time with the -d option and the name of
the .db file instead of the -e option? The -e option dumps the
environment statistics, while the -d options dumps the table statistics
for the given table.

Philip Guenther
Back to top
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Thu Jun 01, 2006 2:36 pm    Post subject: Re: real data vs. db file Reply with quote

thans for your opinion.

I really used db_stat to try to find something, what i want to know is
why the db file/real data is from 1.3 to 4. The db_stat for the db
which db file/real data is 4 will put at the end.

I use berkeley db to store devices' historical data, which has some
characters:
1. every device will issuing data out continuously every 1 or several
seconds, and the device number may be large
2. several devices' data will be put into one berkeley db's database
3. normal access: write the data; query a device's data by time range;
nearly never change the data stored in berkeley db

The insert and query speed are met my requirement, but the space is not
now.

db_stat -e result for the database which has a data size of 1024

Thu Jun 1 22:38:39 2006 Local time
0x120897 Magic number
0 Panic value
4.4.16 Environment version
Thu Jun 1 16:20:26 2006 Creation time
0xd32ead0e Environment ID
1 Primary region allocation and reference count mutex [0/2 0% !Own]
1 References
Thread status blocks:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-272KB Mutex region size
0 The number of region locks that required waiting (0%)
4 Mutex alignment
1 Mutex test-and-set spins
2352 Mutex total count
2242 Mutex free count
110 Mutex in-use count
115 Mutex maximum in-use count
Mutex counts
2352 Unallocated
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-4 Last allocated locker ID
0x7fffffff Current maximum unused locker ID
5 Number of lock modes
1000 Maximum number of locks possible
1000 Maximum number of lockers possible
1000 Maximum number of lock objects possible
0 Number of current locks
3 Maximum number of locks at any one time
0 Number of current lockers
3 Maximum number of lockers at any one time
0 Number of current lock objects
3 Maximum number of lock objects at any one time
131082 Total number of locks requested
131082 Total number of locks released
1 Total number of locks upgraded
3 Total number of locks downgraded
0 Lock requests not available due to conflicts, for which we waited
0 Lock requests not available due to conflicts, for which we did not
wait
0 Number of deadlocks
0 Lock timeout value
0 Number of locks that have timed out
0 Transaction timeout value
0 Number of transactions that have timed out
344KB The size of the lock region
0 The number of region locks that required waiting (0%)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-257KB 868B Total cache size
1 Number of caches
264KB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
0 Requested pages mapped into the process' address space
518367 Requested pages found in the cache (99%)
2 Requested pages not found in the cache
132395 Pages created in the cache
2 Pages read into the cache
132399 Pages written from the cache to the backing file
2 Clean pages forced from the cache
132330 Dirty pages forced from the cache
0 Dirty pages written by trickle-sync thread
65 Current total page count
65 Current clean page count
0 Current dirty page count
37 Number of hash buckets used for page location
650766 Total number of times hash chains searched for a page
5 The longest hash chain searched for a page
1439523 Total number of hash chain entries checked for page
0 The number of hash bucket locks that required waiting (0%)
0 The maximum number of times any hash bucket lock was waited for
0 The number of region locks that required waiting (0%)
132402 The number of page allocations
265370 The number of hash buckets examined during allocations
4 The maximum number of hash buckets examined for an allocation
132332 The number of pages examined during allocations
1 The max number of pages examined for an allocation
Pool File: aa_table_file.db
4096 Page size
0 Requested pages mapped into the process' address space
518367 Requested pages found in the cache (99%)
2 Requested pages not found in the cache
132395 Pages created in the cache
2 Pages read into the cache
132399 Pages written from the cache to the backing file


Philip Guenther 写道:

Quote:
likun.navipal@gmail.com wrote:
1. althouth it has some internal data structures in berkeley db, the db
file/real data be high to 4 is hard to understand.

What about it is hard to understand? Did you actually use db_stat and
do the math on what it returned to see how much overhead was caused by
each factor I mentioned?


2. you mentioned about "fill-factor", which means the precentage of
total data space that's actually holding information. can i control the
"fill-factor", e.g., set to 0.8?

No. In B-trees the fill-factor is a consequence of the sizes of the
keys and values and the insertion order and not something you can
configure.


3. you said that, If a key or value would take up more than a quarter
of the space on a
page, then it'll get placed on an 'overflow' page instead. why is a
quarter? is it a waste? how can i change it?

It's a quarter because the default for the bt_minkeys parameter is 2
and a key/value pair takes 2 slots. For information on how to change
that parameter and its effects, please read section 2.5.3 of the
reference guide.

As for whether that's a waste, that depends on what you're trying to
save. Space? Speed of searches? Speed of inserts and deletes?
Programming complexity? Since you haven't described the limits that
your application needs to run under nor your reasons for using Berkeley
DB to begin with, it's impossible to say whether the default
key-per-page value or even the factor of four growth is a good tradeoff
or a poor one. If the size of the data on disk is the overriding
concern than you should consider alternative storage formats.
Depending on your tradeoffs, you may be better off completely tossing
BDB and hand-coding the storage layer of your application. You're the
only one who knows what tradeoffs you're willing to make.


Philip Guenther
Back to top
guenther@gmail.com
*nix forums beginner


Joined: 29 Jan 2006
Posts: 8

PostPosted: Wed May 31, 2006 10:42 pm    Post subject: Re: real data vs. db file Reply with quote

likun.navipal@gmail.com wrote:
Quote:
1. althouth it has some internal data structures in berkeley db, the db
file/real data be high to 4 is hard to understand.

What about it is hard to understand? Did you actually use db_stat and
do the math on what it returned to see how much overhead was caused by
each factor I mentioned?


Quote:
2. you mentioned about "fill-factor", which means the precentage of
total data space that's actually holding information. can i control the
"fill-factor", e.g., set to 0.8?

No. In B-trees the fill-factor is a consequence of the sizes of the
keys and values and the insertion order and not something you can
configure.


Quote:
3. you said that, If a key or value would take up more than a quarter
of the space on a
page, then it'll get placed on an 'overflow' page instead. why is a
quarter? is it a waste? how can i change it?

It's a quarter because the default for the bt_minkeys parameter is 2
and a key/value pair takes 2 slots. For information on how to change
that parameter and its effects, please read section 2.5.3 of the
reference guide.

As for whether that's a waste, that depends on what you're trying to
save. Space? Speed of searches? Speed of inserts and deletes?
Programming complexity? Since you haven't described the limits that
your application needs to run under nor your reasons for using Berkeley
DB to begin with, it's impossible to say whether the default
key-per-page value or even the factor of four growth is a good tradeoff
or a poor one. If the size of the data on disk is the overriding
concern than you should consider alternative storage formats.
Depending on your tradeoffs, you may be better off completely tossing
BDB and hand-coding the storage layer of your application. You're the
only one who knows what tradeoffs you're willing to make.


Philip Guenther
Back to top
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Wed May 31, 2006 7:43 am    Post subject: Re: real data vs. db file Reply with quote

but i still have some problems:

1. althouth it has some internal data structures in berkeley db, the db
file/real data be high to 4 is hard to understand.

2. you mentioned about "fill-factor", which means the precentage of
total data space that's actually holding information. can i control the
"fill-factor", e.g., set to 0.8?

3. you said that, If a key or value would take up more than a quarter
of the space on a
page, then it'll get placed on an 'overflow' page instead. why is a
quarter? is it a waste? how can i change it?

many thanks to you!

guenther@gmail.com 写道:

Quote:
likun.navipal@gmail.com wrote:
...
for (int i=0; i<1024*128; i++)
{
sprintf(sz_key, "aaaa_key_%d", i);
key.data = sz_key;
key.size = strlen(sz_key) + 1;
...
Let's calculate the real data, average key size is 12, so real data is
(12+x)*1024*128 Bytes :

Actually, the average key size is a bit over 15 bytes; of the 131072
keys, 90,000 are 15 bytes in length.


data size db files real data db files/real data
16 12M 3.5M 3.43
32 16M 5.5M 2.91
64 24M 9.5M 2.53
128 40M 17.5M 2.29
256 70M 33.5M 2.90
512 120M 65.5M 1.83
1024 522M 129.5M 4.03
2048 522M 257.5M 2.03
4096 1.1G 513.5M 2.14
...
The questiongs are: first, why db files/real data are so large?

The DB format adds overhead in a number of places:
- space is needed to track the size of each key and value and the
'type' of each entry, and possibly a byte or two to align the key
and/or value (6 to 8 bytes per entry)
- B-trees are page based, so there will be some "internal
fragmentation" that occurs whenever a new page has to
allocated because the previous page didn't have enough free-space
for the next key/value pair. This is often expressed as the
'fill-factor', the precentage of total data space that's actually
holding information
- each page has a header that tracks various things, such as the
amount of free space on the page, the page-numbers of the pages
before and after it in the tree, and the LSNs for transaction
recovery (26 bytes)
- the B-tree has internal pages that track the locations of the pages
below them
- the first page of the B-tree holds metadata for the entire tree (1
page)


Also, your code inserts the keys out of sorted order (the key
"aaaa_key_13" sorts _before_ "aaaa_key_4") which will result more
internal fragmentation (a lower fill-factor) than if you inserted them
in sorted order.


second, look at the line when data size is 512 and 1024, the smallest and
biggest db files/real data. why?

If a key or value would take up more than a quarter of the space on a
page, then it'll get placed on an 'overflow' page instead and a
reference to that page goes on the normal B-tree page.


Much of the above is described or discussed in the DB Reference Guide
in the DB source tree and at
http://www.sleepycat.com/docs/ref/toc.html

The db_stat utility can be used to get many statistics about a DB file,
such as the fill-factor and the number of overflow pages.


Philip Guenther
Back to top
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Tue May 30, 2006 8:30 am    Post subject: Re: real data vs. db file Reply with quote

thank you very much!

guenther@gmail.com 写道:

Quote:
likun.navipal@gmail.com wrote:
...
for (int i=0; i<1024*128; i++)
{
sprintf(sz_key, "aaaa_key_%d", i);
key.data = sz_key;
key.size = strlen(sz_key) + 1;
...
Let's calculate the real data, average key size is 12, so real data is
(12+x)*1024*128 Bytes :

Actually, the average key size is a bit over 15 bytes; of the 131072
keys, 90,000 are 15 bytes in length.


data size db files real data db files/real data
16 12M 3.5M 3.43
32 16M 5.5M 2.91
64 24M 9.5M 2.53
128 40M 17.5M 2.29
256 70M 33.5M 2.90
512 120M 65.5M 1.83
1024 522M 129.5M 4.03
2048 522M 257.5M 2.03
4096 1.1G 513.5M 2.14
...
The questiongs are: first, why db files/real data are so large?

The DB format adds overhead in a number of places:
- space is needed to track the size of each key and value and the
'type' of each entry, and possibly a byte or two to align the key
and/or value (6 to 8 bytes per entry)
- B-trees are page based, so there will be some "internal
fragmentation" that occurs whenever a new page has to
allocated because the previous page didn't have enough free-space
for the next key/value pair. This is often expressed as the
'fill-factor', the precentage of total data space that's actually
holding information
- each page has a header that tracks various things, such as the
amount of free space on the page, the page-numbers of the pages
before and after it in the tree, and the LSNs for transaction
recovery (26 bytes)
- the B-tree has internal pages that track the locations of the pages
below them
- the first page of the B-tree holds metadata for the entire tree (1
page)


Also, your code inserts the keys out of sorted order (the key
"aaaa_key_13" sorts _before_ "aaaa_key_4") which will result more
internal fragmentation (a lower fill-factor) than if you inserted them
in sorted order.


second, look at the line when data size is 512 and 1024, the smallest and
biggest db files/real data. why?

If a key or value would take up more than a quarter of the space on a
page, then it'll get placed on an 'overflow' page instead and a
reference to that page goes on the normal B-tree page.


Much of the above is described or discussed in the DB Reference Guide
in the DB source tree and at
http://www.sleepycat.com/docs/ref/toc.html

The db_stat utility can be used to get many statistics about a DB file,
such as the fill-factor and the number of overflow pages.


Philip Guenther
Back to top
guenther@gmail.com
*nix forums beginner


Joined: 29 Jan 2006
Posts: 8

PostPosted: Tue May 30, 2006 3:26 am    Post subject: Re: real data vs. db file Reply with quote

likun.navipal@gmail.com wrote:
....
Quote:
for (int i=0; i<1024*128; i++)
{
sprintf(sz_key, "aaaa_key_%d", i);
key.data = sz_key;
key.size = strlen(sz_key) + 1;
....
Let's calculate the real data, average key size is 12, so real data is
(12+x)*1024*128 Bytes :

Actually, the average key size is a bit over 15 bytes; of the 131072
keys, 90,000 are 15 bytes in length.


Quote:
data size db files real data db files/real data
16 12M 3.5M 3.43
32 16M 5.5M 2.91
64 24M 9.5M 2.53
128 40M 17.5M 2.29
256 70M 33.5M 2.90
512 120M 65.5M 1.83
1024 522M 129.5M 4.03
2048 522M 257.5M 2.03
4096 1.1G 513.5M 2.14
....
The questiongs are: first, why db files/real data are so large?

The DB format adds overhead in a number of places:
- space is needed to track the size of each key and value and the
'type' of each entry, and possibly a byte or two to align the key
and/or value (6 to 8 bytes per entry)
- B-trees are page based, so there will be some "internal
fragmentation" that occurs whenever a new page has to
allocated because the previous page didn't have enough free-space
for the next key/value pair. This is often expressed as the
'fill-factor', the precentage of total data space that's actually
holding information
- each page has a header that tracks various things, such as the
amount of free space on the page, the page-numbers of the pages
before and after it in the tree, and the LSNs for transaction
recovery (26 bytes)
- the B-tree has internal pages that track the locations of the pages
below them
- the first page of the B-tree holds metadata for the entire tree (1
page)


Also, your code inserts the keys out of sorted order (the key
"aaaa_key_13" sorts _before_ "aaaa_key_4") which will result more
internal fragmentation (a lower fill-factor) than if you inserted them
in sorted order.


Quote:
second, look at the line when data size is 512 and 1024, the smallest and
biggest db files/real data. why?

If a key or value would take up more than a quarter of the space on a
page, then it'll get placed on an 'overflow' page instead and a
reference to that page goes on the normal B-tree page.


Much of the above is described or discussed in the DB Reference Guide
in the DB source tree and at
http://www.sleepycat.com/docs/ref/toc.html

The db_stat utility can be used to get many statistics about a DB file,
such as the fill-factor and the number of overflow pages.


Philip Guenther
Back to top
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Tue May 30, 2006 1:40 am    Post subject: real data vs. db file Reply with quote

I write a program that creates one database, and put lots of data into
it, then i calculate the real data and the pysical db files, i find
that the db files are so large.

the follows is the code:

int main(int argc, char **argv)
int rc;
DB_ENV* dbenv;
rc = db_env_create(&dbenv, 0);
dbenv->set_errfile(dbenv, stderr);

DBT key, data;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));

char sz_key[64];
char *sz_data;
int size;

unsigned int flags = DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL |
DB_THREAD;
int mode = 0;
rc = dbenv->open(dbenv, db_home, flags, mode);
if (0 == rc)
{
DB* pdb;
rc = db_create(&pdb, dbenv, 0);
flags = DB_CREATE | DB_EXCL | DB_THREAD;
mode = 0;
rc = pdb->open(pdb, NULL,
table_file_name, table_name,
DB_BTREE, flags, mode);

if (argc > 1)
{
size = atoi(argv[1]); // size is 16, 32, 64, ... , 1024,
2048, 4096
}
else
{
size = 1024;
}

sz_data = (char*)malloc(size);
memset(sz_data, 'k', size);
data.data = sz_data;
data.size = size;

for (int i=0; i<1024*128; i++)
{
sprintf(sz_key, "aaaa_key_%d", i);
key.data = sz_key;
key.size = strlen(sz_key) + 1;

rc = pdb->put(pdb, NULL, &key, &data, 0);
}

free(sz_data);
}

return 0;
}

for each data size from 16 to 4096, db files are too large. Let's
calculate the real data, average key size is 12, so real data is
(12+x)*1024*128 Bytes :
data size db files real data db files/real data
16 12M 3.5M 3.43
32 16M 5.5M 2.91
64 24M 9.5M 2.53
128 40M 17.5M 2.29
256 70M 33.5M 2.90
512 120M 65.5M 1.83
1024 522M 129.5M 4.03
2048 522M 257.5M 2.03
4096 1.1G 513.5M 2.14

All configurations are default, and the OS's page size is 4 KB.

The questiongs are: first, why db files/real data are so large? second,
look at the line when data size is 512 and 1024, the smallest and
biggest db files/real data. why?
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [12 Posts] View previous topic :: View next topic
The time now is Fri Jan 09, 2009 4:39 am | All times are GMT
navigation Forum index » Databases » Berkeley DB
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Running php file everyday on scheduled time sachin PHP 1 Fri Jul 21, 2006 12:49 pm
No new posts Bug#379103: ITP: complearn-gui -- 3D drag-and-drop interf... Rudi Cilibrasi devel 0 Fri Jul 21, 2006 11:00 am
No new posts Regarding thesaurus iso file Srikanth modules 0 Fri Jul 21, 2006 10:42 am
No new posts how can i get a file descriptor not used? mars system 0 Fri Jul 21, 2006 7:41 am
No new posts Bug#379087: ITP: libcomplearn -- data-compression based i... Rudi Cilibrasi devel 0 Fri Jul 21, 2006 7:40 am

Credit Cards | Bankruptcy | Free File Hosting | Actress | MPAA
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.3531s ][ Queries: 20 (0.1717s) ][ GZIP on - Debug on ]