niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Databases » Berkeley DB
Performance and Consistency ??
Post new topic   Reply to topic Page 1 of 1 [5 Posts] View previous topic :: View next topic
Author Message
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Fri Jul 21, 2006 4:24 am    Post subject: Performance and Consistency ?? Reply with quote

I use berkeley-db in an application to store a mass of data. These data
is coming very frequently, nearly 50 thousands items per second. So i
don't want to use transaction. But, if there were no transaction, the
db files woulb be corrupted in disaster, such as power off. Missing
data in memory is accepted, but the whole db must be kept for
subsequent access.

Then, I ask all of you for help:

if i use transaction, how to configure to gain a high performance?
if not, how to recover from disaster?
Back to top
Michael Cahill
*nix forums Guru Wannabe


Joined: 26 May 2005
Posts: 219

PostPosted: Fri Jul 21, 2006 5:23 am    Post subject: Re: Performance and Consistency ?? Reply with quote

Hi,

Quote:
I use berkeley-db in an application to store a mass of data. These data
is coming very frequently, nearly 50 thousands items per second.

What are these events coming in at 50,000 / second? Is this a
sustained or peak rate?

Quote:
So i
don't want to use transaction. But, if there were no transaction, the
db files woulb be corrupted in disaster, such as power off. Missing
data in memory is accepted, but the whole db must be kept for
subsequent access.

I think you'll want transactions, but with the DB_TXN_WRITE_NOSYNC
flag, so that the transaction doesn't have to wait for disk I/O at
commit time. With that flag, you are guaranteed to be able to recover
databases to some consistent point in time, but some of the most recent
updates before a crash may be lost.

That said, if you need to achieve a sustained 50,000 updates per
second, you will need to think carefully about structuring your data
for locality and eliminating contention (are these updates
single-threaded?). Any I/O or lock contention would make it difficult
or impossible to maintain that sort of throughput.

Regards,
Michael.
Back to top
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Fri Jul 21, 2006 5:45 am    Post subject: Re: Performance and Consistency ?? Reply with quote

50,000 / second is a sustained rate

And the data input maybe last for months or even years Sad If i use
transaction and log, can i control the log file's size?

What about db_dump? Can i use it to recovery data after disaster
without transaction?


Michael Cahill wrote:
Quote:
Hi,

I use berkeley-db in an application to store a mass of data. These data
is coming very frequently, nearly 50 thousands items per second.

What are these events coming in at 50,000 / second? Is this a
sustained or peak rate?

So i
don't want to use transaction. But, if there were no transaction, the
db files woulb be corrupted in disaster, such as power off. Missing
data in memory is accepted, but the whole db must be kept for
subsequent access.

I think you'll want transactions, but with the DB_TXN_WRITE_NOSYNC
flag, so that the transaction doesn't have to wait for disk I/O at
commit time. With that flag, you are guaranteed to be able to recover
databases to some consistent point in time, but some of the most recent
updates before a crash may be lost.

That said, if you need to achieve a sustained 50,000 updates per
second, you will need to think carefully about structuring your data
for locality and eliminating contention (are these updates
single-threaded?). Any I/O or lock contention would make it difficult
or impossible to maintain that sort of throughput.

Regards,
Michael.
Back to top
Michael Cahill
*nix forums Guru Wannabe


Joined: 26 May 2005
Posts: 219

PostPosted: Fri Jul 21, 2006 7:57 am    Post subject: Re: Performance and Consistency ?? Reply with quote

Quote:
50,000 / second is a sustained rate

And the data input maybe last for months or even years Sad If i use
transaction and log, can i control the log file's size?

Sure, you can control the size of individual log files (with
DB_ENV->set_lg_max), and you can control how many Berkeley DB needs to
keep by varying the rate of your checkpoints.

There is no inherent reason why you can't execute 50,000 transactions /
second, but you may also want to consider grouping multiple updates
into a single transaction to reduce some overhead (in addition to the
DB_TXN_WRITE_NOSYNC flag).

Another issue you will need to consider is the appropriate access
method. If you use keys that are allocated sequentially, you should be
able to get good cache locality with a btree, recno or queue database.
Only queue will perform well if there are concurrent updates to the
head or tail, though. I'm going to assume that these are
single-threaded inserts.

What do your queries look like? Can you partition the data based on
when it arrives? That is, can you have a separate table for each hour,
for example? Otherwise, another issue you will face is that as the
database gets bigger and bigger, you will need to walk down more levels
to get to the leaf nodes (assuming we are talking about a btree or
recno database). That will also have an impact on performance.

Quote:
What about db_dump? Can i use it to recovery data after disaster
without transaction?

You may be able to salvage some data this way, but there are no
guarantees unless you use transactions.

Michael.
Back to top
likun.navipal@gmail.com
*nix forums beginner


Joined: 08 May 2006
Posts: 15

PostPosted: Fri Jul 21, 2006 8:50 am    Post subject: Re: Performance and Consistency ?? Reply with quote

Thanks for your help.

Michael Cahill wrote:
Quote:
50,000 / second is a sustained rate

And the data input maybe last for months or even years Sad If i use
transaction and log, can i control the log file's size?

Sure, you can control the size of individual log files (with
DB_ENV->set_lg_max), and you can control how many Berkeley DB needs to
keep by varying the rate of your checkpoints.

There is no inherent reason why you can't execute 50,000 transactions /
second, but you may also want to consider grouping multiple updates
into a single transaction to reduce some overhead (in addition to the
DB_TXN_WRITE_NOSYNC flag).

Yes, this is a good idea, i should group multiple updates together to

put into db.

Quote:
Another issue you will need to consider is the appropriate access
method. If you use keys that are allocated sequentially, you should be
able to get good cache locality with a btree, recno or queue database.
Only queue will perform well if there are concurrent updates to the
head or tail, though. I'm going to assume that these are
single-threaded inserts.

The app collects data from a large number of sources. The sources will

emit data per one or two seconds. The key is composed by source id and
timestamp.

Quote:
What do your queries look like? Can you partition the data based on
when it arrives? That is, can you have a separate table for each hour,
for example? Otherwise, another issue you will face is that as the
database gets bigger and bigger, you will need to walk down more levels
to get to the leaf nodes (assuming we are talking about a btree or
recno database). That will also have an impact on performance.

Usually, client will ask the app to retrieve one source's historical

data in a time span.

Quote:
What about db_dump? Can i use it to recovery data after disaster
without transaction?

You may be able to salvage some data this way, but there are no
guarantees unless you use transactions.

No guarantee? Is that means the data in db files may lost at all?


> Michael.
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [5 Posts] View previous topic :: View next topic
The time now is Sun Nov 23, 2008 2:53 pm | All times are GMT
navigation Forum index » Databases » Berkeley DB
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts sosend/soreceive consistency improvements Robert Watson Architecture 4 Sun Jul 23, 2006 6:57 pm
No new posts AIX performance tuning jpzhai@gmail.com AIX 5 Fri Jul 21, 2006 2:27 am
No new posts Performance problem News AIX 1 Wed Jul 19, 2006 9:55 am
No new posts Antw: Performance problem with query Christian Rengstl PostgreSQL 10 Tue Jul 18, 2006 6:24 pm
No new posts performance considerations (looong) Pavel Stratil Apache 2 Tue Jul 18, 2006 3:14 pm

Mobile Phones | Problem Mortgage | Record Internet Radio with Tags | Football Predictions | Electricity Suppliers
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2103s ][ Queries: 16 (0.1137s) ][ GZIP on - Debug on ]