|
|
|
|
|
|
| Author |
Message |
likun.navipal@gmail.com *nix forums beginner
Joined: 08 May 2006
Posts: 15
|
Posted: Fri Jul 21, 2006 4:24 am Post subject:
Performance and Consistency ??
|
|
|
I use berkeley-db in an application to store a mass of data. These data
is coming very frequently, nearly 50 thousands items per second. So i
don't want to use transaction. But, if there were no transaction, the
db files woulb be corrupted in disaster, such as power off. Missing
data in memory is accepted, but the whole db must be kept for
subsequent access.
Then, I ask all of you for help:
if i use transaction, how to configure to gain a high performance?
if not, how to recover from disaster? |
|
| Back to top |
|
 |
Michael Cahill *nix forums Guru Wannabe
Joined: 26 May 2005
Posts: 219
|
Posted: Fri Jul 21, 2006 5:23 am Post subject:
Re: Performance and Consistency ??
|
|
|
Hi,
| Quote: | I use berkeley-db in an application to store a mass of data. These data
is coming very frequently, nearly 50 thousands items per second.
|
What are these events coming in at 50,000 / second? Is this a
sustained or peak rate?
| Quote: | So i
don't want to use transaction. But, if there were no transaction, the
db files woulb be corrupted in disaster, such as power off. Missing
data in memory is accepted, but the whole db must be kept for
subsequent access.
|
I think you'll want transactions, but with the DB_TXN_WRITE_NOSYNC
flag, so that the transaction doesn't have to wait for disk I/O at
commit time. With that flag, you are guaranteed to be able to recover
databases to some consistent point in time, but some of the most recent
updates before a crash may be lost.
That said, if you need to achieve a sustained 50,000 updates per
second, you will need to think carefully about structuring your data
for locality and eliminating contention (are these updates
single-threaded?). Any I/O or lock contention would make it difficult
or impossible to maintain that sort of throughput.
Regards,
Michael. |
|
| Back to top |
|
 |
likun.navipal@gmail.com *nix forums beginner
Joined: 08 May 2006
Posts: 15
|
Posted: Fri Jul 21, 2006 5:45 am Post subject:
Re: Performance and Consistency ??
|
|
|
50,000 / second is a sustained rate
And the data input maybe last for months or even years If i use
transaction and log, can i control the log file's size?
What about db_dump? Can i use it to recovery data after disaster
without transaction?
Michael Cahill wrote:
| Quote: | Hi,
I use berkeley-db in an application to store a mass of data. These data
is coming very frequently, nearly 50 thousands items per second.
What are these events coming in at 50,000 / second? Is this a
sustained or peak rate?
So i
don't want to use transaction. But, if there were no transaction, the
db files woulb be corrupted in disaster, such as power off. Missing
data in memory is accepted, but the whole db must be kept for
subsequent access.
I think you'll want transactions, but with the DB_TXN_WRITE_NOSYNC
flag, so that the transaction doesn't have to wait for disk I/O at
commit time. With that flag, you are guaranteed to be able to recover
databases to some consistent point in time, but some of the most recent
updates before a crash may be lost.
That said, if you need to achieve a sustained 50,000 updates per
second, you will need to think carefully about structuring your data
for locality and eliminating contention (are these updates
single-threaded?). Any I/O or lock contention would make it difficult
or impossible to maintain that sort of throughput.
Regards,
Michael. |
|
|
| Back to top |
|
 |
Michael Cahill *nix forums Guru Wannabe
Joined: 26 May 2005
Posts: 219
|
Posted: Fri Jul 21, 2006 7:57 am Post subject:
Re: Performance and Consistency ??
|
|
|
| Quote: | 50,000 / second is a sustained rate
And the data input maybe last for months or even years If i use
transaction and log, can i control the log file's size?
|
Sure, you can control the size of individual log files (with
DB_ENV->set_lg_max), and you can control how many Berkeley DB needs to
keep by varying the rate of your checkpoints.
There is no inherent reason why you can't execute 50,000 transactions /
second, but you may also want to consider grouping multiple updates
into a single transaction to reduce some overhead (in addition to the
DB_TXN_WRITE_NOSYNC flag).
Another issue you will need to consider is the appropriate access
method. If you use keys that are allocated sequentially, you should be
able to get good cache locality with a btree, recno or queue database.
Only queue will perform well if there are concurrent updates to the
head or tail, though. I'm going to assume that these are
single-threaded inserts.
What do your queries look like? Can you partition the data based on
when it arrives? That is, can you have a separate table for each hour,
for example? Otherwise, another issue you will face is that as the
database gets bigger and bigger, you will need to walk down more levels
to get to the leaf nodes (assuming we are talking about a btree or
recno database). That will also have an impact on performance.
| Quote: | What about db_dump? Can i use it to recovery data after disaster
without transaction?
|
You may be able to salvage some data this way, but there are no
guarantees unless you use transactions.
Michael. |
|
| Back to top |
|
 |
likun.navipal@gmail.com *nix forums beginner
Joined: 08 May 2006
Posts: 15
|
Posted: Fri Jul 21, 2006 8:50 am Post subject:
Re: Performance and Consistency ??
|
|
|
Thanks for your help.
Michael Cahill wrote:
| Quote: | 50,000 / second is a sustained rate
And the data input maybe last for months or even years If i use
transaction and log, can i control the log file's size?
Sure, you can control the size of individual log files (with
DB_ENV->set_lg_max), and you can control how many Berkeley DB needs to
keep by varying the rate of your checkpoints.
There is no inherent reason why you can't execute 50,000 transactions /
second, but you may also want to consider grouping multiple updates
into a single transaction to reduce some overhead (in addition to the
DB_TXN_WRITE_NOSYNC flag).
Yes, this is a good idea, i should group multiple updates together to |
put into db.
| Quote: | Another issue you will need to consider is the appropriate access
method. If you use keys that are allocated sequentially, you should be
able to get good cache locality with a btree, recno or queue database.
Only queue will perform well if there are concurrent updates to the
head or tail, though. I'm going to assume that these are
single-threaded inserts.
The app collects data from a large number of sources. The sources will |
emit data per one or two seconds. The key is composed by source id and
timestamp.
| Quote: | What do your queries look like? Can you partition the data based on
when it arrives? That is, can you have a separate table for each hour,
for example? Otherwise, another issue you will face is that as the
database gets bigger and bigger, you will need to walk down more levels
to get to the leaf nodes (assuming we are talking about a btree or
recno database). That will also have an impact on performance.
Usually, client will ask the app to retrieve one source's historical |
data in a time span.
| Quote: | What about db_dump? Can i use it to recovery data after disaster
without transaction?
You may be able to salvage some data this way, but there are no
guarantees unless you use transactions.
No guarantee? Is that means the data in db files may lost at all? |
> Michael. |
|
| Back to top |
|
 |
Google
|
|
| Back to top |
|
 |
|
|
The time now is Sun Nov 23, 2008 2:53 pm | All times are GMT
|
|
Mobile Phones | Problem Mortgage | Record Internet Radio with Tags | Football Predictions | Electricity Suppliers
|
|
Copyright © 2004-2005 DeniX Solutions SRL
|
|
|
|
Other DeniX Solutions sites:
Unix/Linux blog |
electronics forum |
medicine forum |
science forum |
|
|
Privacy Policy
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|