niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Programming » PHP
Suggestion needed on data storage format in text file
Post new topic   Reply to topic Page 1 of 1 [12 Posts] View previous topic :: View next topic
Author Message
chernyshevsky@hotmail.com
*nix forums Guru


Joined: 09 Mar 2005
Posts: 871

PostPosted: Fri Jul 21, 2006 2:00 pm    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

Manish wrote:
Quote:
Surely. We will also suggest for the database.

Keep in mind that using a "database" doesn't necessarily imply a
full-blown, standalone RDBMS. An embedded database like SQLite or
Sleepycat would work very well in these types of situations.
Back to top
manish
*nix forums addict


Joined: 26 May 2005
Posts: 68

PostPosted: Fri Jul 21, 2006 6:56 am    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

Quote:
I don't wish to sound offensive, but if you can't correctly write to an
XML file without errors, why do you think you'll be able to do it to a
flat file using functions/methods you don't know?

Also, bear in mind if you use a database it will also handle locking from
multiple processes easily, which you will have to handle yourself in this situation.

Don't think "we'll only have one user accessing their account through a
single web instance so we won't have concurrency issues" - people these
days may use browser tabs to work on their mail concurrently.

And you really do run the risk of data loss/corruption if you don't
correctly lock access to the file.

It's definitely a serious issue. Opening same files concurrently, for
each tabbed browser and then update the content of index file will be
less effecient.

e.g. There can be >1000 messages, say 2 are unread, user reads 1
message, to update the status from read to unread for that message, we
have to update single byte positin for that message. It's critical from
performance (response to user) point of view. If we do it in database,
it will be much faster.

Quote:
If the problem is speed, a flat file isn't going to help you that much more. You'll
still have to encode and decode the data, no matter which format you use. And
even if it's faster now, all you're doing is delaying the inevitable. You definitely
need a database.

If it were me, I'd go back to them and explain why they need a database.
But I'm only a consultant...

Surely. We will also suggest for the database.

Quote:
XML and any text format is very inefficiency when updates/deletions are
frequent, as you have to rewrite the file everytime. For a mailbox,
that's unacceptable since the file size will likely be fairly large. A
suitable format requires a directory of sort storing the offsets of
records, so you can quicly seek to the them and modify them in place.

The mailbox file (.mbx) will be there. We will parse it and store only
some of the details (including mailbox file offset for that message) in
the index file. (.idx, .xml, and surely the best will be database)
Back to top
ImOk
*nix forums beginner


Joined: 07 Jul 2006
Posts: 29

PostPosted: Fri Jul 21, 2006 1:51 am    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

Agreed,

But I believe there are database engines whose natural format is XML..
It's probably fixed length.

Chung Leong wrote:
Quote:
ImOk wrote:
My suggestion is to use XML. PHP and Javascript has the Dom class that
supports this format very well. Its also easily extensible. And best of
all it's a text file.

XML and any text format is very inefficiency when updates/deletions are
frequent, as you have to rewrite the file everytime. For a mailbox,
that's unacceptable since the file size will likely be fairly large. A
suitable format requires a directory of sort storing the offsets of
records, so you can quicly seek to the them and modify them in place.
Whatever you come up with it'll end up resembling a database. So why
not just use what's there already?
Back to top
chernyshevsky@hotmail.com
*nix forums Guru


Joined: 09 Mar 2005
Posts: 871

PostPosted: Thu Jul 20, 2006 8:58 pm    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

ImOk wrote:
Quote:
My suggestion is to use XML. PHP and Javascript has the Dom class that
supports this format very well. Its also easily extensible. And best of
all it's a text file.

XML and any text format is very inefficiency when updates/deletions are
frequent, as you have to rewrite the file everytime. For a mailbox,
that's unacceptable since the file size will likely be fairly large. A
suitable format requires a directory of sort storing the offsets of
records, so you can quicly seek to the them and modify them in place.
Whatever you come up with it'll end up resembling a database. So why
not just use what's there already?
Back to top
Jerry Stuckle
*nix forums Guru


Joined: 24 Feb 2005
Posts: 1515

PostPosted: Thu Jul 20, 2006 12:27 pm    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

Manish wrote:
Quote:
Hi Jerry Stuckle, the project specifies not to use database, otherwise
it would have been definitely much easier. I have to store all the
information in the file itself. Thanks for bringing into atention that
whatever, seperator with least probbability of occurence is chosen, it
can occur in subject line. May be we should use some escape character
for it. As it is used in mailbox file. Every new mail starts with "From
", but if it's in the message itself, it's replaced by ">From ". I will
also look into the CSV format for storing the data.


Hi Andy Jeffries, we are using PHP 5, so sprintf/fprintf can be used. I
haven't come across using pointers in PHP. I will definitely try to
learn it.


Hi ImOk, our initial datastructure was in the XML format itself,
(individual XML file for every user). As there can be thousands of
email, the file will grew larger and reading/writing may be slow/error
prone. So it was suggested to use text file.

-----------------------------------------------------------------------------------------------------------------------------
This is how the datastructure is
-----------------------------------------------------------------------------------------------------------------------------
snip
-----------------------------------------------------------------------------------------------------------------------------

But the other setting will still be in XML file.

We are using SimpleXML functions (get values, update values), DOM
(insert). Still the delete functionality is not working. We are
thinking of implementing preg_replace() for it.

Thanks.

Manish


Manish,

If the problem is speed, a flat file isn't going to help you that much
more. You'll still have to encode and decode the data, no matter which
format you use. And even if it's faster now, all you're doing is
delaying the inevitable. You definitely need a database.

If it were me, I'd go back to them and explain why they need a database.
But I'm only a consultant...

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Back to top
Andy Jeffries
*nix forums Guru Wannabe


Joined: 15 Apr 2005
Posts: 208

PostPosted: Thu Jul 20, 2006 10:10 am    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

On Wed, 19 Jul 2006 21:07:06 -0700, Manish wrote:
Quote:
Hi Andy Jeffries, we are using PHP 5, so sprintf/fprintf can be used. I
haven't come across using pointers in PHP. I will definitely try to learn
it.

It's not pointers but string parsing (getting out a section of a string
and formatting a string to contain exact lengths of string).

Quote:
Hi ImOk, our initial datastructure was in the XML format itself,
(individual XML file for every user). As there can be thousands of email,
the file will grew larger and reading/writing may be slow/error prone. So
it was suggested to use text file.

I don't wish to sound offensive, but if you can't correctly write to an
XML file without errors, why do you think you'll be able to do it to a
flat file using functions/methods you don't know?

Also, bear in mind if you use a database it will also handle locking from
multiple processes easily, which you will have to handle yourself in this
situation.

Don't think "we'll only have one user accessing their account through a
single web instance so we won't have concurrency issues" - people these
days may use browser tabs to work on their mail concurrently.

And you really do run the risk of data loss/corruption if you don't
correctly lock access to the file.

Cheers,



Andy



--
Andy Jeffries MBCS CITP ZCE | gPHPEdit Lead Developer
http://www.gphpedit.org | PHP editor for Gnome 2
http://www.andyjeffries.co.uk | Personal site and photos
Back to top
manish
*nix forums addict


Joined: 26 May 2005
Posts: 68

PostPosted: Thu Jul 20, 2006 4:07 am    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

Hi Jerry Stuckle, the project specifies not to use database, otherwise
it would have been definitely much easier. I have to store all the
information in the file itself. Thanks for bringing into atention that
whatever, seperator with least probbability of occurence is chosen, it
can occur in subject line. May be we should use some escape character
for it. As it is used in mailbox file. Every new mail starts with "From
", but if it's in the message itself, it's replaced by ">From ". I will
also look into the CSV format for storing the data.


Hi Andy Jeffries, we are using PHP 5, so sprintf/fprintf can be used. I
haven't come across using pointers in PHP. I will definitely try to
learn it.


Hi ImOk, our initial datastructure was in the XML format itself,
(individual XML file for every user). As there can be thousands of
email, the file will grew larger and reading/writing may be slow/error
prone. So it was suggested to use text file.

-----------------------------------------------------------------------------------------------------------------------------
This is how the datastructure is
-----------------------------------------------------------------------------------------------------------------------------

<mails>
<details id="">
<!-- Mail type (incoming, outgoing) -->
<mailtype></mailtype>
<!-- Whether the message is saved as templete (Yes: 1, No: 0) -->
<istemplate></istemplate>
<!-- The mailbox id in which the mail reside (id for Inbox, Personal
Folders, Trash ... ) -->
<mailboxid></mailboxid>
<!-- Message Priority (Normal:1, High Priority: 2) -->
<priority></priority>
<!-- Is message starred (Yes: 1, No: 0) -->
<isstarred></isstarred>
<!-- Is message read (Yes: 1, No: 0) -->
<isread></isread>
<!-- Is message replied back to sender (Yes: 1, No: 0) -->
<isreplied></isreplied>
<!-- Is message forwarded to any email (Yes: 1, No: 0) -->
<isforwarded></isforwarded>

<!-- Does message has attachment (Yes: 1, No: 0) -->
<hasattachment></hasattachment>
<!-- Attachment details -->
<attachments>
<attdetails id="">
<!-- Attachment file name -->
<filename></filename>
<!-- Attachment file size -->
<filesize></filesize>
</attdetails>
</attachments>


<!-- Sender name -->
<fromname></fromname>
<!-- Sender email -->
<fromemail></fromemail>
<!-- Total email conversation (1, 2, ... ) -->
<totalconversation></totalconversation>
<!-- Main Email detail id (sno), from which the conversation started
-->
<mainemailsno></mainemailsno>
<!-- Emails in To field -->
<toemails></toemails>
<!-- Emails in CC field -->
<ccemails></ccemails>

<!-- Mail content in HTML format -->
<htmlcontent></htmlcontent>
<!-- Mail content in Text format -->
<textcontent></textcontent>
<!-- Date time when the message was sent -->
<sentdatetime></sentdatetime>
<!-- Message size in KB -->
<messagesize></messagesize>

<!-- Offset in mbx file -->
<offsetinmbx></offsetinmbx>

<!-- Extra details for incoming/outgoing type emails -->
<incomingdetails>
</incomingdetails>
<outgoingdetails>
<!-- Emails in CC field -->
<bccemails></bccemails>
<!-- Message Status (sent, pending) -->
<msgstatus></msgstatus>
<!-- Id of the signature to be appended to the message -->
<signatureid></signatureid>
<!-- Scheduled date time (24 hour format) for sending the mail to
recepients (MM/DD/YYY hh:mm) -->
<scheduledtime></scheduledtime>
<!-- Whether to request a return receipt (Yes: 1, No: 0) -->
<requestreceipt></requestreceipt>
<!-- Message send status (pending, sent) -->
<sendstatus></sendstatus>
</outgoingdetails>


</details>

</mails>

-----------------------------------------------------------------------------------------------------------------------------

But the other setting will still be in XML file.

We are using SimpleXML functions (get values, update values), DOM
(insert). Still the delete functionality is not working. We are
thinking of implementing preg_replace() for it.

Thanks.

Manish
Back to top
ImOk
*nix forums beginner


Joined: 07 Jul 2006
Posts: 29

PostPosted: Thu Jul 20, 2006 1:25 am    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

My suggestion is to use XML. PHP and Javascript has the Dom class that
supports this format very well. Its also easily extensible. And best of
all it's a text file.

Sample:

<mailbox name="some user">
<email>
<id>1234</id>
<subject>Send me the check<subject>
<to>nospam@nospam.com</to>
<from>someone@someone.com</from>
<message><![CDATA[blah blah blah blah blah
blah blah blah]]></message>
<attach>path to attach 1</attach>
<attach>path to attach 2</attach>
</email>
<email>
<id>5678</id>
<subject>Send me the check<subject>
<to>nospam@nospam.com</to>
<from>someone@someone.com</from>
<message><![cdata[blah blah blah ]]></message>
<attach>path to attach 1</attach>
<attach>path to attach 2</attach>
</email>
....etc...
</mailbox>
<mailbox name="some other user">
....
</mailbox>

Chung Leong wrote:
Quote:
Manish wrote:
The project I am developing doesn't involves database. I want to parse
the mailbox file (.mbx) and store the summary in the text file for fast
retrieval and display of information in the Inbox page.

The sugegsted format are as:

#1

ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From
Address[100 bytes]...etc...

#2

Instead of preassining fixed size to variable (as actual data may be
much less or can grew to more), we can store the values continuously,
seperated by some unique seperator (#|#, *#*, ...)

1324#|#Hi, How are you#|#me@google.com#|#you@google.com#|# ... and so
on


Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

Thanks.

Manish

That's the kind of project that SQLite was designed for. It's worth
looking into.
Back to top
chernyshevsky@hotmail.com
*nix forums Guru


Joined: 09 Mar 2005
Posts: 871

PostPosted: Wed Jul 19, 2006 8:33 pm    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

Manish wrote:
Quote:
The project I am developing doesn't involves database. I want to parse
the mailbox file (.mbx) and store the summary in the text file for fast
retrieval and display of information in the Inbox page.

The sugegsted format are as:

#1

ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From
Address[100 bytes]...etc...

#2

Instead of preassining fixed size to variable (as actual data may be
much less or can grew to more), we can store the values continuously,
seperated by some unique seperator (#|#, *#*, ...)

1324#|#Hi, How are you#|#me@google.com#|#you@google.com#|# ... and so
on


Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

Thanks.

Manish

That's the kind of project that SQLite was designed for. It's worth
looking into.
Back to top
Andy Jeffries
*nix forums Guru Wannabe


Joined: 15 Apr 2005
Posts: 208

PostPosted: Wed Jul 19, 2006 11:07 am    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

On Tue, 18 Jul 2006 21:28:13 -0700, Manish wrote:
Quote:
#1
ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From Address[100
bytes]...etc...

#2
1324#|#Hi, How are you#|#me@google.com#|#you@google.com#|# ... and so on


Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

The first one will be more efficient from a search/replace point of view,
the second will be more efficient from a space usage point of view.
Efficiency is subjective.

Quote:
Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

substr would be used to cut out various portions of the string (e.g. 100
charactes starting at position 4) and sprintf (or fprintf to do it in PHP5
if you're using PHP5 to save a step).

If you need more than a pointer to the right functions, then it's starting
to sound like a homework assignment and I wish you luck with it...

Cheers,


Andy


--
Andy Jeffries MBCS CITP ZCE | gPHPEdit Lead Developer
http://www.gphpedit.org | PHP editor for Gnome 2
http://www.andyjeffries.co.uk | Personal site and photos
Back to top
Jerry Stuckle
*nix forums Guru


Joined: 24 Feb 2005
Posts: 1515

PostPosted: Wed Jul 19, 2006 11:03 am    Post subject: Re: Suggestion needed on data storage format in text file Reply with quote

Manish wrote:
Quote:
The project I am developing doesn't involves database. I want to parse
the mailbox file (.mbx) and store the summary in the text file for fast
retrieval and display of information in the Inbox page.

The sugegsted format are as:

#1

ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From
Address[100 bytes]...etc...

#2

Instead of preassining fixed size to variable (as actual data may be
much less or can grew to more), we can store the values continuously,
seperated by some unique seperator (#|#, *#*, ...)

1324#|#Hi, How are you#|#me@google.com#|#you@google.com#|# ... and so
on


Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

Thanks.

Manish


Personally, I'd use a database. I wouldn't even try a flat file for
this. Too much work trying to keep things straight.

But you asked about the formats. The fixed length fields will have
extra space any time the amount of data is less than that of the amount
reserved. Then you run into the problem of someone who gets very
verbose with their subject line and exceeds the 100 characters. And 4
bytes allows up to 9999 ID's. Is that enough? Or are you going to try
to read/write binary (not easy in PHP)?

The second one is problematical because the user may include your
separator in its Subject: line (or even name/address if you pick the
wrong character).

Two other ways - use CSV format, which is well documented and supported
by PHP and other programs. Or, add a length field at the beginning of
each field, specifying how many characters in the following field.

But I'd still use a database.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Back to top
manish
*nix forums addict


Joined: 26 May 2005
Posts: 68

PostPosted: Wed Jul 19, 2006 4:28 am    Post subject: Suggestion needed on data storage format in text file Reply with quote

The project I am developing doesn't involves database. I want to parse
the mailbox file (.mbx) and store the summary in the text file for fast
retrieval and display of information in the Inbox page.

The sugegsted format are as:

#1

ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From
Address[100 bytes]...etc...

#2

Instead of preassining fixed size to variable (as actual data may be
much less or can grew to more), we can store the values continuously,
seperated by some unique seperator (#|#, *#*, ...)

1324#|#Hi, How are you#|#me@google.com#|#you@google.com#|# ... and so
on


Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

Thanks.

Manish
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [12 Posts] View previous topic :: View next topic
The time now is Fri Nov 21, 2008 7:51 pm | All times are GMT
navigation Forum index » Programming » PHP
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts AIX HACMP & NFS sharing same mountpoint from SAN storage chongkls AIX 0 Fri Aug 31, 2007 9:03 am
No new posts mailq output format bacilko1 Postfix 0 Tue Oct 03, 2006 12:34 pm
No new posts Oracle Text Score Computation jatinder.1975@gmail.com Server 0 Fri Jul 21, 2006 1:00 pm
No new posts Running php file everyday on scheduled time sachin PHP 1 Fri Jul 21, 2006 12:49 pm
No new posts Bug#379103: ITP: complearn-gui -- 3D drag-and-drop interf... Rudi Cilibrasi devel 0 Fri Jul 21, 2006 11:00 am

Free Cingular Ringtones | Loans | Free Credit Report | Loans | Ringtone
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.5201s ][ Queries: 20 (0.3822s) ][ GZIP on - Debug on ]