niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Programming » C++
Splitting a string into an array words
Post new topic   Reply to topic Page 1 of 1 [10 Posts] View previous topic :: View next topic
Author Message
Alex Vinokur
*nix forums Guru Wannabe


Joined: 23 Feb 2005
Posts: 160

PostPosted: Fri Jul 21, 2006 5:53 am    Post subject: Re: Splitting a string into an array words Reply with quote

Simon wrote:
Quote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings.
[snip]


See "Splitting string into vector of vectors":
http://groups.google.com/group/sources/msg/77993fb8841382c8
http://groups.google.com/group/perfo/msg/9d49a1be3a5c6335
http://groups.google.com/group/perfo/msg/f3c775cf7e3cdcf0


Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn
Back to top
Daniel T.
*nix forums Guru


Joined: 09 Mar 2005
Posts: 583

PostPosted: Fri Jul 21, 2006 4:54 am    Post subject: Re: Splitting a string into an array words Reply with quote

In article <VOVvg.128148$H71.111533@newssvr13.news.prodigy.com>,
Mark P <usenet@fall2005REMOVE.fastmailCAPS.fm> wrote:

Quote:

2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.

True. In my case, I pulled this function out of some actual code where
the return value is sometimes used as a check. E.g., when parsing a
particular file format, I expect a certain number of tokens per line.
It saves the calling function a line of code by having the size of out
returned automatically (and of course this fcn is called in multiple
places).

Here you go, now it returns the size. :-)

int tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )
{
int result = 0;
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos && start[0] != comment ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
++result;
start = str.find_first_not_of( delims, end );
}
return result;
}
Back to top
Daniel T.
*nix forums Guru


Joined: 09 Mar 2005
Posts: 583

PostPosted: Fri Jul 21, 2006 4:35 am    Post subject: Re: Splitting a string into an array words Reply with quote

In article <VOVvg.128148$H71.111533@newssvr13.news.prodigy.com>,
Mark P <usenet@fall2005REMOVE.fastmailCAPS.fm> wrote:

Quote:
template <typename OutIt
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_not_of( delims, end );
}
}

Looks good. In my case it was a bit more complicated because I also
have an additional parameter for a comment character. When a comment
character is encountered at the beginning of a token, that token is
discarded and the loop breaks. (So in my original implementation there
were multiple breakpoints out of the loop, although I hastily trimmed
these before I posted my code, thereby leaving some unattractive vestiges.)

In any event, I appreciate your comments and don't mean to simply make
excuses and argue all of your points.

No problem. Your code was rather good in general, I only saw a few nits
to pick at.

Quote:
The only significant hitch to my
adopting your cleaner implementation is that I really do need support
for the comment character break. Luckily this is just a bit of a little
file parser I use for testing, so I don't stress too much about these
details, but feel free to propose a svelte implementation that supports
a comment char. Smile

If I understand what you mean then:

void tokenize( const string& str, OutIt os, const string& delims = " ",
char comment = '\0' )
{
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos && start[0] != comment ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_not_of( delims, end );
}
}

Of course you should probably change the defaults to whatever is most
common in your code...
Back to top
Mark P
*nix forums Guru Wannabe


Joined: 24 Mar 2005
Posts: 296

PostPosted: Fri Jul 21, 2006 1:09 am    Post subject: Re: Splitting a string into an array words Reply with quote

Daniel T. wrote:
Quote:
In article <Jfwvg.10246$2v.1690@newssvr25.news.prodigy.net>,
Mark P <usenet@fall2005REMOVE.fastmailCAPS.fm> wrote:

Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

Well since criticisms are welcomed... :-)

// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_type wordStart = 0; // current word start position
string::size_type wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_not_of(delims,wordEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_of(delims,wordStart);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(in.substr(wordStart,wordEnd - wordStart));
}
return out.size();
}

From least important to most important:

1) The while true and break is not a style I prefer.

Fair enough-- I'm not a fan either, but see my comment to item 4.

Quote:

2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.

True. In my case, I pulled this function out of some actual code where
the return value is sometimes used as a check. E.g., when parsing a
particular file format, I expect a certain number of tokens per line.
It saves the calling function a line of code by having the size of out
returned automatically (and of course this fcn is called in multiple
places).

Quote:

3) It only works for vectors, I'd write something that works for deques
and lists as well.

Agreed, I very much prefer your templated approach that takes any Output
Iterator. In my case, using a known type allowed me to return the
container size (cf. item 2), but this is just my own particular
situation and at times excessive code parsimony.

Quote:

4) A cyclomatic complexity of 4 seems a tad excessive for what is
supposed to be such a simple job. You can drop that to 3 by removing
the unnecessary "if (wordEnd == in.npos)" logic. Heeding item (1)
above can reduce the complexity to 2.

Here's how I would write it:

template <typename OutIt
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_not_of( delims, end );
}
}

Looks good. In my case it was a bit more complicated because I also
have an additional parameter for a comment character. When a comment
character is encountered at the beginning of a token, that token is
discarded and the loop breaks. (So in my original implementation there
were multiple breakpoints out of the loop, although I hastily trimmed
these before I posted my code, thereby leaving some unattractive vestiges.)

In any event, I appreciate your comments and don't mean to simply make
excuses and argue all of your points. The only significant hitch to my
adopting your cleaner implementation is that I really do need support
for the comment character break. Luckily this is just a bit of a little
file parser I use for testing, so I don't stress too much about these
details, but feel free to propose a svelte implementation that supports
a comment char. :)

Mark
Back to top
Daniel T.
*nix forums Guru


Joined: 09 Mar 2005
Posts: 583

PostPosted: Thu Jul 20, 2006 12:45 am    Post subject: Re: Splitting a string into an array words Reply with quote

In article <Jfwvg.10246$2v.1690@newssvr25.news.prodigy.net>,
Mark P <usenet@fall2005REMOVE.fastmailCAPS.fm> wrote:

Quote:
Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

Well since criticisms are welcomed... :-)

Quote:
// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_type wordStart = 0; // current word start position
string::size_type wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_not_of(delims,wordEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_of(delims,wordStart);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(in.substr(wordStart,wordEnd - wordStart));
}
return out.size();
}

From least important to most important:

1) The while true and break is not a style I prefer.

2) Returning out.size() isn't very useful since the caller can find out
what out.size() equals without the functions help.

3) It only works for vectors, I'd write something that works for deques
and lists as well.

4) A cyclomatic complexity of 4 seems a tad excessive for what is
supposed to be such a simple job. You can drop that to 3 by removing
the unnecessary "if (wordEnd == in.npos)" logic. Heeding item (1)
above can reduce the complexity to 2.

Here's how I would write it:

template <typename OutIt>
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
string::size_type start = str.find_first_not_of( delims );
while ( start != string::npos ) {
string::size_type end = str.find_first_of( delims, start );
*os++ = str.substr( start, end - start );
start = str.find_first_not_of( delims, end );
}
}
Back to top
Mark P
*nix forums Guru Wannabe


Joined: 24 Mar 2005
Posts: 296

PostPosted: Wed Jul 19, 2006 8:04 pm    Post subject: Re: Splitting a string into an array words Reply with quote

Simon wrote:
Quote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

Thanks in advance,
Simon


Here's a little tokenizer fcn I've used before. Not necessarily the
most elegant or compact way to do this (and criticisms are welcomed):

// Populates "out" with delimited substrings of "in".
int tokenize (const string& in, vector<string>& out, const char* delims)
{
string::size_type wordStart = 0; // current word start position
string::size_type wordEnd = 0; // last word end position

while (true)
{
wordStart = in.find_first_not_of(delims,wordEnd);
if (wordStart == in.npos)
break;
wordEnd = in.find_first_of(delims,wordStart);
if (wordEnd == in.npos)
wordEnd = in.size();
out.push_back(in.substr(wordStart,wordEnd - wordStart));
}
return out.size();
}

Mark
Back to top
Rolf Magnus
*nix forums Guru


Joined: 21 Feb 2005
Posts: 1236

PostPosted: Wed Jul 19, 2006 5:27 pm    Post subject: Re: Splitting a string into an array words Reply with quote

Simon wrote:

Quote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings.

What do you mean by "flexible", and which separators do you want to use?

Quote:
For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

In this case, I'd use a stringstream and operator>>.
Back to top
Daniel T.
*nix forums Guru


Joined: 09 Mar 2005
Posts: 583

PostPosted: Wed Jul 19, 2006 5:27 pm    Post subject: Re: Splitting a string into an array words Reply with quote

In article <1153325861.909560.208950@m79g2000cwm.googlegroups.com>,
"Simon" <SimonGoldring@gmail.com> wrote:

Quote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

#include <vector>
#include <string>
#include <iostream>
#include <iterator>
// other includes as necessary

template < typename OutIt >
void split( const std::string& in, OutIt result )
{
// add code here...
}

int main() {
string seed = "step1";
vector<string> result;
split( seed, back_inserter( result ) );
assert( result.size() == 1 );
assert( result[0] == "step1" );
std::cout << "You did it! Good job!\n"
}

Run the above program. Make chances to the part labeled "add code here"
until the program compiles and prints out "You did it! Good job!".

When it does, post back here with the code and I'll help you with the
next step.
Back to top
Marcus Kwok
*nix forums Guru


Joined: 15 Sep 2005
Posts: 379

PostPosted: Wed Jul 19, 2006 5:25 pm    Post subject: Re: Splitting a string into an array words Reply with quote

Simon <SimonGoldring@gmail.com> wrote:
Quote:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

If you are splitting the words by whitespace, you could create a
std::istringstream and push them into a std::vector<std::string>.

Something like: (untested and uncompiled)

std::istringstream line(mostrecentline);
std::vector<std::string> words;
std::string temp;

while (line >> temp) {
words.push_back(temp);
}

You will need to #include <sstream>, <string>, and <vector> for this
method.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Back to top
Simon
*nix forums Guru Wannabe


Joined: 20 Feb 2005
Posts: 180

PostPosted: Wed Jul 19, 2006 4:17 pm    Post subject: Splitting a string into an array words Reply with quote

Well, the title's pretty descriptive; how would I be able to take a
line of input like this:

getline(cin,mostrecentline);

And split into an (flexible) array of strings. For example: "do this
action"
would go to:

item 0: do
item 1: this
item 2: action

Thanks in advance,
Simon
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [10 Posts] View previous topic :: View next topic
The time now is Thu Dec 04, 2008 2:13 am | All times are GMT
navigation Forum index » Programming » C++
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts FAQ 4.32 How do I strip blank space from the beginning/en... PerlFAQ Server Perl 0 Fri Jul 21, 2006 1:03 pm
No new posts FAQ 4.34 How do I extract selected columns from a string? PerlFAQ Server Perl 0 Fri Jul 21, 2006 7:03 am
No new posts Trouble Declaring 3D Array in Header File free2klim C++ 1 Fri Jul 21, 2006 4:07 am
No new posts number of words in a line Fred J. C++ 3 Fri Jul 21, 2006 3:52 am
No new posts determine pointer to point to array or single item during... yancheng.cheok@gmail.com C++ 5 Fri Jul 21, 2006 1:17 am

Cheap Loan | Loans | Calias Hotels | Home Loan | Loans
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 1.9639s ][ Queries: 20 (1.8385s) ][ GZIP on - Debug on ]