niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » Programming » python
readline tokenizer newline sticky wicket
Post new topic   Reply to topic Page 1 of 1 [1 Post] View previous topic :: View next topic
Author Message
Arthur
*nix forums beginner


Joined: 20 Feb 2005
Posts: 34

PostPosted: Tue Feb 07, 2006 2:55 am    Post subject: readline tokenizer newline sticky wicket Reply with quote

Given a "linemess.py" file with inconsistent line ending:

line 1 \r
\r\n
line \n

tokenized as per:

import tokenize
f=open('linemess.py','r')
tokens=tokenize.generate_tokens(f.readline)
for t in tokens:
print t

get output as follows:

(1, 'line', (1, 0), (1, 4), 'line 1\r\n')
(2, '1', (1, 5), (1, 6), 'line 1\r\n')
(4, '\r\n', (1, 6), (1, Cool, 'line 1\r\n')
(1, 'line', (2, 0), (2, 4), 'line 2\n')
(2, '2', (2, 5), (2, 6), 'line 2\n')
(4, '\n', (2, 6), (2, 7), 'line 2\n')
(0, u'', (3, 0), (3, 0), u'')

So that the Windows \r\n is tokenized as a single literal token rather
than as \n under the convention of universal newline support.

Isn't this a problem?

I think this must have been at the route of the issue I ran into when a
file of messy inconsistent line ending that nonetheless compiled and ran
without a problem was rejected by tokenizer.py as having an indent issue.

On the theory that if tokenizer needs to fail when crap is thrown at it,
it should do so more gracefully - is this bug reportable?

Art
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [1 Post] View previous topic :: View next topic
The time now is Thu Jan 08, 2009 11:56 am | All times are GMT
navigation Forum index » Programming » python
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Tokenizer Bit byte C++ 2 Fri Jul 21, 2006 2:40 am
No new posts Sed question (insert newline at specific points) Jim Cornwall shell 3 Thu Jul 20, 2006 2:48 pm
No new posts oneline file, need newline at every * dwcasey@gmail.com shell 6 Wed Jul 19, 2006 3:42 pm
No new posts Tokenizer Function (plus rant on strtok documentation) Robbie Hatley C++ 18 Tue Jul 11, 2006 3:09 am
No new posts Sticky form Franzl Wisseworst Perl 2 Thu Jul 06, 2006 5:36 pm

Six Figure Income | Tesco | Xecuter 3 Mod Chip | Brazilian Property | Debt Consolidation
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2034s ][ Queries: 16 (0.1433s) ][ GZIP on - Debug on ]