qrdn

quite random domain name

Feb 13, 2015

debugging ANTLR4 Lexer

grun MyLexer tokens -tokens < testfile

invokes the TestRig on the Lexer spilling out the tokens it recognized. Example stdout:

[@0,0:9='google.com',<4>,1:0]
[@1,10:10='\n',<2>,1:10]
[@2,11:11='\t',<1>,2:0]

Format of this output: A list of tokens, where each is:

[@tokenIndex,startIndex:endIndex="spelling",<tokenId>,?:?]

or (if not default channel)

[@tokenIndex,startIndex:endIndex="spelling",<tokenId>,channel=channelId,lineNo:columnNo]
  • tokenIndex - in the whole output, starting at 0
  • startIndex,endIndex - char/byte? in the input stream
  • spelling - the literal text
  • tokenId - can be found in the .tokens file
  • channelId - index of the channel(?)
  • lineNo,columnNo - line, column of the token start

Tip: append | column -t -s, | less to create a table delimited at , and increase readability (and pass through less for paging).

This does not output "sub-tokens", i.e. only the highest level, not the ones these are assembled from.