ANTLR 4解析二进制文件

1 二进制文件

解析二进制文件和解析文本文件区别在于,字符不是16位无符号数。

可以使用十六进制低位匹配,如匹配2字节的符号,可以设置高位为0。如:

1
MARKER : '\u00CA' '\u00FE' ;
1
2
3
4
5
6
7
8
grammar IP;

file : ip+ (MARKER ip)* ;

ip : BYTE BYTE BYTE BYTE ;

MARKER : '\u00CA' '\u00FE' ;
BYTE : '\u0000'..'\u00FF' ;

2 二进制流

词法器不关心输入的字符是字节还是Unicode字符。

以下示例写入二进制文件ips:

1
2
3
4
5
6
7
8
9
10
11
public class WriteBinaryFile {
public static final byte[] bytes = {
(byte)172, 0, 0, 1, (byte)0xCA, (byte)0xFE,
(byte)10, 10, 10, 1, (byte)0xCA, (byte)0xFE,
(byte)10, 10, 10, 99
};

public static void main(String[] args) throws IOException {
Files.write(new File("/tmp/ips").toPath(), bytes);
}
}

创建对应8为LATIN-1表示的流:

1
CharStream bytesAsChar = CharStreams.fromFileName("/tmp/ips", StandardCharsets.ISO_8859_1);

解析流:

1
2
3
4
5
6
7
8
//ANTLRFileStream bytesAsChar = new ANTLRFileStream("/tmp/ips", "ISO-8859-1"); DEPRECATED in 4.7
CharStream bytesAsChar = CharStreams.fromFileName("/tmp/ips", StandardCharsets.ISO_8859_1);
IPLexer lexer = new IPLexer(bytesAsChar);
CommonTokenStream tokens = new CommonTokenStream(lexer);
IPParser parser = new IPParser(tokens);
ParseTree tree = parser.file();
IPBaseListener listener = new MyIPListener();
ParseTreeWalker.DEFAULT.walk(listener, tree);

监听器:

1
2
3
4
5
6
7
8
9
10
11
12
class MyIPListener extends IPBaseListener {
@Override
public void exitIp(IPParser.IpContext ctx) {
List<TerminalNode> octets = ctx.BYTE();
short[] ip = new short[4];
for (int i = 0; i<octets.size(); i++) {
String oneCharStringHoldingOctet = octets.get(i).getText();
ip[i] = (short)oneCharStringHoldingOctet.charAt(0);
}
System.out.println(Arrays.toString(ip));
}
}

3 自定义流

版本4.7已标记弃用ANTLRFileStream

示例:改变记号的输出文本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/** make a stream treating file as full of single unsigned byte characters */
class BinaryANTLRFileStream extends ANTLRFileStream {
public BinaryANTLRFileStream(String fileName) throws IOException {
super(fileName, "ISO-8859-1");
}

/** Print the decimal value rather than treat as char */
@Override
public String getText(Interval interval) {
StringBuilder buf = new StringBuilder();
int start = interval.a;
int stop = interval.b;
if(stop >= this.n) {
stop = this.n - 1;
}

for (int i = start; i<=stop; i++) {
int v = data[i];
buf.append(v);
}
return buf.toString();
}
}

使用示例:

1
2
3
ANTLRFileStream bytesAsChar = new BinaryANTLRFileStream("/tmp/ips");
IPLexer lexer = new IPLexer(bytesAsChar);
...

监听器示例:

1
2
3
4
5
6
7
class MyIPListenerCustomStream extends IPBaseListener {
@Override
public void exitIp(IPParser.IpContext ctx) {
List<TerminalNode> octets = ctx.BYTE();
System.out.println(octets);
}
}

输出示例:

1
2
3
[172(0xAC), 0(0x0), 0(0x0), 1(0x1)]
[10(0xA), 10(0xA), 10(0xA), 1(0x1)]
[10(0xA), 10(0xA), 10(0xA), 99(0x63)]

4 二进制文件异常处理

以下示例首个IP缺少了一个0:

1
2
3
4
5
public static final byte[] bytes = {
(byte)172, 0, 1, (byte)0xCA, (byte)0xFE, // OOOPS
(byte)10, 10, 10, 1, (byte)0xCA, (byte)0xFE,
(byte)10, 10, 10, 99
};

输出如下:

1
2
3
4
5
line 1:4 extraneous input '.' expecting BYTE
line 1:6 mismatched input 'Êþ' expecting '.'
[172, 0, 1, 0]
[10, 10, 10, 1]
[10, 10, 10, 99]

其中,Êþ代表了(byte)0xCA, (byte)0xFE。

参考资料