语法分析器生成器

概念

以文法文件作为 DSL,驱动语法分析器的构建。语法分析器生成器可以根据文法文件生成语法分析器。要更新语法分析器,只要更新文法并重新生成。

顾名思义,有一些成熟的语法分析器生成器可以帮助我们生成语法分析器代码,我们需要做的工作就是定义文法文件。

下面以 ANTLR 文法为例,只做概念了解即可

ANTLR 文法示例

待分析的文本

1
2
3
4
greetings.txt...
hello Rebecca
hello Neal
hello Ola

定义文法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Greetings.g...
grammar Greetings;

@header {
package helloAntlr;
}

@lexer::header {
package helloAntlr;
}

script : greeting* EOF;
greeting : 'hello' Name;

Name : ('a'..'z' | 'A'..'Z')+;

WS : (' ' |'\t' | '\r' | '\n')+ {skip();} ;
COMMENT : '#'(~'\n')* {skip();} ;
ILLEGAL : .;

构建语法分析器

即根据文法生成词法分析器和语法分析器相关的 Java 源文件代码。

以下是 Ant 的构建脚本,做了解即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<property name="dir.src" value="src"/>
<property name="dir.gen" value="gen"/>
<property name="dir.lib" value="lib"/>
<path id="path.antlr">
<fileset dir="${dir.lib}">
<include name="antlr*.jar"/>
<include name="stringtemplate*.jar"/>
</fileset>
</path>
<target name="gen">
<mkdir dir="${dir.gen}/helloAntlr"/>
<java classname="org.antlr.Tool" classpathref="path.antlr" fork="true" failonerror="true">
<arg value="-fo"/>
<arg value="${dir.gen}/helloAntlr"/>
<arg value="${dir.src}/helloAntlr/Greetings.g"/>
</java>
</target>

使用生成语法分析器

  1. 使用生成分析器代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class GreetingsLoader {
private Reader input;
public GreetingsLoader(Reader input) {
this.input = input;
}
public List<String> run() {
try {
GreetingsLexer lexer = new GreetingsLexer(new ANTLRReaderStream(input));
GreetingsParser parser = new GreetingsParser(new CommonTokenStream(lexer));
parser.script();
return guests;
} catch (IOException e) {
throw new RuntimeException(e);
} catch (RecognitionException e) {
throw new RuntimeException(e);
} }
private List<String> guests = new ArrayList<String>();
}

@Test
public void readsValidFile() throws Exception {
Reader input = new FileReader("src/helloAntlr/greetings.txt");
GreetingsLoader loader = new GreetingsLoader(input);
loader.run();
}
  1. 定义测试输入文件
1
2
3
4
invalid.txt...
hello Rebecca
XXhello Neal
hello Ola

为文法添加行为代码

使用委托的方式,覆写默认的错误处理函数 reportError

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Greetings.g...
@members {
GreetingsLoader helper;
public void reportError(RecognitionException e) {
helper.reportError(e);
}
}

class GreetingsLoader {
private List errors = new ArrayList();
void reportError(RecognitionException e) {
errors.add(e);
}
public boolean hasErrors() {return !isOk();}
public boolean isOk() {return errors.isEmpty();}
private String errorReport() {
if (isOk()) return "OK";
StringBuffer result = new StringBuffer("");
for (Object e : errors) result.append(e.toString()).append("\n");
return result.toString();
}

public void run() {
try {
GreetingsLexer lexer = new GreetingsLexer(new ANTLRReaderStream(input));
GreetingsParser parser = new GreetingsParser(new CommonTokenStream(lexer));
parser.helper = this;
parser.script();
if (hasErrors()) throw new RuntimeException("it all went pear-shaped\n" + errorReport());
} catch (IOException e) {
throw new RuntimeException(e);
} catch (RecognitionException e) {
throw new RuntimeException(e);
} }
}

使用钩子添加行为代码

即在文法中定义方法,但在手写超类中实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Greetings.g...
grammar Greetings;
options {superClass = BaseGreetingsParser;}
@header {
package subclass;
}
@lexer::header {
package subclass;
}
script : greeting * EOF;
greeting : 'hello' n=Name {recordGuest($n);};
Name : ('a'..'z' | 'A'..'Z')+;
WS : (' ' |'\t' | '\r' | '\n')+ {skip();} ;
COMMENT : '#'(~'\n')* {skip();} ;
ILLEGAL : .;

自定义子类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
abstract public class BaseGreetingsParser extends Parser {
public BaseGreetingsParser(TokenStream input) {
super(input);
}
//---- helper methods
void recordGuest(Token t) {guests.add(t.getText());}
List<String> getGuests() { return guests; }
private List<String> guests = new ArrayList<String>();
//-------- Error Handling -------------------------------
private List errors = new ArrayList();
public void reportError(RecognitionException e) {
errors.add(e);
}
public boolean hasErrors() {return !isOk();}
public boolean isOk() {return errors.isEmpty();}
}