正则表达式中的可疑字符¶

ID: go/suspicious-character-in-regex
Kind: path-problem
Security severity: 7.8
Severity: warning
Precision: high
Tags:
   - correctness
   - security
   - external/cwe/cwe-20
Query suites:
   - go-code-scanning.qls
   - go-security-extended.qls
   - go-security-and-quality.qls

点击查看 CodeQL 代码库中的查询

当字符串字面量或正则表达式字面量中的字符前面带有反斜杠时，它将被解释为转义序列的一部分。例如，字符串字面量中的转义序列 \n 对应于单个 换行符，而不是 \ 和 n 字符。有两个 Go 转义序列可能会产生意外的结果。首先，regexp.Compile("\a") 匹配响铃字符，而 regexp.Compile("\\A") 匹配文本的开头，regexp.Compile("\\a") 是一个 Vim（但不是 Go）正则表达式，匹配任何字母字符。其次，regexp.Compile("\b") 匹配退格符，而 regexp.Compile("\\b") 匹配单词的开头。将其中一个与另一个混淆可能会导致正则表达式的通过或失败频率远远超出预期，并可能导致安全问题。请注意，这与其他一些语言相比问题要小一些，因为在 Go 中，普通字符串（例如，s := "\k" 将无法编译，因为没有这样的转义序列）和正则表达式（例如，regexp.MustCompile("\\k") 将会报错，因为根据 Go 的正则表达式语法，\k 不引用字符类或其他特殊标记）都只接受有效的转义序列。

建议¶

确保在对字符串和正则表达式中的字符进行转义时，使用了正确数量的反斜杠。

示例¶

以下示例代码无法检查输入字符串中是否存在禁止使用的单词

package main

import "regexp"

func broken(hostNames []byte) string {
	var hostRe = regexp.MustCompile("\bforbidden.host.org")
	if hostRe.Match(hostNames) {
		return "Must not target forbidden.host.org"
	} else {
		// This will be reached even if hostNames is exactly "forbidden.host.org",
		// because the literal backspace is not matched
		return ""
	}
}

该检查不起作用，但可以通过对反斜杠进行转义来修复

package main

import "regexp"

func fixed(hostNames []byte) string {
	var hostRe = regexp.MustCompile(`\bforbidden.host.org`)
	if hostRe.Match(hostNames) {
		return "Must not target forbidden.host.org"
	} else {
		// hostNames definitely doesn't contain a word "forbidden.host.org", as "\\b"
		// is the start-of-word anchor, not a literal backspace.
		return ""
	}
}

或者，您可以使用反引号分隔的原始字符串字面量。例如，regexp.Compile(`hello\bworld`) 中的 \b 匹配单词边界，而不是退格符，因为在反引号内，\b 不是转义序列。

参考¶

golang.org: Regexp 包概述。
Google: RE2 接受的正则表达式语法。
常见弱点枚举: CWE-20。