Skip to content Skip to sidebar Skip to footer

Using Regex To Pass Syntax-valid C++ Declaration/initialization

This is for a syntax checker. (yeah i know using regex is not ideal) The reader already detected that it is on the int|float|char|bool part and now it needs to check if the declara

Solution 1:

No regex in the world will be powerful enough to parse C++ declarations, for the very simple reason that the grammar is severely context-sensitive (and, in all likelihood, is actually undecidable).

For example, using the IsPrime template defined here, you can write a declaration like

int a = foo<IsPrime<234799>>::typen<1>();

which is syntactically valid if and only if 234799 is prime.

Consider using a different approach to validate C++ (e.g. g++ -fsyntax-only).


Solution 2:

As nneonneo mentioned, regex is not suitable for the task, but if you want to match the sample strings you have, you can use this:

^(?:\s*[A-Za-z_][A-Za-z0-9]*\s*(?:=\s*(?:[A-Za-z0-9]+(?:[+\/*-][A-Za-z0-9]+)?|"[^"]*"|'[^']*'))?\s*,)*\s*[A-Za-z_][A-Za-z0-9]*\s*(?:=\s*(?:[A-Za-z0-9]+(?:[+\/*-][A-Za-z0-9]+)?|"[^"]*"|'[^']*'))?\s*;

Couple of things I changed from your regex:

  • Changed [A-z] to [A-Za-z].

  • Put the =\s* 'outside' because it was quite repetitive.

  • Added square brackets to the bare 0-9. I believe it was meant to be a character class.

  • Added letters to the character class [0-9].

  • Changed all the [^] to [^"] and [^'] where appropriate. I'm not too sure what you were trying, but just in case.

  • Added the basic integer operators and digits (and letters for variables) following it (?:[+/*-][A-Za-z0-9]+)?.

  • Changed the * in the first chacter class after = to + to prevent immediate , after =.

regex101 demo.

EDIT:

^(?:\s*[A-Za-z_][A-Za-z0-9_]*\s*(?:=\s*(?:[A-Za-z0-9_]+(?:\s*[+\/*-]\s*[A-Za-z0-9_]+)*|[‌​0-9]+(?:\.[0-9]+)?(?:\s*[+\/*-]\s*[0-9]+(?:\.[0-9]+)?)+|"[^"]*"|'[^']*'))?\s*,)*\s*[A-Z‌​a-z_][A-Za-z0-9_]*\s*(?:=\s*(?:[A-Za-z0-9_]+(?:\s*[+\/*-]\s*[A-Za-z0-9_]+)*|[0-9]+(?:\.[0-‌​9]+)?(?:\s*[+\/*-]\s*[0-9]+(?:\.[0-9]+)?)+|"[^"]*"|'[^']*'))?\s*;$

Some more whitespaces allowed and allowed underscore in variable names.


Post a Comment for "Using Regex To Pass Syntax-valid C++ Declaration/initialization"