The normal way of doing this would be to get the AST of your program and then simply search for the variable declarations you need. Gramars as suggested are a nice way of generating such AST.
But, if you need to analyse your program on the fly you can't use this option because your code might have parse errors. In this case I feel your pain...
Your only option is to parse your source code and regular expressions might help a bit.
First, I would begin with a regex similar to this:
(double|long|string|bool|object)\s*(\[\s*\])?\s+(YOUR_VARIABLE_TOKEN)
obs: YOUR_VARIABLE_TOKEN is missing because the variable has strong and defined rules about how it can be constructed for each language.
I didn't test this regex and it certainly isn't perfect. It was just to give you an idea.
Second, you would have to validate these matches with certain exception cases. For instance:
- The declaration might be inside a String literal :
"bool a;"
- The declaration might be inside a comment :
/* bool a; */
Also, this is not a very strange request. Eclipse does this kind of evaluation too in some cases like indenting.
This is not an easy task though, specially, finding those exception cases. Good Luck.
-------------------
What you are attempting to do is very difficult, if not impossible with regular expressions, especially as you have support for string constructs. What happens if I do this:
a = 'b = 3;';
I.e. in this case you would need to escape the string for your regular expression to work.
You really need to perform proper parsing of your code before you are going to be able to perform any meaningful analysis.
-------------------
I also doubt that regular expressions are suitable for this.
As Kragen has demonstrated, there are cases where regular expressions will match some piece of source code, but they will ignore the context in which that bit of source code appears. This can lead to errors. While it might be possible to write smarter regular expressions for such cases as Kragen showed, they will quickly become extremely complex and hard to read/maintain/understand, because they have to consider many different possible contexts.
I'd prefer writing a parser using a parser generator (such as Yacc or Bison). But depending on the language of your source code, that can also be quite tricky.
-------------------
What to find exactly?
Do you have to find only the literals (constants) or the whole declaration? It's ok to use expressions to find literals but its a little more complicated to parse the entire code
Give a chance to grammars
If you have to parse all the code... Dou you know grammar analyzers? When I studied 'language theory' we used grammars for parsing code. You can define a basic analizer with regex for the tokens (constants, reserved words, symbols, etc) and use a grammar analyzer for all the structure.
A Java option is JavaCC. There must be a .Net option.
Basically a grammar analyzer can parse complex structures (and have 'memory'). If a finite-state-automat is equivalent to a regex, a FSA with stack (it is memory) is equivalent to a grammar. It has more processing power.
-------------------
Given that this is .NET, you could consider using CodeDOM to get it parsed properly.
Use the pre-existing C# CodeDOM provider to get a structured representation of your source code by using the Parse method, then traverse it. This allows you to make a solution that can work for pretty much ANY .NET language.
Even though you said it had to be done before compilation, you might be able to use a temporary in-memory compilation, which you can then work with using reflection. The CodeDOM provider can also help you there.
-------------------
Don't use regular expressions. What you are doing is type deduction, and I'm guessing you're doing it for school. They will want you to learn another way, such as logic unification.
You're relying on all the types always being obviously different. What if 0 or 1 is assigned to a Boolean? Regular expressions aren't good at reducing input. Your program will, at best, result in a list of identifiers of each type. There are better approaches.
If you are in a commercial environment, your solution will be totally non-scalable, unmaintainable, unreliable, and slow to implement, as by your own admission, this isn't your strong suit.
숙제 만하는 것이 아니라면이 언어에 대한 파서에 액세스 할 수 있어야합니다. 그렇지 않은 경우 Bison과 같은 파서 생성기로 시작해야합니다.숙제를하고 있다면 책을 읽는 것이 좋습니다.
편집 :
Bison : vP로 무엇을해야하는지 말하는 것을 잊었습니다. 각 변수에 대한 데이터 구조가 있습니다. 가능한 유형 세트를 포함해야합니다. 말,
unsigned int
하나의 비트가 각 유형을 대표로,
enum type_bits { double_bit = 1, long_bit = 2, string_bit = 4, … };
. 모든 비트를 1로 설정하여 시작하십시오. 즉
type_map = (type_bits) -1;
. 각 작업이 발생할 때마다 호환되지 않는 비트를 가리십시오. 완료되면 몇 가지 비트가 설정됩니다. 둘 이상의 우선 순위 규칙을 적용하고없는 경우 오류를 생성합니다.-------------------
유용하다고 생각되는 몇 가지 아이디어가있을 수있는 모노 프로젝트 C # 컴파일러의 소스 코드를 살펴 보았습니까?svn co svn : //anonsvn.mono-project.com/source/trunk/mcs
출처
https://stackoverflow.com/questions/2005879