Artifacts for Analysis
- Source Code The primary component for analysis.
- Design Documentation Includes UML and other design diagrams.
- Issue Tracking Systems Helps understand bugs, fixes, and features.
- Additional Artifacts For instance, requirement documents, test cases, etc.
Types of Code Analysis
- Static Program Analysis Analyzing software source code and related artifacts without executing them.
- Dynamic Program Analysis Evaluating the output or trace data when the program is executed.
Benefits of Static Analysis
- Code Search & Query Helps in quick codebase navigation.
- Metrics Extraction Offers insights about code quality, complexity, etc.
- Code Comprehension & Review Assists developers in understanding and reviewing the code.
- Reverse Engineering Deciphering how the software operates.
- Program Transformation Altering code structure while retaining functionality.
- Code Optimization Making software run more efficiently.
- Ensuring Program Correctness Helps identify bugs and ensure code correctness.
Source-Code Granularity/Levels
- Example
- Tokens Smallest elements like variables, keywords
- Statements Single lines of instructions
- Methods/Functions Collection of statements performing a specific task
- Classes Encapsulation of data and methods
- Individual Files Can contain multiple classes or methods
- Group of Files Collections of related files
- Complete Programs Entire software consisting of all files
Code-Level Approaches
- Regular Expressions
- lexical view, "Program is a stream of tokens"
- Abstract Syntax Tree (AST)
- Fully parsed syntax view
- Really an Abstract Syntax Graph (ASG)
Source Code is Messy
- Comments
- Literal values
- Preprocessor statements
- Code fragments
- Uncompilable code
- Incomplete set of files
Regular Expressions
- Example:
^((From|To)|Subject): ((?(2)\w+@\w+\.[a-z]+|.+))
grep
- Fast, faster, fastest
- API's in most languages
- Great for simple "parsing"
- Corresponds to lexical analysis (lexer): Characters into tokens
- Works with code of any kind
- Major disadvantage: Context, e.g., "if"
Abstract Syntax Tree

- Compiler view
- Better for more complex parsing
- Corresponds to the parser in compilers: Tokens into trees
- Understands syntax
- Answers questions that compilers need to ask
Abstract Syntax Tree: Disadvantages

- Compiler view
- No code fragments
- Uni-preprocessor view
- Cannot handle non-compilable code
- Takes a lot of space
- Slow, slow, slow