Code Analysis

Michael L. Collard, Ph.D.

Department of Computer Science, The University of Akron

Artifacts

  • UML and design documents
  • Issue-tracking
  • Others?
  • Code

Code Analysis

  • static program analysis
  • Analysis of software source code and other artifacts

  • dynamic program analysis
  • Analysis of output/trace data from running programs

Uses for Static Analysis

  • Search/Query
  • Metrics
  • Support Program Understanding/Comprehension, Code Review
  • Reverse Engineering
  • Program Transformation
  • Program Optimization
  • Program Correctness

Source Code Granularity/Levels

  • Example
  • Tokens
  • Statements
  • Methods/functions
  • Classes
  • Files
  • Set of files
  • Complete programs - all files

Code Level Approaches

  • regular expressions
  • lexical view, "Program is a stream of tokens"
  • Abstract Syntax Tree (AST)
  • fully-parsed syntax view
  • Really an Abstract Syntax Graph (ASG)

Source Code is Messy

  • Comments
  • Literal values
  • Preprocessor statements
  • Code fragments
  • Uncompilable code
  • Incomplete set of files

Regular Expressions

  • Example: ^((From|To)|Subject): ((?(2)\w+@\w+\.[a-z]+|.+))
  • grep
  • Fast, faster, fastest
  • API's in most languages
  • Great for simple parsing
  • Corresponds to lexical analysis (lexer): Characters into tokens
  • Major disadvantage: Context, e.g., "if"

Abstract Syntax Tree

Abstract Syntax Tree

  • Compiler view
  • Better for more complex parsing
  • Corresponds to the parser in compilers: Tokens into trees
  • Understands syntax
  • Answers questions that compilers need to ask

Abstract Syntax Tree: Disadvantages

  • Compiler view
  • No code fragments
  • Uni-preprocessor view
  • Cannot handle non-compilable code
  • Takes a lot of space
  • Slow, slow, slow

Alternative: Document View