CPSC 421-020 Object-Oriented Programming (OOP) Spring 2025

Project 1: srcFacts Posted: Jan 23

One approach to understanding a system's source code is to count things, such as the number of statements, functions, classes, expressions, and LOC (Lines Of Code). The program srcfacts produces a report of these counts.

Writing code in a program to directly parse C++, i.e., identify the syntactic parts of the code, is very difficult. So, the srcfacts program does not parse the C++ source code. It uses another tool, srcML, to convert the C++ source code into an XML format. The srcfacts program inputs this XML format, parses the XML while collecting counts of parts of the input code, and produces the report. The input has to be in the srcML format, e.g., demo.xml, however it can be compressed or part of an archive, e.g., demo.xml.zip, linux.xml.gz.

The srcfacts.cpp program is one large, main() function using almost no design features. It is extremely fast; for example, it can produce a report on the entire Linux kernel (verison 6.13, 60,392 source-code files, an 4.7 GB file in the srcML format) in under 8 seconds. However, in its current form, it is not easy to debug the XML parsing, add additional program element counts, or even understand what is going on. It includes code for parsing all parts of XML, but it would not be easy to adapt the XML parsing code for another purpose, e.g., another report. The only way to reuse this code is by copy/paste reuse. Overall, it has the following:

Design Characteristic Level
Scalability Very High
Performance Very High
Portability Medium
Usability Medium
Modularity Very Low
Extensibility None
Reusability None
Maintainability None

Your task is to take the srcfacts code, improve the overall design, and extract the XML parsing part of the code. When completed, a set of functions in the files xml_parser.hpp and xml_parser.cpp will handle as much XML parsing as possible, while the main program, srcfacts.cpp, generates the report using the XML parser functions for parsing.

The steps (in order) with the associated (git) tag are:

Tag v1a: Without changing the design, add the code necessary for the report to include the number of return statements. First see how the other counts are collected, then use it as a guide. Do not make any other changes to the project. The heading in the table must be "Returns". Due Jan 28

Tag v1b: Move the refillContent() function into the separate refillContent.hpp and refillContent.cpp files. The build is already configured for these files. Due Jan 31

Tag v1c: Apply the specific design changes from the Coding Practices Task List (an issue in your repository). As you verify that the code follows the practices given, check them off. Once you have completed all of them, close the issue. Due Feb 4

Tag v1d: Extract a set of (free) functions to handle the low-level details of the XML parsing. Extract one function at a time, i.e., each commit can only include a single extracted function. Due TBA

Tag v1e: Add the necessary code to count the number of line comments and literal strings. The headings in the table must be "Line Comments" and "Strings". Due TBA

Your repository for this project is through GitHub Classroom, with a link on the Brightspace course page.

General Guidelines