User Guide for CCCCounter |
CCCCounter is a tool for the analysis of source code in various languages (primarily C++), which generates a report in HTML format on various measurements of the code processed. Although the tool was originally implemented to process C++ and ANSI C, the present version is also able to process Java source files, and support has been present in earlier versions for Ada95. The name CCCCounter stands for 'C and C++ Code Counter'.
Measurements of source code of this kind are generally referred to as 'software metrics', or more precisely 'software product metrics' (as the term 'software metrics` also covers measurements of the software process, which are called 'software process metrics'). There is a reasonable consensus among modern opinion leaders in the software engineering field that measurement of some kind is probably a Good Thing, although there is less consensus on what is worth measuring and what the measurements mean.
CCCCounter has been developed as freeware, and is released in source code form. Users are encouraged to compile the program themselves, and to modify the source to reflect their preferences and interests.
The simplest way of using CCCCounter is just to run it with the names of a selection of files on the command line like this:
cccc my_types.h big.h small.h *.cc
Alternatively, the for a complex hierarchy, the user could enter a command like this:
find . | cccc - (on Unix family platforms)
or
dir /b/s | cccc - (on DOS/Windows family platforms)
CCCCounter will process each of the files specified on the command line (using standard wildcard processing were appropriate), or, if the '-' option is specified in the standard input stream. For each file, named, CCCCounter will examine the extension of the filename, and if the extension is recognized as indicating a supported language, the appropriate parser will run on the file. As each file is parsed, recognition of certain constructs will cause records to be written into an internal database. When all files have been processed, a report on the contents of the internal database will be generated in HTML format. By default the main summary HTML report is generated to the file cccc.htm in a subdirectory called .cccc of the the current working directory, with detailed reports on each module (i.e. C++ or Java class) identified by the analysis run.
In addition to the summary and detailed HTML reports, the run will cause generation of corresponding summary and detailed reports in XML format, and a further file called cccc.db to be created. cccc.db will contain a dump of the internal database of the program in a format delimited with the character '@' (chosen because it is one of the few characters which cannot legally appear in C/C++ non-comment source code).
The report contains a number of tables identifying the modules in the files submitted and covering:
measures of the procedural volume and complexity of each module and its functions;
measures of the number and type of the relationships each module is a party to either as a client or a supplier;
identification of any parts of the source code submitted which the program failed to parse; and
a summary report over the whole body of code processed of the measures identified above.
Some of the data presented in the report may be displayed in an emphasized form (either with a bold or italic font, or with a red or yellow background). These are items which have been identified as lying outside ranges which have been laid down as desirable for the particular items. A bold font or red background indicates a value which exceeds a threshold defined as being dangerous for that measure, while italic fonts and yellow backgrounds indicate values below the danger threshold but still above a second lower threshold which has been laid down to indicate cause for concern. The two thresholds are configurable by the user of the tool: see the section below on configuring metric treatment for more details.
The report generated by CCCCounter normally consists of six tables plus a table of contents at the beginning and some informational material about CCCCounter itself at the end.
|
Table name |
Description |
|
Project Summary |
This table presents summary values of various measures over the body of source code submitted. |
|
Procedural Summary |
This table presents values of procedural measures summed for each module identified in the code submitted. |
|
Procedural Details |
This table presents values of the same procedural measures covered in the procedural summary report, but this time broken down within each module into the contributions of each member function of the module. |
|
Structural Summary |
This table presents counts of fan-in and fan-out relationships to each module identified, and a derived metric called the Henry/Kafura/Shepperd measure, which is calculated as the square of the product of the fan-in and fan-out counts. |
|
Structural Details |
This table presents lists of the modules contributing to the relationship counts reported in the structural summary. |
|
Rejected Extents |
This table presents a list of code regions which the analyser was unable to parse. |
|
Tag |
Metric Name |
Description |
|
LOC |
Lines of Code |
This metric counts the lines of non-blank, non-comment source code in a function (LOCf), module (LOCm), or project (LOCp). LOC was one of the earliest metrics to come into use (principally because it is straightforward to measure). It has an obvious relation to the size or complexity of a piece of code, and can be calibrated for use in prediction of maintenance effort, although concern has been expressed that use of this metric as a measure of programmer productivity may tend to encourage verbose programming practises and discourage desirable simplification. |
|
MVG |
McCabe's Cyclomatic Complexity |
A measure of a body of code based on analysis of the cyclomatic complexity of the directed acyclic graph which represents the flow of control within each function. First proposed as a measure of the minimum number of test cases to ensure all parts of each function are exercised, it is now widely accepted as a measure for the detection of code which is likely to be error-prone and/or difficult to maintain. |
|
COM |
Comment Lines |
A crude measure comparable to LOC of the extent of commenting within a region of code. Not very meaningful in isolation, but sometimes used in ratio with LOC or MVG to ensure that comments are distributed proportionately to the bulk or complexity of a region of code. |
|
L_C,M_C |
LOC/COM, MVG/COM |
See above |
|
FO,FOc,FOv |
Fan-out, Fan-in |
For a given module A, the fan-out is the number of other modules which the module A uses, while the fan-in is the number of other modules which use A. |
|
HKS, HKSv, HKSc |
Henry-Kafura/Shepperd measure |
This metric is derived by squaring the product of the fan-in and fan-out of each module. The original Henry-Kafura measure, which has been described as a measure of 'information flow complexity' includes a term for the length of the module under consideration, but CCCCounter uses the measure as modified by Shepperd, which omits this term on the basis that it debases the measure by combining two attributes which can and should be separately measured. |
|
NOM |
Number of modules |
Number of modules identified in the project. See discussion below about what constitutes a module. |
|
WMC |
Weighted methods per class |
This measure, proposed by Chidamber and Kemerer, is a count of the number of functions defined in a module multiplied by a weighting factor. The only weighting algorithm suggested in the original formulation is a uniform weighting of one unit per function. |
|
REJ |
Rejected lines |
This is a measure of the number of non-blank non-comment lines of code which was not successfully analysed by the parser. This is more of a validity check on the report generated than a metric of the code submitted: if the amount of code rejected was more than a small fraction (say 10%) of the total code processed, the meaningfulness of the numbers generated by the run must be in doubt. |
CCCCounter implements simple algorithms to calculate each of the measures presented. The algorithms are intended to present a useful approximation to the underlying quantities, rather than meticulously exact counting: in general agreement with manual counts based on the same definitions should agree with CCCCounter to within 2-3%. If larger discrepancies are discovered, or if this level of agreement is not considered adequate, users are welcome to modify the source code to implement closer agreement, or to change the counting behaviour to reflect a desired basis of calculation. The basic definitions of each count are as follows:
Number of Modules (NOM)
CCCCounter defines modules in terms of a grouping of member functions: C++ classes and namespaces, Java classes and interfaces and Ada packages are all defined as modules. In addition to this, functions which are not members of one of these structures are treated as belonging to an anonymous module, which is counted if it has member functions.
Lines of Code (LOC)
This count follows the industry standard of counting non-blank, non-comment lines of source code. Preprocessor lines are treated as blank. Class and function declarations are counted, but declarations of global data are ignored. There may be some double counting of lines in class definitions as the algorithm treats the total over a module as the sum of lines belonging to the module itself and lines belonging to its member functions (the declarations and definitions of member functions in the body of the class definition will contribute to both counts).
Comment Lines (COM)
Any line which contains any part of a comment for the language concerned is treated as a comment by CCCCounter, and is allocated to the module and member of the next following 'real' token. This ensures that leading comments are treated as part of the function or class definition which follows them. There is one exception to this rule: the Rational Rose design tool is known to insert directives into source files disguised as C++ comments with the prefix '//##'. These are not counted as comments, however all other comment structures are, even if there is no content within the comment delimiters, or the content is commented-out source code.
McCabe's Cyclomatic Complexity(MVG)
The formal definition of cyclomatic complexity is that it is the count of linearly independent paths through a flow of control graph derived from a subprogram. A pragmatic approximation to this can be found by counting language keywords and operators which introduce extra decision outcomes. This can be shown to be quite accurate in most cases. In the case of C++, the count is incremented for each of the following tokens: 'if','while','for','switch','break','&&','||'
Note that the boolean operations introduce extra paths through the code because the second operand may or may not be evaluated according to the value of the first operand. Note also that the treatment of switch statements is problematic: it is quite common for multiple 'case' labels to be attached to the same block of code, so counting these might overstate the value. Counting the 'break' tokens instead is better so long as their are no case labels in the middle of the block of code which the break terminates. The motive for counting the 'switch' token is to provide for the default case, which gives rise to a path whether or not the programmer defines a default label. Counting the break token in this way may distort the count where it is used in other contexts (i.e. to exit from a block).
Weighted methods per class (WMC)
This is a count of the member functions known to exist in a class. Knowledge of existence of a function is only gained from declarations or definitions directly contained in files processed by CCCCounter: files included by a preprocessor are ignored, and CCCCounter does not at present identify invocations of member or non-member functions within procedural code.
Fan-In, Fan-Out (FI,FIc,FIv,FO,FOc,FOv)
Traditionally, use relationships between modules were identified by counting function invocations or access to module data in procedural code. CCCCounter identifies relationships only through structures apparent in the definitions of the interfaces of C++ classes or Java classes or interfaces. The specific relationships which can be detected are inheritance of a supplier class by a client, containment of an instance of a supplier class in a client, and the existence of member functions of the client class which accept or return an instance of the supplier.
While these relationships may seem unrelated to the invocation and module data counts, they are likely to show a strong corelation because of the fact that in an object oriented environment, it is likely (but not inevitable) that the low-level use relationships of invocation and direct access to data structures require an object of the class of the supplier module to be available. This availability can be through instantiation of an instance of the supplier class within procedural code, but will often be due to the existence of one of the higher level relationships described above.
The counts of Fan-In and Fan-Out are regarded as a measure of the structural quality of a program, with high values of either (and particularly high values of both within the same module) indicating increased risk of changes required in one module requiring changes across other modules. CCCCounter chooses to define the relationship counts in such a way that each supplier or client module is counted only once, however many separate ways the relationship is detected. CCCCounter applies filtering to the relationships identified to distinguish between different kinds of uses which may carry with them different levels of structural risk. There are two filters: visibility and concreteness.
The visibility filter removes from consideration relationships which are known to be only accessible from the private interface of a module. Relationships which are defined in the visible part of the interface can be exploited by clients of the current module, thus forcing those clients also to be clients of the current module's supplier. Visible relationships also increase the range of operations available on an object, thus increasing the cognitive complexity of the interface from the point of view of a programmer required to use a module.
The concreteness filter removes from consideration relationships which do not create a dependency of the implementation of the client module on the implementation of the supplier class. Dependency-creating relationships increase risk because they may not be cyclical, and thus inhibit the creation of other relationships. They also inhibit the ability of modules to be built separately, requiring recompilation of the client module when the supplier changes. The test for this filter in C++ is whether a forward declaration of the supplier class is adequate to allow the client module definition to be compiled: containment and parameter passing where the client module is modified by a referential operator are allowed in this case, containment or passing by value or inheritance are all dependency-creating. In Java, relationships except inheritance are treated as non-dependency creating.
Number of Modules (NOM)
All instances of the following syntactic constructs are treated as modules: C++ classes and namespaces; Java classes and interfaces, Ada packages. There are contexts where the analyser detects something which may belong to one of these categories but may not. In this case, the name is treated as a module name if and only if member functions are identified for it. Functions which do not belong to a module of one of the categories defined above are treated as belonging to a single anonymous module: if any members are identified for this module it is also counted.
The command line flags supported by CCCCounter are defined in the file ccccmain.cc. A brief usage message can be generated on standard output by entering the command 'cccc --help'. The same message will be generated on standard error if an invalid command line is entered. As of version 3.pre57 the text generated by this command is as follows:
Usage:
cccc [options] file1.c ...
Process files listed on command line.
If the filenames include '-', read a list of files from standard input.
This program is work in progress and is not well documented.
Please be prepared to refer to the source code for the
meaning of some options.
Options:
--help * generate this help message
--outdir=<dname> * directory for generated files
(default=.cccc)
--html_outfile=<fname> * name of primary HTML report generated
(default=<outdir>/cccc.html)
--xml_outfile=<fname> * name of primary XML report generated
(default=<outdir>/cccc.xml)
--db_infile=<fname> * preload internal database from named file
(default=no initial content)
--db_outfile=<fname> * save internal database to named file
(default=<outdir>/cccc.db)
--opt_infile=<fname> * load options from named file
(default=use compiled-in option values,
refer to cccc_opt.cc for option information)
--opt_outfile=<fname> * save options to named file
(default=<outdir>/cccc.opt)
--lang=<string> * use language specified for files specified
after this option
languages supported are c,c++,ada,java
(default=use language/extension mapping
controlled by options)
--report_mask=<hex> * control report content
(refer to ccccmain.cc for mask values)
--debug_mask=<hex> * control debug output content
(refer to ccccmain.cc for mask values)
Much of the behaviour of CCCCounter can be controlled by a stream of configuration data. The file cccc_opt.cc contains the default value for this stream, which can be dumped using the --opt_outfile qualifier. The resulting file can then be edited to customize behaviour and loaded into a later run of cccc using the --opt_infile qualifier. As of version 3.pre57, the contents of the file dumped by the command cccc --opt_outfile=cccc.opt are as follows:
CCCC_FileExt@.ADA@ada.95@ CCCC_FileExt@.ADB@ada.95@ CCCC_FileExt@.ADS@ada.95@ CCCC_FileExt@.C@c++.ansi@ CCCC_FileExt@.CC@c++.ansi@ CCCC_FileExt@.CPP@c++.ansi@ CCCC_FileExt@.CXX@c++.ansi@ CCCC_FileExt@.H@c++.ansi@ CCCC_FileExt@.H++@c++.ansi@ CCCC_FileExt@.HH@c++.ansi@ CCCC_FileExt@.HPP@c++.ansi@ CCCC_FileExt@.HXX@c++.ansi@ CCCC_FileExt@.J@java@ CCCC_FileExt@.JAV@java@ CCCC_FileExt@.JAVA@java@ CCCC_FileExt@.ada@ada.95@ CCCC_FileExt@.adb@ada.95@ CCCC_FileExt@.ads@ada.95@ CCCC_FileExt@.c@c.ansi@ CCCC_FileExt@.c++@c++.ansi@ CCCC_FileExt@.cc@c++.ansi@ CCCC_FileExt@.cpp@c++.ansi@ CCCC_FileExt@.cxx@c++.ansi@ CCCC_FileExt@.h@c++.ansi@ CCCC_FileExt@.h++@c++.ansi@ CCCC_FileExt@.hh@c++.ansi@ CCCC_FileExt@.hpp@c++.ansi@ CCCC_FileExt@.hxx@c++.ansi@ CCCC_FileExt@.j@java@ CCCC_FileExt@.jav@java@ CCCC_FileExt@.java@java@ CCCC_MetTmnt@8.3@999999.000000@999999.000000@0@8@3@General format for fixed precision 3 d.p.@ CCCC_MetTmnt@CBO@12.000000@30.000000@0@6@0@Coupling between objects@ CCCC_MetTmnt@COM@999999.000000@999999.000000@0@6@0@Comment lines@ CCCC_MetTmnt@COMper@999999.000000@999999.000000@0@6@3@Comment lines (averaged)@ CCCC_MetTmnt@DIT@3.000000@6.000000@0@6@0@Depth of Inheritance Tree@ CCCC_MetTmnt@FI@12.000000@20.000000@0@6@0@Fan in (overall)@ CCCC_MetTmnt@FIc@6.000000@12.000000@0@6@0@Fan in (concrete uses only)@ CCCC_MetTmnt@FIv@6.000000@12.000000@0@6@0@Fan in (visible uses only)@ CCCC_MetTmnt@FO@12.000000@20.000000@0@6@0@Fan out (overall)@ CCCC_MetTmnt@FOc@6.000000@12.000000@0@6@0@Fan out (concrete uses only)@ CCCC_MetTmnt@FOv@6.000000@12.000000@0@6@0@Fan out (visible uses only)@ CCCC_MetTmnt@IF4@100.000000@1000.000000@0@6@0@Henry-Kafura/Shepperd measure (overall)@ CCCC_MetTmnt@IF4c@30.000000@100.000000@0@6@0@Henry-Kafura/Shepperd measure (concrete)@ CCCC_MetTmnt@IF4v@30.000000@100.000000@0@6@0@Henry-Kafura/Shepperd measure (visible)@ CCCC_MetTmnt@LOCf@30.000000@100.000000@0@6@0@Lines of code/function@ CCCC_MetTmnt@LOCm@500.000000@2000.000000@0@6@0@Lines of code/single module@ CCCC_MetTmnt@LOCp@999999.000000@999999.000000@0@6@0@Lines of code/project@ CCCC_MetTmnt@LOCper@500.000000@2000.000000@0@6@3@Lines of code/average module@ CCCC_MetTmnt@L_C@7.000000@30.000000@20@6@3@LOC/COM Lines of code/comment line@ CCCC_MetTmnt@MVGf@10.000000@30.000000@0@6@0@Cyclomatic complexity/function@ CCCC_MetTmnt@MVGm@200.000000@1000.000000@0@6@0@Cyclomatic complexity/single module@ CCCC_MetTmnt@MVGp@999999.000000@999999.000000@0@6@0@Cyclomatic complexity/project@ CCCC_MetTmnt@MVGper@200.000000@1000.000000@0@6@3@Cyclomatic complexity/average module@ CCCC_MetTmnt@M_C@5.000000@10.000000@5@6@3@MVG/COM McCabe/comment line@ CCCC_MetTmnt@NOC@4.000000@15.000000@0@6@0@Number of children@ CCCC_MetTmnt@WMC1@30.000000@100.000000@0@6@0@Weighting function=1 unit per method@ CCCC_MetTmnt@WMCv@10.000000@30.000000@0@6@0@Weighting function=1 unit per visible method@ CCCC_Dialect@c++.mfc@BEGIN_MESSAGE_MAP@start_skipping@ CCCC_Dialect@c++.mfc@END_MESSAGE_MAP@stop_skipping@ CCCC_Dialect@c++.stl@__STL_BEGIN_NAMESPACE@ignore@ CCCC_Dialect@c++.stl@__STL_END_NAMESPACE@ignore@
Records in the configuration stream of type CCCC_FileExt control the mapping of file extensions to languages.
Records in the configuration stream of type CCCC_MetTmnt control the treatment of values for each of the metrics defined by CCCCounter.
Records in the configuration stream of type CCCC_Dialect control the handling of dialect-specific pseudo keywords by the CCCCounter parsers.
While CCCCounter attempts to recover from parse failures, there are some language constructs which cause crashes, and others which result in code going unanalyzed. If CCCCounter does not report syntax errors and terminates normally, it is likely that all files have been analyzed, otherwise it is strongly recommended that the user does some kind of independent check on the quantity of code ignored, rather than relying on CCCCounter's own report on 'Rejected Extents'.
The best place to look for information about CCCCounter is the CCCCounter home page.
The current version of the program is released under the GNU Public License, giving users the right to work on the source code to address any specific issues they have.
The CCCCounter distribution needs a version of the Purdue Compiler Construction Toolset (PCCTS) originally created by Terence Parr and coworkers at Purdue University. This program can be found here
The program will also require a C++98 compliant compiler to build. You can use gcc with option "-std=c++98" for this.