Where do I start with C++ documentation?
I am new to programming and am entirely self-taught. I have reached a point in my writing where a solid grasp of documentation standards would be greatly beneficial. My question is not how to add documentation, but when. And what to add. Doxygen seems to be the preferred method although I'm sure there are others, but all the tutorials and advice out there seem to describe how. It's usually more about the parser than anything else.
Some of what I have picked up over the past few months include:
Self-documentation: a lot of C++ naming revolves around explicitly stating what is happening in the code base itself. Is this what I should strive for? Should good code be self-explanatory at every level?
Documentation requires maintenance too: I have read caveats that warn against excessive documentation as when your software is updated so must your documentation.
I've also read that comments should explain why you do something in your code. Not what you do. I do understand that a comment and documentation aren't necessarily the same thing, so I am still uncertain when and what to document.
I don't want to be lazy and avoid good documentation, but I also don't want to weigh down my software with overwrought bogs of difficult to maintain and absurdly obvious descriptions of my code. Any sort of guidance here would be greatly appreciated. Even some examples of what you would consider good documentation could benefit me.
This post was sourced from https://writers.stackexchange.com/q/36232. It is licensed under CC BY-SA 3.0.
3 answers
Find a style guide and stick with it.
Style guides are more than just comments - they cover all parts of your code from how you name your variables to how you structure your code. Good style guides are designed to keep your code as maintainable as possible, with an emphasis on readability.
There are a number of style guides you can follow. Here is Google's C++ Style Guide. For the moment, you are mostly interested in the naming, commenting, and formatting sections, but you'll learn a lot from reading the other sections as well. Good style guides provide rules, explanations for why those rules exist, and examples.
The most important rule is to pick a style and stick with it. If you are consistent then both writing and reading your code becomes easier.
Quick and dirty rules of thumb
If you don't want to read through the style guide, my rule of thumb is that it should be possible to use a function without having to read the code. That means it should be completely comprehensible from the name and docstring. That usually means fully explaining what each of the arguments do, and that the return value means. If your code expects particular units (meters, seconds, etc), always describe what units you are using.
If, after returning to a piece of code after a little while doing something else, it takes you more than 10 seconds to figure out what a piece of code is doing or why it is doing that, add a comment.
This post was sourced from https://writers.stackexchange.com/a/36233. It is licensed under CC BY-SA 3.0.
0 comment threads
What you write depends on your audience. API reference documentation -- the output of tool- like Doxygen -- is usually for the users of that API. Such externally-facing documentation focuses on the contracts of the API and how its various components fit together -- how are you supposed to use this collection of classes and functions to write your application? For this type of documentation, I'll quote from an answer I gave elsewhere (and I recommend you read the linked post):
Here are some key points to writing good API reference documentation:
- Document the contract, not the implementation.1
- Explain fuzzy verbs. What does "find" mean, versus "get"? Set users' expectations.
- Document restrictions on arguments or return values that aren't fully conveyed by the signature (like that an integer has to be positive, or in the range 1-100, etc).
- Cover failure and not just success. Can arguments be invalid? Can your code behave in abnormal ways even if the inputs are valid? How do you signal errors or other problems?
- Be thorough but not verbose. Don't repeat information that's clear from the signature.
1 To do this you need to determine what the contract actually is -- what promises are you making to your users? This is a large software-design topic beyond the scope of this question.
You might also be writing documentation for internal consumers -- your teammates, yourself six months from now, and so on. This is where "why, not just what" becomes even more important -- they can (probably) already read the source code, but they can't read your mind about why you wrote that code and not some other code. Try to step back from your code and imagine that you're not intimately familiar with it already -- what parts aren't obvious to you? And what parts did you have to think about a lot during implementation? Write some comments about that. The closer the comments are to the code the more likely it is that they'll be maintained, so while you might also think about external artifacts like design documents, start with comments in the code. These comments can take a few forms (mix and match):
A comment block at the top of the file that explains the overall organization and primary execution path(s).
Comment blocks at the tops of functions that cover context (like preconditions), effects beyond the function (like that you're taking a global lock on the database), and anything interesting about the implementation.
Inline comments near anything particularly tricky, unclear, or expensive.
Now, I said at the beginning that Doxygen output is usually for users of an API. Here's why I said "usually": depending on how your code is structured, you can also use Doxygen to produce an internal build -- not an API reference, but a our code guts reference. My team does this. Our public-facing API is limited to one namespace and directory structure; we have one Doxygen build that emits just that, and we have another that's generated from all of our code, strictly for our own use. The separation in the code keeps internal documentation from leaking out to users while allowing us to write whatever code documentation we need for our own team. Sure, sometimes you don't need the Doxygen output because you can just look at the code, but we've found it helpful to have both.
0 comment threads
I am a professor and PhD that has been coding over 40 years. I'll restrict this comment to documenting code, which is different enough to warrant its own answer from a professional: I "grew up" (my first real programming job, 40 years ago) on IBM operating system code; written entirely in assembly language.
The assembly language of the time was rather cryptic: "MOV R1,R2" for example, or "ADD 1,R3". Variable names were restricted to eight characters. If you have looked at the modern assembly generated by your compiler, it can be even more cryptic, due to a variety of addressing modes and instructions present today that did not exist back then.
As a result, it was close to to impossible to "read the code" and understand a single thing about what it was supposed to be doing, or what the programmer thought he was doing, with the instructions given.
A consequence of that was the IBM coding standard, which for all I know was around since the 1960's: block comments on every "function" (subroutine at the time) AND a comment on every assembly line.
The block function provided an overview of what the code was doing and how. Such as,
Parsing an interrupt from a controller; the interrupt number is in R0 and the controller code and sub-command are mapped into the upper and lower halves of R7. We find a function table for this interrupt, and call the routine indexed by the controller code, with the sub-command in R0.
The actual code had a comment on EVERY line, with no exceptions, explaining what the assembler code was doing: Example:
MOV R7, R1 # Make a copy of controller code and sub-command.
AND 0xFF, R1 # Isolate just sub-command, for later.
SHR $5,R7 # find offset into function address table.
And so on. This is also the origin (I am pretty sure) of the "don't repeat what you are doing in the comments." This is not helpful:
SHR $5,R7 # shift R7 5 bits right!
Presume I can read code; I don't need you to tell me that, I need you to tell me WHY you are shifting R7 5 bits right. Reading that early OS code, all the comments were aligned in column 40 (of an 80 column screen) and reading down the left half was WHAT you were doing, reading down the right half was WHY you were doing it.
I am not recommending this for C, C# or C++. To a small extent they can be self-documenting:
sec += uSec * 1000000.; // no comment necessary.
But if you can see why this pervasive commenting was necessary when thousands of coders were writing straight-up assembly for millions of lines of code with hundreds of devices, and any of them might quit any day, you can understand the spirit of how your modern coding commentary should be:
It should explain the code and what you are doing well enough that another programmer, of your skill but unfamiliar with the coding problem being solved, can get into the routine and the details and find a bug.
Without wondering WTF you are thinking or checking with:
if (x & 0xA013 || y & 0x03) var3+=7;
Your code should not be cryptic.
It is difficult for comments on lines to get stale; if the line is fixed I fix the comment at the same time.
It is easy for block comments on functions (or whole files) to get stale, it takes an effort to fix a function, test it, and then go back and fix the block comment. So it is better to keep the block comment informative but not detailed, for example you may do a table search, but you don't have to specify you are doing a binary search, or hash table search, or whatever. You search! Exactly how are in the code, commentary can be found there if appropriate (and may not be needed if your code calls "binSearch(&Table, N, Key);" or something like that. You aren't commenting for novices, but programmers.
In short, the block comments are an abstract of what the function does for the caller, kind of like how you would think of it when you are calling it in other code; its reason for being there. It initializes all the disk controller chips, using the table defined by the machine configuration, and leaves them in a ready state or disabled state if the disk hardware fails its self-test.
That said, modern code can contain a plethora of library or 3rd party package calls that even a programmer will not know, so if your library calls or methods (or their arguments) are complex or have cryptic names, it may be helpful (both to other programmers, and yourself in the six months) to explain what a called routine is supposed to be doing, in particular if it does a lot of work and takes a ton of arguments (like a javascript library animating a chart on the user's screen, using several tables and levels of data).
In such cases, I tend to go multi-line on the function call, putting 1 or a few arguments on each line with commentary explaining them, as I see necessary.
ret = rend3D3LSv( // render chart, 3-D, 3 level, side-viewer,
cardata, nCar, nDim, // the car data we just selected, rows and columns,
pX, pY, lX, lY, // screen box chart window dimensions,
0.5, // Limited rotation to 180d,
And so on. Commentary should be maintained with changes, but it should not be HARD to maintain. Even in C, it should be designed to help debug or update the code when that happens a few years from now when you've written a hundred thousand other lines of code (everything changes sooner or later, from the trivial like how input data is formatted, to the major like which libraries or rendering packages your software must use).
Comments are a record of your analytic thought, so a future programmer (including yourself) is not stuck trying to read your mind-of-the-moment in order to use, debug, or update your code. To the extent it is truly obvious what you are doing in your code, you don't have to supply redundant comments. Determining what will BE truly obvious to another person or yourself in a year is just a skill you will have to learn; but it is better to err on the side of caution and comment. A few seconds now can save you an hour of frustration later. Certainly, any function call you had to look up to be able to call it, you should comment as you call it.
Like IBM, I am still in the habit of writing code/comment together, while the thought of "why" is in my head. I do not go back over written code and try to come up with the comments then. Likewise, my block comments on functions are written BEFORE the function (although I have usually written the prototype and args first and know what I intend to DO with the function).
However, I warrant there is leeway on complex calls used all the time in some field; If I have 200 calls to a linear algebra in my code, I don't comment every time I call a matrix-multiply. On the grounds that a programmer reading this code knows (or should know) what a DGEMM() looks like and doesn't need me holding their hand. Or my comment isn't about DGEMM(), it is something helpful like
// get first intermediate vector from partial right hand side.
This is quite different from writing user manuals and documentation, I've done that but have no particular insight or advice, so I will leave it for others to explain.
0 comment threads