How should I document a database schema?
I am going to be writing some user-facing documentation for a database that visitors can query. That is, the people writing queries are not the ones who created the database; they can come in, look at what the database offers, and write their SQL queries. Think of something like SEDE.
I could just give people the database schema, but that only tells you so much. For example, if a value is an integer but it's an enumeration (1 means this, 2 means that, 3 means this other thing, etc), you can't get that from the schema alone. So I need to produce some documentation for the tables and columns in the database, which also provides me the opportunity to present the information in whatever organizational structure makes the most sense for our users.
My question is: are there best practices for going from a DB schema to documentation? If this were an API, for example, then in some languages I could write inline comments that could be automatically turned into reference documentation using tools like Javadoc or Doxygen. Is there something like that for DB schemas, or would I have to write my own transformation tools (which isn't worth the effort for the size of this project)? Or do people writing schemas just inspect the schema and write separate documentation? Are there conventions for doing this? I'm an experienced technical writer but I'm new to databases.
(In case it matters, I need to deliver HTML. This is a volunteer project (not actually open-source, but along those lines) and I have no budget for tools.)
3 answers
If this is user-facing documentation, then make up a data dictionary that describes the tables and columns with supplementary blurbs about the meaning of the data (e.g. the meanings of specific values in a column). This can be a straightforward HTML document with the supplementary descriptions as text.
If you need to produce E/R diagrams then Visio professional (version 2010 and earlier) has a passably good database diagramming feature and should be fairly widely available. There are other tools that will do this as well. Unfortunately Visio's SVG output is pretty crap and I'm not aware of any modelling tool that will produce good SVG diagrams.
Dia is an open-source diagramming tool and has a UML modeller which could be co-opted to produce usable E-R diagrams that will render in SVG. If you can live with bitmap illustrations of the data model then you can get away with taking screenshots from Visio. In this case, split the model into subsystems and make illustrations of the parts.
For user documentation this will be far more useful than any model-based approach.
This post was sourced from https://writers.stackexchange.com/a/12635. It is licensed under CC BY-SA 3.0.
0 comment threads
It depends on your audience.
Blessed Geek suggested UML enumeration, but this assumes that the users understand UML. Your question shows that you know technology well enough to know the capabilities of your users.
Writers often use Javadoc or Doxygen to generate API documentation because those tools easily generate new docs when the API itself changes. Updating the documentation as the code changes is useful when the code is changing a lot. For your database schema, however, that may not be a consideration. Schemas don't change very often, and when they do, they're usually added to. The technical impact of deleting or modifying fields in a database is large.
It's easy enough to print out field names and types from a database using its tools. From this printout, you can produce an HTML table that has a blank column where you can provide more documentation about the field. An image that's something like a simplified UML diagram can help readers understand how the tables of the database are related to each other. If you want some additional ideas, take a look at Contacts Provider in the Android documentation.
This post was sourced from https://writers.stackexchange.com/a/12629. It is licensed under CC BY-SA 3.0.
0 comment threads
While the other answers already mention tools to create graphical presentations of your tables / columns and ways to add short information to that, for example by creating an HTML table with an empty column that you can fill with additional information, it's important to remember what people are looking for when viewing that documentation. You already are concerned about people not knowing certain implicit restrictions in the data, such as the integer that is basically just a key in an enumeration as in your example.
Sometimes it might be better to create a dimension table that can be joined with the table containing such data to give the users a better idea about what the data is while directly viewing the results of their queries. But it might not always be possible to add more tables / columns to the database. Many databases allow comments to be added to columns, tables and other objects, which might be a good starting point for your documentation and for extracting the most relevant parts because these can simply be queried. Programmers are sometimes known not to take comments too seriously though, which means that you might need to come up with the descriptions yourself.
And to be honest, there is no right approach in such a situation. You mentioned that you need to produce HTML in the end so you should focus on creating webpages that make it easy to find what you are looking for by dividing it into appropriate chapters, such as a general introduction and more detailed pages for individual categories. The same could be said about a Microsoft Word document or something similar. The final representation might be different, but for such documentation that is directed at technical folks to understand how to write database queries you will mostly wind up with similar results - whether you have a big document in a central place such as SharePoint with lots of different levels of headings you can jump to or a webpage that you can view in your browser with an overview page from where you can go to each "chapter" doesn't really make a difference. A webpage has quite a view advantages though, especially with larger pictures and in cases where you want to show tables with lots of columns. If the database you are working on is similar to SEDE you might even want to link example queries that the user can just try and fork for themselves later.
While one part of the site could be dedicated to graphical presentations of the data and making it easy to see how the available tables belong together, another part might illustrate what tables are available, or example via a simple search. You could add a short "management summary" style comment to each table that mentions what the table is about and whether there is something to be aware of in case someone wants to look at the data. From there you can link to specific pages for each table which contains information about its columns, again with a short description. In cases where there are hidden "gotchas" such as the one you mention you should mark those tables and columns on their respective pages in a way that shows your readers that they should take a closer look and link to a detailed page describing the problems.
Furthermore you should have a page dedicated to such "gotchas" where you describe in prose what the problems with some of the data or datastructures are and link to the respective detail pages. This could for example be done in an order that shows how problematic something is. Make it easy to find and easy to see at first glance that there are some things that you need to be careful about when working with the database.
I once encountered a database that I couldn't change and needed to document some things for others to read up on the database design. Nobody wanted to touch the thing and I soon understood why. There were... a couple problems with the design. For example I couldn't create a userful graphical presentation of how the tables were connected from the database because while the database allowed people to use things such as Foreign Keys to link tables the designers of this specific application opted for not using such contrived things - instead there was a column in some of the tables that listed Tuples of Table Name, Column Name and Values. That needed a lot of prose to describe to others how they could hope to query the database with expansive examples for the most difficult and important things that you might want to know, as well as common errors that happened while creating queries.
Most of the time a picture is worth a thousand words - but sometimes you need a thousand words because nobody would understand the picture.
1 comment thread