Post History

60%

+1 −0

Q&A What strategies are there to document data lineage and keep it updated with a minimum amount of maintenance?

I worked on a project that sounds similar to yours in which the ability to identify every process that touched every piece of data was vital in order to minimize the amount of code validation that ...

posted 7y ago by Mark Baker‭ · last activity 5y ago by System‭

Answer

#4: Attribution notice removed by

System‭ · 2020-01-03T20:41:56Z (over 5 years ago)

Copy Link

Raw

Markdown

Source: https://writers.stackexchange.com/a/33481
License name: CC BY-SA 3.0
License URL: https://creativecommons.org/licenses/by-sa/3.0/

#3: Attribution notice added by

System‭ · 2019-12-08T08:02:04Z (over 5 years ago)

Copy Link

Raw

Markdown

Source: https://writers.stackexchange.com/a/33481
License name: CC BY-SA 3.0
License URL: https://creativecommons.org/licenses/by-sa/3.0/

#2: Initial revision by

System‭ · 2019-12-08T08:02:04Z (over 5 years ago)

Copy Link

Raw

Markdown

I worked on a project that sounds similar to yours in which the ability to identify every process that touched every piece of data was vital in order to minimize the amount of code validation that had to be redone whenever any other piece of code changed.

To show the flow of data through processes, we use a three column vertical swim lane diagram where the first column was the input, the middle column the process and the third column was the output.

Of course, the output could be the input to another process, in which case we would draw a flow line looping it back to the first column. These diagrams could potentially get quite long, but the virtue was that you could follow any data back up a easily traced path through all the processes it passed through and identify any data that it was melded with by any process. And because the inputs, processes, and outputs were in separate columns, you could easily identify any input, output, or process you were interested in and trace its role in the system.

Updating it was reasonably simple as well. If you added a new process, you added a new row to the diagram, inserted the process in the middle columns, and then reconnected the data input and output lines. If you added a new data source, you added a new row, inserted the new source, and connected it to the relevant processes.

We also used shading to distinguish between internal and external inputs and processes.

#1: Imported from external source by

System‭ · 2018-02-03T13:49:58Z (over 7 years ago)

Copy Link

Raw

Markdown

Original score: 3

Communities

Post History