Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Post History

60%
+1 −0
Q&A What strategies are there to document data lineage and keep it updated with a minimum amount of maintenance?

I worked on a project that sounds similar to yours in which the ability to identify every process that touched every piece of data was vital in order to minimize the amount of code validation that ...

posted 7y ago by Mark Baker‭  ·  last activity 5y ago by System‭

Answer
#4: Attribution notice removed by user avatar System‭ · 2020-01-03T20:41:56Z (almost 5 years ago)
Source: https://writers.stackexchange.com/a/33481
License name: CC BY-SA 3.0
License URL: https://creativecommons.org/licenses/by-sa/3.0/
#3: Attribution notice added by user avatar System‭ · 2019-12-08T08:02:04Z (almost 5 years ago)
Source: https://writers.stackexchange.com/a/33481
License name: CC BY-SA 3.0
License URL: https://creativecommons.org/licenses/by-sa/3.0/
#2: Initial revision by user avatar System‭ · 2019-12-08T08:02:04Z (almost 5 years ago)
I worked on a project that sounds similar to yours in which the ability to identify every process that touched every piece of data was vital in order to minimize the amount of code validation that had to be redone whenever any other piece of code changed.

To show the flow of data through processes, we use a three column vertical swim lane diagram where the first column was the input, the middle column the process and the third column was the output.

Of course, the output could be the input to another process, in which case we would draw a flow line looping it back to the first column. These diagrams could potentially get quite long, but the virtue was that you could follow any data back up a easily traced path through all the processes it passed through and identify any data that it was melded with by any process. And because the inputs, processes, and outputs were in separate columns, you could easily identify any input, output, or process you were interested in and trace its role in the system.

Updating it was reasonably simple as well. If you added a new process, you added a new row to the diagram, inserted the process in the middle columns, and then reconnected the data input and output lines. If you added a new data source, you added a new row, inserted the new source, and connected it to the relevant processes.

We also used shading to distinguish between internal and external inputs and processes.

#1: Imported from external source by user avatar System‭ · 2018-02-03T13:49:58Z (almost 7 years ago)
Original score: 3