Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

What strategies are there to document data lineage and keep it updated with a minimum amount of maintenance?

+0
−0

Quickly communicating data lineage to other stakeholders in our organization has become increasingly difficult as we scale.

What are effective strategies to address this and keep it maintained?

An example would be customer data that is stored in a data warehouse, processed in various ways, and used for analysis and reporting. The audience would be members of the business intelligence and analytics teams as well as product managers. The data lineage can change as developers add or modify the code.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

This post was sourced from https://writers.stackexchange.com/q/33479. It is licensed under CC BY-SA 3.0.

0 comment threads

2 answers

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

+1
−0

As your question is fairly general, this answer is too.

I'd insert the ETL process itself into the target database. As you have on-site developers this will have to be included in your extraction development. Best add a bit of architecture description and versioning as well to serve the level of understanding of your consumers.

When going after this I'd include a similar step in the actual extraction run process; even a straightforward run-date and field for comments can be worth gold when attempting to generate trends a year hence.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

This post was sourced from https://writers.stackexchange.com/a/33480. It is licensed under CC BY-SA 3.0.

0 comment threads

+1
−0

I worked on a project that sounds similar to yours in which the ability to identify every process that touched every piece of data was vital in order to minimize the amount of code validation that had to be redone whenever any other piece of code changed.

To show the flow of data through processes, we use a three column vertical swim lane diagram where the first column was the input, the middle column the process and the third column was the output.

Of course, the output could be the input to another process, in which case we would draw a flow line looping it back to the first column. These diagrams could potentially get quite long, but the virtue was that you could follow any data back up a easily traced path through all the processes it passed through and identify any data that it was melded with by any process. And because the inputs, processes, and outputs were in separate columns, you could easily identify any input, output, or process you were interested in and trace its role in the system.

Updating it was reasonably simple as well. If you added a new process, you added a new row to the diagram, inserted the process in the middle columns, and then reconnected the data input and output lines. If you added a new data source, you added a new row, inserted the new source, and connected it to the relevant processes.

We also used shading to distinguish between internal and external inputs and processes.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »