Post History

50%

+0 −0

Q&A How to create a useful diff of a markdown writing project iteration

I presume you have already tried the flags --minimal and --ignore-all-space. If line breaks due to added/deleted/changed words are causing the problem with diff (making it look like changes appear...

posted 7y ago by Amadeus‭ · last activity 5y ago by System‭

Answer

#4: Attribution notice removed by

System‭ · 2019-12-19T22:13:16Z (over 5 years ago)

Copy Link

Raw

Markdown

Source: https://writers.stackexchange.com/a/32239
License name: CC BY-SA 3.0
License URL: https://creativecommons.org/licenses/by-sa/3.0/

#3: Attribution notice added by

System‭ · 2019-12-08T07:36:39Z (over 5 years ago)

Copy Link

Raw

Markdown

Source: https://writers.stackexchange.com/a/32239
License name: CC BY-SA 3.0
License URL: https://creativecommons.org/licenses/by-sa/3.0/

#2: Initial revision by

deleted user · 2019-12-08T07:36:39Z (over 5 years ago)

Copy Link

Raw

Markdown

I presume you have already tried the flags --minimal and --ignore-all-space.

If line breaks due to added/deleted/changed words are causing the problem with diff (making it look like changes appeared that are just format changes), I would suggest (since you can program this yourself easily) you pre-process both files, to produce two new files for comparison.

The idea is to have ONE line for diff to process per **_element_** : So if it were just a story, I'd say combine all lines in each paragraph as a single line, even if thousands of characters long, so what diff does is compare each **paragraph** for changes, not each **line**.

(For diff, use the **--width=nnn** flag to ensure all characters are output, I'd set **--width=16384** or so.)

You'll have to recognize, based on your formatting, where each type of **element** begins and ends. I am not familiar with play formatting, but in a screenplay I would want to compare slug lines (setting), exposition paragraphs, character dialogue labels, and character dialogue itself.

In a screenplay, I think these identifications could probably be done by counting leading spaces, and/or checking for ALL CAPS or keywords (INT., EXT., etc), and using blank lines as signals (if you are double-spaced, then ignore a single blank line, but two in a row signal the end of an element). I would write a simple state machine (each state being what type of element I am building for output, or that I am seeking text for the start of another element, etc), but the logic is pretty simple no matter how you code it.

So five lines of dialogue become a single line in the output file. So in the output a new scene might have line breaks like this:

> EXT. Park Park, NIGHT
> 
> [five lines of exposition combined here, setting up DIANE and EDGAR jogging, about to be mugged]
> 
> DIANE
> 
> [ten lines of dialogue combined on this line, a story about work]
> 
> EDGAR
> 
> [two or three lines combined here]

That should ignore reflow changes the editor did and identify only where word changes occurred.

However, it becomes harder to identify where in the original files the text appeared. To solve that, I would add more to the output file, and use another flag of diff: **--ignore-matching-lines=RE**

RE here is a regular expression. So the idea is, before (or after, your choice) you output any built-up element, report on a separate line the original line numbers from the original file, and the modified file, with a marker that won't be found in the text: Like the hashtag '#', or a pair of them. Since you know that will begin on the first character of the 'notation' line, set your regular expression to **'^#'** , and those lines will be ignored. So the line in the output might look like

> # 1023 1027

meaning line 1023 in the first file, line 1027 in the second. If you prefer, make it more complex by counting pages too. I haven't tried it, but I think these should be output as part of the context when reporting a changed line.

You can also tell diff to only output changes, **--suppress-common-lines**.

That may not be exactly what you want; but once you **identify** the changes, you might be able to highlight them somehow in the new script (color or bolding or whatever) as a practice script.

Since you are identifying elements and in particular wish to isolate changed dialogue, it would be easy in your code to do all of this but suppress the output (in the two files to compare) of anything BUT the dialogue of your one character. The hashtag for original file line numbers would remain the same.

If you make the character name a variable or argument to your pre-processing program you write, you can make practice scripts for each character.

Have fun coding.

#1: Imported from external source by

System‭ · 2017-12-29T12:14:20Z (over 7 years ago)

Copy Link

Raw

Markdown

Original score: 1

Communities

Post History