What new data should we import from SE?
When we set up this site we imported from SE as of the December data dump (the latest we had at the time). We didn't have a way to get the delta; the import code didn't use the API.
We now have better data-import tools, and there's been a new data dump. We can, therefore, import stuff that was posted on SE between December and now -- through sometime in March via the data dump, and through the API for the rest. We can also now specify what we want via a SQL query, so we can make import decisions based on tags, question status, votes, whether the question has answers, and more.
This is for new posts; we can't integrate edits.
I think it's worth pulling in more posts, as some of our users here were still active there during this period of time. I also think it's worth being a little more picky than we were on the initial data load. I'm thinking about excluding anything that's downvoted on SE, and any question that's closed. Is that a good approach? If not, what should we do instead?
2 answers
I don't know. What this place needs is not more data but more users. Bringing over a few more months of questions from SE would give the few of us who still check in here something to chew on for a little while, but I don't see it changing the dynamic. Sooner or later this place has to learn to fly on its own. I don't know what is going to spark that, but I'm skeptical that another dump of second hand questions is going to do it.
0 comment threads
I'm honestly inclined to agree that what Writing Codidact needs isn't really another data import. It's to get actual people to come here, read, contribute, and remain.
People who have posted on Writing SE are by no means legally prevented from posting that same content here, since it's still their content and SE only has a license to use it (for a wide variety of purposes, including redistributing it under a CC license). The SE terms of service clearly acknowledge this. Edits can be slightly trickier, but certainly the initial revision could be copied verbatim if the person who posted it wants to do so, and we can then tweak it here.
Also, there's a small amount of original content here that wasn't imported from SE at all. Making another import from Writing SE seems likely to bury that content, which seems to me like absolutely the wrong thing to do when we're trying to get people to come here. We should be giving people a reason to come here specifically, and I don't see how the way to do that would be to import more content from elsewhere.
0 comment threads