Sign Up Sign In

What new data should we import from SE?


When we set up this site we imported from SE as of the December data dump (the latest we had at the time). We didn't have a way to get the delta; the import code didn't use the API.

We now have better data-import tools, and there's been a new data dump. We can, therefore, import stuff that was posted on SE between December and now -- through sometime in March via the data dump, and through the API for the rest. We can also now specify what we want via a SQL query, so we can make import decisions based on tags, question status, votes, whether the question has answers, and more.

This is for new posts; we can't integrate edits.

I think it's worth pulling in more posts, as some of our users here were still active there during this period of time. I also think it's worth being a little more picky than we were on the initial data load. I'm thinking about excluding anything that's downvoted on SE, and any question that's closed. Is that a good approach? If not, what should we do instead?

Why should this post be closed?


2 answers


I'm honestly inclined to agree that what Writing Codidact needs isn't really another data import. It's to get actual people to come here, read, contribute, and remain.

People who have posted on Writing SE are by no means legally prevented from posting that same content here, since it's still their content and SE only has a license to use it (for a wide variety of purposes, including redistributing it under a CC license). The SE terms of service clearly acknowledge this. Edits can be slightly trickier, but certainly the initial revision could be copied verbatim if the person who posted it wants to do so, and we can then tweak it here.

Also, there's a small amount of original content here that wasn't imported from SE at all. Making another import from Writing SE seems likely to bury that content, which seems to me like absolutely the wrong thing to do when we're trying to get people to come here. We should be giving people a reason to come here specifically, and I don't see how the way to do that would be to import more content from elsewhere.


Good point about burying the original content, which we need more of (and to be more findable). And we should continue to prune stuff from the original import that is not helping us -- downvotes and flags are helpful there. ‭Monica Cellio‭ 4 months ago

Agreed. The only way this place thrives is if it becomes known as the best place to ask questions, and that only happens if it becomes known for having the best answers. Otherwise SE's first mover advantage will be insurmountable. More vigorous curation could certainly do a lot to improve the quality of answers here. It may not be enough by itself, but without some distinguishing property in the model, I don't see how the mouse bests the elephant. ‭Mark Baker‭ 4 months ago


I don't know. What this place needs is not more data but more users. Bringing over a few more months of questions from SE would give the few of us who still check in here something to chew on for a little while, but I don't see it changing the dynamic. Sooner or later this place has to learn to fly on its own. I don't know what is going to spark that, but I'm skeptical that another dump of second hand questions is going to do it.


Sign up to answer this question »

This site is part of the Codidact network. We have other sites too — take a look!

You can also join us in chat!

Want to advertise this site? Use our templates!