Stack Overflow Public Database Available (2017-06)

Need a huge SQL database for your data project? Here is one for you.

Nick Craver and the kind folks at Stack Overflow publish their data export periodically with your questions, answers, comments, user info, and more. It’s available as an XML data dump. Then, Brent Ozar took the data and imported into SQL Server for his teaching performance tuning needs, and generously published it via a 16GB torrent feed.

Once downloaded, it gives you a series of 7Zip files that you can extract to produce an 118GB SQL Server 2008 database that you can attach directly to any 2008-2016 SQL Server.

It’s a pretty big database containing massive data that is perfect for your data analysis projects.

  • Badges – 23M rows, 1.1GB data
  • Comments – 58.2M rows, 18.5GB data
  • Posts – 36.1M rows, 90GB data, 15.5GB off which is off-row text data. This table holds questions & answers, so the Body NVARCHAR(MAX) field can get pretty big.
  • PostLinks – 4.2M rows, 0.1GB
  • Users – 7.3M rows, 1GB
  • Votes – 128.4M rows, 4.5GB

/via Brent Ozar/

One thought on “Stack Overflow Public Database Available (2017-06)

Leave a Reply to Dhanush Dharmaretnam Cancel reply

Your email address will not be published. Required fields are marked *