At this year’s NICAR, I saw a lot of new tools and platforms demo’d but the one that stood out to me was one that fits into a missing place in the data journalism ecosystem – and web frameworks generally. Called Datasette, it lets users quickly launch a searchable, browsable web interface from a data set.
It’s a minimalist framework for rapid data presentation, that assumes and recommends a particular workflow. User dashboards, authentication – more than half the stuff web frameworks are made of – aren’t here. But what’s included is just what you need for a particular kind of project.
By foregoing many of the assumptions and overhead of traditional frameworks and focusing on the specific use case of those who need a read-only web framework (data journalists, archivists and others who work on data, then publicly present it), Datasette is able to leverage tried and true, solid technologies like CSV, SQLite, and command-line Python utilities to offer speed and simplicity for users with just enough know-how.
Named after the cassette-powered Commodore 64, Datasette is the product of programmer Simon Willison, co-creator of the Django framework. Willison worked as a data journalist with the Guardian in the UK, where he’s from originally, and has also worked in Silicon Valley at companies including Eventbrite. He’s currently a 2020 Knight Journalism Fellow at Stanford University. As he explains in this post, Willison’s inspiration for Datasette emerged from his on-the-ground experience as a newsroom developer at the Guardian, becoming familiar with the day to day needs of technologists interfacing with reporters.
The vision of what’s possible with Datasette is best given with an example, one set up by Willison for FiveThirtyEight – their Datasette server includes dozens of small databases including global alcohol consumption and episodes of Bob Ross’ “Joy of Painting.” Once on a Datasette server, they’re browsable and queryable in a predictable, accessible and even bookmarkable way.
Databases put online with Datasette include a bunch of things by default: the ability to query them, online, with SQL; a JSON API; shareable URL’s that can return either HTML pages or JSON; and a browsable website. There’s also options to customize the presentation, and to create pre-filled queries; Datasette could be a way to quickly make a bunch of your data transparent to the public (for say, civic or news organizations). Put behind a login, a Datasette server could be a private data trove for archiving and investigation within an organization.
Datasette is free and open source, and available to use on your own server with built-in launch processes for Heroku and Zeit. Find it on Github here.
For a deeper dive:
Watch Simon Willison present at PyCon 2019. This is directed at developers, rather than journalists, so it goes way into the technical details but gives a very good overview of Datasette’s capabilities.