Adding new sources of data to an existing system is always a challenge. For example, the format of the new data might be incompatible with the existing data. This is true for event information in particular since there isn’t any generally accepted format, although some work is being done in that direction, see for example schema.org, EventsML or CitySDK. Other difficulties can arise from incompatibilities at the API level - even though a machine-friendly API is provided, it isn’t necessarily cut out for all possible use cases that application developers might come up with.
Open data has become a priority for the public sector in many countries. The idea is enticing - make your data open and let people innovate with it, developing applications that the public sector itself does not have the resources to implement. New York has NYC BigApps, Helsinki loves developers and there is even an Apps4Pirkanmaa competition, to name just a few. The HelMet case is a good example of a public organization’s willingness to expose data for public use - information about events taking place in HelMet’s network of libraries is provided using a machine-friendly API, which we decided to leverage in the Linked Events portal.
The challenges of open dataDespite the easy-to-use API, we still encountered a few problems while integrating our new data source, problems that probably are rather typical for developers trying to leverage open data. In our case, several vital pieces of information were missing from the event data that the API provides - location information, translations and so on. All of these issues would certainly be possible to fix, but it would probably take a long time for the data provider, a large public sector organization, to go through the hoops of procurement, IT development outsourcing and so forth. Because of time constraints, we ended up scraping a website for the missing pieces of information, an unnecessarily brittle step in what could otherwise have been a very simple integration.
Developers of applications that utilize open data are usually individuals or small, agile companies. When the providers of the open data are large organizations that cannot respond quickly to developers’ needs, there is going to be friction. Here are some tips for organizations that want developers to be able to smoothly leverage their open data sets:
- Prefer open source components and universal data formats for developing APIs. This avoids provider lock-in and makes maintenance easier.
- Outsource carefully and have someone in your organization with sufficient technical know-how oversee the project. Facilitate knowledge transfer by bringing consultants in-house and having them collaborate with your own people instead of doing development off-site.
- Designate someone to be the “API maintainer”. This person should coordinate development, answer questions from developers, monitor API usage and watch out for potential problems. Provide this person’s contact information to developers.
- Open a public issue queue on for example GitHub. This provides a channel for developers to report issues, to discover what problems other developers have run into and and to see what progress has been made on resolving them.
- Transparency is key - provide documentation, examples, a list of known bugs and so on. Consider open-sourcing your API.
I think the push for making data open and accessible is an applaudable effort by the public sector. There have already been many beautiful applications built using open data - take, for example, BlindSquare or our own Linked Events aggregator. To make open data even more useful, data providers should seek to become more agile and responsive to developers’ needs. A public-facing API can’t be implemented and then forgotten about, it is something that needs to be supported and refined continuously.