8 minutes to read With insights from... Dan Klein Former Global Chief of Data & AI Over the past decade, there’s been a lot of discussion and emphasis on open data across the world and in the UK in particular. The importance of reaching data maturity and optimal ways of doing so are hot topics for a range of public sector organisations looking to streamline operations, enhance service delivery, and boost performance. Despite the ongoing discourse and efforts, the reality of open data often falls short of its idealistic vision. Data that may at first glance seem ‘open’ still isn’t. This disparity came to the forefront as we explored Scotland’s transport network data. With Scotland investing significantly into its data infrastructure, we decided to take a look at the country’s frontline services and investigate how operational data across various settings could create meaningful efficiencies and service improvements. We started with the transport sector and what unfolded was a revealing journey into the challenges that hinder the realisation of truly open, interoperable, mature data. This article marks the beginning of a comprehensive study presented in a two-part series exploring Scotland’s data infrastructure across diverse sectors. Below, we uncover some of the challenges to achieving data maturity that are applicable to organisations across the world. From addressing issues of accessibility and standardisation concerns, to exploring the broader impacts on service improvements, join us as we navigate the intricacies of the data maturity journey, using Scotland’s transport network as a lens to comprehend the broader challenges and opportunities. Why is open data crucial for data maturity? According to the Open Data Institute (ODI), a UK non-profit organisation committed to promoting and advancing trust in data, open data is ‘data that anyone can access, use, and share’. Specifically, good open data should possess the following four key characteristics: Available: data can be linked to so that it’s easy to share and discuss Structured: data is available in a standard, structured format for easy processing Consistent: data has guaranteed availability and consistency over time, ensuring reliability Traceable: data is traceable right back to its origins so that others can determine whether or not to trust it It’s crucial to clearly define open data for interoperability implications. If the term isn’t understood correctly, it can jeopardise the journey towards data maturity. Organisations will simply publish datasets that don’t work together effectively, hindering the ability to build large-scale, complex systems. This is precisely the issue we’ve encountered. Main roadblocks to true data maturity and interoperability During our exploration of the Scottish transport network’s data, we examined 14 datasets, including public transport access points, road network details, as well as train, ferry, bus routes and timetables. We found several issues with the available datasets that were out in the open, but not easy to operate. Some datasets were outdated, and others had unstructured data or were missing information entirely, which is not in line with ODI’s characterisation of good open data. In fact, of the 14 datasets we explored only Network Rail’s open data was in good condition. It was easily accessible via an API and provided wiki documentation with clear instructions that were short and understandable. The issues we encountered are real roadblocks to achieving data maturity and interoperability. They are also not unique to the Scottish transport network. So, if you recognise your organisation as we outline the top challenges below, it might be time to innovate your data infrastructure. 1. Outdated datasets One of the main difficulties we encountered were outdated datasets. Often, information that could have been very interesting and relevant hasn’t been updated for ten years. This immediately raised the question of, ‘how can there be any effective decision-making if these decisions are not even relying on the latest information?’. Examples of outdated datasets: NaPTAN: Great Britain’s dataset of all public transport access points had its schema last updated in 2014, requiring manual scraping to discover what each code meant. Besides this, the data itself was either available as one batch or individually for each local authority. So, to obtain data for Scotland, it was necessary to manually select all 32 local authorities and download them. Geospatial train data: also hasn’t been regularly updated and was only available as a result of a freedom of information request. 2. Unstructured data Another major roadblock to data interoperability is data that is poorly structured or doesn’t follow a standard format, making it impossible to seamlessly connect datasets and requiring time-consuming manual work. Our analysis revealed several datasets with unstructured data, including odd names, no instructions, and generally poor organisation. Examples of unstructured datasets: Traveline national dataset: Great Britain’s dataset, containing public transport timetables for bus, light rail, tram, and ferry services. It had a 300-page long schema that didn’t explain the dataset in words but instead used UML diagrams. Additionally, the schema was in zip folders that were not explicitly described, making it difficult to know where the relevant data was located. Ferry data: the timetables and operator statistics are provided by individual companies and not centralised. As a result, everything is formatted differently. For example, CalMac had monthly statistics while Northlink had yearly ones. For ferry timetable data, there was no open information, instead data was only available in PDFs. 3. Missing information Lastly, some datasets lacked the intended information, presenting a challenge when trying to connect different systems and carry out an analysis. An example of this issue was in the Traveline national dataset as outlined above. After exploring all its schema, we discovered that some timetables didn’t include the bus times at all, making the manual efforts redundant. Case in point: Challenges in rural Scotland’s public transport connectivity So, what is the effect of these data issues on residents in Scotland? After digitising Scotland’s transport schedule data internally, it became evident that residents in rural Scotland face significant challenges when attempting certain public transport journeys during the work week, particularly if their departure time is not precisely timed. This issue stems from the lack of accessible open data and established data practices, making it challenging to integrate information across different transport systems. You can find the full list of journeys we investigated here. Below we explore a few key examples. Edinburgh to Peterhead route Edinburgh to Peterhead route Distance by car: +/- 164 miles (longest from the three compared) Duration by car: +/- 3.25 hours As we looked at the east side of Scotland and the Edinburgh to Peterhead route in particular, transport scheduling didn’t look like it faced an issue. The waiting time when changing via Aberdeen is relatively short, regardless of which day of the week and time the traveller sets out. Plus, the average journey duration remains stable throughout the day and the week at around five hours. Inverness to Stromness route Inverness to Stromness route Distance by car: +/- 142 miles Duration by car: +/- 5 hours (includes a ferry) As we moved on to the Northwest of Scotland and the Inverness to Stromness route, which mileage-wise is shorter than the Edinburgh-Peterhead one, waiting time and average trip duration started to increase. It would take at least 6.5 hours to make the trip via public transport, but only if the traveller was to set out in the middle of the workday, at 14:00. Otherwise, the average trip duration fluctuates between 7.5 hours - 12.5 hours if the traveller sets out before 14:00 and skyrockets to over 20 hours if they choose to leave after 14:00, regardless of the day of the week. Tobermory to Glasgow route Tobermory to Glasgow route Distance by car: +/- 130 miles Duration by car: +/- 4 hours (includes a ferry) Finally, we examined the West side of Scotland from rural Tobermory to a large city like Glasgow and discovered a further increase in scheduling complexity. The fastest trip of the week takes a little under 6.5 hours, but if a traveller wants to leave after 16:00, on any day of the week, they won’t be able to reach Glasgow on the same day. Overall, waiting times between public transport connections throughout the week are quite high for this route, unless travellers set out at very specific times of day. Can Scotland’s transport scheduling issue be resolved? The discrepancies we encountered in Scotland's transport data directly impact the efficiency of public transport journeys, particularly in rural areas. The lack of seamless integration between various transport systems results in extended waiting times, making it evident that data interoperability is not just a theoretical concept but a tangible necessity for enhancing the lives of individuals relying on these services. To address these challenges and build a more efficient and connected transport system, a strategic approach to data engineering is imperative. Cleaning and organising datasets, ensuring they are updated regularly, and promoting standardised formats are crucial steps toward achieving true data maturity. Of course, Scotland’s geography presents a natural challenge, especially for reaching remote areas, particularly those in the West of the country. However, rather than accepting the current status quo, we should strive to enhance the transportation system, leveraging modern technologies and practices to overcome these obstacles. Improving frontline services through data maturity Our analysis of Scotland’s transport network data has shed light on the key barriers to achieving true data maturity and interoperability for organisations, not just within Scotland, but on a global scale. The challenges of outdated datasets, unstructured information, and missing data underline the critical need for a concerted effort to enhance open data practices. As we move forward, it is essential for stakeholders, including government bodies and transport organisations to recognise the transformative power of data interoperability. Investing in robust data engineering practices, fostering collaboration between various data providers, and promoting a culture of openness and transparency are key components in building Scotland's data future. Let’s work together towards this future that prioritises efficiency, connectivity, and improved service provision for all. Through strategic data management and collaboration, we can overcome the current barriers and unlock the full potential of open data, creating a transport network that truly serves the needs of the people. Stay connected with us Subscribe to receive the latest news, insights, and exclusive event invitations from Zühlke. Learn more Contributors: Charles Roadnight, Lead Data Engineer at Zühlke Sherri Chuah, Professional Data Engineer at Zühlke Tabitha Day, Professional Data Engineer at Zühlke Neelesh Sonawane, Principal UX Consultant at Zühlke Anna Ronco, Expert UX Designer at Zühlke Acknowledgements: Giuseppe Sollazzo, Head of Data Products and Services at the Department for Work and Pensions (former Head of Data at the Department for Transport) Contact person for United Kingdom Josh Parkinson Lead Data Consultant As a Data Consultant at Zühlke, Josh Parkinson helps businesses move to the next level in data, from developing new technological solutions to teaching businesses how to ask the right questions. Contact josh.parkinson@zuhlke.com Your message to us You must have JavaScript enabled to use this form. First Name Surname Email Phone Message Send message Leave this field blank Your message to us Thank you for your message.