Thanks for the reply. I would not (yet) describe myself as a developer in any meaningful sense! In a sense this is more about what computer-literate non-developers might do with resources such as collections of pdf files, which might be update weekly, or perhaps only occasionally. Any particular update might replace or add just a small number of files, I’m not sure this is the best place to to be raising this, as it is not about interacting with the API, but here goes anyway.
An example is the collection of spider maps at https://tfl.gov.uk/maps_/bus-spider-maps. These could be linked to from community websites, for example. A cursory inspection of the pages for individual boroughs shows that the naming convention is that there is, er, no naming convention! Possibly because different potential user needs pull in different direction:-
- Some include a date, though not necessarily in the same format each time, some do not, If they include a date then any link set up from outside is going to fail when a spider map is replaced because the file name will have changed.
- On the other hand, including the date in all file names as a matter of convention makes it easy for anyone looking at the page to see whether the file they want has changed without needing to download it. Then you get the geek community (guilty, m’lud) who may want to keep copies of old files as well as download new. Hence the interest in being able to spider the pages.
Of course, simply putting up a directory listing somewhere would enable new/changed files to be identified quickly, even if the file names are unchanged, as long as the true creation dates are shown.
Which appears not to be the case for the second example, the bus working timetables available via https://tfl.gov.uk/corporate/publications-and-reports/bus-schedules. These do keep the same names when replaced, unlike the spider maps, but for anyone who simply want to know what is new, it is “needle in haystack” time. This resource is updated weekly and it appears (from downloading) that files always have the date of the latest update as the creation date; in other words the entire resource appears to be recreated every week. The metadata for each PDF file do contain information that shows a file is new (the Title field under PDF Information under Properties) but to access this you have to download the file anyway! Any directory listing would have to include this piece of metadata to be of any use.
I am not suggesting that there is a compelling business need to address either of these examples. But I do feel sometimes that the more old-fashioned ways of getting at and using data get overlooked as more and more sophisticated methods are developed. I’m a great enthusiast for the thousand flowers which open data have enabled to bloom but that can embrace low as well as high tech approaches.