Joining a BODS TfL TxC VehicleJourney to a SIRI Vehicle Activity

Hi there,

I have downloaded some SIRI from the Bus Open Data Service (BODS) TFLO dataset, as well as some related TFLO TxCs. I am attempting to identify the various VehicleJourney elements in the TxCs that some of the VehicleActivity elements, which are present in the SIRI, should be following.

I have done this before with other TxC/SIRI based operators and typically, but not always, one might achieve this by using the LineRef and JourneyCode fields in the SIRI, and joining it with the equivalent fields in the TxC.

I see that TFLO SIRI does not contain a JourneyCode extension. It does, however, contain a VehicleJourneyRef field. I have attempted to use this field to join to a VehicleJourney in a TxC but have been unsuccessful thus far.

My questions are: Is this a valid field in the SIRI to attempt to join to an equivalent TxC? If so, how might one achieve a one-to-one join with a VehicleJourney? If not, is there other fields I can use to achieve a one-to-one join with VehicleJourney?

Thanks very much in advance

Welcome @Brian_C

Can I run with the assumption you’re downloading https://tfl.gov.uk/tfl/syndication/feeds/journey-planner-timetables.zip ?

Could you please define TFLO and TxC please?

The main difference I have found between the TfL bus dataset is that they have the DestinationDisplay defined (what is shown on the bus “blinds”) which others don’t.

1 Like

Hi @briantist

Thank you for your reply, apologies, I should have been more specific and provided examples.

The SIRI feed I am referring to is the response retrievable from the BODS SIRI-VM api, apparently this can be limited to TfL related data by applying the operator reference TFLO as a parameter. An example response from this api can be seen using this url https://data.bus-data.dft.gov.uk/api/v1/datafeed?operatorRef=TFLO

By TxC I mean the TransXChange format for exchanging bus timetables. It is my understanding that TfL related TransXChange datasets recently became available on the bus open data platform. An example of TfL routes timetables in the TransXChange format can be seen by accessing this url Data set details

Usually, and it is the case with many TransXChange and SIRI-VM datasets available on the bus open data service, one can connect the vehicle in the SIRI-VM with the timetable it should be currently running by joining on certain fields available in both.

In the past I have used the JourneyCode field to achieve this join, but the SIRI-VM in that response does not contain a JourneyCode. My question is how should one connect a vehicle from the SIRI-VM response to the corresponding TransXChange timetable it is following.

It is possible I am misinterpreting how these datasets should be used, so any feedback/help is greatly appreciated.

@Brian_C

OK. The complicated bit is that that there are both JourneyPatternSections and JourneyPatternSectionsRefs which add a level of (useful) abstraction but aren’t really used much because in London a bus route is a bus route and the worst that might happen is that it turns back early!

My code for decoding a flle (from the ZIP file) therefore goes…

private function doExplorerSingleFile($filename)
{
    libxml_use_internal_errors(true);
    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = false;
    $dom->loadXML($this->cachedGetTransxchangeitemByFilename($filename));
    $this->TransXChange = $dom->getElementsByTagName("TransXChange");
    $this->extractStopPoints($dom);
    $Mode = $this->getServiceTypeForThis();
    if ($Mode == $this->strMode) {
        $StartDate = $this->getStartDate();
        $EndDate = $this->getEndDate();
        $Today = date("Y-m-d");
        echo "? $StartDate <=$Today <=-$EndDate ? ";
        if ($StartDate <= $Today and $Today <= $EndDate) {
            $LineName = $this->getLineNameForThis();
            $arrJourneyPatternSections = $this->extractJourneyPatternSections();
            $arrJourneyPatternSectionRefs = $this->extractJourneyPatternSectionsRefs();
            $this->extractVehicleJourneys($LineName, $arrJourneyPatternSections, $arrJourneyPatternSectionRefs);
            $this->extractNptgLocalities();
        } 
}

and the deep dive means you have to pull a section ref apart like this. It’s the JourneyPatternSectionsRefs that point to the JourneyPatternSections …

private function extractJourneyPatternSectionsRefs()
{
    $arrJourneyPatternSectionRefs = [];
    foreach ($this->TransXChange as $item1) {
        foreach ($item1->getElementsByTagName("Services") as $Services) {
            foreach ($Services->getElementsByTagName('Service') as $Service) {
                foreach ($Service->getElementsByTagName('StandardService') as $StandardService) {
                    foreach ($StandardService->getElementsByTagName('JourneyPattern') as $JourneyPattern) {
                        $JourneyPatternID = ($JourneyPattern->getAttribute("id"));
                        foreach ($JourneyPattern->getElementsByTagName('JourneyPatternSectionRefs') as $JourneyPatternSectionRefs) {
                            if (!isset($arrJourneyPatternSectionRefs[$JourneyPatternID])) {
                                $arrJourneyPatternSectionRefs[$JourneyPatternID] = [];
                            }
                            $arrJourneyPatternSectionRefs[$JourneyPatternID][] = $JourneyPatternSectionRefs->nodeValue;
                        }
                    }
                }
            }
        }
    }
    return $arrJourneyPatternSectionRefs;
}

Does this help?

1 Like

@briantist

Thanks again for replying, yes that was helpful. You are right, I’m essentially trying to find which JourneyPatternSection the bus should be following.

The only thing I am missing is way to link a Vehicle Activity, a.k.a. bus GPS coordinates from the SIRI-VM api response, to the JourneyPatternSection it should be following.

I will provide a screenshot below of an example of a non TfL SIRI-VM vehicle activity below

In the screenshot you can see that this particular VehicleActivity contains a LineRef element and a JourneyCode element. Using these elements one can then identify, from the TransXChange, the VehicleJourney it is following, and following that, the JourneyPatternRef, and JourneyPatternSection, like what you have done above in your code snippet.

However, the TfL SIRI-VM lacks a JourneyCode element, so I cannot achieve this link using this method. I will provide another screenshot of an example of the available fields

Is there a way of using any of those available fields to figure out the VehicleJourney / JourneyPatternSection / JourneyPatternSectionRef the bus plans on doing?

The presence of a VehicleJourneyRef suggests that this is the element one should use to achieve this link, but I have been unsuccessful in using that thus far.

Thanks again for your help

Brian C

Interesting extract. Not least because the Origin stop is the last stop northbound and the Destination stop is the first stop southbound. IIRC quoting the next stop (rather than the last stop) as Destination has been flagged up before but it does seem bizarre to show the bus between quite different journeys.

On the face of it, we also have:

  • a Lineref which has nothing to do with the actual route number (of course it is there as PublishedLineName and
  • a VehicleJourneyRef which seems to bear no relationship to the VehicleJourneyCode on the Journey Planner XML files in the Datastore zip file (thus no way of using the Ref to get to the JourneyPattern details).

I also note that grid references are used whereas the JP files use easting and northing,though that is hardly insuperable.

This sort of data extraction is above my pay grade (not that I’m paid anyway) but there does seem to be a lot of wheel reinvention going on here. There may well be good reasons (the VehicleJourneyCode may not be acceptable to BODS, for example).but wouldn’t it be nice if knowing the Ref enabled you to work out the Code and vice versa.

I know the format of JP files is going to change to a different version of TransXChange but that’s nothinbg to do with actual values used in fields.

@mjcarchive

Thanks for your comment, yes everything you have said there lines up with my attempts to link a vehicle activity in SIRI-VM to a TransXChange journey schedule. At this point I believe there might not be a definite way to join the two without making numerous assumptions.

But you are right, there may be various reasons for the data being represented this way. I believe BODS plan on introducing a SIRI-VM standard for data uploaded to their platform (they recently introduced a TxC standard in September that was very helpful), so perhaps I’ll wait until that comes in and re-evaluate then.

Regardless thank you @briantist and @mjcarchive for your help

1 Like

It might be helpful if someone at TfL involved could explain why these different codes have been deployed and whether there is any relationship at all between them.

The easy answer in such situations is often “make a lookup table”. Easy enough for the route/line but for vehicle journeys it would be changing every week. Judging from the, ahem, updating schedule for the bus stops file (which links the 4900… code with the internal BP… or whatever code) it would rarely be up to date.

Another answer would be for the Datastore files to show the VehicleJoujrneyRef supplied for BODS as well as the VehicleJourneyCode. I say “as well as” because, while I have mot checked this, I presume that the last part of VehicleJourneyCode relates to the trip number as recorded on the Working Timetables.

Obviously operational needs come first but in a world where apps have been built around open data it would be nice to know that the interests of those building such apps are playing a part, whether it be BODS, Journey Planner. Countdown, the WTTs…

I can’t help thinking that it must cause issues within TfL, or between TfL and operators, when different codes are used for what is essentially the same thing.

@mjcarchive @Brian_C

I guess, as I’m about to look at the nation ones of these from API Service again this week, that there deliberately isn’t a link between the timetabled bus and the operational bus because, unlike train operators, buses are allocated in bus depots on a first-come-first-served basis.

There is no need for a bus operator to pre-describe which bus (indexed with the reg number) will run a given route, as they can often be used interoperable. Few places have decals on the outside of the buses like Brighton & Hove buses do showing a route number.

image

Trains, however, have their unit numbers tightly controlled (some operators treat these allocations like some great secret!) and so the given unit will be kept overnight in a specific “road numbers”

image

Of course this is 100% unhelpful for us people who want to tell the public where their bus is.

The upside outcome for the operators is it’s very hard to say which services are not keeping to time.

Next: I will write code to work out how accurate weather forecasts turned out to be.

1 Like

@briantist

That’s a very interesting insight! I suppose it makes sense from an operational point-of-view when you put it like that.

It’s just a pity that it makes it quite difficult to match bus with schedule for research/analysis :’(

Interestingly, RTIG had a talk yesterday explaining the SIRI-VM standard they plan on introducing. I’ll link the video here for future reference UK PTI SIRI VM & Data Matching Q&A - 20 December 2021 - YouTube

At around 10:35 in the video they start defining the link from SIRI-VM to TxC, and then at 11:15 we can see that they plan on making VehicleJourneyRef in SIRI-VM the join field with VehicleJourneyCode in the TxC.

So hopefully this issue might be solved whenever this standard comes in, fingers crossed anyways! :crossed_fingers:

1 Like

@Brian_C

Thanks for that. I will look at later today.

I note with interest that the current version of SIRI-VM used by the data.bus-data.dft.gov.uk system no longer supports the use of HTTP Digest (from rfc2617), so you are only able to grab snapshots of the current state, and not maintain a connection of updates.

The original version I came was the Ticketer one (https://siri.ticketer.org.uk/api/vm) and this worked as a maintained connection (like the NRE Darwin Push Port, which is STOMP).

@briantist

Yes that is correct, with the current implementation on the bus open data service one must do something like poll that api endpoint regularly and store the response to build up a sequence of vehicle activities.

1 Like

@Brian_C

It is evident from that webinar that an awful lot is riding upon the ability to match VehicleJourneyRef with VehicleJourneyCode.so it follows that at some stage the two will carry the same information, just with a different label attached.

Later on in the webinar it is clearly stated that the actual form of the data is not to be standardised - at least not yet - to make life easier for operators by enabling to use the same code they already use for TxC. It would seem to follow that TfL could quite happily use its existing VJ… codes but has chosen not to. Perhaps early guidance from DfT was to move to something more standardised, who knows.

The implication is that whatever TfL use will have to be consistent between the two data flows. With that and block number the DfT team seem to think that data matching will be OK. The main benefit of block number appears to be making predictions involving the bus’s next journey (so, for example, you don’t issue timetable-based predictions when the incoming bus is half an hour behind schedule). Block number id not mandatory but may become so eventually.

One final point on the data that I picked up. The destination field is definitely meant to refer to the final stop on the journey, not the next stop, which seems to be what the examples extracted from TfL input to BODS is showing.

There is a move to a later version of TxC coming up. Maybe any changes associated with making it BODS-compatible will be bundled up with that, though there would seem to be no logical necessity to do so. As always, advance warning of any detail would be helpful.

Even when it *****ing well should be. Why do I need to do this to decode the bus schedule days to a seven-bit bitmap?

class DayWordsToBitmap
{
const SINGLEWORDDECODER = [
    "monday" => ViewBusStopTimetables::MONDAY,
    "tuesday" => ViewBusStopTimetables::TUESDAY,
    "wednesday" => ViewBusStopTimetables::WEDNESDAY,
    "thursday" => ViewBusStopTimetables::THURSDAY,
    "friday" => ViewBusStopTimetables::FRIDAY,
    "saturday" => ViewBusStopTimetables::SATURDAY,
    "sunday" => ViewBusStopTimetables::SUNDAY];

public static function dayWordsToBitmap(string $id): int
{
    $intCalculated = self::dayWordsToBitmapUsingCommaSeperatedList($id);
    if ($intCalculated === 0) {
        $intCalculated = self::dayWordsToBitmapOriginal($id);
    }
    return $intCalculated;
}

public static function dayWordsToBitmapUsingCommaSeperatedList(string $id)
{
    $intCalculated = 0;
    $arrWords = preg_split("/,/", $id);
    foreach ($arrWords as $word) {
        $word = trim(strtolower($word));
        if (!isset(self::SINGLEWORDDECODER[$word])) {
            return 0;
        } else {
            $intCalculated = $intCalculated | pow(2, self::SINGLEWORDDECODER[$word]);
        }
    }
    return $intCalculated;
}

public static function dayWordsToBitmapOriginal(string $id)
{
    switch ($id) {
        // 1 (for Monday) through 7 (for Sunday)
        case "Monday":
        case "Sunday Night/Monday Morning":
        case "Bank Holidays":
            return pow(2, ViewBusStopTimetables::MONDAY);
        case "Tuesday":
            return pow(2, ViewBusStopTimetables::TUESDAY);
        case "Wednesday":
            return pow(2, ViewBusStopTimetables::WEDNESDAY);
        case "Thursday":
            return pow(2, ViewBusStopTimetables::THURSDAY);
        case "Friday":
        case "School Friday":
        case "Non-School Friday":
            return pow(2, ViewBusStopTimetables::FRIDAY);
        case "Friday Night/Saturday Morning":
        case "Saturday":
            return pow(2, ViewBusStopTimetables::SATURDAY);
        case "Saturday Night/Sunday Morning":
        case "Sunday":
            return pow(2, ViewBusStopTimetables::SUNDAY);
        case "Monday to Friday":
        case "Mon-Fri Schooldays":
        case "Summer Monday to Friday Non-Schooldays":
        case "Mon-Fri Non-Schooldays":
        case "Mo-Fr Night/Tu-Sat Morning":
        case "Monday - Friday":
        case "MondayToFriday":
        case "Monday,Tuesday,Wednesday,Thursday,Friday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::FRIDAY);
        case "Monday to Thursday":
        case "Mo-Th Nights/Tu-Fr Morning":
        case "Mon-Th Schooldays":
        case "Mon-Th Non-Schooldays":
        case "Monday -Thursday Non-Schooldays":
        case "Monday,Tuesday,Wednesday,Thursday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY);
        case "Monday,Tuesday,Wednesday,Friday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::FRIDAY);
        case "Monday,Tuesday,Thursday,Friday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::FRIDAY);
        case "Tuesday, Wednesday & Thursday":
        case "Tuesday,Wednesday,Thursday":
            return pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY);
        case "Tuesday,Wednesday,Thursday,Friday":
            return pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::FRIDAY);
        case "Tuesday,Wednesday,Thursday,Friday,Saturday":
            return pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::FRIDAY) | pow(2, ViewBusStopTimetables::SATURDAY);
        case "Monday,Tuesday,Wednesday,Thursday,Sunday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::SUNDAY);
        case "MondayToSunday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::FRIDAY) | pow(2, ViewBusStopTimetables::SATURDAY) | pow(2, ViewBusStopTimetables::SUNDAY);
        case "Monday,Tuesday,Wednesday,Thursday,Friday,Saturday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::FRIDAY) | pow(2, ViewBusStopTimetables::SATURDAY);
        case "Wednesday,Thursday,Friday":
            return pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::FRIDAY);
        case "Wednesday,Friday":
            return pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::FRIDAY);
        case "Saturday,Sunday":
            return pow(2, ViewBusStopTimetables::SATURDAY) | pow(2, ViewBusStopTimetables::SUNDAY);
        case "Monday,Tuesday,Wednesday,Thursday,Friday,Sunday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY) | pow(2, ViewBusStopTimetables::THURSDAY) | pow(2, ViewBusStopTimetables::FRIDAY) | pow(2, ViewBusStopTimetables::SUNDAY);
        case "Monday,Wednesday":
            return pow(2, ViewBusStopTimetables::MONDAY) | pow(2, ViewBusStopTimetables::WEDNESDAY);
        case "Tuesday,Thursday":
            return pow(2, ViewBusStopTimetables::TUESDAY) | pow(2, ViewBusStopTimetables::THURSDAY);
        case "":
            return 0;
        default:
            ReportAllErrorsToCloudwatch::dieToCloudWatch(__FUNCTION__ . " " . $id, __CLASS__);
    }
}

}

@mjcarchive

Yes you’re right, I completed watching the rest of the video after commenting and as you say there is still quite a bit more allowance in the standard than there ideally should be. At least for now anyways.

I agree with you in terms of using block id as the join field, it’s better than nothing I suppose and could allow one to interpolate journeys in some way, but still lacks the definite certainty of what pattern of stops the bus is following.

I suppose all we can do is wait and see how this plays out

Not standardising the vehicle journey ref is seen as removing a barrier to some small operators (or their agents) using the system, or at least leaving them feeling they have been listened to and thus happier bunnies. It may sound patronising but that sort of thing can be important in achieving change more quickly and with cleaner data. Small operators forma high percentage of the operator count (though as with all business demography they are responsible for a much smaller percentage of activity.

Similarly on block numbers as the smallest operators may have no need for them, particularly if they only operate a few isolated local bus journeys. Don’t needlessly hack off your data suppliers if you don’t want GIGO!

TBH I can see CJ references becoming standardised more readily than I can see block numbers becoming mandatory because a code is just a code and changing the format over time is fairly painless.

1 Like