Duplicate trips in TransXChange data


#1

Hi,

In the data available from http://data.tfl.gov.uk/tfl/syndication/feeds/journey-planner-timetables.zip, I have noticed that some trips seem to be duplicated/overlap across the individual xml files.

A specific example from the most recent dataset (made available to us on 2017-09-06):

On the “Metropolitan” line, for the “Chesham - Wembley Park outbound” route, departing at 23:57:00, I can see two separate definitions, each in a separate xml file.

  • Except for the ids, the JourneyPatternSection of each is identical.
  • Each VehicleJourney operating profile specifies Wednesday and the day of week.
  • In one file, the Service operating period is 2017-09-13 until 2017-09-13.
  • In the other file, the Service operating period is 2017-09-02 until 2017-12-23
  • This appears to create a duplicate trip on the date of 2017-09-13 for the trip starting at 23:57:00

This is happening in many other places. This is just a single example. And it has been like this for a long time in all of the previous datasets. I am missing something that might explain this?

Here is an edited version of some of the xml highlighting the relevant parts of the two files for the above example:

tfl_1-MET-_-y05-1607900.xml (relevant lines only):

    <Route id="R_1-MET-_-y05-1607900-O-7">
      <Description>Chesham - Wembley Park</Description>
      ...
    </Route>

    <VehicleJourney>
      <PrivateCode>tfl-1-MET-_-y05-1607900-41-TA</PrivateCode>
      <OperatingProfile>
        <RegularDayType>
          <DaysOfWeek>
            <Wednesday />
          </DaysOfWeek>
        </RegularDayType>
      </OperatingProfile>
      <VehicleJourneyCode>VJ_1-MET-_-y05-1607900-41-TA</VehicleJourneyCode>
      ...
      <DepartureTime>23:57:00</DepartureTime>
    </VehicleJourney>

    <Service>
      <PrivateCode>1-MET-_-y05-1607900</PrivateCode>
      <OperatingPeriod>
        <StartDate>2017-09-13</StartDate>
        <EndDate>2017-09-13</EndDate>
      </OperatingPeriod>
      <OperatingProfile>
        <RegularDayType>
          <DaysOfWeek>
            <Wednesday />
            <Thursday />
          </DaysOfWeek>
        </RegularDayType>
      </OperatingProfile>
      ...
    </Service>

tfl_1-MET-_-y05-3400500.xml (relevant lines only):

    <Route id="R_1-MET-_-y05-3400500-O-8">
      <Description>Chesham - Wembley Park</Description>
      ...
    </Route>


    <VehicleJourney>
      <PrivateCode>tfl-1-MET-_-y05-3400500-236-UU</PrivateCode>
      <OperatingProfile>
        <RegularDayType>
          <DaysOfWeek>
            <Wednesday />
          </DaysOfWeek>
        </RegularDayType>
      </OperatingProfile>
      <VehicleJourneyCode>VJ_1-MET-_-y05-3400500-236-UU</VehicleJourneyCode>
      ...
      <DepartureTime>23:57:00</DepartureTime>
    </VehicleJourney>

    <Service>
      <PrivateCode>1-MET-_-y05-3400500</PrivateCode>
      <OperatingPeriod>
        <StartDate>2017-09-02</StartDate>
        <EndDate>2017-12-23</EndDate>
      </OperatingPeriod>
      <OperatingProfile>
        <RegularDayType>
          <DaysOfWeek>
            <MondayToSunday />
          </DaysOfWeek>
        </RegularDayType>
      </OperatingProfile>
      ...
    </Service>

Thanks,
Dean


#2

@jamesevans is this one you can as the JP Team about?


#3

Hi Dean,

The Journey Planner team create a number of timetables for each line. For instance, the 2nd one you have mentioned 3400500.xml is what we refer to as the base timetable. This is the business-as-usual timetable. As you can see, this has the longer running operating period.

The other timetable you have mentioned is a supplementary timetable. The example you have given is only valid on the 13th September. In this example you have picked, this is the timetable that caters for Spurs playing their match at Wembley. The main change to this timetable is that the Metropolitan Line’s evening closures are not implemented on matchdays. This will mean that this timetable will override the base timetable for that day. The same train service can exist in both files.

Other timetables you will see in the dataset for the Met are most likely related to engineering works. The Journey Planner application and the Unified API uses some logic based on the Operating Period field to determine which timetable is valid on the current day. I would advise that you treat the supplementary timetables as overriding the base timetable on their dates of operation.

Thanks,
James
Technology Service Operations.


#4

Thanks for the info @jamesevans

I’ve had a crack at implementing the supplementary timetable overrides. But I’m having trouble reliably identifying which of the .xml files are supplementary and which are base timetables.

Or more specifically, how to group the files together in such a way that there will only be one base timetable and its supplementaries in each group.

What is the logic you are using internally to determine these groups?

All of the IDs seem to be different across xml files. The only consistent thing to link them together seems to be the Service -> LineName.

In particular, on the Central line, I have:

tfl_1-CEN-_-y05-690544.xml

		<Service>
			<ServiceCode>1-CEN-_-y05-690544</ServiceCode>
			<PrivateCode>1-CEN-_-y05-690544</PrivateCode>
			<Lines>
				<Line id="1-CEN-_-y05-690544">
					<LineName>Central</LineName>
				</Line>
			</Lines>
			<OperatingPeriod>
				<StartDate>2017-09-16</StartDate>
				<EndDate>2017-12-23</EndDate>
			</OperatingPeriod>
			<OperatingProfile>
				<RegularDayType>
					<DaysOfWeek>
						<MondayToSunday />
					</DaysOfWeek>
				</RegularDayType>
			</OperatingProfile>
			<RegisteredOperatorRef>OId_LUL</RegisteredOperatorRef>
			<Mode>underground</Mode>
			<Description>West Ruislip/Ealing Broadway - Liverpool Street - Hainault/Woodford/Epping</Description>
		</Service>

and also:

tfl_1-CEN-_-y05-690877.xml

		<Service>
			<ServiceCode>1-CEN-_-y05-690877</ServiceCode>
			<PrivateCode>1-CEN-_-y05-690877</PrivateCode>
			<Lines>
				<Line id="1-CEN-_-y05-690877">
					<LineName>Central</LineName>
				</Line>
			</Lines>
			<OperatingPeriod>
				<StartDate>2017-10-09</StartDate>
				<EndDate>2017-12-23</EndDate>
			</OperatingPeriod>
			<OperatingProfile>
				<RegularDayType>
					<DaysOfWeek>
						<MondayToSunday />
					</DaysOfWeek>
				</RegularDayType>
			</OperatingProfile>
			<RegisteredOperatorRef>OId_LUL</RegisteredOperatorRef>
			<Mode>underground</Mode>
			<Description>Ealing Broadway/West Ruislip - Liverpool Street - Epping/Hainault/Woodford</Description>
		</Service>

These two services overlap quite significantly, but neither seems to be a supplementary timetable?


#5

Has there been any solution to this? I have found a similar issue on the Piccadilly Line. file tfl_1-PIC--y05-580800.xml has a start and end dates of 2017-10-21 and 2017-12-23 but the data appears to be old. there is file tfl_1-PIC--y05-1035255.xml that has start and end dates of 2017-10-21 and 2017-12-22. This seems to match what is on the unified api and the TFL Journey Planner. This is based on looking at trains leaving Heathrow Terminal 5, the start times are just a few minutes apart (5:58am on the bad file, and 5:56am on the good). What data should I use to determine the correct file without manually checking?