cátedras abertis | cátedras abertis...la cátedra abertis de la universidad politécnica de...

121
sobre gestión de infraestructuras del transporte Premio Internacional Abertis de investigación 13 - ESPAÑA - Traffic parameters estimation from the analysis of connected car data Meritxell Pacheco Paneque

Upload: others

Post on 20-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

sobre gestión deinfraestructurasd e l t r a n s p o r t e

Premio Internacional

Abertis de investigación

13- ESPAÑA -

Traffic parameters estimation from the analysis of connected car data

Meritxell Pacheco Paneque

Page 2: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Pórtico

La red internacional de Cátedra Abertis convoca un año más, junto a prestigiosas universidades, los premios que reconocen a los mejores trabajos de final de carrera, tesinas o tesis doctorales relacionadas con la gestión de infraestructuras de transporte, desarrollados por universitarios de los distintos países en los que opera el Grupo Abertis.

A partir de la creación en el año 2003 de la primera Cátedra Abertis, su presencia internacional ha ido creciendo y constatando el compromiso de la compañía con el mundo académico y contribuyendo a la investigación sobre la repercusión de las grandes obras en el territorio, a la vez que esto permite una mejora en la calidad de vida de sus habitantes.

La Red Internacional de Cátedras Abertis está presente en España, Francia, Puerto Rico, Chile y Brasil, en colaboración con las siguientes universidades: Universitat Politècnica de Catalunya-BarcelonaTech (Barcelona, España); IFSTTAR, École des Ponts–ParisTech, Fondation des Ponts (París, Francia); Universidad de Puerto Rico (San Juan, Puerto Rico); Pontificia Universidad Católica de Chile (Santiago, Chile); y, Universidad de São Paulo (São Paulo, Brasil).

Este modelo de gestión del conocimiento tiene su origen en la firme voluntad de Abertis de colaborar con las universidades, los centros de excelencia y los expertos más destacados en cada materia con el fin de ayudar a generar y a divulgar el conocimiento, poniéndolo al servicio de la investigación y de toda la sociedad. El trabajo distinguido por los Premios Abertis de investigación que ahora tiene en sus manos, quiere ser una muestra más de esta vocación de servicio a los investigadores, a la comunidad educativa y de los profesionales con responsabilidades en el campo dela gestión de las infraestructuras.

Esta visión, que se integra en la responsabilidad social del Grupo Abertis, aspira también a ofrecer vías de progreso, de colaboración, de diálogo y de interacción en todos los territorios en los está presente, ayudando a desarrollar de forma responsable y sostenible las actividades del Grupo.

Page 3: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Presentación

La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación sobre la gestión de infraestructuras del transporte estructurada en los ejes de actividad de la corporación: carreteras y autopistas, tráfico, seguridad vial y sistemas de transporte inteligentes.

Asimismo, con objeto de potenciar el interés de los universitarios españoles, la Cátedra Abertis establece anualmente el Premio Abertis, al mejor trabajo de investigación inédito en gestión del transporte realizado por estudiantes en España. Existen otras Cátedras Abertis similares en otros países como Francia, Puerto Rico, Chile y Brasil.

En la treceava convocatoria de 2015, se presentaron treinta y siete candidatos, todos ellos de elevada calidad. En la categoría de tesinas se presentaron veinte contribuciones relacionadas con el análisis de los costes del transporte intermodal de mercancías, la racionalización de la inversión de la línea de alta velocidad ferroviaria Palencia-Santander, accesos a Barcelona, recuperación de retrasos en compañías aéreas, diseño geométrico de carreteras convencionales, optimización de transporte escolar en Cantabria, propagación de retrasos en la red aeroportuaria española, ayudas europeas y colaboración Público-Privada en infraestructuras de transporte, redes de transporte colectivo y geografía, implantación Sistema Europeo de Telepeaje, PPP en Indonesia, accidentalidad en túneles de carretera españoles, pago por uso de carreteras, gestión y configuración de pistas del aeropuerto de BCN-El Prat, una APP para mejorar la conducción, evaluación de los sistemas de subvención a residentes en transporte aéreo, estimación de parámetros de tráfico utilizando vehículos cooperativos, efectos de la velocidad dinámica en una autopista holandesa, análisis del sistema de gestión aeroportuario en España y un sistema multi agente de ayuda a la toma de decisiones para gestionar los flujos en cruces de las ciudades.

Ha resultado ganadora del XIII Premio Abertis 2015 el trabajo final de máster “Traffic parameters estimation from the analysis of connected car data” de la Sra. Meritxell Pacheco Paneque, Licenciada en Matemáticas por la Universitat Politècnica de Catalunya, BarcelonaTech. La teoría del tráfico está fundamentada en el análisis de las trayectorias espacio-temporales de los vehículos. Su aplicación práctica, sin embargo, se ha visto limitada por la información disponible, generalmente datos agregados del flujo, velocidad y ocupación. Los vehículos cooperativos dotados de capacidades comunicativas con otros vehículos y/o con la infraestructura suponen un paso adelante, ya que permiten obtener información espacio-temporal detallada sobre sus posiciones, a la vez que permiten el rastreo de los vehículos a su alrededor. En esta tesina se describe y analiza un experimento realizado en Barcelona con una flota reducida de vehículos. La información capturada por éstos se usa para definir y calibrar un modelo de microsimulación para generar datos a gran escala, que a su vez permiten investigar y evaluar metodologías para la estimación de las variables fundamentales del tráfico a partir de las definiciones de Eddie.

Prof. Francesc RobustéDirector de la Cátedra Abertis-UPC

Page 4: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Universitat Politècnica de Catalunya

ETSEIB - ETSECCPB

Treball Final de Màster Màster en Logística, Transport i Mobilitat

Traffic parameters estimation from the analysis of connected car data

MEMÒRIA

Autor: Directors:

Convocatòria:

Meritxell Pacheco Paneque Jaume Barceló Bugeda, Lídia Montero Mercadé

Setembre 2015

Page 5: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Disclaimer

Publications about the content of this work require the written consent of Volkswagen AG.

The results, opinions and conclusions of expressed in this thesis are not necessarily those ofVolkswagen AG.

Page 6: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data i

AbstractTraffic flow theory is grounded on the theoretical analysis of time-space vehicle trajec-

tories to describe the movement of individual vehicles and the relative movement of theneighbour vehicles. Nevertheless, this theoretical point of view has rarely been consideredin the empirical analysis of traffic data, since the available information was typically theaggregated spot measurements of flows, speeds and occupancies provided by inductive loopdetectors. This limitation begins to be overcome with the advances in ICT applications. Inparticular, the vehicles known as probe vehicles are able to acquire wide-ranging and spa-tiotemporally detailed information about their positions via GPS, but they can not directlysupply volume-related variables such as flow and density. Cooperative cars with commu-nication capabilities with other vehicles and/or the infrastructure represent a step forwardbecause they also allow to track the vehicles in their surrounding. In this project we de-scribe and analyze the experiment conducted in Barcelona with a small fleet of cooperativevehicles equipped with a set of radars that identify the vehicles within their detection zone.The gathered data were used to build and calibrate the emulation of the functions of suchvehicles in a microscopic simulation model, which allowed to emulate fleet data on a largescale that goes far beyond what the reduced fleet of vehicles could capture. Finally, thisdata leads us to explore and evaluate methodological approaches for the estimation of thefundamental traffic variables, namely the flow, density and average speed, based on Edie’sdefinitions.

ResumLa teoria del transit esta fonamentada en l’analisi teorica de les trajectrories espai-

temporals dels vehicles per tal de descriure el moviment dels vehicles individualment i elmoviment relatiu dels vehicles veıns. Aquest punt de vista teoric, pero, ha estat raramentconsiderat a l’analisi empırica de les dades relatives al transit, ja que la informacio disponiblenormalment feia referencia a les mesures puntuals agregades del flux, velocitat i ocupacioproporcionades per detectors de bucle d’induccio. Aquesta limitacio esta comencant a essersuperada gracies als avencos en les aplicacions TIC. Concretament, els vehicles coneguts coma probe vehicles son capacos d’obtenir informacio espai-temporal detallada de gran abastsobre les seves posicions via GPS, pero no poden oferir variables relatives al volum comara el flux i la densitat. Els vehicles cooperatius amb capacitats comunicatives amb altresvehicles i/o amb la infraestructura representen un pas endavant perque tambe permeten elrastreig dels vehicles al seu voltant. En aquest projecte es descriu i s’analitza l’experimentrealitzat a Barcelona amb una petita flota de vehicles cooperatius equipats amb uns certsradars que identifiquen els vehicles dintre de la seva area de deteccio. La informacio recollidaes va emprar per a la definicio i calibracio de l’emulacio de les funcions d’aquest tipus devehicles en un model de microsimulacio, cosa que ens va permetre l’emulacio d’informacioa gran escala que va mes enlla del que la flota de petit tamany podia capturar. Finalment,aquesta informacio ens porta a investigar i evaluar metodologies per a l’estimacio de lesvariables fonamentals, concretament el flux, la densitat i la velocitat mitjana, en base a lesdefinicions d’Edie.

Master en Logıstica, Transport i Mobilitat

Page 7: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

ii Master thesis

Master en Logıstica, Transport i Mobilitat

Page 8: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data iii

Contents

List of Figures vii

List of Tables xi

Glossary xiii

1 Introduction 1

1.1 Motivation: Connected Car project . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Objectives and project scope . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Project overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 State of the Art 7

2.1 Traffic flow theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Microscopic and macroscopic characteristics . . . . . . . . . . . . . . 8

2.1.2 Performance indicators . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.3 Traffic flow regimes and fundamental diagrams . . . . . . . . . . . . 20

2.2 Measurement issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Probe vehicle data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3.1 Stakeholders and supply chain analysis . . . . . . . . . . . . . . . . . 28

2.3.2 Probe vehicle data collection approaches . . . . . . . . . . . . . . . . 29

2.3.3 User privacy protection . . . . . . . . . . . . . . . . . . . . . . . . . 31

Master en Logıstica, Transport i Mobilitat

Page 9: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

iv Master thesis

2.3.4 Probe vehicle data deployments . . . . . . . . . . . . . . . . . . . . . 31

2.4 Traffic simulation and goodness-of-fit measures . . . . . . . . . . . . . . . . 33

3 Preliminary analysis 37

3.1 Technical specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Data collection design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Methodology 47

4.1 Reference estimation method . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.1 Discussions on the method . . . . . . . . . . . . . . . . . . . . . . . 51

4.1.2 Field experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Previous considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.1 Preliminary adaptations . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2.2 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3 Leader approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.4 Leader-Follower approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.5 Extended approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 Results 77

5.1 Simulation experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.2 Leader approach analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2.1 Penetration rate 5% - Time window length 91 s . . . . . . . . . . . . 79

5.2.2 Penetration rate 10% - Time window length 182 s . . . . . . . . . . 81

5.2.3 Goodness-of-fit measures . . . . . . . . . . . . . . . . . . . . . . . . 84

5.3 Leader-Follower approach analysis . . . . . . . . . . . . . . . . . . . . . . . 85

5.3.1 Penetration rate 5% - Time window length 364 s . . . . . . . . . . . 85

Master en Logıstica, Transport i Mobilitat

Page 10: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data v

5.3.2 Penetration rate 10% - Time window length 91 s . . . . . . . . . . . 87

5.3.3 Goodness-of-fit measures . . . . . . . . . . . . . . . . . . . . . . . . 89

5.4 Extended approach analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.1 Penetration rate 5% - Time window length 182 s . . . . . . . . . . . 90

5.4.2 Penetration rate 10% - Time window length 364 s . . . . . . . . . . 92

5.4.3 Goodness-of-fit measures . . . . . . . . . . . . . . . . . . . . . . . . 95

6 Conclusions 97

Bibliography 102

Master en Logıstica, Transport i Mobilitat

Page 11: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

vi Master thesis

Master en Logıstica, Transport i Mobilitat

Page 12: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data vii

List of Figures

2.1 Two consecutive vehicles in the same lane in a traffic stream. Source:(Maerivoet and Moor, 2005) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 A time space diagram showing two vehicle trajectories and both the spaceand time headway. Source: (Maerivoet and Moor, 2005) . . . . . . . . . . 9

2.3 A time-space diagram showing several vehicle trajectories and three measure-ment regions: Rt, Rs and Rt,s. Source: (Maerivoet and Moor, 2005) . . . . 10

2.4 A fundamental diagram relating the density k and the space-mean speed vs. 22

2.5 A fundamental diagram relating the density k to the flow q. Source: (Maerivoetand Moor, 2005) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6 A fundamental diagram relating the flow q to the space-mean speed vs.Source: (Maerivoet and Moor, 2005) . . . . . . . . . . . . . . . . . . . . . 23

3.1 Radars location (grey circles) and surrounding awareness of a probe vehicle(in red) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Itinerary 2 on a street map. Source: Google Maps . . . . . . . . . . . . . . 41

3.3 Trajectory plot (Itinerary 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4 Speed plot (Itinerary 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5 Acceleration plot (Itinerary 2) . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.6 Heading plot (Itinerary 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.7 Distribution of the speed of the equipped and observed vehicles per hour(Itinerary 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Master en Logıstica, Transport i Mobilitat

Page 13: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

viii Master thesis

3.8 Distribution of the acceleration of the equipped and observed vehicles perhour (Itinerary 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Time-space region A, trajectories n = 5, n = 8 and n = 12 corresponding toprobe vehicles. Source: (Seo et al., 2015) . . . . . . . . . . . . . . . . . . . 49

4.2 Sketch of the discretisation of the targeted time-space region. . . . . . . . . 54

4.3 Sketch of a discrete time-space region . . . . . . . . . . . . . . . . . . . . . 55

4.4 Sketch of a discrete time-space region . . . . . . . . . . . . . . . . . . . . . 56

4.5 Case (0): complete intermediate missing data . . . . . . . . . . . . . . . . . 59

4.6 Different cases for data availability for two consecutive timestamps . . . . . 61

4.7 Equipped and leading vehicles during a motionless period . . . . . . . . . . 62

4.8 Trajectory plot at section 372 and time window 2 . . . . . . . . . . . . . . . 65

4.9 Case (0): examples for the complete process for missing data . . . . . . . . 66

4.10 Subcases for data availability in Case (1) . . . . . . . . . . . . . . . . . . . . 67

4.11 Subcases for data availability in Case (2) . . . . . . . . . . . . . . . . . . . . 68

4.12 Subcases for data availability in Case (3) . . . . . . . . . . . . . . . . . . . . 69

4.13 Subcases for data availability in Case (4) . . . . . . . . . . . . . . . . . . . . 70

4.14 Observed vehicles by 2 probe vehicles for certain section and timestamp . . 71

4.15 Equipped and observed for a certain timestamp t within a discrete time-spaceregion, where red cars correspond to equipped vehicles . . . . . . . . . . . . 72

4.16 Trajectories for each occupied lane in Figure 4.15 . . . . . . . . . . . . . . . 72

4.17 Observed vehicles classified into the rear and front set . . . . . . . . . . . . 74

5.1 Flow visualization for the Leader approach with a penetration rate of 5% anda time window length of 91 s . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2 Density visualization for the Leader approach with a penetration rate of 5%and a time window length of 91 s . . . . . . . . . . . . . . . . . . . . . . . . 80

5.3 Speed visualization for the Leader approach with a penetration rate of 5%and a time window length of 91 s . . . . . . . . . . . . . . . . . . . . . . . . 80

Master en Logıstica, Transport i Mobilitat

Page 14: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data ix

5.4 RMSNE heatmap for the Leader approach with a penetration rate of 5% anda time window length of 91 s . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.5 Flow visualization for the Leader approach with a penetration rate of 10%and a time window length of 182 s . . . . . . . . . . . . . . . . . . . . . . . 82

5.6 Density visualization for the Leader approach with a penetration rate of 10%and a time window length of 182 s . . . . . . . . . . . . . . . . . . . . . . . 82

5.7 Speed visualization for the Leader approach with a penetration rate of 10%and a time window length of 182 s . . . . . . . . . . . . . . . . . . . . . . . 83

5.8 RMSNE heatmap for the Leader approach with a penetration rate of 10%and a time window length of 182 s . . . . . . . . . . . . . . . . . . . . . . . 83

5.9 Flow visualization for the Leader-Follower approach with a penetration rateof 5% and a time window length of 364 s . . . . . . . . . . . . . . . . . . . . 85

5.10 Density visualization for the Leader-Follower approach with a penetrationrate of 5% and a time window length of 364 s . . . . . . . . . . . . . . . . . 86

5.11 Speed visualization for the Leader-Follower approach with a penetration rateof 5% and a time window length of 364 s . . . . . . . . . . . . . . . . . . . . 86

5.12 RMSNE heatmap for the Leader-Follower approach with a penetration rateof 5% and a time window length of 364 s . . . . . . . . . . . . . . . . . . . . 87

5.13 Flow visualization for the Leader-Follower approach with a penetration rateof 10% and a time window length of 91 s . . . . . . . . . . . . . . . . . . . . 88

5.14 Density visualization for the Leader-Follower approach with a penetrationrate of 10% and a time window length of 91 s . . . . . . . . . . . . . . . . . 88

5.15 Speed visualization for the Leader-Follower approach with a penetration rateof 10% and a time window length of 91 s . . . . . . . . . . . . . . . . . . . . 89

5.16 RMSNE heatmap for the Leader-Follower approach with a penetration rateof 10% and a time window length of 91 s . . . . . . . . . . . . . . . . . . . . 89

5.17 Flow visualization for the Extended approach with a penetration rate of 5%and a time window length of 182 s . . . . . . . . . . . . . . . . . . . . . . . 91

5.18 Density visualization for the Extended approach with a penetration rate of5% and a time window length of 182 s . . . . . . . . . . . . . . . . . . . . . 91

5.19 Speed visualization for the Extended approach with a penetration rate of 5%and a time window length of 182 s . . . . . . . . . . . . . . . . . . . . . . . 92

Master en Logıstica, Transport i Mobilitat

Page 15: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

x Master thesis

5.20 RMSNE heatmap for the Extended approach with a penetration rate of 5%and a time window length of 182 s . . . . . . . . . . . . . . . . . . . . . . . 92

5.21 Flow visualization for the Extended approach with a penetration rate of 10%and a time window length of 364 s . . . . . . . . . . . . . . . . . . . . . . . 93

5.22 Density visualization for the Extended approach with a penetration rate of10% and a time window length of 364 s . . . . . . . . . . . . . . . . . . . . 94

5.23 Speed visualization for the Extended approach with a penetration rate of10% and a time window length of 364 s . . . . . . . . . . . . . . . . . . . . 94

5.24 RMSNE heatmap for the Extended approach with a penetration rate of10% and a time window length of 364 s . . . . . . . . . . . . . . . . . . . . 95

Master en Logıstica, Transport i Mobilitat

Page 16: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data xi

List of Tables

2.1 Table containing the definitions of Lighthill and Whitham and Wardrop for-mulated with the microscopic features. Source: (Edie, 1963) . . . . . . . . 13

2.2 Eulerian and Lagrangian measurements with their sub-types and some ex-amples. Source: (Ou et al., 2011) . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Value chain of probe data process with a short description of each stage andthe responsible stakeholder. Source: (Bessler and Paulin, 2013) . . . . . . 29

2.4 Measures of goodness-of-fit with their formulae and some comments. Source:(Hollander and Liu, 2008) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1 Sensor technologies and their functions. Source: Volkswagen AG . . . . . . 38

5.1 RMSNE and Theil’s inequality coefficient for the Leader approach with apenetration rate of 5% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.2 RMSNE and Theil’s inequality coefficient for the Leader approach with apenetration rate of 10% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.3 RMSNE and Theil’s inequality coefficient for the Leader-Follower approachwith a penetration rate of 5% . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4 RMSNE and Theil’s inequality coefficient for the Leader-Follower approachwith a penetration rate of 10% . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.5 RMSNE and Theil’s inequality coefficient for the Extended approach witha penetration rate of 5% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.6 RMSNE and Theil’s inequality coefficient for the Extended approach witha penetration rate of 10% . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Master en Logıstica, Transport i Mobilitat

Page 17: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

xii Master thesis

Master en Logıstica, Transport i Mobilitat

Page 18: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data xiii

Glossary

Keyword Description

ADAS Advanced Driver Assistance Systems

ATIS Advanced Traveller Information Systems

approach each of the defined traffic state estimation method

connected car equipped vehicle that shares the information with other equippedvehicles and profit by this data

equipped vehicle vehicle provided with a set of sensors that allows it to detect thesurrounding objects

Eulerian measurement measurement from a fixed control volume

fundamental diagram representation of the bivariate functional relationships establishedbetween two traffic flow characteristics

fundamental relation oftraffic flow theory

q = kvs

fundamental variable each of the macroscopic traffic characteristics involved in the fun-damental relation of traffic flow theory

goodness-of-fit mea-sures

statistical measures to perform the calibration of microsimulationmodel and the validation

ITS Intelligent Transportation Systems

k density (vehicles per unit of length)

Lagrangian measure-ment

measurement of the system along a particle trajectory

occupancy fraction of time that the measurement location is occupied by avehicle

probe data vehicle sensor information that is transmitted to a land-based cen-tre for processing

q flow (vehicles per unit of time)

Rs measurement region at a single fixed location (dx) during a certaintime period T

Rt measurement region at a single time instant (dt) over a certainroad section of length X

Rt,s general measurement region that can have any shape

Master en Logıstica, Transport i Mobilitat

Page 19: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

xiv Master thesis

Keyword Description

space headway distance between rear bumpers of consecutive vehicles

time headway time to reach the current position of the leader vehicle plus thetime needed to traverse the vehicle’s length

time-space diagram graph that describes the relationship between the location and thetime of vehicles in a traffic stream

traffic state fundamental variables

urban sprawl expansion of people away from dense areas into cities’ suburbs

urbanization transition from a rural to a more urban society

vs space-mean speed (total distance traveled by all vehicles in Rtdivided by the total time spent in Rt)

vt time-mean speed (arithmetic average of vehicles’ spot speeds in atemporal measurement region Rt)

Master en Logıstica, Transport i Mobilitat

Page 20: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 1

Chapter 1

Introduction

The process of transition from a rural to a more urban society, known as urbanization, hasbeen one of the dominant trends of the economic and social change of the 20th century,especially in the developing world. Since 1950, as a consequence of a natural increase of thepopulation and migrations, the world’s urban population has more than doubled. Today,more than half of the 7 billion inhabitants of the planet live in urban areas (Castells-Quintana, 2015), a share which is expected to keep rising. In fact, as stated in (Rodrigueet al., 2009), by 2050 6.4 billion of people, about two thirds of humanity, are likely to beurban residents.

Whereas in developed countries urbanization has been a long and slow process, in de-veloping countries this process is nowadays characterised by a really fast pace and a urbanpopulation tending to concentrate in one or few large metropolitan areas of disproportionatesize. In any case, the general trend is defined by the growing size of cities and the increasingproportion of the urbanised population.

Urban mobility problems have arisen proportionally with urbanization, since mobilitydemands are concentrated over a specific area. Both mobility and demographic growth havebeen shaped by the capacity and requirements of urban transport infrastructures. Citieshave traditionally responded to growth in mobility by expanding the transportation supply,defining new transit lines and building new highways.

In an age of motorization and personal mobility, a growing number of cities is developingspatial structures that increase reliability on motorized transportation, in particular theprivately owned automobile. The phenomenon of urban sprawl consists of the expansion ofpeople away from dense areas into cities’ suburbs, typically with a higher dependence onown vehicles. This effect is taking place in many different types of cities, and originates,among other factors, from the feasibility of people for daily commuting. Furthermore, it isclosely connected to the trend decoupling the economical and the residential activities.

In addition, automobile’s unique performance and space consumption characteristics

Master en Logıstica, Transport i Mobilitat

Page 21: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

2 Master thesis

make this transport mode one of the most relevant in terms of spatial importance, accordingto (Rodrigue et al., 2009). A vehicle requires space to move around (roads), and also spendsabout 98% of its existence occupying a parking spot. Thus, a significant amount of urbanspace must be allocated to accommodate the automobile, especially when it does not move.For instance, in Western Europe roads account for between 15 and 20% of the urban surface,whereas for developing countries this percentage is about 10%.

All these features show that urban sprawl directly impacts traffic congestion, high oilconsumption and other transportation issues. For example, traffic congestion is accom-panied by negative impacts such as delays, increment of air pollution and carbon dioxideemissions, and a higher chance of collisions due to tight spacing. Therefore, an appropriatetraffic management is really important in order to mitigate these negative externalities asmuch as possible.

New technologies play an important role in traffic management, which seeks an optimalstrategy at network level, since the availability of wide-ranging real-time data allows theinvolved stakeholders (city councils, road operators, etc.) to be aware of traffic conditionsand, consequently, to make the decisions which best adapt to the situation. The research inthe technologies for gathering real-time data is a hot topic, because different systems anddevices are directly or indirectly involved, such as vehicles’ on-board sensors (providing, forexample, GPS position) and smartphones with mobile internet connection and some sensingcapabilities such as location, accelerometer, etc.

Several studies have been conducted from points of view different to traffic management,such as driver safety, continuous monitoring of vehicle state or other mobility services asguidance or parking assessment. However, a few researchers have already taken advantageof the mobile sensing platforms as means of collection, processing and analysis of sensor datain addition to their original purposes, with the aim of estimating traffic state, monitoringthe ride quality and supporting proactive traffic surveillance, among others.

1.1 Motivation: Connected Car project

One of the main goals of automotive industries is to define a service offer with the objective ofmaking easier the driving as a whole. Then, vehicle-collected data offers the opportunity tobe exploited for an end-user service itself, which may result in a powerful tool to differentiatefrom competitors.

Upon this idea emerged the collaboration between Volkswagen AG, the biggest au-tomaker in both Germany and Europe, and inLab FIB, an innovation and research labbased in the Barcelona School of Informatics (UPC). The Connected Car project, which isset within the context of the urban mobility assistance, is focused on the research on newmobility services that provide the driver support in different urban scenarios.

In this framework, the Connected Car initiative addresses the simulation-based study

Master en Logıstica, Transport i Mobilitat

Page 22: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 3

of various such services to assess their feasibility, the produced value and the effect on thetraffic conditions and infrastructure usage. For this assessment Volkswagen AG supplies atype of vehicle equipped with a set of radars detecting its surrounding objects, which willbe referred from now on as equipped vehicle. These equipped vehicles are connected to eachother in the sense they all share the information and profit by this data, so these vehiclesare also known as connected cars.

Since this long-term project is made up of several stages, only the ones included in theworkplan for the academical year 2014/2015 are described. The project started with anexploratory data analysis of a pilot test conducted by a single equipped vehicle performinga couple of routes in the surroundings of Braunschweig (Germany). The analysis of thegathered data provided a knowledge of the available information, and the identificationof outliers and systematic errors due to malfunctioning of the sensors or induced by theprocessing of the data from the on-board devices to the final xml files.

Then, a five-day experiment consisting of a fleet of three equipped vehicles driving inBarcelona was conducted. In order to get significant results, the penetration rates of suchvehicles (share with respect to all vehicles in the network) should lie, according to theliterature, between 7 and 10%, so a microscopic traffic simulation environment is required.By taking into account the outcomes of Braunschweig test, Barcelona’s collected data wasexhaustively analyzed to calibrate the simulation model and to define a library of functionsemulating the data capture process of the equipped vehicles.

Afterwards, different simulation experiments were handled in order to emulate basicmobility services, and the obtained results were analyzed by comparing some performanceindicators for the experiments with and without the corresponding mobility service. Thedefinition of advanced mobility services will come hereafter, and its evaluation will be or-ganized in the same way as for the basic ones.

Therefore, the access to the data from the simulation experiments with a considerablepenetration rate, together with the motivation of evaluating the data from a traffic flowtheory point of view, give rise to the current project. This approach moves away fromthe mobility services’ point of view, and constitutes an academic research work which maycontribute in the definition of real applications for both traffic authorities and car manu-facturers.

1.2 Objectives and project scope

This Master thesis is mainly aimed at estimating traffic variables, namely flow, densityand speed, from data supplied by connected cars in order to characterize traffic state in aparticular time-space region. Other important objectives are to adapt to our particular casethe procedure presented in (Seo et al., 2015) to estimate these variables based on certainequipped vehicles which only observe their front vehicle, and to analyze the existing trade-off between the penetration rate of the equipped vehicles and the quality of the estimation,

Master en Logıstica, Transport i Mobilitat

Page 23: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

4 Master thesis

among other factors.

As stated before, a microsimulation model for the Connected Car project is defined inorder to deal with a significant share of equipped vehicles. The data used for the developedprocedures, which belongs to both the equipped and the detected vehicles, is generatedby Aimsun, the employed traffic modelling software. It is worth noting that this data isconsidered as an input for the present work, so the implementation of the equipped vehicles’functions in terms of data capture, and the calibration of the microsimulation model, areexcluded from the scope of the project.

However, a summary of the data analysis developed for the Barcelona experiment isincluded to provide a better understanding of the available information and the behaviourof the equipped vehicles when they detect other vehicles in their surrounding. As commentedabove, this preliminary analysis enabled us to implement the detection process in Aimsunand to calibrate the model.

The reference estimation method consists of a methodology that considers data regard-ing to the GPS position and the spacing measurement, being the latter provided by acamera located on vehicles’ dashboards. In our case, the distance between vehicles (spac-ing measurement) is also given, so this project intends to extend the reference method byconsidering not only the vehicle in front of the equipped vehicle but also the vehicles around.

Then, the final outcomes of this project are, firstly, the estimated values of the mentionedtraffic variables for each of the implemented experiments, and secondly, an exhaustive anal-ysis of the obtained results from different perspectives and by considering different tools.Technically, for the first output the estimation approaches are defined and implemented, andfor the second a design of experiments is conducted. In both cases the employed softwareis the statistical software R.

1.3 Project overview

This thesis is organized in big chapters, each of them corresponding to the stages thatcompose the project. Moreover, these are further divided into sections and subsections inorder to deal with the different aspects of each stage separately.

In the first place, in Chapter 2 an analysis of the state of the art of the areas related tothis work is conducted. We first describe the fundamentals of traffic flow theory, and thenwe characterize the conventional measurement methods for the collection of data. This isfollowed by an exhaustive overview of the probe vehicle data, the type of data availablefor this project. Finally, a general outlook of the main traffic simulation concepts andgoodness-of-fit measures concludes the chapter.

Afterwards, Chapter 3 summarizes the preliminary analysis developed for the Barcelonaexperiment. Before that, we itemize the technical specifications of both the equipped vehi-

Master en Logıstica, Transport i Mobilitat

Page 24: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 5

cles that performed the experiment and the exploited data for the assessment of mobilityservices. Later, the design of the data collection presents the main features of Barcelona’strial, such as the defined itineraries and the information provided to drivers. Since the ref-erence estimation method is suitable for corridors, the exploratory data analysis includedin this thesis refers to the itinerary that passes through the corridor selected for the testingof the defined procedures.

Once the available data is characterized, Chapter 4 covers all aspects related to thedeveloped methodology. First of all, the reference estimation method and the correspondingfield experiment are detailed in (Seo et al., 2015). Secondly, we itemize the main adaptationswith respect to this method and the defined functions for the processing of the emulateddata, and lastly the three implemented procedures for the estimation of traffic variablesbased on the reference method are explained in depth.

The results obtained with these procedures are analyzed in Chapter 5. For each methoda couple of combinations of the design parameters is selected, and different visualizationsand goodness-of-fit measures are considered to evaluate the quality of the estimated valuesand their reproducibility of the traffic phenomena.

Finally, this project ends up with the conclusions and future work (Chapter 6) byincluding the main findings and the possible future research.

Master en Logıstica, Transport i Mobilitat

Page 25: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

6 Master thesis

Master en Logıstica, Transport i Mobilitat

Page 26: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 7

Chapter 2

State of the Art

Traffic engineering originates as a very practical discipline, consisting of practitioners tryingto solve particular traffic problems from a common sense point of view. However, thischanges in the early 1950s, when this discipline, now known as traffic flow theory, beganto attract engineers from all fields. Especially, it should be pointed out the contributionof J. G. Wardrop (Wardrop, 1952), who described traffic flows by using mathematical andstatistical concepts.

This research focus established itself as a solid basis for theoretical analyses, standingout the fluid-dynamic model of Lighthill, Whitham and Richards, shortly called the LWRmodel ((Lighthill and Whitham, 1955) and (Richards, 1956)), for describing traffic flows;and the generalized definitions introduced by Edie for the so-called fundamental variables.

Later on, personal computers allowed the field to evolve further. Nowadays, it is fairlyimpossible to dissociate it from the industry, with what is known as intelligent transportationsystems (ITS), since they are present in almost all aspects related to the transportationcommunity, and have become an important part of the current research.

Because the models and approaches described by researchers should always be comparedwith reality, the ITS research has allowed the development of several systems and tools togather data from the physical world, which allows the test of the developed methods andthe subsequently analysis of the obtained results.

Traditionally, road operators have invested in expensive infrastructure for the purposesmentioned above, such as inductive loops in the pavement, microwave, ultrasound and infra-red sensors, gantries with video cameras and recently, roadside with units that register thevehicles passing by. Nowadays, with the advances in ITS, the data collected by the vehiclesthemselves, also known as probe data, is presented as a reasonable complement to datagathering.

This data is used by researchers and practitioners to build traffic simulation models to

Master en Logıstica, Transport i Mobilitat

Page 27: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

8 Master thesis

replicate actual traffic conditions. Especially, since the parameters characterizing the modelneed to be adjusted, the comparison between the observed and the simulated measurementsenables them to determine of the parameters best fitting traffic reality. This comparison iscarried out by the so-called measures of goodness-of-fit.

This chapter is organized as follows. First we include a general overview of the mainconcepts of traffic flow theory. Then, we provide a review of the traditional measurementprocedures, together with an exhaustive overview of probe vehicle data. Finally, last sectionintroduces basic simulation ideas and the goodness-of-fit measures most commonly used.

2.1 Traffic flow theory

In this section we first deal with the microscopic and macroscopic characteristics that de-scribe traffic streams. Then, we discuss the most common performance indicators employedto evaluate the quality of traffic conditions, and we conclude with the so-called traffic flowregimes and the transitions between them.

2.1.1 Microscopic and macroscopic characteristics

In order to characterize traffic flows we need to specify if the point of view we are consideringis microscopic or macroscopic. The former takes into account each vehicle in a traffic streamindividually, in the sense that the road traffic flow is composed by a set of drivers associatedto the individual vehicles they are driving. The latter, instead, considers many vehiclessimultaneously by zooming out from the microscopic point of view to get a more aggregatelevel of information.

For the purposes of this project, the macroscopic features play an important role, sinceour final goal is to estimate their values and evaluate the quality of this estimation. Never-theless, because of their use in the developed methods and their relation with the macro-scopic characteristics, the main microscopic features and their relations are described here-after.

If we take into account a single vehicle, denoted as i, its related variables include itslength (li), longitudinal position (xi), speed (vi = dxi

dt ) and acceleration (ai = dvidt ). It is

worth noting that the position xi typically refers to the rear bumper of the vehicle.

Besides these features, when considering a vehicle i and its preceding i+ 1, other trafficflow characteristics that relate both vehicles can be defined. In terms of the space, vehiclei has a certain space headway hsi with i + 1, which is composed of the space gap gsi (thedistance from the i -th front bumper to the i+1 -th rear bumper) and li (its own length):hsi = gsi + li. Figure 2.1 shows the length of vehicle i and the space gap and space headwaybetween this vehicle and its preceding in the same lane in a traffic stream.

Master en Logıstica, Transport i Mobilitat

Page 28: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 9

Analogously, regarding the time there is a time headway hti between vehicles i andi + 1, that comprises a time gap gti (the amount of time to reach the current position ofthe leader vehicle) and an occupancy time ρi (time needed to traverse a distance equal tovehicle’s length): hti = gti + ρi.

Figure 2.1: Two consecutive vehicles in the same lane in a traffic stream. Source:(Maerivoet and Moor, 2005)

Both space and time headway can be visualized in Figure 2.2, where vehicles’ positionsare plotted with respect to time by tracing out two vehicle trajectories, in what is knownas a time-space diagram. In the space direction (vertical) the space headway hsi and itscomponents (the space gap gsi and vehicle length li) are shown. In the time direction(horizontal) the time headway hti and its components (the time gap gti and the occupancytime ρi) are also displayed.

Figure 2.2: A time space diagram showing two vehicle trajectories and both the spaceand time headway. Source: (Maerivoet and Moor, 2005)

From the macroscopic point of view traffic flows are considered as a stream, i.e., a fluidcontinuum. According to this parallelism, the main characteristics we may be interested

Master en Logıstica, Transport i Mobilitat

Page 29: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

10 Master thesis

in would be fluids’ own properties, that is, flow (quantity per unit of time), concentra-tion (quantity per unit of space) and speed (space per unit of time), and their variationthroughout time and space.

Since traffic is actually composed by discrete vehicles, rather than be a continuous fluid,methods are generally concerned with individual measurements like the space or the timeheadway. Historically, this resulted in a need to convert these discrete measurements intothe targeted continuous characteristics, which at the same time lead to definitions based ontwo types of measurements: made at a point in space and made at an instant in time. In1963 Edie presented a unified and generalised version of these definitions in (Edie, 1963),independent of the methods of measurement, about which we are going into detail soon.

To define the different measurement regions we consider the time-space diagram inFigure 2.3. Because objects are commonly constrained to move along a one-dimensionalguideway (e.g. a highway lane or a flight path), the relevant aspects of objects’ motion areoften described in cartesian coordinates of time t and space x. Figure 2.3 illustrates thetrajectories of some objects traversing a facility of length X during time interval T .

Figure 2.3: A time-space diagram showing several vehicle trajectories and three measure-ment regions: Rt, Rs and Rt,s. Source: (Maerivoet and Moor, 2005)

This is a powerful tool that allows to consider many vehicles at the same time by tracingtheir trajectories, and provides a complete picture of all the operations that are taking place(e.g. accelerations or decelerations). However, all the data needed to completely build thetrajectories may not be available, which results in the use of only coarsely approximateddata or hypothetical data from ”thought experiments”.

Time-space diagram in Figure 2.3 shows many vehicle trajectories and three rectangularmeasurement regions: Rt, Rs and Rt,s. The region Rt corresponds to measurements at asingle fixed location (dx) during a certain time period (T ), whereas Rs refers to measure-

Master en Logıstica, Transport i Mobilitat

Page 30: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 11

ments at a single time instant (dt) over a certain road section of length X. The remainder,Rt,s, is a general measurement region which can have any shape. For the sake of simplicity,a rectangular region in time and space is considered throughout the present chapter.

The adoption of these measurement regions depends on the available systems to collectthe data. For instance, a single inductive loop embedded in the road provides data in a Rtregion, an aerial photograph a Rs region, and a sequence of images made by a video cameradetector a general measurement region Rt,s.

Returning to the macroscopic characteristics, and maintaining the parallel with fluids,the properties we are interested in are flow, concentration and speed. It is worth noting thatthis was the nomenclature employed in times of Edie, but nowadays we refer to concentrationas the (vehicular) density and to speed as the mean or average speed. Thus, from now onwe will use this terminology to refer to these macroscopic characteristics.

In general terms, the density enables us to get an idea of how crowded a certain sectionof a road is, and it is typically expressed as number of vehicles per kilometre. Whereas thedensity is usually a spatial measurement, the flow is considered as a temporal measurementexpressed as an hourly rate, i.e., in number of vehicles per hour.

A set of definitions for the macroscopic characteristics was introduced by Lighthill andWhitham, who define the flow q and the density k from measurements at a single point,that is, by considering a Rt region, as follows:

q =n

T, (2.1)

k =

n∑i=1

dti

T dx, (2.2)

where n is the number of vehicles crossing dx during time T , and dti is the time taken by thei-th vehicle to cross dx. The mean speed of the stream was obtained from the condition thatrelates the three characteristics: q = vsk, which had been previously proved by Wardrop(this condition is analyzed at the end of this subsection):

vs =q

k=

ndxn∑i=1

dti

. (2.3)

As shown in Equation 2.3, the mean speed is computed as the total distance traveledby all vehicles in the measurement region divided by the total time spent in this region,which is known as the space-mean speed. We denote it as vs to distinguish it from vt, thetime-mean speed, which is defined later on.

Master en Logıstica, Transport i Mobilitat

Page 31: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

12 Master thesis

On the other hand, Wardrop showed another way to compute the average stream speedby considering a different set of measurements, specifically two aerial photographs of a longsection of roadway X taken a short time apart (a pair of Rs regions):

vs =

n∑i=1

dxi

n dt, (2.4)

where n is the number of vehicles on X at time t, and dxi the distance traveled by the i-thvehicle on X during the interval dt between photographs.

Previous definitions for the space-mean speed are two different characterizations bytaking into account the temporal and spatial regions. If we rewrite them by starting fromthe mean speed definition (total distance traveled by all vehicles in the region divided by thetotal time spent by these vehicles in the region), and by including vehicles’ instantaneousspeed in the formulation, we obtain the following formulae:

vs =

n∑i=1

dxi

n∑i=1

dti

n∑i=1

vi�dt

n�dt= 1

n

n∑i=1

vi (region Rs)

n��dxn∑

i=1

��dxvi

= 1

1n

n∑i=1

1vi

(region Rt).

(2.5)

It is worth noting that the spatial measurement is based on an arithmetic average ofvehicles’ instantaneous speeds (vehicles’ speeds at dt), whereas the temporal measurementis based on the harmonic average of vehicles’ spot speeds (vehicles’ instantaneous speed atlocation dx).

Furthermore, Wardrop defined the flow and the density for the aerial observations asfollows:

q = vs k =

n∑i=1

dxi

X dt, (2.6)

k =n

X. (2.7)

The set of definitions provided by Wardrop (Equations 2.4, 2.6 and 2.7), together withthe one supplied by Lighthill and Whitham (Equations 2.1, 2.2 and 2.3), result in two setsof definitions consistent with how the measurements were made.

Master en Logıstica, Transport i Mobilitat

Page 32: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 13

In 1963, Edie expressed previous definitions in terms of the related microscopic charac-teristics in order to make them understandable. Then, according to (Edie, 1963), they canbe formulated in terms of the vehicle speed vi, space headway hsi and time headway hti .The resulting definitions are included in Table 2.1.

Lighthill and Whitham Wardrop

q =n

T=

nn∑i=1

hti(2.8)

q =

n∑i=1

dxi

Xdt=

n∑i=1

vi

X=

n∑i=1

vi

n∑i=1

hsi

(2.9)

k =

n∑i=1

dti

Tdx=

n∑i=1

1vi

T=

n∑i=1

1vi

n∑i=1

hti

(2.10)k =

n

X=

n∑ni=1 hsi

(2.11)

vs =nXn∑i=1

dti

=nn∑i=1

1vi

(2.12)vs =

n∑i=1

dxi

ndt=

n∑i=1

vi

n

(2.13)

Table 2.1: Table containing the definitions of Lighthill and Whitham and Wardrop for-mulated with the microscopic features. Source: (Edie, 1963)

It is easy to see that the flow, when calculated for a Rt region, is the reciprocal of thearithmetic mean of the time headway, and the density in the Rs region the reciprocal ofthe arithmetic mean of the space headway. However, it was not clear how to average q overspace at a time instant (Rs region) or k over time at a given physical point (Rt region).

Although the definitions originally given by Lighthill and Whitham and Wardrop pro-vided a variety of forms of use for traffic engineers, it was desirable to study longer sectionsof roadway for various periods of time. Edie highlighted the interest in the study of platoonsof vehicles traveling in space and time. Thus, it appeared to be a need the extension of theexisting definitions to any Rt,s region.

In order to extend these definitions, Edie first considered Lighthill and Whitham equa-tions, precisely Equation 2.1, and multiplied both numerator and denominator by dx, ob-taining

q =n dx

T dx. (2.14)

In Equation 2.14 the numerator is the aggregate distance traveled by all n vehicles ondx and the denominator represents the area of a rectangular time-space region of spacelength dx and time length T in which vehicles travel. Moreover, Equation 2.2 already givesthe density as the total time spent on dx divided by the area of the respective time-spaceregion.

Master en Logıstica, Transport i Mobilitat

Page 33: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

14 Master thesis

The same can be applied to Wardrop definitions, since the flow is already given as thetotal distance traveled by the n vehicles on X divided by the area of the Rs region (Equation2.6), and density’s expression can be multiplied by dt in both numerator and denominator,obtaining this quantity as the total time spent by all vehicles divided by the correspondingarea.

Then, Edie deduced that by modifying the original definitions in the described way itwas possible to end up with equivalent expressions for the two sets of definitions. Thus, heconcluded that it was possible to combine both sets into a single one independent of themeasurement methods. The resulting equations for the flow, density and speed of a trafficstream are itemized next:

• Flow (q): measured as the aggregate distance traveled by all vehicles passing througha Rt,s region divided by its area

q =

n∑i=1

xi

A[veh/h], (2.15)

where xi is the distance traveled by vehicle i in the region Rt,s and A is the area ofthis region.

• Density (k): defined as the aggregate time spent by all vehicles passing through aRt,s region divided by its area

k =

n∑i=1

ti

A[veh/km], (2.16)

where ti is the total time spent by vehicle i in the region Rt,s.

• Average speed (vs): measured as the aggregate distance traveled by all vehiclespassing through a Rt,s region divided by the total time spent by all vehicles traversingit

vs =

n∑i=1

xi

n∑i=1

ti

[km/h]. (2.17)

At first sight these definitions look unfamiliar based on the general concepts of flowand density, but they are consistent with previous definitions and independent of particularmethods of measurement. In addition, they are explicit and reduce sensitivity to randomerrors that result from measurements made at a single point or at a single instant.

Master en Logıstica, Transport i Mobilitat

Page 34: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 15

In addition, it is worth highlighting that these generalized definitions merely average theflows collected over all points and the densities collected at each instant within the regionof interest. In fact, Edie showed that if the computed averages of qi and ki for regions withareas ai are provided, the weighted averages are calculated as follows:

q =

m∑i=1

ai qi

m∑i=1

ai

, k =

m∑i=1

ai ki

m∑i=1

ai

, v =

m∑i=1

ai qi

m∑i=1

ai ki

. (2.18)

Some comments on the macroscopic characteristics

• Flow

When the distance traveled by the i-th vehicle di is not available, but the flow q(t) is knownfor each timestamp t during a certain time period T in consecutive Rs regions, we can usethe approach that corresponds to the temporal average of the flow, with equations:

q =

1T

∫ Tt=0 q(t) dt (continuous)

1T

T∑t=1

q(t) (discrete)

(2.19)

• Density

Since the density ignores the effects of traffic composition and vehicle lengths, when consid-ering heterogeneous traffic (composed by different types of vehicles), the notion of passengercar unit (PCU) can be employed. For example, if we take as a PCU an average passengercar, a truck in the same traffic stream may be considered as 2 PCUs. In this way we cannotice the spatial differences between different vehicle types.

As with the flow, travel times ti of the individual vehicles may not be available, and inmost cases even difficult to measure. We can use an approach similar to that of Equation2.19, which corresponds to the temporal average of the density, when at each timestamp tthe density k(t) for a certain time period T in consecutive Rs regions is known. In this case,the adaptation of Equation 2.19 becomes straightforward because we only need to changeq(t) for k(t).

Since density is basically defined as a spatial measurement, it is one of the most difficultcharacteristics to be obtained. Due to the importance of this quantity, another macroscopicmeasurement is defined in order to achieve density. The occupancy, denoted by ρ, is defined

Master en Logıstica, Transport i Mobilitat

Page 35: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

16 Master thesis

as the fraction of time the measurement location (typically a detector station on the road)is occupied by a vehicle, and it is a commonly-used property for describing highway trafficstreams.

If we consider the trajectories of each vehicle as two parallel lines tracing the vehicle’sfront and rear (as seen by a detector), it becomes straightforward to define the generalizedoccupancy in the region of area A in the same manner as Edie. It is defined in (Cassidyet al., 1996) as the fraction of the region’s area covered by the strips marked out by vehicles’front and rear. It follows that ρ(A) and k(A) are related by an average vehicle length. Infact, this value is the ratio of ρ(A) to k(A):

average vehicle length =ρ(A)

k(A)=

area of the shaded strips

|A|· |A|t(A)

. (2.20)

It is important noting that in order to obtain the density by dividing the occupancy bythe average vehicle length (Equation 2.20), it is necessary to assume that individual vehiclelengths and speeds are uncorrelated.

Since occupancy is usually measured by a loop detector, we can characterize its cal-culation by assuming that X is the length of the road ”visible” to the loop detector, theso-called detection zone, and T the interval over which the detector collects measurements.Then, if n is the number of vehicles passing the detector during T , and τi is the time spentby the i-th vehicle on top of the detector, the occupancy is calculated as:

ρ =

n∑i=1

τi

T. (2.21)

• Mean speed

The mean speed of a traffic stream should not be confused with velocity, since the latterimplies a direction, whereas the former could be regarded as the norm of this vector.

As stated above, the mean speed can be calculated both spatially and temporally. Thetime-mean speed vt is the arithmetic average of vehicles’ spot speeds in a temporal mea-surement region Rt:

vt =1

n

n∑i=1

vi, (2.22)

where n is the total number of vehicles in the Rt region.

Master en Logıstica, Transport i Mobilitat

Page 36: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 17

Regarding the space-mean speed, Wardrop has shown that the following relation betweenboth mean speeds holds:

vt = vs +σ2svs, (2.23)

where σ2s is the statistical sample variance, defined as follows:

σ2s =1

n− 1

n∑i=1

(vi − vs)2, (2.24)

in which vi denotes the i-th vehicle’s instantaneous speed.

It is worth noting that the time-mean speed always exceed the space-mean speed exceptwhen all vehicles’ speeds are the same. Therefore, a stationary observer will most likely seemore faster than slower vehicles passing by, whereas on an aerial photograph more slowerthan faster vehicles will be seen.

Although the practical difference between both is negligible for free-flow traffic, undercongested traffic conditions both mean speeds will behave substantially different. In general,using the space-mean speed is preferred to the time-mean speed, but when only the latteris available, care should be taken when interpreting the results.

Fundamental relation of traffic flow theory

As we have already introduced, there exists a unique relation, proved by Wardrop, betweenthe three considered macroscopic characteristics density k, flow q and space-mean speed vs:

q = k vs. (2.25)

This relation is also called the fundamental relation of traffic flow theory, since knowingtwo of the discussed macroscopic characteristics allows us to calculate the third one. Byextension, the three macroscopic characteristics involved are called fundamental variables.

The family of definitions proposed by Edie is such that q = k vs holds as an identity(i.e. it is true by definition). However, this relation is only valid when the following tworestrictions are satisfied:

• The variables are continuous or smooth approximations of them

• The traffic is composed of substreams (e.g. slow and fast vehicles) satisfying thefollowing assumptions: homogeneous traffic (same type of vehicles in the traffic sub-stream) and stationary traffic, which means that the traffic substream at different

Master en Logıstica, Transport i Mobilitat

Page 37: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

18 Master thesis

times and locations looks the same, that is, all vehicles’ trajectories should be paralleland equidistant

Since density can not always be easily measured, this relation provides an alternativeway to calculate it. Then, this variable can be directly derived from flow and space-meanspeed measurements. We notice that the time-mean speed does not satisfy this relation.

2.1.2 Performance indicators

Besides the fundamental variables, there are some popular performance indicators used bytraffic engineers when assessing the quality of traffic operations. We discuss now two ofthe most common ones: the travel time and its reliability, and the queue length and theconsequent delays.

• Travel time

One of the most tangible aspects of journeying is the notion of expected travel time, sincetravellers like to know how long a particular journey will take. More specifically, whenpeople are travelling to their work, they are required to arrive on time at their destinations.Thus, travellers use the average travel time it takes to reach the destination to decide abouttheir departure time.

Furthermore, there is an increased interest to obtain precise values for this indicator inthe context of the advanced traveller information systems (ATIS). It is also a measure ofinterest for traffic operators, since travel time is the primary input to route guidance. Theaccurate prediction of future travel times, together with other features such as detection ofincidents, enable drivers to stay informed of the actual traffic conditions and, eventually,modify their journey.

The estimation methods for this quantity are used to determine vehicle travel time ona link or a road section during a certain time period using the data gathered in this period.Historically, travel time has been estimated considering data from fixed sensors such asinductive loop detectors (see Section 2.2 for more details), but these estimations may not besufficient reliable because of the limited coverage area and the low data accuracy; and thesensors have a high implementation cost. Consequently, current developments focus on usingmoving sensors to provide high accurate estimations, like GPS positioning data. However,these data also have errors, especially in urban areas, due to network coverage problems(disturbing the GPS functions) and data transmission frequency. Therefore, additionaltechniques need to be applied to achieve precise estimations.

Although travel time is a very intuitive idea, different points of view are involved whendefining this measure in a formal way. We can define it as the amount of time necessary

Master en Logıstica, Transport i Mobilitat

Page 38: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 19

to traverse a route between any two points of interest, which is called experienced dynamictravel time, starting at a certain time stamp t0 over a road section of length X:

T (t0) =

∫ X

0

1

v(t, x)dx ∀t ≥ t0, (2.26)

for which it is assumed that all local instantaneous vehicle speeds v(t, x) are known at allpoints along the route and at all time instants. However, if only a subset of the v(t, x) isknown (e.g. the values at the detector stations), travel time can be approximated using therecorded speeds at the beginning and the end of a road section.

Since the experienced dynamic travel time requires the availability of local vehicle speedsat all time instants, a simplification can be used, resulting in the so-called experiencedinstantaneous travel time:

T (t0) =

∫ X

0

1

v(t0, x)dx. (2.27)

In general, we can easily derive the travel time by dividing the distance traveled by allvehicles by their space-mean speed (for which we need an accurate estimation):

T (t0) =X

vs(t0). (2.28)

Travellers reason about their expected travel time based on a built-in safety margin,and may know its value because of the familiarity with the associated trip or because anexternal source provides it. Furthermore, drivers typically accept (and sometimes expect)a small delay in their expected travel time.

The reliability of such travel time is typically characterised by the standard deviation(square root of the variance) of a travel time distribution. The expected travel time isunreliable when both expected and experienced travel times differ sufficiently. Thus, traveltime can be seen as a measure of service quality.

There has been some research into the analytic form of travel time distributions, likethe work in (Arroyo and Kornhauser, 2007) concluding that the lognormal distribution isthe most appropriate one. However, there exist significant differences between travel timedistributions: a smaller standard deviation indicates a better service quality and reliability,whereas a large standard deviation means a chaotic behaviour of the traffic flow. Moreover,travel time estimations can have a long tail due to seldom events (e.g. incidents), whichcan have significant repercussions on the quality of traffic operations.

Master en Logıstica, Transport i Mobilitat

Page 39: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

20 Master thesis

• Queue length

Traffic congestion almost always leads to the build up of queues, introducing an increasein the experienced travel time, which is known as the delay. Basically, the congestion itselfcan have originated due to structural reasons, that is, traffic demand exceeding the capacity(e.g. the morning or afternoon rush in urban areas), or incidental reasons, when an incidentoccurred (e.g. road works or a traffic accident).

In urban networks, queues at signalized intersections are the main cause of traffic de-lays and travel time variability. As with the travel time, an accurate and practical queueestimation method is really important for ITS to provide a better understanding of urbanflow dynamics, to be utilized for traffic state estimation and to be integrated in a trafficsignal control framework.

Therefore, queues can be seen as loss in travel time with respect to some base line refer-ence, which are commonly the travel time under free flow conditions (see Subsection 2.1.3for more details) and the travel time under maximum flow. Although current systems anddevices enable user to be informed about the extra travel time (the delay) and queue lengths,from a user-based point of view it is more intuitive to advertise a temporal estimation (aboutthe delay or the expected travel time under corresponding traffic conditions) rather than aspatial estimation informing about the queue length in a particular road junction.

2.1.3 Traffic flow regimes and fundamental diagrams

After providing the main traffic flow characteristics and performance indicators, we includethe different traffic flow conditions in terms of the concepts introduced so far, and thetransitions between them.

Traffic flow regimes

When considering a traffic flow stream we can distinguish different types of operationalcharacteristics, called regimes (also called phases or states). Each of these regimes is char-acterized by a certain set of unique properties, and sometimes they are classified based onoccupancy measurements or combinations of different macroscopic traffic flow characteris-tics.

We differentiate between three regimes, the ones known as free-flow traffic, capacity-flowtraffic and congested traffic. These regimes are, in fact, the commonly adopted ones whenlooking at traffic flows, but it is worth noting that there are other opposed theories, suchas Kerner’s three-phase traffic theory, which includes a regime called synchronized traffic.We are not discussing these other approaches in this thesis.

In the free-flow traffic, vehicles are able to freely travel at their desired speed by takinginto account the maximum allowed speed of the road (in case it exists), as well as other

Master en Logıstica, Transport i Mobilitat

Page 40: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 21

characteristics related to the road, engine, weather and vehicle. The free-flow speed, denotedas vff , is the mean speed of the vehicles. This regime takes place only at low densities,which implies large average space headways between vehicles, and traffic flow is stable dueto the insignificant effect of the small local disturbances.

When the traffic density increases, vehicles are driving closer to each other. At a certainmoment, the flow will reach its maximum value, which is called the capacity flow, denotedby qc, qcap or even qmax. At capacity-flow traffic, the average time headway is minimal,indicating the formation of packed clusters of vehicles (i.e. platoons), which are moving ata certain capacity-flow speed vc (which is normally a bit lower than the free-flow speed).

When more vehicles are present, the density is increased even further, allowing suf-ficiently large disturbance to take place. The resulting traffic state of saturated trafficconditions is called congested traffic, and the moderately high density at this state is calledthe critical density, and is denoted by kc or kcrit. Higher values of density indicate almostalways a worsening of the traffic conditions, resulting in stop-and-go traffic, at which vehiclesare required to severely slow down or even stop completely. Even extremely, when trafficbecomes motionless, which is known as jammed traffic, there exists a maximum density atwhich traffic seems to turn into a ”parking lot”, called the jam density, and denoted by kj ,kjam or kmax.

From a physical point of view, these systems can be described in the framework of thestatistical physics. Within this context, the changeover from one traffic regime to anothercan be looked as a phase transition. While on the subject of the analogy of fluids fortraffic flows, the comparison with gas-liquid transitions can help us to understand regimetransitions.

Then, free-flow traffic corresponds to a gaseous phase, in which particles are evenlyspread out in the system. At the transition, liquid droplets form, leading to a state whereboth gaseous and liquid phases coexist. For even higher densities, particles are so close toeach other that the only remaining state is the liquid phase.

Relations between traffic flow characteristics

Traffic engineers have historically dealt with fundamental diagrams to understand and an-alyze the relations between traffic flow characteristics by establishing bivariate functionalrelationships between them.

We now give an overview of some qualitative features of the different possible fundamen-tal diagrams, representing the equilibrium between flow, density and space-mean speed. Wenote that for each of the diagrams we take as an example a possible fundamental diagram,as they can take on many shapes.

The fundamental diagram relating the density and the space-mean speed is the easiestto understand intuitively, since the relationship is linear with a negative slope, as shown inFigure 2.4. Its most notable features are the following:

Master en Logıstica, Transport i Mobilitat

Page 41: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

22 Master thesis

• the density is restricted between 0 and the jam density kj

• the space-mean speed is restricted between 0 and the free-flow speed vff

• as density increases, the space-mean speed monotonically decreases, and there exist asmall range of low densities in which the space-mean speed remains unaffected (aroundthe free-flow speed)

Figure 2.4: A fundamental diagram relating the density k and the space-mean speed vs.

The flow versus density diagram is probably the most encountered form of fundamentaldiagram, which is displayed in Figure 2.5. Its main characteristics are itemized next:

• for moderately low densities (below the critical densities kc), the flow increases moreor less linearily (this part is called the free-flow branch of the fundamental diagram)

• near the critical density kc, the fundamental diagram can bend slightly (due to fastervehicles being obstructed by slower vehicles)

• at the critical density kc, the flow reaches a maximum, the so-called qcap

• in the congested branch of the fundamental diagram, the flow starts to decrease withincreasing density, until the jam density kj is reached, resulting in a zero flow

• the space-mean speed vs for any point on the fundamental diagram can be found asthe slope of the line connecting that point and the origin

As we can see in Figure 2.6, and in opposition to diagrams in Figures 2.4 and 2.5, thespace-mean speed versus flow curve no longer embodies a function in a mathematical sense,since for each value of flow there are two different space-mean speed values.

Master en Logıstica, Transport i Mobilitat

Page 42: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 23

It is worth noting that the described relations are based on observations, which meansthat there is no direct causal relation assumed between any two variables. Then, funda-mental diagrams sketch only possible correlations between variables. In fact, traffic is nothomogeneous nor stationary, so the effect of a large amount of scatter appear in the afore-mentioned diagrams. The presence of all this scatter in the data leads some traffic engineersto question the validity of the fundamental diagrams, more specifically the behaviour in con-gested traffic. However, fundamental diagrams remain to the majority of the community asa fairly accurate description of the averaged behaviour of a traffic stream.

Figure 2.5: A fundamental diagram relating the density k to the flow q. Source:(Maerivoet and Moor, 2005)

Figure 2.6: A fundamental diagram relating the flow q to the space-mean speed vs.Source: (Maerivoet and Moor, 2005)

Master en Logıstica, Transport i Mobilitat

Page 43: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

24 Master thesis

2.2 Measurement issues

Though the measurement technology for obtaining traffic data has changed over the 60-year span of interest in traffic flow, most of the basic procedures remain largely the same.However, there is one technique that has entered the picture during last years, the so-calledprobe vehicles.

We first discuss shortly the measurement types and the basic measurement procedures,and afterwards an overview of probe vehicle data is provided by defining the techniques, theinvolved stakeholders, the data collection approaches and some of the current deploymentsin the European Union.

Measurement types and procedures

Traditionally, according to the measurement range we have distinguished between Eulerianand Lagrangian measurements. The former corresponds to measurements from a fixedcontrol volume (e.g. static sensors), whereas the latter refers to measurements of the systemalong a particle trajectory (motion of a car). Some examples of mobile (or Lagrangian)sensors are RFID transponders, smartphones and GPS devices on-board vehicles, whichprovide position and/or speed.

Eulerian measurements can be further classified into local, instant and time-space mea-surements; and Lagrangian into complete and incomplete. Table 2.2 summarizes this clas-sification and supplies an example for each sub-type.

Sub-types Examples

Eulerian measurementstime-space measurementslocal measurementsinstant measurements

radar measurementsdata from loop detectorstraffic image

Lagrangian measurementscompleteincomplete

vehicle trajectoriestravel times from camera

Table 2.2: Eulerian and Lagrangian measurements with their sub-types and some exam-ples. Source: (Ou et al., 2011)

Basically, in the time-space measurements all the time-space region is considered, whereasin the local measurements the space interval of the time-space region becomes extremelysmall (point observations on a road), and in the instantaneous measurements is the timewhich becomes drastically small (traffic conditions over a road section of certain length Xat a certain time instant t).

In classical field theory, Lagrangian measurements track an individual fluid parcel (e.g.an individual vehicle) as it moves through time and space. These measurements enableto plot all the positions of an individual parcel through time. In contrast to Eulerian

Master en Logıstica, Transport i Mobilitat

Page 44: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 25

measurements, the Lagrangian ones reflect how vehicles experience the traffic. Nevertheless,in many cases we cannot get a complete vehicle trajectory, and obtain instead the time spentfor a vehicle to travel from one location to another, resulting in incomplete Lagrangianmeasurements.

According to the measurement types described above, we can characterize the mainmeasurement procedures that are employed to obtain traffic data. We shortly depict thefour techniques traditionally implemented, without including probe vehicle data, which isdescribed in Section 2.3. These four procedures are itemized next:

1. Measurement at a point (local measurements)

2. Measurement over a short section

3. Measurement over a length of road

4. The use of an observed moving in the traffic stream (moving observer)

1. Measurement at a point (local measurements)

It was the first procedure used for traffic data collection, either with hand tallies or pneu-matic tubes. This method is capable of providing volume counts and therefore flow canbe directly calculated. Moreover, it can give time headways if arrival times are recorded.The technology for making measurements at a point in freeways changed over 40 years agofrom using pneumatic tubes placed across the roadway to using point detectors. The mostcommon are based on inductive loop technology, but other methods include microwave,radar, photocells, ultrasonic and closed circuit television cameras.

Loop detectors are the most common source of data for traffic state estimation. Thesedetectors present the data in aggregated values for a time period ranging from 30 secondsto 10 minutes, and can supply flow measurements at the exact locations where they areinstalled. However, the error accumulation induced by these devices may affect the qualityof the obtained estimates.

Since the speed is defined as the rate of change of an object’s position (v = dx/dt), adx, however small, is required to calculate it. Then, speeds at a point can only be obtainedby radar or microwave detectors, because their frequencies of operation mean that a vehicleneeds to move only about one centimeter during the speed measurement. In absence ofsuch instruments, a second observation location is necessary to obtain speeds (e.g. dualloop detectors).

2. Measurement over a short section

In order to obtain speeds a second pneumatic tube, very close to the first (less than 10 m),is used. Next systems have used paired presence detectors, such as inductive loops spacedabout five to six meters apart. With video camera technology, two detector ”lines” placedclose together provide the same capability for measuring speeds.

Master en Logıstica, Transport i Mobilitat

Page 45: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

26 Master thesis

It is worth noting that loop detectors can supply the speed measurements only at cer-tain points on a road (local speeds), but not over a road section. Since loop detectorsnormally aggregate the speed measurements from 30 seconds to 10 minutes, which leads toover-representation of high speed measurements, the speed measurements have structuraldeviation from the ground-truth values. We note that this deviation is relatively biggerwhen the speed value is lower.

Then, these detectors give direct measurement of volume and time headways, as wellas speeds. We notice that most of the point detectors used, such as inductive detectors ormicrowave beams, take up space on the road, and are therefore a short section measurement.Thus, these detectors also measure occupancy, but because occupancy depends on the sizeof the detection zone of the instrument, the measured value may differ from site to site foridentical traffic.

3. Measurement along a length of road

The source of these measurements consists basically on aerial photographs and camerasmounted on tall buildings or poles. Since a single frame gives no sense of time, only densitycan be measured. Instead, if several frames are available (from a video-camera or fromtime-lapse photography over short time intervals), speed and volumes can also be measured.One advantage of such measurements is that, with suitable computer vision algorithms, truetravel times over a lengthy section of road can be obtained.

4. The moving observed method

Unlike previous measurement procedures, which always assume a fixed measurement region,there exists another procedure based on what is called moving observer. This method wasused in some early studies, but practically not used as the primary data collection becauseof the prevalence of other sources. We distinguish between two approaches to this method.

The first one is a simple floating car procedure in which speeds and travel times arerecorded as a function of time and location along the road. Although the floating car behavesas an average vehicle within the traffic stream, this approach cannot provide precise averagespeed data, but it is effective for obtaining general qualitative information about freewayoperations.

The other approach was developed in (Wardrop and Charlesworth, 1954) for urban trafficmeasurements with the aim to obtain both speed and volume measurements simultaneously.The idea behind is to have a survey vehicle driving in both directions of a traffic flow, eachtime recording the number of oncoming vehicles and the net number of vehicles it getsovertaken by, as well as the times necessary to complete the two trips. It is worth mentioningthat the round trip should be completed before traffic conditions change significantly (i.e.stationary traffic).

By using this method is possible to derive flow and density of the traffic stream inthe direction of interest. Nevertheless, as with the first approach, in order to obtain an

Master en Logıstica, Transport i Mobilitat

Page 46: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 27

acceptable level of accuracy on a road with a low flow, a very large number of trips isrequired.

Some comments on the measurement procedures

Cassidy and Coifman pointed out in (Cassidy et al., 1996) the criteria that need to be met forthe data from the described systems to satisfy the definitional requirements. These criteriacan be reduced to ensure that any vehicle entering the speed trap also clears it within thetime interval for which the data are obtained (or that the number of vehicles in time intervalis quite large). However, there is no guarantee that this criterion will regularly be met, sincethe need for timely information makes that some systems poll the loop detectors controllersat least every minute. Then, the volume counts on a single freeway lane is not large enoughto overcome the error introduced by missing one of the measures we are collecting.

Furthermore, it is worth highlighting that both the location and the time intervals mayaffect the obtained results. For instance, with respect to bottlenecks, we may collect datadescribing different traffic situations upstream and downstream from where the bottlenecktakes place. On the other hand, the adoption of intervals arbitrarily chosen is undesirable,since the data extracted over short measurement intervals is highly susceptible to the effectsof statistical fluctuations, and the use of longer intervals may average-out the features ofinterest.

2.3 Probe vehicle data

As we have already commented, this recent technique (the use of the so-called floatingcars or probe vehicles) consists of data collected by the vehicles themselves. They maybe compared to the moving observer method, but in this case the vehicles are equippedwith GPS and GSM(C)/GPRS devices that determine their locations and transmit thisinformation to some operator.

Initially, this procedure enabled an agency, e.g., a parcel delivery service, to track its ve-hicles throughout a network based on their locations. Nowadays, the technique has evolved,resulting in several completed field tests in order to estimate the traffic conditions based ona small number of probe vehicles as the main purpose. During field measurements, probevehicles can perform several types of behaviour, such as travel at the traffic flows’ meanspeed or try to travel at the road’s speed limit.

According to ISO 22837, probe data is defined as ”vehicle sensor information, formattedas probe data elements and/or probe messages, that is processed, formatted and trans-mitted to a land-based centre for processing to create a good understanding of the drivingenvironment”. Since the publication of the standard in 2008, the tracking possibilities usingmobile and nomadic devices increased substantially, requiring a broader definition of probedata, since a device in a vehicle which is able to determine at least the GPS position andcommunicate with some service application (such as tolling) can deliver probe data.

Master en Logıstica, Transport i Mobilitat

Page 47: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

28 Master thesis

A probe vehicle data system has the advantages that it does not need an expensiveinfrastructure and can collect data all the way along the route (and not only transversally).It is important noticing that ”pure” probe vehicle data is formed by messages sent to aprocessing server. Therefore, they are not used for safety purposes, and do not need to beexchanged between vehicles, although it could suppose an extension of this technique.

Due to the large number of future sensors and applications, it is difficult to define aset of data elements composing a probe vehicle data message. The data may refer to thevehicle, the environment (road, weather, traffic) and the driver (driving style, etc.). A basicmessage typically contains the geo-location and timestamp of the measurement, followed byone or multiple type-value pairs, such as speed, heading, etc. This data has to be quicklyavailable at the processing server (e.g. road operator), and should be accurate both spatiallyand temporally. The collection process must scale up to all vehicles and work for differenttypes of data.

Some examples and experiments with probe vehicle data include the comparison of traveltimes calculated from probe data with measured travel times, which concluded that theycorrespond reasonably, and the calculation of the minimum percentage of vehicles necessaryto estimate traffic stream characteristics for certain traffic patterns. Thus, the use of probevehicle data provides an effective way to gather accurate current travel times in a road net-work, which allows to obtain good updated estimations of traffic conditions. The techniquewill continue to grow and evolve, already by introducing personalised traffic informationto drivers, based on their location and the surrounding traffic conditions. Moreover, thisdevelopment is stimulated by the fact that the GSM market penetration still rises above70%.

The following goes into some issues related to probe vehicle data. First, we describe theinvolved stakeholders and which is the role they play in the supply chain the process repre-sents. Then, different approaches for probe vehicle data collection are discussed. Anotherimportant topic, the privacy of the user, that refers mainly to the mobile operator thatwants to track individual people’s units, is briefly debated. Last, we present an overview ofcurrent deployments with probe data.

2.3.1 Stakeholders and supply chain analysis

Although probe vehicle data collection enables the performance of many services, it is notan end-user service itself. However, a supply chain is only meaningful when it’s completelydepicted, that is, when all the responsible stakeholders are involved. Table 2.3 illustrates anexample of value chain with the responsible stakeholders. We note that depending on thesystem approach (cooperative ITS, mobile phone based, etc.) the responsible stakeholdermay change accordingly.

The most important stakeholders are the road operator and the user (vehicle driver).The former is in charge of the management of traffic data and the control and inducement

Master en Logıstica, Transport i Mobilitat

Page 48: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 29

Description Stakeholder

Configuration, management

Data acquisition

In vehicle processing

Transmission

Probe data processing

Service application

Define sensor data type, pa-rameters, time interval androad segment

Road operator, ap-plication provider

Sample the sensor, with thelocation and time stamp ac-cording to configuration

Vehicle, on-boardunit, nomadic device

Buffer the samples, filter ac-cording to threshold, sam-pling rate, prepared the datamessage, anonymize

Vehicle

Transmit the message viaWLAN, cellular, wimax

Road operatorand/or networkoperator

Aggregate probe data, mapmatching, storage

Road operator, ser-vice provider

Service applications: calcu-late travel time, speed, visu-alization, prediction, informdriver

All (road operator,driver)

Table 2.3: Value chain of probe data process with a short description of each stage andthe responsible stakeholder. Source: (Bessler and Paulin, 2013)

of traffic. However, the ideal probe vehicle data collection covering the whole road and withlow latency (time delay between the data capture and its availability) cannot be achievedby a road operator (high cost). The latter is sensitive to the additional costs for services ingeneral, and to probe vehicle data in particular, since they do not have information a prioriabout the benefits derived from a better traffic control. Basically, the main topics users areinterested in are traffic, environment and hot spot information in any situation.

Another of the involved stakeholders are car manufacturers, which can integrate on-board units to collect data from car sensors. The final goal is to add more value in theircars by including navigation systems, but they should take care of the price of these extrasdue to users’ price sensitivity.

2.3.2 Probe vehicle data collection approaches

A typical architecture of a probe vehicle data collection system consists of an on-board datacollection system in the vehicle (either integrated or a nomad device) that has access tocar sensors, primarily a GPS receiver. The sensors are periodically sampled and the datais formatted to probe vehicle data messages, which are sent over a wireless data connectionto a server of the service provider. There, messages from several vehicles are processed and

Master en Logıstica, Transport i Mobilitat

Page 49: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

30 Master thesis

stored into a database for further use (e.g. applications for traffic information and controlor for environmental information).

The early probe vehicle data systems required a fleet composed by taxis or busesequipped with wireless data communication devices, and a server that processes the in-coming data streams. In initial studies (Huber and Ogger, 1999), it has been shown that apercentage between 1% and 5% of collecting cars driving in the area of interest is neededto provide useful information on the urban traffic.

Although taxis and buses provide a major source of inner-city traffic information due tothe extended periods of time they spend on the urban road network, they have limitations.For instance, if taxi drivers take stops to avoid congested areas (because of their knowledgeof the local road network), information about the congestion, for example, may not bereported.

The sampling rate of the GPS position of vehicles in the fleet tends to be in general lowbecause of airtime transmission costs. At the server, the samples have to be map-matched(map the GPS position to a particular road), which is challenging if the samples are sparse,particularly in urban environments. The sparse sampling approach has, furthermore, otherdrawbacks, since data collected with a relatively large sampling interval (e.g. 30 seconds)leads to a low accuracy; and at low densities and low penetration rate it is necessary towait for minutes to collect enough samples.

In contrast to taxi fleets, which could offer a quite poor statistics of the dynamic trafficsituation, the high penetration rate of mobile phones have completely changed the situationin the last years. In fact, large map application providers such as Nokia, Google, Tom Tomor Apple added GPS tracking to leverage their maps, navigation and traffic services software.

On the one side there is more potential in smartphones for data collection in the future,as they have a large number of sensors besides GPS (e.g. compass that can determinethe heading or accelerometer). On the other side, nomadic devices (e.g. smartphones orTom Tom devices) do not offer the same level of integration as an embedded on-board unitconnected to the controller area network. Then, specialized probe vehicle data applicationsare still difficult to realize with mobile phones, and have more chances in a cooperative ITSenvironment with open and standardized interfaces.

Cooperative ITS services derive their name from the coupling between on-board driverassistants and road infrastructure services in order to improve the traffic control and safetyof the vehicles, and learn the traffic situation from vehicle sensors. The system architecturein a cooperative ITS is based on the ITS station. However, much remained to be done,since at the end of 2013 the most important messages exchanged between ITS stations werethe cooperative awareness message (a ”Hello” message) and the decentralized environmentnotification message.

Master en Logıstica, Transport i Mobilitat

Page 50: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 31

2.3.3 User privacy protection

In the context of probe vehicle data, user and data privacy is a fundamental and difficultproblem to solve. On the one hand, it means to disclose a user mobility trace, which isprocessed by an untrusted third party, in order to provide traffic features such as traveltime, traffic situation or dangerous hot spots. On the other hand, this trace needs to be un-distinguishable from the other near traces in time and space to obtain accurate information.

A first approach consists of anonymazing the traces by removing all the individual mea-surements, or by hiding the relationship between a trip identifier and the individual mea-surements. However, this approach is not sufficient because drivers can be re-identified bythe high spatio-temporal correlation between successive points and by correlation of anony-mous location traces with data from other sources. Thus, one of the challenges of probevehicle data is the privacy protection of this data. For instance, mobile phone applicationsdo not guarantee sufficiently the privacy issues.

2.3.4 Probe vehicle data deployments

Finally, let us consider some of the studies and experiments with probe vehicle data devel-oped in Europe so far. We briefly describe some research projects and field trials funded bythe EU, national and regional funded projects, activities of companies and real time traffictools run by governmental sites and cities.

Regarding the projects funded by the EU we emphasize DRIVE C2X and ROADIDEA.The former focuses on communication among vehicles (vehicle to vehicle) and betweenvehicles, a roadside and backend infrastructure system (vehicle to infrastructure). Previousprojects have tested the feasibility of safety and traffic efficiency applications based onC2X (Car-to-X) communication, but DRIVE C2X goes beyond the proof of concept andaddresses large-scale field trials under real world conditions at multiple national test sitesacross Europe. Some of the ITS services use probe vehicle data to determine traffic andweather conditions.

The other project, ROADIDEA, studies the European transport service system andtries to analyse all available information sources, ranging from probe vehicle data, in-roadsensor over to weather information, and identifies how these can be merged and used forvarious applications. Furthermore, this project recommends that user oriented services andbusiness models should be included in the overall ITS roadmap.

At a national level, the PRELUDE trial, in Netherlands, consisted of 60 cars that com-municate every 5 minutes a series of GPS positions gathered at 10 seconds intervals. Theinformation was very useful for determining the departure time and changing the route dur-ing the trip, but less useful for choosing the transportation mode. Some of its applicationsare travel time calculation, travel time prediction, use in logistics to schedule deliveries byconsidering historical data with actual travel times, and travel strategy evaluation using

Master en Logıstica, Transport i Mobilitat

Page 51: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

32 Master thesis

probe vehicle data measurements before and after the study of changes.

In Denmark, the ITS Platform studies the feasibility of a full, running ITS system andinvestigates how to provide an open platform where researchers, public authorities andcompanies can deploy and test ITS related applications. The system disposes currently 420active vehicles which generate data and evaluate the developed applications, such as trafficstatistics (e.g. for road authorities to calculate travel time, congestion levels, etc.) andcustomized traffic information.

When thinking about companies the example of Google comes to mind. In 2007, Googleextended Google Maps by adding historical patterns to their maps for the visualisation ofroad traffic information and in their routing tool. In 2012 they extended this feature to bebased on probe vehicle data originated from Android smartphones, which are equipped withGPS modules. The location is sent periodically to the Google server, where map-matching isperformed. Google also supports applications such as Google Navigation (2011) to calculatealternative routes in real time.

Acquired in 2013 by Google, the Israely company Waze developed a free smartphonenavigation application supported by the position information it receives by the users in thecommunity (crowd sourcing). Waze generates traffic information from the available GPSpositions and proposes better route alternatives to the users. Then, the grower the numberof users in the community, the more accurate the information would be. Both Googleand Waze technologies, although successfully deployed, are prone to data security and userprivacy issues.

For their part, car manufacturers have their own vision of connected car assistancesystems. The exchange of data, which includes the collection of extensive floating car data,follows in most cases a proprietary protocol. The introduction of electric cars has intensifiedthe app approach, since additional charging information has to be exchanged.

For instance, the Connected Drive system developed by BMW uses an integrated SIM inthe car to provide navigation, entertainment and safety services to BMW drivers. Neverthe-less, it is expected that in the field of probe data collection, the Car-2-Car Consortium willactively work towards an integrated approach that meet the interests of all stakeholders.

To end up this subsection we present some examples in which data is available online, most of them without requiring any registration, and is mainly provided by nationaldepartments of transport. The traffic data is most often obtained from permanent countstations installed on major roads (generally freeways). Therefore, typical parameters aretraffic flow and average speed. Further data such as occupancy rates and travel times (e.g.calculated from probe vehicle data) may also be collected.

In the case of Spain, the Direccion General de Trafico (DGT) of the Ministerio delInterior has been supplying a large amount of real time traffic data that are integrated inGoogle Maps. The user can then easily gather real time traffic flow and average speed from4000 traffic sensors located over the Spanish road network. Furthermore, it is possible to

Master en Logıstica, Transport i Mobilitat

Page 52: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 33

query historical values, such as intensity, composition, occupation rate and average speed.

In Italy, the registration of the users in a similar platform enables them to find on linereal time speed and number of vehicles on the Italian freeways network, as well as in thearea of major cities. Traffic data is provided from the largest probe vehicles fleet in Europe,which corresponds to hundred of thousands of anonymous customers equipped with GPS.The dataset may afterwards contribute to the route planning optimisation and be used bynavigation systems.

2.4 Traffic simulation and goodness-of-fit measures

This section provides a general overview of traffic microsimulation, the associated calibra-tion process and the goodness-of-fit measures employed to assess the effectiveness of thecalibration. Furthermore, as will be seen in Chapter 5, these measures are also used tocompare estimated values (obtained by the implementation of certain methods) with theircorresponding ground-truth values.

Traffic microsimulation models are used by researchers and practitioners for a detailedanalysis of the performance of transport systems. The effectiveness of such models inevaluating traffic management strategies lies in its availability to accurately replicate actualtraffic conditions. Thus, these models require proper calibration of their parameters ratherthan using default values.

Calibration is the process in which the model parameters of the simulator are optimizedto the extent possible for obtaining a close match between the simulated and the actual traf-fic measurements. Generally, calibration is an iterative process in which the engineer adjuststhe simulation model parameters until the results produced by the simulator match the fieldmeasurements. The comparison between simulation outputs and observed measurement isoften referred to as validation.

A typical statistical procedure for comparing two sets of data for a close match isthrough a hypothesis test such as the t-test. The null hypothesis in this context couldbe that the mean of the simulated traffic measurements is equal to that of the actual trafficmeasurements. However, there is a limitation of applying the t-test to traffic measurements,since observations should be identically and independently distributed (i.i.d), and simulatedand actual traffic measurements may not.

Table 2.4 summarizes the goodness-of-fit measures traditionally employed by researchersfor the validation process. Some comments about these measures are itemized next:

• Most of the measures will identify poor fit between the central tendencies of thecompared samples, while only few measures (especially Theil’s indicators) are sensitiveto variance and covariance

Master en Logıstica, Transport i Mobilitat

Page 53: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

34 Master thesis

• Some measures (PE, ME, MNE) let errors with a similar size but different sign balanceeach other

• Some measures (MAE, MANE) use the absolute value of the difference between theobserved and simulated measurements (they give equal weights to all errors)

• Other measures (SE, RMSE, RMSNE) depend on the squared difference, and henceplace a higher penalty on large errors

• GEH statistic is an empirical formula used in traffic engineering which has provenuseful for several traffic analysis purposes

Most measures involve summation of errors over series of pairs of simulated and observedvalues. However, it is not always obvious how to create these pairs. Generally, the trend isto consider the space of simulation outputs as one-dimensional, as only one index (denotedby i) is used for the series of measurements in all measures included in Table 2.4.

The most common dimension is time, but sometimes the measurements consist of valuesfrom different locations in the study network or different vehicles. Then, it is importantto choose not only a goodness-of-fit measure suitable for the particular needs of everyapplication, but also to use it in the appropriate dimension.

Master en Logıstica, Transport i Mobilitat

Page 54: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 35

Name Measure Comments

Percent error (PE ) xi−yiyi

Applied either to a single pair of observed-simulatedmeasurements or to aggregate networkwide measure-ments

Squared error (SE )N∑i=1

(xi − yi)2

Mean error (ME ) 1N

N∑i=1

(xi − yi)Indicates the existence of systematic bias. Useful whenapplied separately to measurements at each location

Mean normalized er-ror (MNE )

1N

N∑i=1

xi−yiyi

Indicates the existence of systematic bias. Useful whenapplied separately to measurements at each location

Mean absoluted er-ror (MAE )

1N

N∑i=1|xi − yi| Not particularly sensitive to large errors

Mean absolutednormalized error(MANE )

1N

N∑i=1

|xi−yi|yi

Not particularly sensitive to large errors

Root mean squarederror (RMSE )

√1N

N∑i=1

(xi − yi)2Large errors are heavily penalised. Sometimes appearsas mean squared error, without the root sign

Root mean squarednormalized error(RMSNE )

√1N

N∑i=1

(xi−yiyi)2 Large errors are heavily penalised

GEH statistic√

2(xi−yi)2xi+yi

Applied to a single pair of observed-simulated mea-surements. GEH < 5 indicates a good fit

Correlation coeffi-cient (r)

1N−1 ·

N∑i=1

(xi−x)(yi−y)σxσy

Theil’s bias propor-tion (Um)

N(y−x)2N∑i=1

(yi−xi)2

A high value implies the existence of systematic bias.Um = 0 indicates a perfect fit, Um = 1 indicates theworst fit

Theil’s variance pro-portion (Us)

N(σy−σx)2N∑i=1

(yi−xi)2

A high value implies that the distribution of simulatedmeasurement is significantly different from that of theobserved data. Us = 0 indicates a perfect fit, Us = 1indicates the worst fit

Theil’s covarianceproportion (Uc)

2(1−r)Nσxσy∑Ni=1(yi−xi))2

A low value implies the existence of unsystematic er-ror. Uc = 1 indicates a perfect fit, Uc = 0 indicatesthe worst fit. r is the correlation coefficient

Theil’s inequality co-efficient (U )

√1N

N∑i=1

(yi−xi)2√1N

N∑i=1

y2i +

√1N

N∑i=1

x2i

Combines the effect of all 3 Theil’s error proportions.U = 0 indicates a perfect fit, U = 1 indicates the worstfit

Kolmogorov-Smirnov test

max(|Fx − Fy|)F is the cumulative probability density function of xor y

xi simulated measurement; yi observed measurement; N number of measurements; x, y sampleaverage; σx, σy sample standard deviation

Table 2.4: Measures of goodness-of-fit with their formulae and some comments. Source:(Hollander and Liu, 2008)

Master en Logıstica, Transport i Mobilitat

Page 55: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

36 Master thesis

Master en Logıstica, Transport i Mobilitat

Page 56: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 37

Chapter 3

Preliminary analysis

Since the performance of a physical experiment with a significant share of vehicles is unfea-sible due to the high associated costs, a microsimulation environment emulating vehicles’functions is required to test the development of mobility services. However, a physical trialwith a reduced number of vehicles becomes necessary in order to model vehicles’ behaviorwithin the network and to calibrate the APIs modeling data gathered by connected cars.

The microsimulation scenario, corresponding to Barcelona’s Central Business District(CBD), comprises 7.46 km2 and more than 250.000 inhabitants. Its Aimsun model consistsof 2,111 sections (links) and 1,227 nodes (intersections), and the horizon study was set to30 minutes accounting for a total number of trips of 20,700 vehicles corresponding to amorning peak period configuration.

This simulation model was suitably calibrated after the analysis of the data gatheredin the physical trial, and a library of functions was developed in order to emulate thedata capture process of the equipped vehicles. In particular, the probe car radar visibilitypolygon for observed cars is calculated by adapting a ray casting detection algorithm toget observed objects according to the radar detection range, location of the radars andsurrounding conditions. Radars are one of the developing technologies already available,but other devices, such as front cameras, have not yet been used.

Furthermore, the analysis of the gathered data enables to identify vehicles’ unexpectedbehavior (with respect to the intended purposes), as well as semantically meaningful trends.Moreover, such an analysis allows the acquisition of a better understanding of the data whenimplementing the estimation methods.

In this chapter, we describe the technical specifications regarding the vehicles used forthe physical experiment and the data structure. Then, the main features for the datacollection in Barcelona are depicted, and, although this data was exhaustively analyzed forthe Connected Car project, only a summary of this analysis is included.

Master en Logıstica, Transport i Mobilitat

Page 57: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

38 Master thesis

3.1 Technical specifications

As stated before, the physical experiment was performed by three vehicles, one VolkswagenGolf Generation 7 and two Audi A3, equipped with the basic sensors already available,which are GPS, front camera and front and side radars. However, camera data has not beenexploited for the Connected Car project so far. Table 3.1 shows the functions associated toeach of the considered sensors.

Sensor technologies Functions

GPS navigation

Radars

front assistantcity emergency brakeadaptative cruise controlside assistant

Front camera

lane keeping assistanttraffic sign recognitiondynamic light assistantadaptative cruise control

Table 3.1: Sensor technologies and their functions. Source: Volkswagen AG

Previous vehicles are equipped with one front radar and two side radars, which arealmost at the rear part of the vehicle. The front radar has a detection radius of 180meters in the ±30 degrees cone, whereas the range for the rear radars is lower and noteasily determined, since we need to take into account the influence of both radars. Theycan simultaneously track multiple vehicles driving in front or behind the probe vehiclerespectively. Nevertheless, if the line of sight to one vehicle is continuously obstructed (forinstance by another vehicle), it will not be detected at all. Since radars are located in thefront and rear ends of the car, vehicles driving parallel to the equipped vehicle remain outof the scope of the radar cannot be observed.

Figure 3.1 illustrates an equipped vehicle (in red), and the coordinate axes from whichthe distances are calculated are superimposed. The radars are represented by grey circles,one at the front and two behind. The equipped vehicle observes seven vehicles in total, fourin front and three behind. As stated in the preceding paragraph, parallel vehicles are notdetected.

Current radar technology does not allow equipped vehicles to observe motionless vehi-cles, with a speed threshold of 3 km/h. Then, it is important to take this into account sincemotionless vehicles at a traffic light are not being detected during the red phase (approxi-mately), but they may be detected before and after it. In this case imputations of missingdata are required. Furthermore, when an equipped vehicle stops detecting a vehicle andobserves it again, a new identifier is assigned. This occurs to the vehicles at a red light,but also in other situations, such as the interruption in the detection for the overtaking ofa vehicle.

Master en Logıstica, Transport i Mobilitat

Page 58: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 39

Figure 3.1: Radars location (grey circles) and surrounding awareness of a probe vehicle(in red)

For each timestamp, which is defined as half second, a probe vehicle measures and storesthe following data:

• Equipped vehicles: timestamp, longitude, latitude, speed and heading (orientationwith respect to the physical north)

• Observed vehicles: timestamp of observation, longitude, latitude, speed, relativespeed, x-distance and y-distance according to the coordinate system shown in Figure3.1

Together with this data and the identifiers associated to the readings, other requiredinformation about the microsimulation model is provided by Aimsun. It is worth noting thatin real implementations an integrated cartography and a later processing by also consideringthe collected data (such as map matching procedures) may supply them:

• Equipped vehicles: section and lane of the Aimsun’s model, number of lanes in thesection and distance to the section’s downstream node

• Observed vehicles: corresponding section of the Aimsun’s model

Then, other variables can be derived from the ones presented at this point. In the case ofthe gathered data, the acceleration can be straightforward calculated for both vehicle types,as well as the Euclidean distance between the equipped vehicle and any of its observed.

Additionally, other derived variables, which are explained in detail in Chapter 4, arerequired. Basically, these variables are the distance to the section’s upstream node, the

Master en Logıstica, Transport i Mobilitat

Page 59: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

40 Master thesis

lane for the observed vehicles and the type of vehicle regarding to the position with respectto the equipped in charge of the detection (on the same lane or on another).

From a data structure point of view, Barcelona’s data was loaded into a PostgreSQLdatabase and extracted for the exploratory data analysis into R data.frames. A data.frameis a two-dimensional data structure containing as many rows as time stamps and as manycolumns as variables.

3.2 Data collection design

The experiment conducted in Barcelona consisted of the mentioned probe vehicles drivingeight hours a day during a 5-weekday period in November 2014. In order to capture theorigin-destination behavior in L’Eixample district, 12 itineraries referring to the journeysof interest were defined and assigned to drivers aimed at covering all of them at both rushhour intervals (morning and afternoon) and off-peak hours.

Each of the professional drivers was given a schedule with the itineraries to be performedeach day, their corresponding time slot (with a length of one hour), during which theitinerary has to be repeated as many times as possible, and the corresponding drivingtype. The latter has two levels, either normal, when the vehicle floats with the traffic,or aggressive, wherein the driver, while applying traffic regulations, try to overtake othervehicles since the final purpose is to arrive as soon as possible.

According to the most common journeys provided by the Barcelona City Council web-site, the itineraries are defined by grouping these journeys, which simplifies the field datacollection. Since itineraries are performed without interruptions (except for the regulatorystops every two hours), it is worth noting that transitions between them are flexible in thesense that the last lap of an itinerary can be left in the closest point to reach the followingitinerary. Thus, an itinerary may be initiated at any physical point, and not necessarily inthe defined start.

The itinerary we are going to focus on is the second one, since it contains the corridoremployed to test the different estimation methods. Itinerary 2, which is depicted in Figure3.2, has a length of 10.6 km, and one lap takes between 25 and 28 minutes on off-peakconditions. It is composed by two one-way urban corridors in the characteristic square gridof L’Eixample: Carrer Arago (highlighted in black) and Gran Via de les Corts Catalanes.The short stretches connecting both avenues belong to Carrer Tarragona (Placa Espanya,the western one) and Carrer Cartagena (Placa de les Glories, the eastern one).

Arago corridor has been chosen for several reasons. First, it is a one-way multi-laneurban corridor with merging and diverging sections, which are some of the characteristicsdescribed in the reference estimation method. Furthermore, it has been one of the arterialsless affected by roadworks. This fact has to do with the calibration of the Aimsun’s model,which was done a few years ago and does not include the changes in the infrastructure.

Master en Logıstica, Transport i Mobilitat

Page 60: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 41

However, in the case of Arago corridor, the model faithfully reproduces its current status.

Figure 3.2: Itinerary 2 on a street map. Source: Google Maps

As we can observe in Figure 3.2, there is a missing stretch around Carrer Tarragona,which corresponds to three sections of the model. This is due to the low demand inducedby Aimsun at the final part of the corridor, leading to an almost complete lack of data,which is required as input of the estimation method (see Chapter 4 for more details).

3.3 Exploratory data analysis

The exploratory analysis conducted for the Connected Car project consisted of two stages:in the former the equipped data was exhaustively examined, and in the latter two importanttraffic variables, speed and acceleration, were compared for both equipped and observeddata. It was performed at itinerary level and from an univariate point of view, which allowsto identify trends and atypical data for each of the examined variables.

Furthermore, a categorical variable was defined in order to evaluate the differencesbetween the rush hour and the off-peak periods. It is called day hour and indicates the dayand the hour at which the observation takes place. It has the form DX HY Y , where Xrefers to the day (from 1 to 5) and Y Y to the hour (from 08 to 19).

Although observations for both equipped and observed form a time series and are highlycorrelated, time serial statistical modelling is not considered in this analysis. Then, datais considered jointly and filtered by itinerary, and every itinerary is analyzed in the sameway, that is, by taking into account the same variables in a predetermined order and thesame types of plot. Since the developed estimation methods are tested on Arago corridor,

Master en Logıstica, Transport i Mobilitat

Page 61: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

42 Master thesis

the results presented below belong to Itinerary 2.

First, the trajectory is analyzed by plotting the geographic coordinates (longitude andlatitude) for all equipped vehicles performing the second itinerary. Then, the resulting shapeis compared with the shape of the itinerary depicted on a street map. Figure 3.3 showssome differences with respect to the designed trajectory (Figure 3.2), which may be causedby the roadworks affecting some stretches of the itinerary at the time of the experiment,such as the big ones in Placa de les Glories.

Figure 3.3: Trajectory plot (Itinerary 2)

Figure 3.4: Speed plot (Itinerary 2)

The distribution of the speed by the day-hour factor is next examined. It is easy toidentify in Figure 3.4 a rush hour period at which probe vehicles tended to reach lower

Master en Logıstica, Transport i Mobilitat

Page 62: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 43

speeds (during the second day around 17:00). Moreover, there are two motionless intervalsof considerable length, which, as verified in a further analysis, correspond to the equippedvehicle stopped in a parking spot.

In terms of the acceleration, it is almost symmetric around 0 m/s2, with a range from-4 and 4 m/s2. Nevertheless, Figure 3.5 presents some outliers exceeding previous values.For instance, the highest value (6.51 m/s2) corresponds to an increase of the speed from 0km/h to 45 km/h in a single timestamp (half second), which is obviously due to a temporarymalfunction of the measurement devices and, therefore, has to be filtered.

Both the speed and the acceleration were also analyzed by considering the associateddriving type (normal or aggressive). Although for some itineraries high speed values andoutliers in the acceleration were reached during an aggressive performance, for the othersthey were registered during a normal one, so we conclude the driving type does not influencesuch atypical values.

Figure 3.5: Acceleration plot (Itinerary 2)

The heading, which represents the orientation with respect to the physical north, enablesto identify the patterns derived from the repetitions of the same itinerary, and also thechanges due to roadworks or other reasons. The irregularities detected when analyzing thetrajectory can be seen in Figure 3.6 as deviations from the regular shape.

Figures 3.7 and 3.8 show the distribution of the variables to be compared (speed andacceleration) during the hours they were recorded, and for each hour a boxplot is superim-posed referring to the typical range.

In the case of the speed, observed vehicles present a considerably wider range (Figure3.7b) than the one for the equipped (Figure 3.7a). In fact, whereas equipped speed isbounded by 50 km/h, observed can reach speeds higher than 80 km/h, and, in contrastto the equipped, some negative values are registered. A deeper analysis manifested the

Master en Logıstica, Transport i Mobilitat

Page 63: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

44 Master thesis

fact that a high share of negative observed speed where recorded when the equipped wasmotionless, but speeds exceeding 80 km/h were unrealistic due to the associated trafficconditions.

Figure 3.6: Heading plot (Itinerary 2)

(a) Speed plot for equipped vehicles (b) Speed plot for observed vehicles

Figure 3.7: Distribution of the speed of the equipped and observed vehicles per hour(Itinerary 2)

Although the observed vehicles present more extreme outliers than the equipped for theacceleration (by considering their absolute value), the ranges for both the equipped (Figure3.8a) and the observed (Figure 3.8b) are much more similar. The aforementioned outlierswere separately analyzed, and in some cases we inferred that they correspond to errors inthe calculation of the speed from one timestamp to its following.

Master en Logıstica, Transport i Mobilitat

Page 64: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 45

(a) Acceleration plot for equipped vehicles (b) Acceleration plot for observed vehicles

Figure 3.8: Distribution of the acceleration of the equipped and observed vehicles perhour (Itinerary 2)

In conclusion, with this exploratory analysis the ranges and the atypical values for thequantitative variables gathered by equipped vehicles have been determined. Moreover, itallowed us to identify some causes for unexpected situations, such as the drastic changesfor some variables in a single timestamp when the equipped was motionless.

Master en Logıstica, Transport i Mobilitat

Page 65: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

46 Master thesis

Master en Logıstica, Transport i Mobilitat

Page 66: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 47

Chapter 4

Methodology

As stated in Chapter 2, traffic observation is one of the most essential elements for theplanning, control and management of transportation systems. It comprises the acquisitionof the fundamental variables, the object of the presented estimation methods, also referredto as traffic state by C. F. Daganzo.

A traffic state observation method is characterized by some properties, such as the si-multaneous obtainment of the fundamental variables or the coverage of a spatial and/ortemporal wide-ranging area. Since these features are difficult to be satisfied at the sametime, different observation methods have been traditionally applied to specific purposesbased on their feasibility and the required information. For instance, signal control re-quires spatiotemporally detailed information at target road sections, whereas macroscopicmeasures need information not so specific but covering a wider range.

One of the most extended methods uses detectors at a fixed point (e.g. loop detector),which can observe flow and occupancy at their installed points. Then, density and speedmay be estimated by supposing weak assumptions, such as the average vehicle length anda steady speed in a short section. Nevertheless, such methods are limited to provide tem-porally detailed information at the vicinity of the devices, and in order to acquire spatiallydetailed and wide-ranging information simultaneously a high number of such devices wouldbe required.

Probe vehicles offer a solution to mitigate previous inconveniences, because they areable to provide both spatial and temporary wide-ranging data and overcome the cost andtechnical limitations of the methods employed so far. In most of the cases, although thefinal purpose of the developing devices of such vehicles is to define and test new mobilityservices, the data they gather is a really powerful source for the characterization of thetraffic state.

The advances in ICT have popularized probe vehicles, and several academical studieshave considered them. For instance, in (Herrera et al., 2010) detailed observation data

Master en Logıstica, Transport i Mobilitat

Page 67: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

48 Master thesis

from a large number of GPS-equipped probe vehicles on a highway was evaluated, and in(Yuan et al., 2012) a traffic state estimation method based on Lagrangian observation anda Lagrangian formulated traffic flow was developed, which was validated by using bothGPS-equipped probe vehicle data and a boundary fixed detector.

However, probe vehicles equipped only with GPS positioning devices do not providevolume-related information, and, therefore, flow and density cannot be obtained withoutstrong exogenous assumptions, such as a fundamental diagram or the probe vehicle pene-tration rate. These variables may be acquired with on-board devices able to supply spacingmeasurement data, such as the radars in Connected Car project, which provide the distanceto observed vehicles in probe vehicles’ surrounding.

In this chapter we first present a recent method for the estimation of the fundamentalvariables with spacing measurement equipment, which is considered as the starting point forthe different implemented approaches. Then, after including some considerations regardingthe preliminary adaptations and data processing, we fully describe the implemented proce-dures based on the reference estimation method, which will be referred to as approaches.

4.1 Reference estimation method

The estimation methodology defined in this project is based on the work of the researchersToru Seo, Takahiko Kusakabe and Yasuo Asakura in (Seo et al., 2015), which proposes anestimation method for the fundamental variables with data observed by probe vehicles withspacing measurement equipment. Such technology is known as advanced driver assistancesystems (ADAS).

These systems, such as adaptative cruise control, collision avoidance, and autonomousvehicles, have been recently developed for driving comfort and efficiency purposes. SinceADAS-equipped vehicles must recognize their surrounding, and, thus, space headway, theyare expected to be utilized as a source of volume-related data, which may be automaticallycollected while driving.

The main objectives of their study were to develop a new probe-vehicle based trafficestimation method able to derive volume-related variables for probe vehicles observing theirsurrounding vehicles with on-board equipment, and to verify the proposed method underactual traffic conditions by conducting a field experiment at a single predetermined corridor.

Then, the described method is for the estimation of the fundamental variables in acorridor, which is assumed to be one-way traffic and can include multi-lane, merging anddiverging sections. The fundamental variables are determined by applying Edie’s generalizeddefinitions, which, as stated in Section 2.1, are computed in a time-space region based onthe trajectories of all vehicles in the region and the area of the region.

As inputs for the estimation method, the researchers use probe vehicle data consisting of

Master en Logıstica, Transport i Mobilitat

Page 68: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 49

the space headway between the probe vehicle and its leading vehicle in the same lane, whichis provided by ADAS and allow to reconstruct leader vehicles’ trajectories, and the positionof the probe vehicle. It is worth noting that these variables are continuously measured.

In order to simplify the procedure, a couple of conditions are presumed. The formerinvolves an error free assumption, and is based on the premise that the measurements haveno error and that the route of the probe vehicles drivers is identified without error. Thelatter consists of a random sampling assumption for probe vehicles, which are assumed tobe randomly distributed in the traffic with an unknown penetration rate, and in terms ofthe driving behavior of probe vehicles, which is the same as that of other vehicles along thecorridor.

The authors divide their proposed estimation procedure in two main steps: a first stepto discretise the targeted time-space region, and a second step to estimate the fundamentalvariables by considering probe vehicle data. Each step is detailed hereafter.

Step 1. The time-space region subject to the traffic state estimation is dividedinto multiple discrete time-space regions.

Figure 4.1: Time-space region A, trajectories n = 5, n = 8 and n = 12 corresponding toprobe vehicles. Source: (Seo et al., 2015)

In order to discretise the targeted time-space region any rule can be used. The simplestrule, which is employed for this method, consists of dividing it into identical size of Eulerianrectangles. In fact, this is a familiar discretisation in today’s traffic flow data, since fixed-point detectors are installed at a certain space resolution and the data is gathered for

Master en Logıstica, Transport i Mobilitat

Page 69: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

50 Master thesis

predetermined time intervals.

Figure 4.1 presents a time-space diagram and an example of a discrete time-space regionA, where the thick solid lines are probe vehicles’ trajectories, the dashed lines are those ofnon-probe vehicles, and the horizontally hatched regions sn are time-space regions betweenthe n-th probe vehicle and its leader.

Step 2. The fundamental variables of every discrete time-space region are in-dependently estimated based on the data collected by probe vehicles in theregion.

As stated before, the formulation defined in this method is based on the generalizeddefinitions for the fundamental variables (see Section 2.1 for more details). Given a time-space region A, the flow q(A), the density k(A) and the average speed v(A) within theregion, as defined by Edie, are calculated as follows:

q(A) =d(A)

|A|=

∑n∈N(A)

dn(A)∑n∈N(A)

|an(A)|, (4.1)

k(A) =t(A)

|A|=

∑n∈N(A)

tn(A)∑n∈N(A)

|an(A)|, (4.2)

v(A) =d(A)

t(A)=

∑n∈N(A)

dn(A)∑n∈N(A)

tn(A), (4.3)

where d(A) is the total distance traveled by all vehicles in region A (veh · km), t(A) is thetotal time spent by all vehicles in the region (veh · h) and |A| is the time-space area of theregion(km · h).

By setting N(A) as the set of all vehicles in region A, these definitions can be expressedas the sum for all vehicles n ∈ N(A) of the total distance traveled by each vehicle nin the region (dn(A)), the total time spent by each vehicle n in the region (tn(A)) andthe time-space region between each vehicle n and its corresponding leading vehicle, an(A)(an(A) = A ∩ sn, see Figure 4.1).

The estimators based on probe vehicle data are defined by replacing N(A) with the setof all probe vehicles in region A, P (A) ⊆ N(A), as follows:

Master en Logıstica, Transport i Mobilitat

Page 70: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 51

q(A) =

∑n∈P (A)

dn(A)∑n∈P (A)

|an(A)|, (4.4)

k(A) =

∑n∈P (A)

tn(A)∑n∈P (A)

|an(A)|, (4.5)

v(A) =

∑n∈P (A)

dn(A)∑n∈P (A)

tn(A). (4.6)

Estimators 4.4 and 4.5 are not unbiased in general, that is, there is difference betweenthese estimators’ expected value and the true value of the parameter being estimated. Thebias in the density and flow estimators can be approximated with the third-order Taylorexpansion as (see (Seo et al., 2015) for the derivation of 4.7 and 4.8):

E[q(A)] ' k(A) +1

P(A)

V ar[sn(A)]

E[hn(A)]3. (4.7)

E[k(A)] ' q(A) +1

P(A)

V ar[hn(A)]

E[hn(A)]3, (4.8)

According to these equations, there exists an inverse correlation between the bias andthe number of probe vehicles in the region, that is, the higher the number of probe vehiclesin the region, the lower the bias of the estimators. Therefore, the bias tends to be mitigatedas the size of the region is increased (i.e. time/space resolution is lowered) or the probevehicle penetration rate is increased. Furthermore, previous equations show that the biasalso correlates to the variance of all vehicles’ mean time/space headway, which depends onthe traffic flow characteristics. Finally, there is no bias in the speed estimator.

4.1.1 Discussions on the method

This method allows to estimate the density, flow and average speed based only on probevehicle data, which makes the acquisition of wide-ranging data easy. Moreover, the methoddoes not presume a fundamental diagram, so the jam density kj and the capacity flow qmaxare unknown. This assumption is often used in existing studies to estimate the traffic state.

Since this method is independent from a fundamental diagram, the obtained results arerobust against unpredictable or uncertain factors of traffic flow, such as traffic incidents.

Master en Logıstica, Transport i Mobilitat

Page 71: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

52 Master thesis

In addition, the results can be used to estimate or calibrate traffic flow models (includingfundamental diagrams).

As we have already commented, there is a trade-off between the resolution of the es-timation (related to time and space) and the estimation accuracy, as well as a positiverelation between the accuracy and the precision, and probe vehicle penetration rate. Then,depending on the requirements of the estimation and the number of available probe vehicles,analysts can choose an appropriate resolution in order to maintain the required accuracyand precision. For instance, if there is a sufficiently large number of probe vehicles, a highresolution can be set, which allows the identification of changes in traffic states. Instead,if the number of probe vehicles is small, a low resolution is still valuable, since it allowsanalysts to acquire some knowledge on the traffic, such as an hourly traffic volume.

However, the proposed method relies on assumptions that may not always be satisfiedin the real world, especially the error free and random sampling assumptions. The errorfree assumption can be reasonable except in situations with very sparse traffic wherein aleader vehicle is not detected. Although errors in the spacing measurement directly affectthe calculation of an(A) and the accuracy and precision of the method, if the length of thetime-space region A is larger than the expected error in position measurement, errors indn(A), tn(A) and an(A) are negligible. In addition, errors in spacing measurement shouldbe small since ADAS technology has to be accurate enough in order to enable a comfortabledriving.

Furthermore, the random sampling assumptions may not always be satisfied in the realworld, since probe vehicles may have different driving behaviours compared with othervehicles. The estimation biases induced by biases in probe vehicles’ driving behaviour canbe explained as follows. Let r1 and r2 be the (positive) coefficients describing the biases onspeed and spacing respectively. Then, if the speed of a probe vehicle is r1 times the averagespeed of all vehicles, the flow and average speed will be estimated r1 times the actual state.In the case of the spacing, if it is r2 times the average spacing of all vehicles, the flow andthe density will be estimated 1

r2times the actual state.

4.1.2 Field experiment

The authors conducted a field experiment under actual traffic conditions in order to testthe proposed estimation method. It took place at a urban expressway in Tokyo (Japan),precisely the Inner Circular Route of the Metropolitan Expressway in the counterclockwisedirection, and twenty probe vehicles with spacing measurement devices were employed.

The Inner Circular Route is a ring-shaped urban expressway located in central Tokyo,with a loop length of 14.2 km and most sections with two lanes for cruising and passing.It is worth noticing that the survey section was limited to a 11 km stretch because a seriesof tunnels disturbed the GPS functions. To verify the obtained results, a large number ofdual supersonic traffic detectors were installed, which were able to observe flow, spot speed

Master en Logıstica, Transport i Mobilitat

Page 72: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 53

and spot occupancy directly, from which the density and the average speed were calculated.The time resolution of the observation with traffic detectors is of 1 minute and the averagespace resolution of 250 m per lane.

Twenty standard-sized passenger vehicles with non-professional drivers were employedas probe vehicles. These vehicles were equipped with GPS loggers and mono-eye cameras ontheir dashboards to identify and record their position and the spacing with leader vehicles.Most of the probe vehicles drove three laps on the Inner Circular Route, resulting in theperformance of 59 laps during the experiment period (42.1 veh/h/lane). According to thedetector data, the flow of all vehicles in the survey section was 1255.2 veh/h/lane, whichcorresponds to a probe vehicle penetration rate of around 3.5%.

Both position and spacing data were collected at intervals of 15 s. Vehicle positions wereidentified by GPS data, and a simple map-matching procedure was employed to match thegeographical coordinates to the closest point on the links of a digital map. The spacingmeasurement method consisted of the identification of the corresponding leading vehicle inthe image, the measurement of the apparent size of its body width, and the calculation ofthe spacing based on different factors, such as the apparent size, its assumed actual sizeand the angle of view camera.

Although both procedures (positioning and spacing) have errors, they may not be signif-icantly biased. Some small errors were found by the authors in the GPS measurement andmap-matching procedure, but they do not significantly affect the estimation method (seeSubsection 4.1.1 for more details). Regarding the spacing measurement, if the assumed ormeasured variables (angle of view camera, leading vehicle size, etc.) have errors, the spacingwill consequently have errors. However, it is not possible to determine the amount of errorsbecause the corresponding ground truth data is not available. The reason for this simplemethod is because its implementation is relatively easy and satisfies the requirements forthe conducted experiment.

4.2 Previous considerations

Since the estimation method presented in Section 4.1 considers as input data the GPS posi-tion of probe vehicles and the space headway with its leading vehicle, and both are providedby the available equipped vehicles, it represents a starting point for the development of anestimation method for the characterization of the traffic state. Furthermore, Connected Carvehicles are able to detect not only their leading vehicles, but also the follower and othervehicles driving in the same or other lanes.

This section includes the factors to be adapted to our particular case and the requireddata processing functions in order to proceed with the implementation of the estimationmethod. As commented in the introduction of the current chapter, we will implementdifferent approaches to deal with the traffic state characterization. These procedures areitemized in the upcoming sections.

Master en Logıstica, Transport i Mobilitat

Page 73: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

54 Master thesis

4.2.1 Preliminary adaptations

Regarding the first step of the reference estimation method, the targeted time-space regionis divided into Eulerian rectangles of identical size. In the case of an urban environment,the intrinsic limitations of the urban road network have to be considered, since its topologyinduces specific ways of discretising the time-space region.

In fact, given a corridor in an urban network, it seems reasonable to spatially discretiseit by considering the stretched between consecutive junctions. Since the space length ofsuch stretches may differ between them, it keeps the regions from being equally sized. Inaddition, the interruptions originated by the traffic light system makes the time not to becontinuous, as opposed to freeways. Thus, the length of the time intervals employed todiscretise the time horizon will depend on the cycle length. For instance, in L’Eixample(Barcelona’s CBD), the most common cycle length is 91 s and the average length of thesections is 75 m.

Figure 4.2 shows a sketch of the discretisation of the targeted time-space region, whichcorresponds to the outer square. The corridor is composed by sections, the black blocks,and nodes, the grey ones, adopting Aimsun’s terminology.

Figure 4.2: Sketch of the discretisation of the targeted time-space region.

Physical intersections are ignored in this discretisation because they merge vehiclesdriving along the corridor with other vehicles. Moreover, equipped vehicles’ radars are not

Master en Logıstica, Transport i Mobilitat

Page 74: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 55

designed for the detection of vehicles driving in a perpendicular way. For instance, whenthe equipped is stopped at a red traffic light and the vehicles in the right-angle street arepassing by, probe vehicles may detect them and, therefore, the gathered data may presentan unexpected behavior. For these reasons, and for the sake of simplicity, we only takeinto account the observations in the section within the region for the estimation of thefundamental variables in a discrete time-space region.

The fact that data is gathered every timestamp (half second) imposes conditions onhow to reconstruct vehicles’ trajectories and calculate the required measurements (traveleddistance, total time spent and area of the region). It consists of advancing through thetimestamps composing the time window by considering at each iteration two consecutivetimestamps and their corresponding data, as shown in Figure 4.3.

Figure 4.3: Sketch of a discrete time-space region

Figure 4.3 depicts one of the discrete time-space regions presented in Figure 4.2 and ashort stretch of the trajectories for one equipped vehicle and one observed (leading vehi-cle), from timestamps t to t + 3. For each pair of consecutive timestamps we can build aquadrilateral by joining the four points and calculate its surface. Then, by adding all thecalculated surfaces along the timestamps we obtain an(A) for the n-th probe vehicle.

Furthermore, the total traveled distance and the total time spent in the region are readily

Master en Logıstica, Transport i Mobilitat

Page 75: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

56 Master thesis

calculated every two timestamps by considering the vertical distance between probe vehicle’spoints (represented as crosses in Figure 4.3) for the distance and a a single timestamp (halfsecond) for the time.

Finally, according to the definition of space headway provided in Subsection 2.1, dis-tances are calculated between vehicles’ rear bumpers. As shown in Figure 3.1, the coordinateaxes from which the distances are calculated are not directly set in the rear bumper, butsome centimeters further. Thus, it is necessary to adjust the distances between equippedand observed vehicles.

Figure 4.4: Sketch of a discrete time-space region

In Figure 4.4, a simple example consisting of a probe vehicle observing an in front vehicleand a rear vehicle is illustrated. The easiest way to adjust the displayed distances consists ofsubtracting the Rear distance from the supplied equipped vehicle’s distance and the followervehicle’s length from the given y−distance. In this way, we locate the equipped and thefollower in their rear bumper, as the leader is already placed (dashed lines in Figure 4.4).Since the length of the vehicles is not provided, we consider an average vehicle length of 4meters for all approaches.

4.2.2 Data processing

Apart from the functions connecting the R environment to the database, a description ofthe functions directly related to the implemented approaches is included next. Basically,they have to do with the processing of the available data in order to obtain the informationwhich is not directly supplied.

Initialization

The first stage consists of the obtainment of the dataframes for both the equipped and

Master en Logıstica, Transport i Mobilitat

Page 76: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 57

observed readings, which contain the data described in Section 3.1. In the script Initialize.R,the equipped dataframe is loaded and slightly adapted for uniformity purposes, such as theirnames or the format the data is presented with. Then, the observed dataframe is loadedand some equipped information is included with the objective to place in the same readingdata of the observed vehicle and the corresponding equipped.

New Variables

The script called NewVars.R contains the functions to create the derived variables. It isworth noting that Initialize.R script calls some of the functions presented, because thegenerated variables are required for both the equipped and the observed dataframes. De-pending on the input data they need, these functions are run for the equipped dataframe(and merged with the observed subsequently), or directly for the observed. The main func-tions are itemized next:

• Distance from node (distanceFromNode): Despite the distance to the section’sdownstream node is provided, the natural way to plot trajectories and calculate dis-tances are from the section’s upstream node. We can straightforward convert thesupplied distance into the distance from the upstream node by subtracting the firstto the length of the section, which is obtained from Aimsun.

• Observed lane (estimate obs lane): Although Aimsun is able to supply the lanewhere a vehicle is driving, we implement a function to infer it for the observed vehiclesunder the assumption that the lane of the corresponding equipped vehicle is provided.The reason for the estimation is to avoid exogenous information when possible. Then,it can easily be calculated based on the the y−distance, which is provided by theradars, and the equipped vehicle’s lane, number and width of the lanes, which forthis experiment are taken from Aimsun, but in real applications may come from anintegrated cartography.

• Vehicle type (vehicle type): In this case, the type of an observed vehicle is definedaccording to its position with respect to the corresponding equipped vehicle. Asinput variables we consider the observed lane, equipped lane and x−distance. Ifboth vehicles are on the same lane and the x−distance is negative, it means that theobserved is behind, and if the x−distance is positive, the observed is in front. Then,if the observed is immediately in front or immediately behind they are labelled asleader and follower respectively. Otherwise, and together with the vehicles driving indifferent lanes (with respect to the equipped vehicle), observed vehicles are labelledas other.

Input Variables

Once the equipped and the observed dataframe are properly completed, we can proceedto define a single dataframe that strictly contains the required fields for the implementa-tion of the approaches. This dataframe is built in the input variables.R script, and it is

Master en Logıstica, Transport i Mobilitat

Page 77: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

58 Master thesis

called input vars, since it will serve as input for the approaches. It basically removes theunnecessary variables and consolidates the rest under a common notation.

Approaches

Each approach is implemented in a separate R script, which processes the correspondinginput data in order to estimate the traffic state for each discrete time-space region. Thus,the output is a dataframe with the estimated fundamental variables (flow, density andaverage speed) for each time window and section.

In general terms, every approach iterates over the time windows and the sections, andchecks if there is available data. If not, it is not feasible to provide an estimation and weset the fundamental variables for this discrete time-space region to NA (Not Available). Ifso, it analyzes if the available data enables the estimation of traffic state.

Main script

In this R Markdown script, the above scripts and functions are called. Moreover, it includesthe required R libraries, calculates the values of the goodness-of-fit measures and generatesthe plots for each of the implemented experiments.

4.3 Leader approach

The Leader approach consists of restricting the set of observed vehicles by only consideringthe leader vehicles. Although it is a direct implementation of the reference estimationmethod, we take into account the considerations described in Section 4.2, together withother factors related to data availability described next, for a correct adaptation of themethod.

According to the reference method, for each discrete time-space region the trajectoriesof equipped and leading vehicles are characterized. Then, the total distance travelled byprobe vehicles in the region, the total time spent and the area of the strips between probeand leading vehicles are calculated. With this information, the fundamental variables areestimated. In this case, as shown in Figure 4.3, for each equipped vehicle the trajectories arebuilt every two consecutive timestamps, so the approach also iterates over the timestampsthe equipped is passing through the region. For this purpose, we define three auxiliary vari-ables initialized to zero for each equipped vehicle, namely area n, distance n and time n,and they are updated every timestamp with the calculated values.

As stated in Section 4.2, all procedures iterate over the time windows and the sectionsin order to estimate the fundamental variables for each discrete time-space region. Then,and before the computation of the defined auxiliary variables, for each equipped vehiclewithin the region we verify if there are at least two readings, otherwise it is not possibleto proceed with the estimation (with a single reading a strip can not be depicted). It is

Master en Logıstica, Transport i Mobilitat

Page 78: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 59

important noting that equipped vehicles’ trajectories can be completely reconstructed, butthe leader ones may not, since leader vehicles are not always detected.

Afterwards, for each timestamp we check if a leader is observed or not, and store thisinformation in a vector called available data, whose length corresponds to the amount oftimestamps spent by the equipped vehicle in the discrete time-space region. Given a times-tamp, if a leading vehicle is detected the corresponding position of the vector is set to 1,and 0 otherwise.

At this point, we perform some verification regarding the missing data. The preliminaryanalysis showed some instances where the equipped vehicle suddenly stops detecting itsleader vehicle, probably due to some malfunction or precision issues, and data is missing fora single or a small set of consecutive timestamps. We will refer to such situation as Case(0), and we would avoid it by imputing an average value instead, as illustrated in Figure4.5.

Figure 4.5: Case (0): complete intermediate missing data

In this example the equipped vehicle detects its leading vehicles at timestamps t andt + 2, but no data is gathered at t + 1. This missing value may be inferred as half of thedistance traveled by the leading vehicle from timestamp t to t + 2. In order to simplifythe implementation, this process is only performed for a single missing value between two

Master en Logıstica, Transport i Mobilitat

Page 79: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

60 Master thesis

provided ones.

It is worth taking into account that apart from the first and the last timestamp of anequipped vehicle in a discrete time-space region, which are only used once for calculations,the rest of the timestamps are considered twice. Thus, when we complete the missingleader traveled space at a particular iteration, we have to check afterwards if data wascompleted and, consequently, to consider the inferred value for the associated calculations.The employed notation is P1 and P3 for the equipped points at timestamp t and t + 1respectively, whereas for the leader vehicle these points are denoted by P2 and P4.

Hence, after verifying if data was completed in the previous iteration, which means thatP4 was inferred and in the current iteration is taken as P2, we proceed to check if theimputation of the leader vehicle distance is required by considering the two timestamps ofthe iteration and the upcoming one. If so, we complete the missing value (P4) and weupdate the area n, distance n and time n for this equipped.

If data is not completed in the previous iteration or in the current one, because it is notfeasible or not required, four different possibilities are defined in terms of the availabilityof leader data. These cases are displayed in Figure 4.6, where the equipped points arerepresented by crosses and the leading points by filled circles. The unfilled circles refer tomissing data.

Anyway, for all cases the main goal is to characterize at each iteration the four pointsenclosing the area of the strip between the equipped and the leading vehicles for the twoconsecutive timestamps, as shown in Figure 4.3. Each iteration considers certain timestampst and t + 1, and for each timestamp the two-dimensional points (time, space) for boththe equipped and the leader vehicle. If this characterization is not feasible, the auxiliaryvariables (area n, distance n and time n) are not updated and the procedure continues, ifapplicable, with the next timestamps.

From the cases showed in Figure 4.6, Case (1) already contains all the required data,since it has been gathered by the equipped vehicle. The rest of the cases, instead, requirean additional step in order to check if it is feasible to characterize the missing data and,consequently, update the auxiliary variables.

As we have already commented in Section 3.1, the equipped vehicles are not able todetect motionless vehicles. For instance, when the leading vehicle gets closer to a red trafficlight, it reduces its speed until it fully stops, and at some instant the equipped vehiclestops detecting it. Although the literature considers a speed threshold of 3 km/h for avehicle to be considered motionless, in this case we need to slightly increase this value inorder to identify if the observed vehicle is or not motionless. In fact, if an observed vehicle isreducing its speed, its last gathered speed is greater than 3 km/h, and the speed at followingtimestamp is not gathered (the speed is under the threshold and, therefore, the vehicle isalmost motionless). Then, the condition that determines that a vehicle is motionless whenits speed is less than 3 km/h would not be satisfied (such values are not gathered). Byincreasing the threshold from 3 km/h to 5 km/h the condition may be satisfied for its last

Master en Logıstica, Transport i Mobilitat

Page 80: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 61

collected speed, which allows motionless leading vehicles to be taken into account.

It is worth noticing that the leader vehicle stops first at the red traffic light (it arrivesbefore), and, consequently, starts reducing its speed and, afterwards, drives again beforethe equipped vehicle. This behaviour can be easily identified in the time-space diagrams,since the timestamp after the last detected the probe vehicle is still in motion, and at thefirst timestamp after the motionless period the equipped is still motionless.

Time window

Se

ctio

n

Time

Space

equipped vehicleleading vehicle

t t+1

P1

P3

P2

P4

(.1) Available data at timestamps t and t+ 1

Time

Space

equipped vehicleleading vehicle

t t+1

Time window

P1P3

P2 P4

Se

ctio

n

(.2) Available data only at timestamp t

Time

Space

equipped vehicleleading vehicle

t t+1

Time window

P1 P3

P2 P4

Se

ctio

n

(.3) No available data

Time

Space

equipped vehicleleading vehicle

t t+1

Time window

P1 P3

P2P4

Se

ctio

n

(.4) Available data only at timestamp t+ 1

Figure 4.6: Different cases for data availability for two consecutive timestamps

This motionless situation corresponds to Cases (2), (3) and (4). When the speed of theleading vehicle is less than 5 km/h, it is considered to be motionless, and at some point it

Master en Logıstica, Transport i Mobilitat

Page 81: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

62 Master thesis

ceases to be detected. Such situation is illustrated in Case (2) (Figure 4.6.2). The easiestway to impute the missing value and achieve the four points is the one implemented inthis approach. We keep the last gathered travelled space for the leader vehicle until it isdetected again.

Then, in Case (3) the traveled distance is missing for timestamps t and t+ 1, as shownin Figure 4.6.3. If the leading vehicle is motionless under the established criteria, we imputeboth missing distance values with the last collected value as in Case (2). Finally, since theleader vehicle sets in motion before the equipped vehicle does, in Case (4) (Figure 4.6.4) itis necessary to check if the equipped vehicle is still motionless at timestamp t + 1. If so,we assume that the situation corresponds to the motionless period illustrated in Figure 4.7,and we infer the value for the missing leader observation. If not, it may be a different case,such as the equipped not observing a leading vehicle or a missing observation between twoprovided ones (process described above), and such an imputation would not be justified.

Time

Space

equipped vehicleleader vehicle

Time window

Se

ctio

n

Case (4)Case (3)Case (2)

Case (1)

Figure 4.7: Equipped and leading vehicles during a motionless period

An overview of the motionless situation described above is depicted in Figure 4.7. Inthis example both vehicles are motionless during 6 timestamps. Last traveled space for theleader vehicle is imputed for the missing ones (5 following timestamps), provided that it ismotionless (its last recorded speed is less than 5 km/h).

Once the procedure iterates over all timestamps within the length of the analyzedequipped vehicle in the discrete time-space region, the variables area n, distance n andtime n have been calculated. Then, they are added to the general variables for the discretetime-space region being parsed, namely area, distance and time, and the approach contin-ues with other equipped vehicles in the region, if applicable. Finally, after iterating over

Master en Logıstica, Transport i Mobilitat

Page 82: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 63

all vehicles, the variables area, distance and time have been accordingly updated, and thetraffic state for the discrete region can be estimated by applying the formulation presentedin Section 4.1.

The pseudocode of the described approach is presented in the following page, and givesan overview of the corresponding R script fv lead.R.

Master en Logıstica, Transport i Mobilitat

Page 83: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

64 Master thesis

Data: input varsResult: data.frame with the fundamental variables for each time window and sectionforeach tw in time windows do

foreach s in sections dovariables area, distance and time set to 0;if has data(tw,s) then

foreach e in equipped vehicles(tw,s) dovariables area n, distance n and time n set to 0;if readings(e) ≥ 2 then

compute the available data vector;foreach t in timestamps(tw,s,e) do

obtain the equipped points P1 and P3;if Case 0 then

obtain P2 and infer P4 with data from t+ 2;endif Case 1 then

obtain P2 and P4;endif Case 2 then

obtain P2 and infer P4;endif Case 3 then

infer P2 and P4 from a previous iteration;endif Case 4 then

obtain P4 and infer P2;endif P1, P2, P3, P4 characterized then

update area n, distance n, time n;end

end

endupdate area, distance, time;

endestimate q, k, vs from area, distance, time;

elseset q, k, vs to NA;

endstore q, k, vs in the output data.frame;

end

endAlgorithm 1: Pseudocode for the Leader approach

Master en Logıstica, Transport i Mobilitat

Page 84: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 65

4.4 Leader-Follower approach

A simple preliminary trajectory analysis for each discrete time-space region performed afterthe implementation of the Leader approach showed some blanks along several leader trajec-tories, because during some single timestamps and/or longer intervals the leading vehiclewas not detected.

Figure 4.8 depicts the trajectory of five equipped vehicles (orange points) in the discretetime-space region defined by section 372 and time window 2. For each equipped vehicle,the gathered leader data is also plotted (see blue points in Figure 4.8a), and it is clear thatthe missing data along the leader trajectories does not allow to completely rebuild them.In addition to the missing values because the leader vehicle was motionless (orange rowfor the fourth trajectory), other values are missing due to unknown reasons (see the firsttrajectory).

(a) Leader data (b) Leader and follower data

Figure 4.8: Trajectory plot at section 372 and time window 2

By including the follower observations some of the mentioned blanks can be mitigated(see Figure 4.8b), such as the first trajectory of the region, which was not considered in

Master en Logıstica, Transport i Mobilitat

Page 85: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

66 Master thesis

the Leader-approach due to the lack of data. This consideration leads to the definition of anew approach by taking into account the follower data when the leader one is not available,but always prioritizing the latter, in order to be as faithful as possible to the referenceestimation method. The way to proceed is really similar to the Leader approach, but someextra checking is defined to properly consider the follower data.

Then, the Leader-Follower approach iterates over the time windows within the timehorizon and the sections within the corridor, and for each equipped vehicle in the discretetime-space region the available data vector is computed. Unlike the previous approach,data availability at each timestamp has now 4 possible levels: -1 when neither leader norfollower data are available, 0 for solely leader data, 1 for solely follower data and 2 when bothvalues are available (the equipped vehicle detects both vehicles at the same timestamp).

Regarding the data imputation, when information is missing for a single timestamp butavailable for both the preceding and the following (which was defined as Case (0) in theLeader approach), we proceed in the same way by imputing the traveled space for twoprovided leader observations or two follower observations. If the equipped vehicle observesits leading vehicle at timestamp t and its follower at timestamp t+ 2 or vice versa, it is notfeasible to complete the missing value. Figure 4.5 (Leader approach) represents a particularcase with leader data for timestamps t and t+ 2.

Figure 4.9 shows the subcases induced by the consideration of follower data in Case (0).If the follower data is available for timestamps t and t + 2 (Figure 4.9a), the missing onecan be set to half of the distance traveled by the follower vehicle between these timestamps.In the case illustrated in Figure 4.9b, the leader data is provided at timestamp t and thefollower at timestamp t+ 2, so the missing data can not be inferred, since observations areof different type (the same applies for the vice versa situation). Finally, if both data areavailable at one and/or both timestamps (see Figure 4.9c), the vehicle type with 2 readingsis the one employed for the complete process (leader data if the observations for both leaderand follower vehicles are available).

(a) Follower data (b) Leader and follower data (c) Leader data

Figure 4.9: Case (0): examples for the complete process for missing data

The performed checks before the discussion of the different cases in terms of the available

Master en Logıstica, Transport i Mobilitat

Page 86: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 67

data are the same for this approach. First, if data was completed in the previous iterationwe take the inferred intermediate value for the current one, and, together with the other two-dimensional points, we use them for the update of the equipped auxiliary variables (area n,distance n and time n). Then, if it was not completed, we verify if such an imputation isrequired and, if so, the auxiliary variables can be updated since the four required pointshave been determined.

As with the Leader approach, when neither data was completed in the previous iterationnor data is inferred in the current one, 4 cases are defined depending on the data availabilityfor every pair of consecutive timestamps. In the Leader-Follower approach these situationsare considered as general cases, and are divided into subcases according to the feasiblecombinations of leader and follower data, as explained next. Finally, when all equippedvehicles within the discrete time-space region have been examined and the auxiliary variablesproperly updated, the approach provides the estimation for the fundamental variables inthe region.

Case (1): Available data at timestamps t and t+ 1

Case (1) corresponds to the available data at both timestamps, and for the Leader-Followerapproach it is divided into three subcases. The one depicted in Figure 4.10a represents theavailability of leader data at timestamps t and t+ 1, with the possibility of also presentingfollower data. Since leader data is utilized where possible, it is used to update the variablesarea n, distance n and time n and the follower values are ignored.

In subcase (b), instead, only follower data is gathered, as shown in Figure 4.10b, andin subcase (c), Figure 4.10c, both information are available at timestamp t, but only thefollower at t + 1 (the same will apply by swapping t with t + 1). In both subcases therequired points for the observed vehicle are computed with the follower values.

(a) Leader and follower (partial)data

(b) Follower data (c) Leader (partial) and followerdata

Figure 4.10: Subcases for data availability in Case (1)

It is important noting that, in contrast with Case (1) of the Leader approach, thepresence of data does not guarantee the update of the auxiliary variables, since a single

Master en Logıstica, Transport i Mobilitat

Page 87: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

68 Master thesis

leader observation at timestamp t and a single follower observation at t + 1 does not giverise to a strip and, therefore, it is not possible to define an area.

Case (2): Available data at timestamp t and no data at t+ 1

The fact that no data is gathered at timestamp t + 1 is probably due to vehicles gettingcloser to a red traffic light, and consequently reducing their speed until they are not beingdetected by the equipped vehicle. This situation comprises three different subcases, onewith leader and follower data at timestamp t, and the other two with either one or theother, as shown in Figure 4.11.

(a) Leader data (b) Follower data (c) Leader and follower data

Figure 4.11: Subcases for data availability in Case (2)

In subcases (a) and (b) (Figures 4.11a and 4.11b respectively) the required imputationis analogous to that in Case (2) of the Leader approach. However, it is mandatory to checkif the observed vehicle was motionless, otherwise the interruption in the detection may bedue to another reason and the inference would not be justified.

In subcase (c), when both observations are available (Figure 4.11c), the criteria todecide between the leader and the follower is based on their speeds. If the leading vehicle ismotionless but the follower is not (assuming the 5 km/h threshold), the latter is consideredmore reliable and the missing point is calculated by using the follower vehicle. Instead, ifboth vehicles are motionless, priority is given to the leader data.

As in the Leader approach, in all subcases the imputed value for the traveled distanceat timestamp t + 1 is the last recorded position, that is, the one of P2. It is worth notingthat, theoretically, if equipped vehicles were constantly detecting their leading and followingvehicles, last gathered data would belong to a follower, because it becomes motionless laterthan the leader. Since vehicles do not need to be always detected (due to the distance withrespect to the equipped, sudden interruptions, etc.), we can not assume this behaviour andwe need to check the type of vehicle of the last observed vehicle.

Master en Logıstica, Transport i Mobilitat

Page 88: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 69

Case (3): No data at timestamps t and t+ 1

This situation corresponds to Case (3) in the Leader approach, and refers to the situationwhere vehicles are motionless because of a red traffic light. However, if the equipped vehicleis not motionless we can not assume such situation, and therefore missing values are notimputed.

Figure 4.12 shows the 2 feasible subcases, depending on the last observed vehicle beforethe equipped vehicle becomes motionless, either a leader vehicle (Figure 4.12a) or a follower(Figure 4.12b). In any case, we infer the missing information with the last reported traveledspace.

(a) Leader data (b) Follower data

Figure 4.12: Subcases for data availability in Case (3)

Case (4): No data at timestamp t and available data at t+ 1

Last case completes the motionless period with the detection of in motion vehicles again.Like for the other cases, it is divided into 3 subcases based on the available vehicle type attimestamp t+1. In subcases (a) and (b) the leader vehicle and the follower, respectively, areobserved again (Figures 4.13a and 4.13b respectively). As commented before, the expectedbehaviour would be the leader vehicle to be detected before the follower, since it is in frontof the equipped and sets in motion first, but it is not possible to make such assumptionfor the reasons already described. In subcase (c) (Figure 4.13c), both vehicles are observedtogether at timestamp t+ 1.

Furthermore, we notice that the equipped vehicle used to be motionless when detectingthe leader vehicle again, but it could be already in motion for the follower, as illustrated inFigures 4.13a and 4.13b respectively. In all subcases it is necessary to check if the vehicle

Master en Logıstica, Transport i Mobilitat

Page 89: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

70 Master thesis

type of the last gathered data matches the one of t + 1 (e.g. if the last collected datacorresponds to a leader vehicle, the observation in t + 1 has to be also a leader). If not,last observed vehicle can be considered as a lost vehicle and the procedure continues withfurther detected vehicles.

(a) Leader data (b) Follower data (c) Leader and follower data

Figure 4.13: Subcases for data availability in Case (4)

Since the four cases have been adapted from the Leader approach, Algorithm 1 is alsoappropriate for the Leader-Follower approach by taking into account the described subcasesfor this procedure. The corresponding R script is called fv lead fol.R

4.5 Extended approach

Going further, the natural extension that comes to mind is to take into account all thevehicles observed in the equipped vehicles’ surroundings, which means that vehicles drivingon other lanes and vehicles different from the leader and the follower are considered. Themain idea of this approach, which is called the Extended approach, consists of conceiving theobserved vehicles as probe vehicles in the region and, therefore, contributors to the estima-tion of the fundamental variables. Thus, this procedure requires an error-free identificationof the observed vehicles in order to be able to completely track them along the corridor.

However, due to the original purpose in the radar technology’s design, this is not thecase. As stated in Section 3.1, when the equipped stops detecting a vehicle and identifiesit again after a certain time interval, a new identifier is assigned. This behaviour makesimpossible to fully track all observed vehicles, since several identifiers may correspond tothe same vehicle. Since Aimsun supplies a unique identifier for each vehicle, even thoughthe equipped vehicle may assign different identifiers along vehicle’s detection, this limitationcan be overcome for the present project.

In contrast to the previous approaches, a preliminary data processing to remove dupli-cated observations is required. First, due to the presence of several equipped vehicles along

Master en Logıstica, Transport i Mobilitat

Page 90: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 71

the same sections that might track the same observed vehicles, more than one reading refer-ring to the same vehicle may appear. For each timestamp, we can easily delete duplicatedreadings by detecting observations with the same Aimsun identifier, leaving one of themand deleting the rest.

Moreover, when an equipped vehicle observes another equipped, there is no way for theequipped vehicle in charge of the detection to know about the type of the vehicle beingobserved. In this case, both the radar technology and the simulator consider it as a regularobservation, and assign an observed identifier, as shown in Figure 4.14. Then, togetherwith the identifier attributed to the equipped vehicle, there are two different registers forthe same vehicle for a certain timestamp.

Figure 4.14: Observed vehicles by 2 probe vehicles for certain section and timestamp

In the example illustrated in Figure 4.14 there are two equipped vehicles driving inthe same section (identifiers 1 and 2), and equipped 1 is detecting equipped 2 (pointedwith a red arrow). Since it corresponds to an equipped vehicle’s observation at a certaintimestamp, Aimsun associates an identifier to the observed vehicle, in this case 1450, whichleads to two different readings for the same vehicle.

Thus, for each timestamp and for each equipped vehicle driving within the discretetime-space region at the given timestamp we perform several checks to determine if anobservation corresponds in reality to an equipped vehicle. For this purpose following itemsare evaluated for the observed vehicles at the analyzed timestamp to identify duplicatedobservations. Finally, if one or more duplicated readings are found, we leave the originalinformation (equipped vehicle) and remove the rest.

• Equipped vehicle in charge of the detection: since one equipped vehicle cannot observe itself, we can ensure that its own observations are not duplicating itsinformation and eliminate these observed vehicles from the set of candidates

• Geographical coordinates and lane: for the remaining set of observed vehicles,if the lane and the geographical position, with a tolerance of ±0.0001 for the latitudeand ±0.00001 for the longitude, agree with those for the equipped, the observation isconsidered as a duplicate and is deleted from the corresponding dataframe

Once the data has properly cleaned, the error-free condition may be assumed subject

Master en Logıstica, Transport i Mobilitat

Page 91: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

72 Master thesis

to small errors in the above procedure, and the approach can be implemented. Althoughthis procedure also iterates over the time windows and the sections to move along thediscrete time-space regions, the natural way to extend the Leader and the Leader-Followerapproaches consists of iterating over occupied lanes of the region. In contrast to the theseapproaches, the lanes to be analyzed are not restricted to the ones with equipped vehicles,but include also other lanes with only observed vehicles.

Figure 4.15: Equipped and observed for a certain timestamp t within a discrete time-spaceregion, where red cars correspond to equipped vehicles

Let us consider the simple example depicted in Figure 4.15, where three equipped vehi-cles (identifiers 1, 4 and 6) and three observed vehicles (2, 3 and 5) are driving in a 4-lanesection for the time interval between timestamps t and t+ 3 (after this interval the config-uration may change). Then, we can build the trajectories for all vehicles in each lane, asshown in Figure 4.16.

(a) Lane 1 (b) Lane 2 (c) Lane 4

Figure 4.16: Trajectories for each occupied lane in Figure 4.15

The situation in lane 1 is equivalent to the Leader approach, so the calculations employedthere are also applicable to this case. It is important noting that since there are not anyvehicle detected immediately in front of the leader vehicle, the distance between it and theequipped enables us to calculate the area of the strip between both vehicles (cross-hatched

Master en Logıstica, Transport i Mobilitat

Page 92: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 73

band in Figure 4.16a), but the traveled distance and the spent time of this leading vehicleare not contributing to the estimators (as occurs in the Leader approach).

Lane 3, instead, only presents a single vehicle (see Figure 4.16b), and therefore none ofthe previous variables can be calculated. Finally, Figure 4.16c displays the trajectories ofthree vehicles driving one after the other in lane 4, which defined a couple of strips (betweenvehicles 4 and 5 and vehicles 5 and 6). Thus, in this lane the traveled distances and spenttimes of vehicles 4 and 5 are contributing to the estimators, but vehicle 6 does not becausethere is not any vehicle in front.

Regarding the implementation, for each discrete time-space region the Extended ap-proach iterates over the timestamps defining the corresponding time window and, for eachof these timestamps, it iterates over the occupied lanes of the section. In an analogousway as with the Leader and the Leader-Follower approaches, three auxiliary variables forthe update of the area between vehicles, the traveled distance and the spent time are de-fined. In this case, they are updated every timestamp with the information of all involvedlanes, so we call them area t, distance t and time t (initialized to 0 at the beginning of thetimestamp iteration).

Then, for each lane it is necessary to check if there are at least two vehicles on the lane,otherwise it is not possible to update the mentioned variables (see Figure 4.16b). Whetherthe number of vehicles in the lane is greater than or equal to 2, the procedure iterates overpairs of consecutive vehicles in space starting from the closest to the upstream node (seeexample in Figure 4.16c). For the two considered vehicles the points P1, P2, P3 and P4,defined in the same manner as in the previous approaches, are characterized.

In contrast with the Leader and the Leader-Follower approaches, it is not possible toadjust all observed vehicles’ distances to the upstream node because the target point of theradar is unknown. In fact, apart from the leader and the follower vehicles, for which weknow the physical point of detection (as the arrows illustrate in Figure 4.4), the point ofdetection for the rest of the observed vehicles can not be assumed because it is not supplied.

Figure 4.4 showed how to transform the provided distances into rear distances for thecase of a leader and a follower vehicle with respect to the equipped that detects them. TheExtended approach also includes this adjustment for the equipped and follower vehicles (wenotice that it is not necessary to adapt the distance for a leader vehicle, as explained inSubsection 4.2.2). However, for the rest of the vehicles it is not clear how to locate them inthe rear bumper.

For each equipped vehicle we divide its observed vehicles labelled as other into frontand rear vehicles, as depicted in Figure 4.17. An imaginary line that spans the equippedvehicle approximately in half (orange line) physically separates the vehicles in the rear andthe front set (as indicated in the figure). Then, for the sake of simplicity we assume thatvehicles in the rear set are detected from ahead, such as the follower vehicle (green vehicle),so the provided distance is transformed into the rear bumper distance by subtracting theaverage length of the vehicle (with respect to the upstream node). Instead, vehicles in the

Master en Logıstica, Transport i Mobilitat

Page 93: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

74 Master thesis

front set are assumed to be observed as the leader vehicle, and therefore we assume thatthe rear bumper distance is directly supplied.

Figure 4.17: Observed vehicles classified into the rear and front set

The previous assumption is formulated as an inequality to be checked when calculatingthe 4 points. If the distance from the upstream node of the observed vehicle is less thanthe sum of the equipped rear distance and half of the average length (2 m), the observedis within the rear set and its supplied distance is adjusted as explained above. If not, thevehicle is in the front set.

Finally, when the approach iterates over all occupied lanes for a given timestamp, thevariables area t, distance t and time t are properly updated and added to the final variablesarea, distance and time. Then, after iterating over all the timestamps within the discretetime-space region the previous variables are taken into account to estimate the flow, densityand average speed for this region. The pseudocode of the whole procedure corresponds toAlgorithm 2 and the associated R script is called fv all.R.

Master en Logıstica, Transport i Mobilitat

Page 94: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 75

Data: input varsResult: data.frame with the fundamental variables for each time window and sectionfor each tw in time windows do

for s in sections dovariables area, distance and time set to 0;if has data(tw,s) then

for each t in time stamps(tw,s) dovariables area t, distance t and time t set to 0;for l in occupied lanes(tw,s,t) do

if amount vehicles(tw,s,t,l) ≥ 2 thenfor v,w in consecutive vehicles(tw,s,t,l) do

for PX in {P1, P3} doif vehicle type(v) in {equipped, leader, follower} then

adjust the provided distances as in the Leader andthe Leader-Follower approach.

elseif v is in the rear set then

modify the distance as a follower vehicle;else

the vehicle is in the front set and the distance isnot adapted;

end

end

endfor PX in {P2, P4} do

if vehicle type(w) in {equipped, leader, follower} thenadjust the provided distances as in the Leader andthe Leader-Follower approach;

elseif v is in the rear set then

modify the distance as a follower vehicle;else

the vehicle is in the front set and the distance isnot adapted;

end

end

end

endupdate variables area t, distance t and time t;

end

endupdate variables area, distance and time;

end

elseset the q, k, vs to NA

endestimate q, k, vs from area, distance and time;store q, k, vs in the output data.frame;

end

endAlgorithm 2: Pseudocode for the Extended approach

Master en Logıstica, Transport i Mobilitat

Page 95: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

76 Master thesis

Master en Logıstica, Transport i Mobilitat

Page 96: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 77

Chapter 5

Results

Once that a complete overview of the implemented approaches has been provided, we pro-ceed with the analysis of the obtained results for each of them separately. Another purposeof this chapter is to compare the different approaches in order to determine the achievedimprovements as the procedure gets more sophisticated.

As commented in Subsection 4.1.2, the reference estimation method was tested withdata from dual supersonic traffic detectors installed for the physical experiment along thecorridor. In this case, since the conducted field experiment was composed by a smallfleet of vehicles with the goal to test the developing technology and subsequently the in-progress mobility services, we resort to the microsimulation to emulate the data gatheredby Connected Car vehicles, which is considered from now on as the ground-truth data thatallows us to evaluate the obtained results.

The researchers in (Seo et al., 2015) carried out a quantitative analysis by calculatingsome performance indicators for accuracy and precision with different time and space res-olution and penetration rates, and different visualizations to examine how well the trafficphenomena is reproduced with their method. In addition to these tools, we will also consideranother goodness-of-fit measure in order to complete the analysis of the results.

Such an analysis also enables us to investigate the dependence between the employedcombination of design parameters and the quality of the estimation, as well as how theconsidered discretization influences in the obtained results. Since the number of combina-tions of the design parameters would give rise to a really extensive analysis, we will onlyexhibit in this thesis the most relevant results by considering two combinations of the designparameters for each approach.

This chapter is organized as follows. First, the main facts of the simulation experimentare itemized in order to understand both the emulation of approaches’ input data and theobtainment of the ground-truth data. Then, the three implemented approaches are analyzedby including various visualizations and performance indicators.

Master en Logıstica, Transport i Mobilitat

Page 97: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

78 Master thesis

5.1 Simulation experiment

As stated in Chapter 3, a library of functions has been developed in order to emulatevehicles’ performance. Since the storage of the emulated data is not required for the purposesmentioned in the introduction of this chapter, a specific experiment has been defined in orderto test the implemented approaches by including a set of design parameters and storing theemulated data in a data base.

In fact, two different experiments have been implemented according to the penetrationrate of the equipped vehicles in the network. For this project the penetration rates of 5 and10% are considered. Furthermore, we define various time resolutions, also known as timewindows, of 91, 182 and 364 seconds, also known as time windows, which correspond tomultiples of the most common cycle length in L’Eixample (91 s).

Although the developed approaches are for a single corridor, the simulation experimentsare run for the whole network of the model (L’Eixample). Then, the emulated data is filteredby considering the sections that compose Arago corridor. Before that, a warm-up period of30 minutes is included in order to provide the network its corresponding state at the timethe simulation experiment begins.

Since the data is gathered by the equipped vehicles every time stamp (0.5 s), the sizeof the dataframes may grow considerably as the length of the experiment increases. In thiscase, with a simulation horizon of 30 minutes and a penetration rate of 5%, the equippeddataframe contains 36,274 observations and the observed 173,695 observations, which cor-responds to 4.8 observed vehicles per time stamp and equipped vehicle in average. On theother hand, with a 10% penetration rate the equipped includes 75,921 registers, whereas theobserved comprises 355,559 registers, which results in a similar average of observed vehicles(4.6 vehicles per time stamp and equipped vehicle).

Then, given a penetration rate and a time window length, the fundamental variablescan be estimated by running one of the approaches described in Chapter 4. With the aimof testing the quality of the obtained results, we consider as ground-truth data the valuessupplied by Aimsun for the flow, density and average speed. These values are provided foreach section within the corridor and for each time window length.

In the following sections the estimated values are compared against ground-truth data.For each approach, two of the six combinations of the design parameters (two levels ofpenetration rate and three time window lengths) are analyzed in depth, in a way thatall different combinations are evaluated by one of the approaches. Furthermore, two ofthe goodness-of-fit measures introduced in Section 2.4, in particular the Root Mean SquareNormalized Error (RMSNE), which is expressed as a percentage, and the Theil’s inequalitycoefficient (U), whose values range from 0 (perfect fit) to 1 (worst fit). It is worth notingthat the relation of both measures regarding the performed experiments is out of the scope ofthis analysis. In fact, these two indicators are considered in order to evaluate the estimatedvalues from two different points of view.

Master en Logıstica, Transport i Mobilitat

Page 98: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 79

5.2 Leader approach analysis

The picked combinations of design parameters for the Leader approach are the 5% penetra-tion rate with a time window length of 91 s and the 10% with 182 s. For each combination acouple of visualizations are considered. The former displays the estimated values in a time-space diagram in order to examine how well the traffic phenomena is reproduced, whereasthe latter shows the RMSNE range for each discrete time-space region. Finally, we includea table that summarizes the values for the two considered goodness-of-fit measures for allcombinations of parameters.

5.2.1 Penetration rate 5% - Time window length 91 s

Figures 5.1, 5.2 and 5.3 present the visualization for both the estimated and the ground-truth data provided by Aimsun. Since the penetration rate is the lowest and the timeresolution is the highest, there is a significant amount of discrete time-space regions withoutenough data. This fact leads to NA estimations in the time-space diagram (grey cells),which hinders us from the identification of traffic phenomena reproducibility.

Figure 5.1: Flow visualization for the Leader approach with a penetration rate of 5% anda time window length of 91 s

In particular, the ground-truth data in Figure 5.1 shows that the flow in the sectionsat the end of the corridor is lower than in the previous ones, which may be due to somedifferent factors, such as the increase in the number of lanes or the fact that it correspondsto the final part of the corridor within the CBD. Anyway, at this point either there is noavailable data (or this is not enough) to allow the estimation of the fundamental variables,or the estimated values differ substantially from the ground-truth ones.

Regarding the density, although it may seem that visualizations in Figure 5.2 are muchmore similar, the trend is to systematically underestimate this variable. In fact, the mean

Master en Logıstica, Transport i Mobilitat

Page 99: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

80 Master thesis

estimated density for this experiment is 52.7 veh/km, whereas the ground-truth mean is88.7 veh/km. Furthermore, a similar behaviour is observed for the flow, albeit less clear,since the average estimated flow is 1,564 veh/h and the ground-truth average 1,926 veh/h.

Figure 5.2: Density visualization for the Leader approach with a penetration rate of 5%and a time window length of 91 s

Figure 5.3: Speed visualization for the Leader approach with a penetration rate of 5%and a time window length of 91 s

The speed, instead, does not present any pattern (e.g. underestimation, overestimation)at first sight (see Figure 5.3). In relation to the five-number summary, which includes thefive most important sample percentiles, we realize that, apart from the minimum and themaximum, which are outliers originated at specific situations, the estimated values aredistributed in the way that the ground-truth values do. It is worth noting that Aimsuncalculates the average speed of a discrete time-space region by taking into account allvehicles that have traversed the section, that is, when the vehicles leave the section. Then,vehicles waiting at a red traffic light and not exiting before the time window that defines

Master en Logıstica, Transport i Mobilitat

Page 100: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 81

the region ends are not considered for the calculation. In our case, for all the implementedapproaches the data of such vehicles is also taken into account, which may be the reasonfor the presence of estimated speeds between 0 and 10 km/h.

In terms of the RMSNE, Figure 5.4 illustrates the RMSNE percentage ranges for eachdiscrete time-space region. Since the estimator for the speed is unbiased, one expects lowererrors for this fundamental variable, which is consistent with the higher amount of greencells in the time-space diagram. Alternatively, the other fundamental variables include morecells with a RMSNE greater than 60%. For instance, the high RMSNE values at distance1762.2 m and at the end of the corridor for these two variables do not necessarily implyhigh errors for the speed.

Figure 5.4: RMSNE heatmap for the Leader approach with a penetration rate of 5% anda time window length of 91 s

5.2.2 Penetration rate 10% - Time window length 182 s

As we can observe in Figures 5.5, 5.6 and 5.7, an increase in the penetration rate and inthe time window length (i.e. the time resolution is lowered) notably reduces the number ofNA cells in comparison to the experiment of Subsection 5.2.1. Furthermore, this resolutionmakes possible the identification of the general trends of the traffic state in the targetedtime-space region.

The underestimated values for flow and density are visually recognized in Figures 5.5and 5.6. In general, the obtained values for the regions that belong to the stretch betweenthe beginning of the corridor and the 2269.2 m distance are underestimated. At this dis-tance, Aimsun values show that this section is affected by congestion throughout the timehorizon, since both flow and density take high values. Nevertheless, the estimations arenot reproducing such behaviour. From this section to the end of the corridor, the trafficsituation changes into a less busy and fewer dense state, which results in underestimatedand overestimated values depending on the data characteristics (i.e. sparse observations,

Master en Logıstica, Transport i Mobilitat

Page 101: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

82 Master thesis

less detected vehicles, etc.).

Figure 5.5: Flow visualization for the Leader approach with a penetration rate of 10%and a time window length of 182 s

Figure 5.6: Density visualization for the Leader approach with a penetration rate of 10%and a time window length of 182 s

In general terms, the five-number summary confirms the underestimated trend, sincethe quartiles for the estimated data are always under the ones from ground-truth data. Forexample, the median and the third quartile for the density estimations are 46.4 and 59.9veh/km respectively, whereas the ground-truth ones are 79.2 and 107.2 veh/km.

Figure 5.7 verifies that the speed estimator is the best reproducing ground-truth phe-nomena. In fact, although the ranges of values of some regions may differ from the ground-truth ones, we can recognize the general speed patterns illustrated on the Aimsun time-spacediagram. As with the previous variables, the congested situation at 2269.2 m leads to un-derestimated speed values for all time windows. However, from this point on the estimations

Master en Logıstica, Transport i Mobilitat

Page 102: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 83

are more consistent with the ground-truth ranges than flow and density estimations are.

Figure 5.7: Speed visualization for the Leader approach with a penetration rate of 10%and a time window length of 182 s

Figure 5.8: RMSNE heatmap for the Leader approach with a penetration rate of 10% anda time window length of 182 s

Finally, the RMSNE heatmaps in Figure 5.8 confirm estimations’ errors at the men-tioned sections. For instance, at 2269.2 m distance, most of the reported errors for flowand speed are greater than 60 %, whereas the errors for the density are higher than 40%.Moreover, other sections along the corridor, such as 700.7, 1762.2 and from 2269.2 m to thevery end of the corridor, are also affected by high errors, especially the flow and the density.

We notice that at the final part of the corridor and during the second half of thetime horizon the errors are not that high as expected from the visualization. This mainlydisplayed heterogeneous cells, which made difficult to recognize similarities with respect tothe ground-truth values’ layout.

Master en Logıstica, Transport i Mobilitat

Page 103: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

84 Master thesis

5.2.3 Goodness-of-fit measures

Table 5.1 shows a decrease of the RMSNE percentage as the time resolution is loweredexcept for the flow and the density for the 364 s time window length. However, we can notconclude that this approach makes results worse for low time resolution, since only a singleinstant has been evaluated. In the case of the speed, both RMSNE and Theil’s inequalitycoefficient show an improvement in the results.

RMSNE Theil’s inequality coefficient (U)

TW Flow Density Speed Flow Density Speed

91 67.10 65.87 39.69 0.29 0.41 0.18182 58.47 59.68 35.60 0.28 0.39 0.17364 60.02 60.81 30.77 0.28 0.39 0.16

Table 5.1: RMSNE and Theil’s inequality coefficient for the Leader approach with apenetration rate of 5%

RMSNE Theil’s inequality coefficient (U)

TW Flow Density Speed Flow Density Speed

91 69.00 64.34 35.92 0.29 0.41 0.17182 64.01 60.10 32.07 0.28 0.39 0.16364 61.42 61.18 30.10 0.29 0.39 0.16

Table 5.2: RMSNE and Theil’s inequality coefficient for the Leader approach with apenetration rate of 10%

Unexpectedly, the experiments with a penetration rate of 10 % present, in general,higher RMSNE errors and Theil’s inequality coefficients, as shown in Table 5.2. It isworth noticing that the two implemented simulation experiments are independent, i.e., theset of vehicles labelled as equipped in the 5% experiment is not necessary a subset of theequipped vehicles for the 10% experiment. Thus, emulated data in this experiment is notan extension of the 5% data, but another dataset containing certain equipped and observedvehicles’ data.

If we compare the different time resolutions, flow and speed RMSNE percentage im-prove as the time window length increases, whereas it slighlty increases for the density witha time window length of 364 s. In terms of U , the coefficients are really similar from thosein the 5% experiment: for the speed it is the closest to 0 (perfect fit), for the flow around0.3, and the highest values, around 0.4, for the density (the hardest fundamental variableto be estimated).

Master en Logıstica, Transport i Mobilitat

Page 104: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 85

5.3 Leader-Follower approach analysis

In the case of the Leader-Follower approach another couple of combinations of design pa-rameters is taken into account. The first evaluated one corresponds to a penetration rateof 5% and a time window length of 364 s, whereas the second consists of a 10% penetrationrate with the lowest time window length (91s).

5.3.1 Penetration rate 5% - Time window length 364 s

Although the penetration rate is the shorter of the ones considered, the lowest time reso-lution (364 s) already leads to a smaller number of NA cells, particularly the estimationcannot be provided for only 5 regions. Together with the size of the discrete time-spaceregions, which is the highest for the performed experiments, this combination of parametersenables us to analyze how the traffic phenomena is replicated.

Figure 5.9: Flow visualization for the Leader-Follower approach with a penetration rateof 5% and a time window length of 364 s

In general terms, we observe lighter cells for the flow estimation in Figure 5.9, whichmeans that most of the regions’ values are underestimated, as we have already stated forthe previous experiments. Additionally, the values for the final part of the corridor arenot reproducing the low flow reported by Aimsun, since some of the cells overestimate theground-truth flow. In fact, in the last part of the corridor the mean of the estimated flowvalues is 1,562.3 veh/h, while the mean for the ground-truth values is 1,252.3 veh/h. Again,the lack of data conduct to bad-quality estimations, which are partially smoothed with theincrease of the time window length, but not totally mitigated.

In the case of the density, also the underestimation of the values is visually displayedin Figure 5.10, although less evident than for the flow. Furthermore, as with the Leaderapproach, the average range of the estimated values for the most congested section (at

Master en Logıstica, Transport i Mobilitat

Page 105: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

86 Master thesis

distance 2269.2 m) is approximately half of the ground-truth one, so the traffic behaviour atthis point clearly affects the obtained estimations regardless of the employed time resolution.

Figure 5.10: Density visualization for the Leader-Follower approach with a penetrationrate of 5% and a time window length of 364 s

Figure 5.11: Speed visualization for the Leader-Follower approach with a penetration rateof 5% and a time window length of 364 s

The speed, instead, reproduces part of the main block of speed values from 30 to 40km/h in the down half of the time-space diagram (see Figure 5.11). Although the ranges forthe rest of the cells is not always accurately estimated, it is not clear that it corresponds toan underestimated values’ trend, but maybe to the fact that different data (with respect toAimsun) is taken into account when estimating the values for each cell. Nevertheless, fromthe three fundamental variables, this is again the best one replicating traffic phenomenafrom a general point of view.

In relation to the RMSNE, the heatmaps in Figure 5.12 exhibit a concordance between

Master en Logıstica, Transport i Mobilitat

Page 106: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 87

Figure 5.12: RMSNE heatmap for the Leader-Follower approach with a penetration rateof 5% and a time window length of 364 s

flow and density red cells (discrete time-space regions for these two variables meet), but atfirst sight the density heatmap contains a high number of regions whose RMSNE percent-age is greater than 60%. The speed presents several cells with an error lower than 40%,and for the whole time-space diagram the row associated to distance 2269.2 m is the onlyone with most of the RMSNE values larger than 60%.

5.3.2 Penetration rate 10% - Time window length 91 s

As with the experiment in Subsection 5.2.2, a penetration rate of 10% contributes to themitigation of NA cells, since the number of equipped vehicles is high and, consequently, theamount of observed vehicles. However, there are still some blanks within the time-spacediagram. Furthermore, this time resolution results in a visualization harder to be analyzed.

In the same way than for the experiment described in Subsection 5.2.1, several NA cellscorrespond to the less busy part of the corridor. For the remainder, the heterogeneity of theranges of the estimations does not match the ground-truth distribution, since the estimatedvalues are, in general, systematically underestimated.

Figures 5.13 and 5.14 illustrate this behaviour, specially for the density, whose estimatedvalues’ ranges correspond to the shorter half of the ground-truth ranges (the maximumvalues for the estimations are up to 200 veh/km and up to 440 veh/km for the ground-truthones). In fact, the achieved maximums are 167.9 veh/km with the Leader-Follower approachand 435.8 veh/km for the ground-truth data.

Despite the high time resolution, the visualization for the speed in Figure 5.15 allows toidentify some of the main trends depicted in the ground-truth time-space diagram. However,this estimation agrees the previous two for the final part of the corridor, because the low

Master en Logıstica, Transport i Mobilitat

Page 107: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

88 Master thesis

flow makes the obtained estimations to sharply fluctuate as data availability does. It isworth noting that the minimum value for the estimated speed is not around 0 km/h as inthe Leader approach, but closer to the ground-truth one: 22.9 km/h in the Leader-Followerapproach and 18.8 km/h for the Aimsun values.

Figure 5.13: Flow visualization for the Leader-Follower approach with a penetration rateof 10% and a time window length of 91 s

Figure 5.14: Density visualization for the Leader-Follower approach with a penetrationrate of 10% and a time window length of 91 s

Regarding the resulting errors, the flow and the density present a high number of redcells in Figure 5.16, while for the speed most of the RMSNE values oscillate between 0and 40%. As expected, we observe that the final sections of the corridor register severalRMSNE greater than 100%.

Master en Logıstica, Transport i Mobilitat

Page 108: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 89

Figure 5.15: Speed visualization for the Leader-Follower approach with a penetration rateof 10% and a time window length of 91 s

Figure 5.16: RMSNE heatmap for the Leader-Follower approach with a penetration rateof 10% and a time window length of 91 s

5.3.3 Goodness-of-fit measures

To end up with the Leader-Follower approach, Tables 5.3 and 5.4 summarizes the RMSNEand the U for the performed experiments. In this case, the RMSNE percentage decreasesin both approaches and for all time window lengths as the time resolution is lowered.

With respect to the Leader approach, these goodness-of-fit measures indicate an im-provement in the obtained results. In addition, this approach also presents worst values forboth the RMSNE and U (except for the speed) for the 10% penetration rate experiment,as shown in Table 5.4.

Master en Logıstica, Transport i Mobilitat

Page 109: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

90 Master thesis

RMSNE Theil’s inequality coefficient (U)

TW Flow Density Speed Flow Density Speed

91 66.51 63.76 38.34 0.28 0.39 0.18182 58.20 58.34 32.99 0.26 0.37 0.16364 55.74 54.76 27.34 0.26 0.35 0.14

Table 5.3: RMSNE and Theil’s inequality coefficient for the Leader-Follower approachwith a penetration rate of 5%

RMSNE Theil’s inequality coefficient (U)

TW Flow Density Speed Flow Density Speed

91 68.82 62.84 34.06 0.29 0.38 0.17182 61.40 57.81 29.90 0.27 0.36 0.15364 56.73 55.55 26.82 0.26 0.35 0.14

Table 5.4: RMSNE and Theil’s inequality coefficient for the Leader-Follower approachwith a penetration rate of 10%

Regarding Theil’s inequality coefficient, the coefficient for the density is always below0.4, whereas for the rest of the variables they are the same or shorter, which also confirmsthe betterment of the estimations by considering follower data when the leader data is notavailable or less reliable in the sense described in Section 4.4.

5.4 Extended approach analysis

Finally, we examine the two missing combinations of the design parameters in this section.We first deal with the 5% penetration rate with the 182 time window length, and secondlywith the 10% and 364 s.

5.4.1 Penetration rate 5% - Time window length 182 s

Figures 5.17 and 5.18 show that, in general, the values are again underestimated for bothvariables. Moreover, at first sight the obtained values at the sections previously commented(700.7, 1762.2 and 2269.2 m) appear not to be correct estimated (like for the other ap-proaches).

Furthermore, the amount of NA cells is low enough in order to identify the generalpatterns of the traffic stream, specially in the case of the density, which partially reproduceswith underestimated values some trends of the Aimsun time-space diagram of Figure 5.18.

Master en Logıstica, Transport i Mobilitat

Page 110: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 91

However, in terms of their average values, they are really similar to the ones obtained inthe previous experiments: 53.7 veh/km for the Extended approach and 88.3 veh/km forthe microsimulation experiment. The flow, instead, presents closer averages than expectedfrom the visualization: 1,657.1 and 1,920.9 veh/h respectively.

Figure 5.17: Flow visualization for the Extended approach with a penetration rate of 5%and a time window length of 182 s

Figure 5.18: Density visualization for the Extended approach with a penetration rate of5% and a time window length of 182 s

As in the other approaches, Figure 5.19 illustrates that the estimations for the speedmostly reproduce the general traffic behaviour. In fact, the obtained results seem moreprecise and accurate from the visualization. For instance, the mean values for the obtainedand ground truth data are the closest achieved so far: 32.6 and 34.8 km/h respectively.

The RMSNE heatmaps summarize some of the comments mentioned above. The flowpresents some high errors (RMSNE greater than 100%) at the final part of the corridor.

Master en Logıstica, Transport i Mobilitat

Page 111: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

92 Master thesis

Figure 5.19: Speed visualization for the Extended approach with a penetration rate of5% and a time window length of 182 s

However, the sections previously specified are not depicted by red rows, but there are othersections, particularly distances 1636.9 and 2290.4 m, with a higher amount of red cells. Thismay be due to the fact that these sections correspond to transitions between different trafficsituations. The density still has a considerable number of cells with a RMSNE higher than60%, and the speed heatmap is almost green (RMSNE between 0 and 20%).

Figure 5.20: RMSNE heatmap for the Extended approach with a penetration rate of 5%and a time window length of 182 s

5.4.2 Penetration rate 10% - Time window length 364 s

The final combination to be analyzed corresponds to the higher penetration rate and thelarger time window length. Although one could expect to obtain the best estimated values

Master en Logıstica, Transport i Mobilitat

Page 112: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 93

(based on the considered goodness-of-fit measures), because there are more equipped andobserved vehicles and the time resolution is the lowest, the Leader and the Leader-Followerapproaches have already showed that this is not the case. In fact, we need to keep inmind that in this thesis only a sample is evaluated, so outliers and atypical values are notsmoothed as with the consideration of several instances.

The first that we observe in Figures 5.21, 5.22 and 5.23 is that there are not NA cells,that is, the approach was able to estimate the fundamental variables for all discrete time-space regions.

Regarding the flow and the density, the obtained results and their layout are similar tothe ones of the previous experiment (Subsection 5.4.1). The traffic phenomena depicted inthe ground-truth time-space diagram is not directly recognized in the visualizations, but byzooming in one can identify a local correspondence of general patterns (except for the finalpart of the corridor) between Aimsun data and the obtained values, but not for the values,since the latter are underestimated.

Figure 5.21: Flow visualization for the Extended approach with a penetration rate of 10%and a time window length of 364 s

Therefore, the estimations may present a systematic error with respect to the ground-truth data, which results in an underestimation of these fundamental variables. However,for the end of the corridor a low time resolution illustrates an overestimation of the valuesin most of the cells. In fact, if we consider the final part of the corridor as in the experimentof Subsection 5.3.1, the mean values for the Extended approach for the flow and the densityare 1867 veh/h and 55.1 veh/km respectively, whereas based on the ground-truth data theyare 1252.3 veh/h and 48.5 veh/km. Then, an overestimation of these variables may appearin situations with low flow and density.

Let us consider the first quartile (Q1) and the third quartile (Q3) for both fundamentalvariables. In the case of the flow, for the Extended approach Q1 and Q3 are 1,451 and1,814 veh/h respectively, whereas the ones provided by Aimsun are 1,543 and 2,324. In the

Master en Logıstica, Transport i Mobilitat

Page 113: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

94 Master thesis

case of the density, the estimated Q1 and Q3 are 43.3 and 61.1 veh/km respectively, whilethe ground-truth quartiles are 53.0 and 107.4 veh/km. As with the previous approaches,the five-number summary shows that the lower the ground-truth data is, the closer the cor-responding estimated value is. The considered quartiles are an example for this behaviour,since the two values for Q1 are closer to each other than the values for Q3 are.

Figure 5.22: Density visualization for the Extended approach with a penetration rate of10% and a time window length of 364 s

Once again, the speed replicates pretty well the traffic phenomena (see Figure 5.23).The main parts of the time-space diagram are correctly reproduced, but at certain sectionsthe estimated values do not match the ground-truth data. Nevertheless, this behaviour maybe related to the specific features of the corresponding section, so a further analysis of thecharacteristics of the sections would be required.

Figure 5.23: Speed visualization for the Extended approach with a penetration rate of10% and a time window length of 364 s

Master en Logıstica, Transport i Mobilitat

Page 114: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 95

The RMSNE heatmaps in Figure 5.24 present the same layout than Figure 5.20, withthe difference that the final part of the corridor is affected by higher errors. Furthermore,the density shows almost complete red rows at three sections different from the mentionedones: 53.9, 1,363.9 and 2290,4 m, which may also correspond to transitions between trafficstates. Furthermore, the speed heatmap is almost green.

Figure 5.24: RMSNE heatmap for the Extended approach with a penetration rate of10% and a time window length of 364 s

5.4.3 Goodness-of-fit measures

Table 5.5 reports the lowest RMSNE and U values for all the implemented approaches.The RMSNE for the flow and the density are lower, but with errors really close to eachother for the 91 and 182 and for the 182 and 364 time window length respectively. TheRMSNE for the speed decreases from one time resolution to the next lower, and is alwaysbelow 30%.

In terms of the Theil’s inequality coefficient, we highlight the obtained values for theflow and the speed, which are upper bounded by 0.23 and 0.13 respectively. The density,instead, presents shorter U values than for the Leader and the Leader-Follower approach,but they are relatively high even for the lowest time resolution.

Although we already expected larger values for both goodness-of-fit measures for the10% penetration rate, Table 5.6 register two notably high RMSNE values for the 91 timewindow length. The RMSNE for the flow and the density are almost 100% (99.5 and 99.7%respectively), which may be originated by one or some extreme outliers of the estimatedvalues that make the RMSNE to increase considerably. However, since the estimator forthe speed is unbiased, these outliers do not affect the RMSNE percentage, which is 26.2%.It is worth noting that the Theil’s inequality coefficient is not considerably affected by thesevalues. According to the coefficients for the other time window lengths, the flow presents arelatively high U , but for the density and the speed the U is slightly higher (as expected).

Master en Logıstica, Transport i Mobilitat

Page 115: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

96 Master thesis

RMSNE Theil’s inequality coefficient (U)

TW Flow Density Speed Flow Density Speed

91 57.94 62.65 27.39 0.23 0.39 0.13182 58.20 58.48 25.12 0.22 0.36 0.13364 54.91 58.65 21.75 0.21 0.36 0.11

Table 5.5: RMSNE and Theil’s inequality coefficient for the Extended approach with apenetration rate of 5%

The rest of the time resolutions reports larger values than those for the 5% penetrationrate. The RMSNE for the flow and the density are around 60%, whereas the U coefficientsare relatively low (below 0.24 and 0.35 respectively). In contrast to these results, the speedRMSNE is below 26.7% and the U below 0.12, which are similar to the ones obtained forthe 5% penetration rate, and therefore good results.

RMSNE Theil’s inequality coefficient (U)

TW Flow Density Speed Flow Density Speed

91 99.53 99.67 26.20 0.36 0.39 0.13182 63.63 60.57 23.71 0.24 0.35 0.12364 63.36 63.95 21.63 0.23 0.34 0.11

Table 5.6: RMSNE and Theil’s inequality coefficient for the Extended approach with apenetration rate of 10%

Master en Logıstica, Transport i Mobilitat

Page 116: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 97

Chapter 6

Conclusions

This project presents three different approaches for the estimation of the traffic state ina corridor from data provided by probe vehicles with spacing measurement equipment.The implemented procedures are based on an existing estimation method, which has beenadapted for an urban environment and extended because the on-board devices of the avail-able probe vehicles allow the detection of more vehicles in their surrounding. This methoddoes not rely on any exogenous assumptions on the traffic flow characteristics, such as a fun-damental diagram, which makes it highly robust against unpredictable or uncertain trafficphenomena.

It is worth noting that it is difficult to establish equivalences in terms of the obtainedresults between the reference estimation method and the approaches implemented in thisthesis due to their own nature. The former was validated with a field experiment consistingof a set of probe vehicles driving exclusively along an urban expressway, whereas the latterwas evaluated with a simulation experiment that considers the whole network (Barcelona’sCBD) and is reduced to the targeted corridor by filtering the sections that compose it.In fact, such an experiment requires a calibration of the traffic flow models, such as carfollowing models, which have a strong influence in the discussed approaches. Furthermore,although the evaluated experiments were run for this thesis, the way they were designedwas based on the requirements for the Connected Car project, and not specifically for aspecific evaluation like in the method proposed in (Seo et al., 2015).

In the case of the reference estimation method, the visualizations for the consideredscenario (3.5% penetration rate, 5 min time resolution and 500 m space resolution) showedthat the traffic phenomena observed in the traffic detector data could be recognised inthe obtained results. For instance, some of the reproduced dynamic phenomena were thequeue existence and its propagation. In our particular case, we could not easily identify thecorrespondence between the estimations and the ground-truth data, since for the flow andthe density most of the values were underestimated for the busiest and densest part of thetime-space diagram, and some were overestimated in the remaining part. However, sinceone of the simplifications assumed in the approaches consists of ignoring the intersections

Master en Logıstica, Transport i Mobilitat

Page 117: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

98 Master thesis

between sections, and the traffic flow does not drive in a continuous way as for the urbanexpressway (due to traffic lights, pedestrians, etc.), such situations were not expected to beclear from our experiments.

Regarding the shape of the discrete time-space regions, in the reference estimationmethod it is at the preference of planners and analysts, which leads to the obtainmentof estimations at different time and space resolutions, whereas in an urban network thereis not such freedom because its intrinsic limitations need to be taken into account. As wehave already commented, some extremely short sections (less than 10 m) were removedfrom the corridor because it was physically impossible to gather any data inside them, andit has been found that other sections, which were not deleted but whose length is relativelyshort, tend to present larger errors than longer sections. Thus, the length of the sectionsof the model seems to play an important role in the estimate errors. Furthermore, theaverage length of the sections considered in the corridor was 84.5 m, whereas the lowestspace resolution in the experiments considered in the reference method was 500 m, whichmay help to smooth errors since more data may be involved.

In general terms, the estimations obtained in the reference method were more pre-cise and accurate than ours, as the considered visualizations and goodness-of-fit measuresshowed, but it is important to highlight that both the employed technology and the designof experiments were the most appropriate for the proposed method and for its subsequentlyanalysis. We adapted this methodology to our particular case, which is characterized by atechnology in development and a space resolution induced by the microsimulation model.Then, although the method can also be improved, a better performance of the previousweaknesses may also contribute to a higher accuracy and precision of the estimated values.

The three implemented approaches are defined by making improvements from one tothe following, because the visualizations better reproduce the main trends of traffic streamand the goodness-of-fit measures improve. The Extended approach, the most sophisticatedone, is the one reporting best estimations, but with the introduction of the follower data inthe direct adaptation of the reference estimation method we already found an improvement.

In relation to the design parameters, one of the main findings was already expected:a low time resolution yields better estimations. We also awaited for an improvement ofthe results with a 10% penetration rate, but the 5% experiment performs better estimatedvalues. As previously stated, both experiments are no related to each other, and since weconsider a single experiment, we cannot ensure a better performance based on a particularcase. Therefore, one future work may consist of the design of more experiments by takinginto account the automation of the analysis of the results as much as possible, since foreach approach the fundamental variables need to be estimated and later evaluated in orderto characterize the improvement achieved with a larger penetration rate. This could alsohelp us to understand the underestimation and overestimation behaviour detected from thevisualizations, such as the conditions that originate it, if it corresponds to a systematic errorand the reason for closer values when the ground-truth value is low and higher differenceswhen it is larger.

Master en Logıstica, Transport i Mobilitat

Page 118: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 99

Another future work consists of the improvement of the Extended approach, for whichthe simplification in the calculation of space headways is assumed. Although current tech-nology does not provide a suitable information to calculate distances from rear bumper torear bumper, an extensive casuistry may provide us such distances in a more accurate way.The other approaches, instead, already take into account all feasible situations and it isbetter to focus on the Extended approach.

Finally, the authors of the reference estimation method include several future workssuch as the improvement of the estimators, the development of a model that considers theinterdependencies between consecutive discrete time-space regions by employing a trafficflow model whose parameters are estimated from the probe vehicle data, and the definitionof an unbiased estimation method which can be apply to biased probe vehicle data. Thefirst results in a refinement of the Extended approach, the second could also be exploredfor our case, but for the last work a better knowledge of the bias in both the estimatorsand the data is required.

Master en Logıstica, Transport i Mobilitat

Page 119: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

100 Master thesis

Master en Logıstica, Transport i Mobilitat

Page 120: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

Traffic parameters estimation from the analysis of connected car data 101

Bibliography

Arroyo, S. and A. Kornhauser2007. Modeling travel time distributions on a road network. In Proceedings of the 11thWorld Conference on Transport Research.

Bessler, S. and T. Paulin2013. Literature study on the state of the art of probe data systemsnin europe.

Cassidy, M., B. Coifman, and B. I. o. T. S. University of California1996. The relation between average speed, flow and occupancy and the analogous rela-tion between density and occupancy, Research report, University of California, Berkeley.Institute of Transportation Studies. Institute of Transportation Studies, University ofCalifornia at Berkeley.

Castells-Quintana, D.2015. Malthus living in a slum: urban concentration, infrastructures and economicgrowth. IREA Working Papers 201506, University of Barcelona, Research Institute ofApplied Economics.

Ciuffo, B., V. Punzo, and M. Montanino2012. The Calibration of Traffic Simulation Models: Report on the Assessment of Differ-ent Goodness of Fit Measures and Optimization Algorithms. Publications Office.

Edie, L. C.1963. Discussion of traffic stream measurements and definitions. The organisation forEconomic Co-operation and Development.

Gartner, N., C. Messer, A. Rathi, C. on Traffic Flow Theory, and C. (AHB45)2001. Traffic Flow Theory: A State-of-the-art Report. Committe on Traffic Flow Theoryand Characteristics (AHB45).

Herrera, J., D. Work, R. Herring, X. Ban, Q. Jacobson, and A. Bayen2010. Evaluation of traffic data obtained via GPS-enabled mobile phones: The MobileCentury field experiment. Transportation Research Part C, 18(4):568–583.

Hollander, Y. and R. Liu2008. The principles of calibrating traffic microsimulation models. Transportation,35(3):347–362.

Master en Logıstica, Transport i Mobilitat

Page 121: Cátedras Abertis | Cátedras Abertis...La Cátedra Abertis de la Universidad Politécnica de Cataluña (UPC) promueve la realización de seminarios y conferencias y la investigación

102 Master thesis

Hourdakis, J., P. G. Michalopoulos, and J. Kottommannil2002. A practical procedure for calibrating microscopic traffic simulation models by.

Huber, W., L. M. and R. Ogger1999. xtended floating-car data for the acquisition of traffic a information. Technicalreport, BMW Group.

Lighthill, M. J. and G. B. Whitham1955. On kinematic waves. ii. a theory of traffic flow on long crowded roads. Proceedingsof the Royal Society of London A: Mathematical, Physical and Engineering Sciences,229(1178):317–345.

Maerivoet, S. and B. D. Moor2005. Traffic flow theory. Technical Report 05-154, Katholieke Universiteit Leuven.

Ou, Q., I. Netherlands Research School for Transport, and Logistics2011. Fusing Heterogeneous Traffic Data: Parsimonious Approaches Using Data-dataConsistency, TRAIL thesis series. Netherlands TRAIL Research School.

Richards, P. I.1956. Shock waves on the highway. Operations Research, 4(1):42–51.

Rodrigue, J., C. Comtois, and B. Slack2009. The Geography of Transport Systems. Taylor & Francis.

Seo, T., T. Kusakabe, and Y. Asakura2015. Estimation of flow and density using probe vehicles with spacing measurementequipment. Transportation Research Part C: Emerging Technologies, 53(0):134 – 150.

Wardrop, J. G.1952. Road paper. some theoretical aspects of road traffic research. Proceedings of theInstitution of Civil Engineers, 1(3):325–362.

Wardrop, J. G. and G. Charlesworth1954. A method of estimating speed and flow of traffic from a moving vehicle. Proceedingsof the Institution of Civil Engineers, 3(1):158–171.

Yuan, Y., J. W. C. van Lint, R. E. Wilson, F. van Wageningen-Kessels, and S. P. Hoogen-doorn2012. Real-time lagrangian traffic state estimator for freeways. IEEE Transactions onIntelligent Transportation Systems, 13(1):59–70.

Master en Logıstica, Transport i Mobilitat