Data Science

Data science, deep learning,
and systems analytics
for the planning and management
of complex systems/networks.

Integrated infrastructure and environment/energy systems modeling and advanced decision methodologies (e.g., low carbon and low emission transportation systems; complex systems of coupled transportation, environment and energy networks)

Intellectual Merits:

The analysis, design and operation of complex systems rely extensively on the generation and processing of large datasets, systems analytics, and optimization. On the other hand, The large datasets generated by real-time monitoring and model simulations can be analyzed using data analytics techniques to discover new knowledge in the form of useful relationships between design variables, sensitivities of results to different model parameters, etc. As the amount of data generated by these models continue to grows, it becomes clear that big data and high-performance computing must be integrated into the toolkits of systems science and engineering. This sub-area of CUTES research 1) uses data science and deep learning techniques for real-time size-resolved particle emissions modeling; and 2) develops novel systems analytics tool and modern optimization models (e.g., optimal learning, surrogate optimization, etc.) for transportation network design that take into account environmental externality proactively. In our research in data science and deep learning for real-time emissions modeling, we address the significant modeling challenges (e.g., the confounding factors of weather, changing traffic conditions, driver behavior and other factors cannot be controlled in on-board emissions measurement; influence of the aerosol residence time within the exhaust and dilution system, etc.) to develop robust predictive PM number emissions models. Intellectual contribution of this research includes: (1) Delineating the range of second-by-second operating conditions experienced during operations on real-world driving routes and establishing quantitative relationship between engine operating parameters and size-resolved particle emissions; (2) Developing robust mobile emission models that quantify the relationships between particle number emissions and transportation variables—facility type, road grade, and level of congestion, etc.; (3) Establishing understanding of the vehicle processes responsible for high-emitting episodes; and (4) Developing novel statistical methodologies for mobile emissions modeling that account for multiple sources of variability, correlated measurements, particle size-distribution, and nonlinear dynamics due to inherent non-linearity as well as variable time lags.

In our research on systems analytics and optimal learning for the management and optimization of complex infrastructure systems/networks, the environmental goals are sought proactively in the strategic transportation infrastructure investment planning stage. In particular, we advocate proactive integration of the emission reduction objective into transportation modeling and network design problems (NDPs). We first develop a comprehensive Bayesian Ranking and Selection (R&S) modeling framework for single-objective Network Design Problem with Uncertainty (NDPU). This is then further extended to the Bayesian R&S model for Multi-Objective discrete Network Design Problem with Uncertainty (MONDPU), an emerging area in transportation planning due to the need for sustainable transportation systems. We define a multi-objective version of the Knowledge Gradient policy with Correlated Beliefs which uses a crowding distance metric to ensure the diversity of the Pareto optimal front. Results showed that our multi-objective Bayesian R&S model is able to identify a very diverse set of highly optimal solutions under very limited budget, significantly out-performing the bench-marking NSGA-II algorithm in both solution quality and practicality. The models provide an innovative statistical learning perspective to NDPU, which has mainly been studied as an optimization problem. The new formulation is intuitive to understand and easily applicable to similar discrete optimization problems such as the Optimal Sensor Location problem, Uncapcitated Fixed Charge Facility Location problem, etc. The global Bayesian belief structure and the sequential value of information sampling policies make the model especially efficient for blackbox, gradient free optimization problems where the evaluation of each objective value take up the majority of the computational burden. We believe the models themselves as well as this unique statistical perspective are of great interest to and value for transportation network modelers and simulation optimization practitioners. In a recent study on sustainable design of the transit systems in grid urban networks using the macroscopic fundamental diagram, we propose a continuum approximation model that optimizes the network structure (line spacing and stop spacing) and the operating characteristics (headway and fare) of the transit system by minimizing a linear combination of (1) the generalized cost that users experience in their trips, (2) the operating cost of the transit system for the agency, and (3) the external cost of the emission in the urban region. On this basis, the optimal design of the transit system can be derived by minimizing the total cost of the transportation system in three different network allocation scenarios: (i) mixed network (Bus), (ii) dedicated lanes (Bus Rapid Transit), and (iii) parallel network (Metro).