Projects

Automated Model Checking of Distributed Cyber Physical Systems Software

Developed an automated model translation and generation tool for validation and timing analysis of software components for distributed cyber physical systems using Timed Automata and UPPAAL. The tool uses AST based respresentations to parse Python source code of individual software compnents of a distributed system and then merge them with user specified timing properties in the form of specially formatted comments, to generate a composite network of timed automata processes. It preserves the communication patterns of the individual software components using timed automata features suc as channels and synchronization labels. The generated automata can then be read using UPPAAL model checker and verified using formal verification queries such as the stimulus-to-response delay, buffer overflow etc. The tool was tested on a distributed data streaming and processing workflow using two different deployment architectures. This work is currently submitted for review at the 2023 IEEE International Symposium on Real-Time Distributed Computing.

Resilient Deployment Solver for Distributed Software Components

Built an automated deployment framework for optimal allocation of distributed software components to remote edge computing hardware platforms using SMT constraints and Linear Optimization, subject to fault tolerance requirements and resource constraints. The input to the solver was a set of user supplied deployment specifications and hardware resource specifications using a custom TextX based language. Two optimization modes were implemented, one for maxmizing redundancy and one for minimizing te deployment cost. A dedicated testing environment was created to rapidly deploy and test the solved configurations using Mininet, a virtual networ emulator. The testing environment coulld be controlled to introduce artifical faults at specific network nodes or links at specific time nstances using abehavior model script. The effect of the deployment configuration on the fault-tolerance capability of the system was demonstrated on a microgrid energy management system. It consisted of distributed loads, batteries, and electric vehicle chargers in a microgrid connected to a centralized aggregator which calculated the optimal power allocation by executing a convex optimization algorithm using demand predictions with LSTM neural networks. Our work was presented in the 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS).

Generating Sparse Communication Topologies to Improve the Performance and Fault-Tolerance of Disributed Peer-to-Peer Algorithms

Developed a tool to generate virtual communication graphs between distributed peers based on any user specified topology, using Python and networkx. The topology could either be selected from a library of known types such as star or ring or could be custom built by describing the nodes and links, using a custom input language. Distributed algorithms mostly use fully connected communication for peer-to-peer algorithms that can have network badwidth isuues as they scale. Thus, sparse communication can be applied in such scenarios where only a subset of the available links are used. However, that leads to less information being exchanged, that can impact the performance of peer-to-peer algorithms. In order to strike a balance between the two, an algorithm called Bounded Path Dissemination was introduced that can dynamically alter the communication topology such that complete information dissemination is achieved within a bounded hop distance within a sparsely connected graph. The algorithm also contained fault-tolerance protocols to reconfigure the graph in the event of node crashes. The work was part of an internship at Siemens and was presented at the 2022 IEEE International Conference on Computer Communications and Networks (ICCCN)

Fault-tolerant Loadshedding using a Decentralized Software Platform

Worked with open source Resilient Information Architecture Platform for Smart Grids (RIAPS), an integrated communications and control framework for distributed component-based software development and deployment to design fault-tolerant control algorithms. This work investigated the fault management architecture that is required across the physical, platform and application layers to design fault-tolerant systems. The various fault detection and mitigation services that the platform can provide to the application developers which can then combine with the application logic were investigated. The application of the said features and services were then demonstrated on a practical loadshedding application which was augmented with fault-tolerance features within the control logic. The setup was implemented on a modified IEEE 13 bus distribution system simulated using Gridlab-D, with A distributed priority-based load disconnection and reconnection algorithm being executed on a set of 32 embedded boards, each controlling a load. The performance of the loadshedding logic was evaluated subject to various faults in the computing infrastructure. Our work was published in the Journal of Systems Architecture, Volume 109.