Executing State Machine Transitions with PyTransitions

Shubham PahadiaAug 10, 2023

This blog covers how Kyte’s Surfer Engineering team built the backend to develop the Surfer App, a core component of the Kyte experience that changes transportation as we know it. For context: Kyte Surfers deliver and retrieve cars from our customers. Over the last year, the Surfer Engineering team has been hard at work improving the delivery experience using the Kyte Surfer App.

Understanding the three surfer workflows:

Once a customer has booked a car, there are numerous steps that Kyte Surfers perform in order to ensure you get the vehicle in the smoothest possible method. Figure 1 details the 3 main workflows they perform. The delivery flow is the sequence of actions a Surfer performs to prepare and transport a car from the lot to the customer. Conversely, the return flow involves the Surfer bringing the car from the customer back to our lot. The swap is an amalgamation of both processes, where a Surfer delivers a car from our lot and exchanges it with the customer's current vehicle.

Surfer Workflows
Figure 1: Surfer Workflows

During Kyte's initial growth phase, the Surfer workflow relied on communication between our backend and the Slack API. This approach served as an excellent proof of concept, allowing us to focus on entering new markets and enabling us to scale the initial experience faster. However, as we continued to expand and add new market-specific features, we encountered several scalability challenges with the Slack approach. In light of this, we made the strategic decision to develop a dedicated Surfer app and transition away from Slack.

The first step in this transition was a complete backend service refactor. This involved building a state machine and an easily interactable service within the existing monolith. To achieve this, Kyte leveraged the pytransitions library and applied the State Machine design pattern to overhaul the original Surfer flow implementation. This article dives into the details of how Kyte’s Surfer Engineering team successfully achieved this significant improvement in our Surfer app development process.

Determining the technical requirements

There were 4 major requirements that we needed to address when re-factoring the service.

Handling multiple integrations

This refactoring process represents the first step in launching a new Surfer app that will handle Surfer-side interactions, including earnings, trip acceptance, Surfer flow, and more. To ensure a smooth transition, the migration of Surfers to our new application was carried out in multiple steps. As a result, the refactor needed to handle interactions between the original Slack implementation and the upcoming new application that we’ve built.

Avoiding strong coupling

The Surfer flow is deeply interconnected with several Kyte services, including customer-facing notifications and our delivery/return status service. This strong coupling was not sustainable and could have led to significant challenges if there were major changes in the flow due to experimentation, market-specific feature requests, or external modifications in the services the Surfer interacts with during the flow. The best way to achieve operational reliability was by implementing a facade layer. This allowed us to make unanticipated changes, added new steps, and created an abstraction without affecting other services or teams.

Covering all processes

As seen in Figure 1, the swap flow can be seen as a combination of both delivery and return processes, which can occur periodically when vehicle maintenance issues arise. However, the first implementation treated swaps as a completely separate entity, which resulted in the need to duplicate any changes made to the delivery or return flow for the swap flow as well. To improve our support for swaps, we were able to reuse specific steps and build the desired functionality without additional overhead.

Building an audit trail

The final challenge to address was the absence of an audit trail. We were reliant on our Customer Support team’s help to track state changes for the operational nature of our business. Additionally, the lack of sufficient data prevented us from identifying bottlenecks across Product and Ops. Better data would provide valuable insights into improving business operations and help us continue to grow. To rectify this and place less dependency on manual human updates, we established a platform that captured the various leg transitions, which enabled us to generate event log data that contained all relevant information collected from the Surfer flow.

Implementing the state machine

Brief primer on State Design Pattern and pytransitions

Since surfers follow a series of steps during the handover process for deliveries and returns, we opted to employ a state design pattern to build a finite state machine for the backend redesign. The fundamental concept in the approach is that at any given time, the program can exist in a finite number of distinct states. Each unique state dictates a different behavior and actions, and upon certain conditions being met, the state will transition to the next possible state until reaching the end. To implement this, we leveraged the pytransitions library to build state machines to implement these workflows.

In the below code snippet, we can define the state machine model states by either passing a list of strings to the Machine instance or defining each State object and adding it in. Once added, the State object will dynamically create enter and exit callback functions which are called whenever the state machine enters or leaves that state and these functions perform all the actions of the state. The add_transition function initializes all possible next state transitions and validations for a given state. The lightweight nature of the state machine library combined with its numerous extensions were beneficial during the prototyping phase. But over time, there were concerns about scalability and maintainability limitations due to the responsibilities entanglement of transitioning and performing actions for a state which would lead to poor testability and reusability.

state_a_on_exit(): #do something pass # States states = [State(name="state-a", on_exit=[state_a_on_exit]), "state-b", "state-c"] # Initialize the state machine machine = Machine(states=states, initial="state-a") machine.add_transition(trigger="next", source="state-a", dest="state-b")

Building an abstraction layer

To address aforementioned challenges within the pytransitions library, we constructed an abstraction layer above the pytransitions library and an additional facade layer over the state machine to interface with the rest of our systems. An abstraction layer is a streamlined interface to enable external services to engage with lower-level counterparts by obfuscating implementation details away. The main purpose of an abstraction layer is to promote modularity and improve code maintainability

As a company, we prioritize the capability to swiftly transition between different libraries or services based on factors like cost, performance, and resources. Dealing with a potentially outdated third-party library can pose a security risk if it hampers the upgrade of other dependencies or has issues itself. To gracefully and quickly pivot if exposed to this risk, we wrap the third-party library within our own interface, centralized in one abstraction layer. This streamlines maintenance efforts because updates and integration with the third party library are maintained within only the codebase that interfacing with the third party library and we can switch easily.

Another major benefit of an abstraction layer is increased simplicity due to modular design. Abstraction layers are invaluable for testing purposes, allowing the replacement of specific details with alternative sets, thus isolating the test areas and facilitating the creation of test doubles. Furthermore this approach eases the reusability and code readability with our project for other engineers, as they only need to understand the abstraction layer rather than the entire system.

For the surfer-flow case, we built our abstraction layer by taking the definition of a state from pytransitions, and breaking it down into 2 major components to separate the responsibilities of transitioning between states and the actions performed within a state. We created an abstract BaseStep class that handled the transitions while the BaseBehavior class is responsible for the actions performed. The flowchart in Figure 3 showcases how the whole system operates in unison. Both steps and behaviors inherit from the BaseStep and BaseBehavior abstract classes, respectively, and implement the required functions. In terms of the state machine and pytransitions, as the Surfer is stepping through our flow, various actions are being performed. Examples of these actions are sending notifications to either the customer or Surfer or getting directions to the handover address and driving there. Behaviors are consistent within a certain state, but they do not care about the next or previous state.

Kyte’s State Machine Implementation Flowchart{
Figure 2: Kyte’s State Machine Implementation Flowchart

What is a step?

Steps can be considered the actual states that are registered within a state machine using pytransitions. They are responsible for transitioning between states. Each step has a behavior registered to it based on certain parameters such as the market, the flow type, etc. Pytransitions dynamically initializes the callback functions for a state, so we built the step class on top to add validations and improve testing without having to deal with the state definitions within pytransitions. Each step contains a transition function which lists all the various next steps the state machine can proceed to, as seen in the code snippet below. When we call the state.next function, it will find the first step that meets the conditions for transitioning and then use the pytransitions.next function to transition to that step.

class StepA(BaseStep): @property def name(self): return "step-a" @property def transitions(self): return [("step-b", None)] class StepB(BaseStep): @property def name(self): return "step-b" @property def transitions(self): return [ ("step-c", self.condition_1) ("step-d", None) ] def condition_1(self, **kwargs): # implement some condition to decide if it should go to step-c return False

What is a behavior?

Behaviors are the meat of our state machine design and are where all the actual work takes place. There are two potential times for a behavior to execute an action within a state: either when entering the state or when exiting it. The on_enter_action and on_exit_action functions allow us to create unit tests for the functions as they are now abstracted away from the generic state machine creation by the pytransitions library. The pytransitions on_enter and on_exit callback functions sole responsibility now is to call these functions which we can test, modify, and re-use. This way, we can repurpose a Behavior implementation by registering it with unique Steps for different flows.

Besides state actions, there is an on_validation function that validates all required actions have been completed before transitioning. In the code snippet below, if the contract remains unsigned, a DeliveryUnsignedContractException is raised to prevent a transition. This exception can be utilized by the front-end to notify the Surfer that they need to have the customer sign the contract to proceed to the next step.

class DeliveryUnsignedContractBehavior(BaseLegFlowStepBehavior): def on_enter_actions(self, **kwargs) -> None: self.facade.send_customer_notification( leg_uuid=self.model.leg_uuid, notification_handler_name="delivery_unsigned_contract" ) def on_validation(self, **kwargs) -> None: rental_agreement = self.facade.get_rental_agreement(self.leg.trip.uuid) if not rental_agreement or not rental_agreement.is_signed: raise DeliveryUnsignedContractException() def on_exit_actions(self, **kwargs) -> None: self.facade.notify_signed_contract(leg_uuid=self.leg.uuid, skip_trip_channel=self.is_app)

Putting it all together

Referencing Figure 2 once more, each step is registered with an individual behavior in the BehaviorFactory. Subsequently, the steps are registered to corresponding state machine flows (i.e. delivery flow, return flow, and swap flow) within the state machine factory. Upon receiving input of the current state and based on the transition parameters, the state machine transitions to the next step. Our implementation has now split the responsibilities of a state to simplify testing each unit of the system. Now we can effortlessly make market-specific changes and fulfill new feature requests within a behavior without needing to touch the state transition logic within steps by taking advantage of the abstraction layer we built.

Conclusion

The endeavor to improve our surfer flow was a collaborative effort involving Engineering, Product, Communication Support, and Leadership. Rewriting an entire service is a delicate balance between prioritizing the values and components most crucial to the company and those that matter most to the customers. The pytransitions library laid a significant amount of groundwork to enable us to completely refactor the surfer flow service. Building upon this foundation, we followed the principles behind abstraction layers and state design patterns to build and encapsulate a robust service within the monolith. Although our approach enabled us to work efficiently on the app and provide a reliable and positive experience for our Surfers, there is still much work to be done to revolutionize the Surfer experience and transform transportation as we know it. As we expand into new markets and continue to grow, we anticipate encountering new challenges, identifying further improvements, and undertaking larger projects.

Shubham is a Backend Engineer at Kyte