July 25, 2022

Growing Pains: Scaling and Re-architecting Systems Under Fire!

Victoriya Kalmanovich
Software Engineering Manager

For the past couple of years, I’ve been leading software teams in a company called Aspectiva. Walmart acquired the company, and as happens to many companies post-acquisition, they must make changes to fit the new world’s new requirements.

When Aspectiva first started, we designed and built a reviews collection app that collected reviews from all over the web. The app’s purpose was to bring in volumes of data from which our NLP models could benefit. We built the app in a startup mode where we needed to make things happen fast, using the most straightforward stack we could find, which fit the first software engineer’s expertise. We didn’t have time to waste and needed to get things done quickly. The objective of this app was to get reviews from external sources, a term known as syndication. When building the app, we created one flow where everything was coupled to everything. For example, if you would try to save a review to the DB and the DB connection was lost before saving the flow processing, the whole processing would get lost.

After Walmart acquired Aspectiva, we weren’t sure if the syndication app would still be an asset worthy of maintaining. Walmart didn’t depend on us to collect reviews (you can learn more about it here), we were not acquired based on our syndication capabilities, so we weren’t sure that this app would ever come to use again. Eventually, we found a solid business justification and a meaningful use case where our app could make a huge impact. So not only did we need to keep the app going, we suddenly needed it to be very reliable, scalable, and maintainable.

So we started by mapping out the main business units the app handled. The next step was to decouple huge components into smaller single-responsibility units within the app. Decoupling the components was essential to properly implement the separation of concerns (SOC) design principle. For example, separating the user handling component into an isolated component helped us when the user handling flow had a critical bug. It was much easier to spot the bug in one specific flow, and we didn’t need to spend great efforts looking for it in a vast complex app. To implement this separation of concerns, we needed to do considerable refactoring while the app was still required to be on the air and very much usable. We started by prioritizing the refactoring necessary for each business unit according to the complexity and overall effect on other components. We needed to balance writing components from scratch and utilizing existing business logic in various components we only needed to extract into their own components. Last but not least was the extensive detailed design and implementation for each business unit, which eventually resulted in a single app that had decoupled business units.

After we completed this process and our services were neatly mapped out and extracted into separate components within the app — the app broke. Our code was amazing, but the scale we worked with before the acquisition was completely different from the scale we needed to support now. So no matter how beautiful our code was, it didn’t fit the new requirements. We quickly understood that some of our components needed more resources than others. The right way to handle proper resource allocation between our components was to extract them into microservices. It was relatively easy because we already had our services decoupled within the app. Working with microservices in the new environment helped us with better resource allocation per service but, hand in hand, introduced a new set of challenges from the decentralized complex systems world.

Once we overcame this new set of challenges, we suddenly needed to face the fact that our data collection app, which only served the purpose of collecting reviews in the background, now required to change its face and become user-friendly. At that point, we faced a challenge different from all other challenges we’ve seen before. By that time, we already knew how to scale our products, how to refactor our code efficiently, learned best practices for working with microservices, and how to define new business logic in a way that would be decoupled from all other business logics — but working with users? That’s one hell of a challenge. We started by mapping the user requirements and needs from the system as they used the UI utterly different from how we used it. We had to rethink our metrics and understand how we plan to provide the proper support. Being user-facing introduced us to ramping up our frontend abilities and reading user needs correctly, but also introduced us to urgent user bugs, support SLAs, and defining a good UX. Even though this challenge seemed different than what we’ve done before — we managed to use our strengths to overcome it. We knew our forte was data collection and understanding, so understanding our data helped us understand the user needs better. Our data-driven approach led us to look at the user bugs as an opportunity. We clustered the issues based on similarities, understood which of our services contained the main pain points, and handled them appropriately.

Our data collection app started as an internal tool built as a proof of concept, continued with significant architecture changes caused by huge demand, and is now pivoting again towards being user-facing. These steps we took at every turn helped us understand our product in view of the new business needs and tame the changes. We managed to re-architect while still being in motion and delivering business value, and we can’t wait to see what new challenges the future holds for us!

Link for the Post on Medium