Common Architectural Questions
These are a list of questions related to cloud-native architecture and application architecture.
Application Architecture
Questions
What are the benefits of a cloud-native architecture?
When should you not use the 12 factors?
With centralised logging, what are logging best practises and how do you differentiate logs from an application instance?
How do you differentiate traffic based on incoming geographic location?
What is your approach for capacity planning and how would you run a capacity planning roadmap workshop for a client?
What are the techniques you would use to optimise an application for performance?
What are the differences between continuous integration, continuous delivery and continuous deployment?
What does the acronym SOLID stand for in OO design?
Are you familiar with the principals of the Twleve Factor App?
What is your approach for hardware sizing for an application?
Based on your experience, what are the challenges you have seen with software delivery?
What are the challenges in breaking up a monolithic application into microservices?
What are your recommended steps from development to production?
Answers
- Benefits of a cloud-native architecture?
Cloud-native applications are stable, reliable, scalable, fault tolerant, performant, monitored, documented and prepared for any catastrophe.
Regardless of the number of execution environments from development to production, because we use a single code based and because all dependencies for the application and its environment are formalised, the deployment is simplified and even mostly automated. Because deployments are easier, upgrades can be pushed out and deployed more frequently. We can have continuous deployments for increased agility. Keeping the execution environments simple and formalising dependencies also allow new developers to get up to speed faster. Having dependencies formalised into a containerised environment also provides portability from the underlying execution platform and operating system flavour. We also get these benefits because 12 factored apps read their configuration from environment variables.
Cloud native applications are designed to run multiple instances concurrently by being stateless and treating backing services as exchangeable. This allows the execution platform to make the application elastic by scaling up instances as demand grows and reducing instances as demand drops. Being stateless and also providing health probes, it allows the platform to make it more resilient by spinning up new instances when an existing application instance has a problem.
The principal behind going with a micro-services architecture is that teams working on these independent micro-services are fairly independent and optimised to handle software decisions at a team level. E.g. refactoring code. The team size of 5 to 8 people has been established to be ideal for human beings from an anthropological perspective to work effectively as a team.
By having external interfaces through port bindings provides a standardised way of interfacing with the application and even allows level 4 based connections such as load balancers and routers to interface with the application. This allows the load balancer to distribute the load and https endpiont encryption on these routing systems and for the application to run on a different physical systems if the need arises.
- When do you not use the 12 factor app methodology?
If you have an existing legacy application that is not going to be running on the cloud or a similar execution platform.
If the objective for the application is heavily around persistence, like a database or a file sharing application, going cloud-native may not be the best choice.
- What are the best practises for logging?
We should be logging as much relevant and contextual data when coding. It should be able to identify functions and methods where an error occurred. It should also be able to identify the id of the running thread for the application. See answer 2 in Answers on Java for more details.
- Traffic from different geographic locations.
One of the methods used for routing traffic based on geolocation is through the DNS service that has a geolocation routing capability. Geolocation works by mapping IP addresses to locations. However, some IP addresses aren’t mapped to geographic locations, so even if you create geolocation records that cover all seven continents, The DNS server will receive some DNS queries from locations that it can’t identify. You can create a default record that handles both queries from IP addresses that aren’t mapped to any location and queries that come from locations that you haven’t created geolocation records for.
The other way to do this is that the proxy web server such as NGINX can have a geoip module that looks at that translates the IP of the incoming request into a region. Based on geoip rules, we can then route the request to the right service.
- What is your approach to capacity planning and providing a roadmap on this to the client?
- 5a - What is capacity planning?
Let’s go with the assumption that the workload is entirely stateless and is able to scale horizontally without an external resource like the database being the bottleneck.
Using Requests-Per-Second (RPS) for incoming traffic, we have to understand the steady and peak loads. We also need to know our micro service’s average response time. This can typically be around 100ms which allows the micro service to make a few queries to the database and read the HTML or JSON response back to the client. Thus in a second, an micro service can respond to 10 requests. The RPS is 10.
If the peak incoming RPS is 2000, then we would need 2000/10 = 200 micro service instances for peak load. If the micro service’s web server is able to fork these requests into 10 threads, then, we only need 20 instances of the micro service. This is the basic calculation for deciding how many micro service instances are required.
Usually for a spring boot web micro service, it would require about 0.5 to 1 CPU and 1GB of RAM. Its best to do a test on the micro service to find this out. If the micro service is not yet available, we would need to estimate this based on similar micro service, or do a POC to determine this. So, if we need 20 instances of the micro service, then, we would need about 20 CPUs and 20GB of RAM. So we probably need a server with about 24 CPUs and 24GB of RAM, or because we need resiliency, we would need 2 servers with 12 CPUs and 12GB of RAM each. From there we would work out the cost of the servers if it is on-premise and the computing cost if it is on the cloud.
From a roadmap perspective, we would need to understand how many of these micro-services are required for the initial launch of the application. As each phase of the project goes by, we also need to determine the increase in the number of micro-services required, or the increase in CPU and RAM required for existing micro-services, as well as changes to the peak RPS. This would then determine how we present our estimates to the customer based on our approach.
We also need to have in place measures in case the load goes above our estimated peak of 2000 RPS, especially if the platform has hard limits on computing resources. We may have to look at the circuit breaker pattern which marks a service as unavailable and diverts the incoming request to a try again web page.
- 5b - Capacity planning roadmap workshop for the client.
Capacity planning has to go hand in hand with optimising the architecture of the application, instead of only from a capacity perspective.
- What are the differences between continuous integration, continuous delivery and continuous deployment?
CI focuses on making and preparing a release easier. CD can mean continuous delivery or continuous deployment.
With CI, developers are merging their new changes back into the main version control branch. These changes are validated, a package is built and automated testing is carried out. CI’s focus is to check that the application is not broken when new code is introduced into the main branch. CI needs to pass before we go to either of the CDs.
With Continuous Delivery ensures that you can release new changes to your client quickly in a sustainable way. Hence, after automated testing, you have an automated release process where with a manual trigger, you can deploy the application quickly.
Continuous Deployment takes it one step further where every change that pass all stages of the production pipeline is released to your customer. There is no human intervention and only a failed test will prevent a new change to be deployed to production.
- What does the acronym SOLID stand for in OO design?
S - Single-responsibility principle: A class should have one and only responsibility.
O - Open-closed principle: Objects or entities should be open for extension, but closed for modification.
L - Liskov substitution principle: Objects in a program should be replaceable with instances of their subtype without having to alter the correctness of that program, i.e., the logic in the program should be generic enough to work on derived classes - it can support polymorphism.
I - Interface segregation principle: Do not add additional functionality to an existing interface that change it interface, e.g., new methods. Instead, create a new interface and let your class implement multiple interfaces if needed.
D - Dependency Inversion Principle: High level modules should not depend on low level modules as both should depend on abstractions. Vice versa, abstractions should not depend on details.
- Based on your experience, what are the challenges you have seen with software delivery?
The common challenges I find in the Singapore context is that developers design their code to pass unit testing without any regard to how their code works in the integrated system. This usually occurs because there are not enough team leads who do not have enough time to do proper code review and rectify these changes early on during the design of the code.
In the same way, I also see problems in the design of the overall architecture where things like performance and security are an after thought in the whole design. All the various aspects need to be factored in during the design phase, i.e., business requirements, user interface design, identifying the micro services and their interfaces, security considerations, remove access, high availability, redundancy and load balancing, performance, database design. In short, we need to design a software a lot like how the Japanese do design and planning, which means have several iterations for the design and ensure it is fairly solid. Have a good idea of what third-party products will be used by the applications and how the whole application will work through use cases. Because, the later we find issues with the application, the more expensive it becomes to change these.
- Recommended steps from development to production?
Pre-commit code review which can be part of the IDE or a standalone tool on the development environment, unit testing on your local development environment, integration testing on a test environment that mirrors the production environment, document changes and functionality in wiki, commit into centralised version control, CI/CD then takes over with automated source code audit, daily build, deploy to test server, comprehensive automated testing, canary release followed by blue/green deployment to production.
Java
Questions
Can you explain your approach for a java based cloud-native architecture?
What are the best practises for logging in java?
Answers on Java
- Best practises for logging in Java.
Objects should have a toString() method, if one does not already exist, which can be used in logging to make it easier to go through the logs. We could even use Java’s reflection capabilities to display attributes of the object.
Log a transaction ID. This can come from the servlet context or the thread ID. The logging pattern can contain the time and the thread ID. Log4j2 provides Mapped Diagnostic Context (MDC) or Thread Context Map for this purpose. It provides a unique id based on the request from each client. MDC stores the context data of the particular thread where the context is running.
For each method, the start of the method should print its method name and input parameters. When we analyse the logs, we have a good idea of which function was called before an error occurred.