Everything we are doing wrong with AWS Lambda.
The cloud industry as a whole, AWS even more, is strongly pushing towards the serverless framework model. It isn't surprising given the plethora of benefits reaped from Serverless containers, storage, and even databases running independently of the underlying infrastructure.
At the core of all of AWS’s serverless landscape is its AWS Lambda offering. It is FAAS, meaning it executes code packages on various event-driven triggers, like API calls, AWS Kinesis streams, S3 file operations, and of course scheduled CRON jobs.
I am a strong proponent of AWS Lambda and have found it as a perfect technology for lots of my use cases. However, I’ve come across a few anti-patterns which we need to be conscious of before mass applying this new cloud computing model.
What is an anti-pattern?
The term anti-pattern was first coined by Andrew Koenig, where he describes
“An antipattern is just like a pattern, except that instead of a solution it gives something that looks superficially like a solution but isn’t one.”
With that fancy definition mentioned above, I would like to sum it up saying anti-pattern is something that, even with the right tool, instead of fixing the problem creates a brand new one !!!
Things to avoid like your ex with AWS Lambda: Serverless Antipatterns
#1. Dreamt of Distributed Microservices, settled for Distributed Monoliths !!
I’ve frequently run over designers having a consistent supposition that placing everything into a library implies that they will never need to stress over capacities utilizing incorrectly or obsolete execution since they all should be update their reliance on its most recent version.
Whenever you exercise an approach in which you adjust any actions consistent with all of your functions by upgrading them all to the same new version of your library, you move toward a possible threat of strong coupling.
With this, you lose one major benefit of serverless architecture- loose coupling, the ability to have all your functions evolve, and to manage them independently from each other.
Eventually, you may develop a system where a change in one of the function mandates the change in all of them.
If you think you won’t be able to keep your functions independent, maybe it’s time for you to consider another architecture approach.
How do you identify a distributed monolith?
a. A change to one function often requires a change to another function?
b. Deploying one function requires other functions to be deployed as well?
c. Are your functions overly chatty and communicating too much?
If answers to the above questions are YES, well my friend you are into a distributed monolith.
Solution:-
One of the preventive estimates you should remember is to ward off the business functionalities that are not pertinent to the primary functions. Functional separation is vital for the overall AWS Lambda performance, agility, and scalability of your application. Also, follow the DRY principle.
#2. Complex Carbs GOOD Complex Processing BAD.
Without a doubt, serverless is astounding when you need to execute little lumps of code. Be that as it may, because of its inalienable restrictions, executing a complex process would not be the most ideal use-case. For instance, image processing can be executed easily yet that isn’t a similar case with video handling.
Here the issue isn’t with the language of whether it can deal with it or not, however, it’s the confinements of the processing power for a single lambda instance function.
At present, the processing assets are truly confined and consequently, you’ll know about your serverless platform limitations.
Solution?
- Restrict the amount of data a function needs to process by reducing the size of the data package.
- Find out how your functions are using the allocated RAM while making sure about the right data structures to avoid unnecessary allocations.
- In AWS Lambda, use /tmp directory which is non-persistent file storage for the function to read and write.
Just as you would optimize your application on the component level, you’ll need to do this on a function level when it comes to a serverless architecture.
Note: Most of the serverless platforms offer temporary read/write data storage on a per-function basis (which is to be erased once the function ends). This is non-persistent file storage for the function to read and write. Effective use of this can be useful for storing temporary results and perform memory-intensive tasks.
#3. Sharing is caring?? Not with Serverless !!
Development teams are now expected to dissect business logic and build code blocks that are highly decoupled and are independently managed.
However, this expectation might not come off as expected, because you may come across scenarios where multiple functions require the same business/code logic.
But when you cross the boundaries between functions even though everything looks same, they are explicitly in different contexts, implemented by the different code, and probably using different data stores.
Consider the case, what happens when there is a major change in the AWS Lambda shared libraries. You’d be required to change dozens of methods of how your function’s endpoints work with this shared core logic. And this isn’t something you have accounted for in your custom software development cycle.
Also, by ongoing with shared logic attitude, you are crossing the isolation barrier, reducing the effectiveness of your serverless architecture, and hampering its scalability.
Solution?
A recommended suggestion here is to adhere to the DRY (DON’T REPEAT YOURSELF) principle since copy and paste is bad. Going back to the roots, ‘The Pragmatic Programmer’ describes DRY as “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.” In other words, this means to take nano-service to the extreme approach.
If you are dealing with just a bunch of Lambdas, this might be the best approach. However, if you have more than just a bunch of Lambdas, this would be a nightmare in practice.
What now? You may follow the Mama Bear approach (not too hot, not too cold) and list a limited set of Lambda functions that maps out logically how the client callers would consume them. It still left them with the problem of shared logic, however, rather than solving the problem through application design, they designed their own development workflow.
#4. Forced Async Calls within Serverless
When a Service A requires another Service B to perform its own task, a call will be made from A to B. While Service B is working, its parent Service A will be kept waiting. Since serverless architecture is billed on the basis of resources consumed, this is an antipattern.
This could be worse when you are chaining your functions. Another example, if your secondary function is making a call to a physical database, or anything that isn’t on the same platform or cloud, you are under the risk of getting a slow response, especially under a strained moment. At this moment there are two outputs:
- Your function is going to time out (AWS Lambda timeout is 900 seconds) and the task would be terminated.
- The significant increase in cost due to an increase in waiting time for your parent function to execute.
We use AWS Lambda so that we don’t pay for the idle but it isn’t that simple when it comes to asynchronous calls. Each function comes with its specific amount of resources and when the function is invoked, you’ll be billed for the amount of time it is running. Gotcha- it doesn’t matter whether you are using those resources or not.
Considering the asynchronous calls, you don’t pay for the idle but you pay for the wait! If you are using asynchronous code within your functions instead of making them single-threaded, you are using FaaS as a Platform to build servers.
Solution?
One potential solution would be to make sure that by the time your asynchronous requests are resolved your function will stay active. If it’s taking more than that, maybe you’re welcoming an antipattern. The trusty async-await is always welcomed here !!
#5. Serverless Big Data ETL Pipeline
As we move towards the serverless architecture, the process of handling the data and its security is becoming a critical concern.
Considering the fact that serverless functions are ephemeral or stateless, everything which function might need to process itself should be provided at the runtime. In typical cases, task payloads provide the tasks with the primary mechanism. In data payloads, data is pulled in from the queue, database, or other data source.
Although serverless providers might allow you to process and pass huge chunks of data with the help of a data payload it isn’t the wise thing to do. This not only reduces efficiency but also increases the surface area of data breaches. Less data means less storage & less transmission which leads to more secure systems.
Moreover, when you’re dealing with systems in which functions are calling other functions, message queues may be used as a random choice to buffer the work in the end. But architects need to be extremely aware of the level of recursion as they are more prevalent than one might think of.
Solution?
In this case, you’re dealing with serverless architecture, you need to critically analyze what and how much amount of data is passed. Functions should only receive data that is needed for its execution. That means you should only send the part of data and not the whole data instead.
This practice might be sufficient if you are dealing with a small amount of data. However, when you’re dealing with large or/and unstructured data, it’d be wise to transmit data IDs rather than data itself.
#6. Long Distance relationships and Long Processing Tasks? TRICKY.
The configuration of long tasks fairly impacts the overall performance of any app. Even after auto-scaling, these complex running tasks will hinder the performance of other long tasks to be operated under their stipulated time frames. And that’s why your functions will have certain runtime limits.
Considering the case of AWS Lambda, it has a timeout limit of 900 seconds while API Gateway will time out after 29 seconds. Here you may realize that Lambda runtime is useless since API Gateway will timeout after 29 seconds. The reason being, our frontend will call an API, and Lambda will be integrated with the backend.
Our major goal here is to process the request as fast as possible and quickly perform the long-running tasks in the background.
Solution?
The resource limits offered by most of the serverless platforms are sufficient to process the basic needs of its application. However, if your needs are advanced, you may opt for asynchronous processing. Perhaps also considering ECS or good old server-based approach can be a more available solution.
In conclusion, AWS Lambda is surely a tool of the hour with so many ways to do it wrong we need to analyze the way to use it in a more efficient way. This article also proves that AWS lambda isn't the best tool in every situation and there indeed are many ideal best use cases of AWS Lambda , but that a story for another day. Stay tuned !!