Serverless functions are great for small tasks
Cloud-based computing using serverless functions is becoming more and more popular. The appeal of implementing new functionality is Serverless ComputingYou can use serverless functions to analyze incoming photos or process events from IoT devices. It’s fast, simple, and scalable. You don’t need to allocate or maintain computing resources – just deploy your application code. Amazon, Microsoftand Googleall providing serverless functionality.
Serverless functions make sense for simple or ad-hoc applications. But are they suitable for complex workflows that read and update persisted, mission-critical data sets? Consider an airline that manages thousands of flights every day. They rely on a scalable NOSQL data store ( Amazon Dynamo DB or Cosmos DB in Azure) can store data about flights, passengers, baggage, gate assignments, pilot schedules, and more. Serverless functions can access these data stores to process events like canceling a flight or rebooking a passenger, but is this the best way to implement the high-volume event processing that airlines rely on?
Issues and limitations
The strength of serverless functions – the fact that they are serverless – creates built-in limitations. By their very nature, they require overhead to allocate compute resources when they are invoked, and because they are stateless, they must retrieve data from external data stores, which slows them down even further. You can’t leverage a local in-memory cache to avoid data movement; data must always flow over the cloud network to wherever your serverless function runs.
When building large-scale systems, serverless functions also do not provide a clear software architecture for implementing complex workflows. Developers must enforce a clear “separation of concerns” in the code that each function executes. Creating multiple serverless functions can easily fall into the trap of duplicating functionality and evolving a complex, hard-to-manage code base. Serverless functions can also generate unusual exceptions, such as timeouts or quota limits, that must be handled in the application logic.
Alternative: Move the code into the data
To get around the limitations of serverless functions, you can do the opposite: move the code into the data. In-Memory Computing Runs code implemented by serverless functions. In-memory computing stores objects in primary memory distributed across a cluster of servers. Functions can be invoked on these objects by receiving messages, and can also retrieve data or save changes to a data store, such as a NO-SQL store.
Instead of defining serverless functions that operate on remotely stored data, you can execute functions by simply sending messages to objects held in an in-memory computing platform. This approach eliminates the need to repeatedly access data stores, speeding up processing and reducing the amount of data flowing over the network. In-memory data computing is highly scalable, allowing you to handle very large workloads with billions of objects. Highly available message processing also removes the need for application code to handle environmental exceptions.
Combined with the strengths of data structure stores, in-memory computing offers important advantages for structuring code that defines complex workflows. LadiesActor Model, etc. Unlike serverless functions, in-memory data grids allow you to restrict the processing of an object to the methods defined on its data type. This frees developers from having to deploy duplicate code across multiple serverless functions, and also removes the need to implement object locking, which can be problematic with persistent data stores.
Benchmark Examples
To measure the performance difference between serverless functions and in-memory computing, we ran a simple workflow implemented with AWS Lambda functions, Scale-out Digital Twinis a scalable in-memory computing architecture. This workflow represents the event processing used by an airline to cancel flights and rebook all passengers onto other flights. We used two data types: flight objects and passenger objects, and stored all instances in Dynamo DB. The event controller triggered the cancellation of a group of flights and measured the time it took to complete all rebookings.
In the serverless implementation, the event controller triggered a Lambda function to cancel each flight. Each “Passenger Lambda” rebooked the passenger by selecting a different flight and updating the passenger’s information. It then triggered a serverless function to confirm the removal from the original flight and add the passenger to the new flight. These functions required the use of locks to synchronize access to Dynamo DB objects.
In our digital twin implementation, in-memory objects for all flights and passengers were created dynamically when these objects were accessed from Dynamo DB. The flight object received a cancellation message from the event controller and sent a message to the passenger’s digital twin object. The passenger’s digital twin rebooked itself by selecting a different flight and sending a message to both the old and new flight. No locks were required in the application code, and the in-memory platform automatically persisted the updates to Dynamo DB.
Performance measurements showed that the digital twin handled the cancellation of 25 flights with 100 passengers per flight over 11 times faster than the serverless function. While the serverless function could not be scaled to run the target workload of 250 canceled flights with 250 passengers each, the ScaleOut digital twin handled 500 flights, double the target workload, without issue.
summary
Serverless functions are well suited for small ad-hoc applications, but they may not be the best choice when building complex workflows that need to manage many data objects and scale to accommodate large workloads. Moving your code to the data using in-memory computing may be a better option, as it minimizes data movement, improves performance, provides high scalability, and simplifies application design by leveraging structured data access.
To learn more about ScaleOut Digital Twins and test this approach to managing data objects in complex workflows, visit https://www.scaleoutdigitaltwins.com/landing/scaleout-data-twins.