NewsCast — Using Sagas with Choreography and Orchestration Patterns — Kotlin example

NewsCast — Using Sagas with Choreography and Orchestration Patterns — Kotlin example

The Saga design pattern has amazing benefits in terms of resilience. We can do that with the Eventuate libraries and Kotlin. Here is how!

1. Introduction

Sagas are enterprise integration patterns used in event-sourcing architectures. They are useful when resilience, load capacity, and performance are important. Telecom companies use this a lot in order to pass data around in order to keep important metrics such as the duration of a request, the number of simultaneous requests possible, how responsive the application is, and how resilient it is compliant with SLAs (Service Level Agreements). This pattern was first brought to the public in 1987 by Hector Garcia-Molina and Kenneth Salem. This idea came to mind to solve a problem related to LLTs (Long-Lived Transactions). These are transactions that are thought out to be A.C.I.D., but took an extended amount of time. The keywords here are Atomic, Consistent, Isolation, and Durability. Doing this for transactions that consumed a lot of resources, would cause LLTs to last very long, even days or weeks. This obviously presented a latency problem and also excessive resource consumption. By splitting a transaction, it would not be possible to comply with Isolation or Atomicity. The risks of losing consistency would also be high. In order to leverage the LLTs and transform them into manageable transactions and release resources as soon as possible, they designed a system, with two variants, with a defined layout plan to perform specific operations in reaction to failure. In an LLT, this would be the rollback. In Sagas, these are intelligent processes capable of rollback or bringing the system to the desired state, should any of the sub-processes fail. These sub-processes are still Transactions and they are strictly compliant with A.C.I.D. paradigms. We will explore the example I’ve created in GitHub.

2. Case

We want to store in our database, all the news from a particular news feed. Then we want to allow users to comment on them and we want to control all error flows we may encounter. We also want to shorten response times as much as possible, provide High Availability, and resiliency and make sure that users get their comments through as fast as possible.

3. Project layout

For this project, we are going to use the eventuate framework, combined with the spring framework:

Generic Flow Diagram

In the schema, we can see two important main sections. The first is composed of the Fetcher and the Mock Feed. These two provide raw data to our Saga architecture. As a default run, I’ve set the Fetcher to run every minute for a maximum of 30 seconds until it gets 100 news messages. This means that either it will complete 30 seconds with under 100 messages or it will complete earlier with a total of 100 messages. Since this is not an online social media platform or real news feed, you will always get 100 messages in this example. It is not the goal of this article to demonstrate how this works, but I think it is important that you get an idea that behind all of this, there are three running threads. One is responsible to keep checking a Queue for incoming messages. The second is responsible for making requests to the mock news feed. Finally, the third one will stop the whole process if 30 seconds have already elapsed. Once we have picked up our data once, we can continue exploring sagas. The fetcher process will continue indefinitely. This is the second part of the diagram. Our saga implementations will consume payloads like the one in this example:

{
  "idPage": 1,
  "pageComment": "I love this",
  "idAuthor": 2,
  "authorComment": "This is my favourite author",
  "idMessage": 3,
  "messageComment": "I agree",
  "authorRequestId": 123,
  "pageRequestId": 456,
  "messageRequestId": 789
}

What’s important to know about this payload is the idPage, idAuthor, and idMessage. These IDs are not programmed to be foreign keys, and so when sending comments to be attached to pages, authors, and messages, they should be able to match existing data. If not, the data will be registered as not available. There is a hierarchy and that is Page->Author->Message. For example, if the Author does not exist, the Message comment will not be recorded, but Page and Author comments will, and they will be marked as not available. This sort of case of course does not really exist in reality. This is just a made-up case to show how Saga can be used to our benefit.

4. Saga in practice

Let’s focus in detail on the goal of this article. We want to see how sagas can work for us. Sagas are also a way to decouple the client request from the actual processing. The client makes a POST request and the Saga will make sure that the request gets to the database. If however, something should fail, then the Saga must have the intelligence to perform rollbacks or other predictable actions.

Saga focused Flow Diagram

From the outside, we can see that the decoupling is provided via a Streaming engine. In our case, we will use Kafka. For streaming purposes, we can use whatever mechanism we want. It is not mandatory to use Kafka. Please check the eventuate.io website to find out more about other compatible mechanisms. A better to visualize what will happen behind the curtain is to look into this sequence diagram.

Sequence Diagram

In this diagram, we can see that when we make a request, using any of the two Saga types described, our request will go to a database. It gets persisted and the only way to continue is to ship it to a stream. In our example, it goes to a Kafka stream. The eventuate team has created a CDC service, which does this. However, for the purpose of this example, it was shown to be quite complicated to manage and this is why I created my own CDC-mocked version. Essentially it picks the data from a table called messages and sends the non-published ones exactly as they are into the Kafka streams. We’ll see later in this article how this works in detail. Once the CDC picks the messages and ships them into Kafka, our Saga code will pick it up in another thread and continue executing the Saga. At this point, our user has already received a 200 OK meaning that the message is being handled. Finally, if we check the database for comments, we will see the results according to what was sent. Maybe we'll see comments marked as not available or maybe we’ll see comments completely and correctly handled.

4.1. Eventuate CDC Service

The implementation of the CDC service is nothing more than an implementation of the Kafka client. For that we create a KafkaProducerFactory:

class KafkaProducerFactory {
    companion object {
        fun createProducer(brokers: String): Producer<Long?, String?> {
            val props = Properties()
            props[ProducerConfig.BOOTSTRAP_SERVERS_CONFIG] = brokers
            props[ProducerConfig.CLIENT_ID_CONFIG] = CdcConstants.CLIENT_ID
            props[ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG] = LongSerializer::class.java.name
            props[ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG] = StringSerializer::class.java.name
            return KafkaProducer(props)
        }
    }
}

The message content and shape, really do depend on what is being registered in the database. For the Eventuate Saga implementation, we first need to create a database. The scripts for this database are available on their website, in many locations. I’ve summed them all up here:

-- from:
-- https://github.com/eventuate-tram/eventuate-tram-sagas/blob/master/postgres/tram-saga-schema.sql
CREATE SCHEMA IF NOT EXISTS eventuate;
DROP Table IF Exists eventuate.saga_instance_participants;
DROP Table IF Exists eventuate.saga_instance;
DROP Table IF Exists eventuate.saga_lock_table;
DROP Table IF Exists eventuate.saga_stash_table;
drop table if exists eventuate.message;
drop table if exists eventuate.received_messages;
drop table if exists eventuate.cdc_monitoring;
CREATE TABLE eventuate.saga_instance_participants
(
    saga_type   VARCHAR(255) NOT NULL,
    saga_id     VARCHAR(100) NOT NULL,
    destination VARCHAR(100) NOT NULL,
    resource    VARCHAR(100) NOT NULL,
    PRIMARY KEY (saga_type, saga_id, destination, resource)
);
CREATE TABLE eventuate.saga_instance
(
    saga_type       VARCHAR(255)  NOT NULL,
    saga_id         VARCHAR(100)  NOT NULL,
    state_name      VARCHAR(100)  NOT NULL,
    last_request_id VARCHAR(100),
    end_state       BOOLEAN,
    compensating    BOOLEAN,
    saga_data_type  VARCHAR(1000) NOT NULL,
    saga_data_json  VARCHAR(1000) NOT NULL,
    PRIMARY KEY (saga_type, saga_id)
);
create table eventuate.saga_lock_table
(
    target    VARCHAR(100) PRIMARY KEY,
    saga_type VARCHAR(255) NOT NULL,
    saga_Id   VARCHAR(100) NOT NULL
);
create table eventuate.saga_stash_table
(
    message_id      VARCHAR(100) PRIMARY KEY,
    target          VARCHAR(100)  NOT NULL,
    saga_type       VARCHAR(255)  NOT NULL,
    saga_id         VARCHAR(100)  NOT NULL,
    message_headers VARCHAR(1000) NOT NULL,
    message_payload VARCHAR(1000) NOT NULL
);
-- from
-- https://github.com/eventuate-tram/eventuate-tram-core/blob/master/eventuate-tram-in-memory/src/main/resources/eventuate-tram-embedded-schema.sql
CREATE TABLE eventuate.message
(
    ID            VARCHAR(1000) PRIMARY KEY,
    DESTINATION   VARCHAR(1000) NOT NULL,
    HEADERS       VARCHAR(1000) NOT NULL,
    PAYLOAD       VARCHAR(1000) NOT NULL,
    CREATION_TIME BIGINT,
    PUBLISHED     BIGINT
);
CREATE TABLE eventuate.received_messages
(
    CONSUMER_ID   VARCHAR(1000),
    MESSAGE_ID    VARCHAR(1000),
    CREATION_TIME BIGINT,
    PRIMARY KEY (CONSUMER_ID, MESSAGE_ID)
);
create table eventuate.cdc_monitoring
(
    reader_id VARCHAR(1000) PRIMARY KEY,
    last_time BIGINT
);

If we take a good look at the Message table, we can see all the important fields for the CDC payload to Kafka. We need ID, which is used internally by the eventuate framework, the headers, the payload, and we use published to determine if the message has already been sent to kafka or not. I chose 0 for not and 1 for having been sent. This way we can easily create our kafka client using the apache kafka libraries:

@SpringBootApplication
@EnableScheduling
open class CdcProcessLauncer(
    private val messageRepository: MessageRepository,
    @Value("\${org.jesperancinha.newscast.host.kafka.brokers}")
    private val brokers: String
) {
    private val producer = KafkaProducerFactory.createProducer(brokers)
    @Scheduled(cron = "0/5 * * ? * *")
    fun fetchAndPublish() {
        messageRepository.findAllByPublishedIs(0).forEach {
            val objectMapper = ObjectMapper()
            val headers = objectMapper.readTree(it.headers)
            val command = KafkaCommand(it.payload, headers)
            val commandPayload = objectMapper.writeValueAsString(command)
            val record = ProducerRecord<Long?, String?>(it.destination, commandPayload)
            producer.send(record).get()
            messageRepository.save(it.copy(published = 1))
            println("Sent: $commandPayload")
        }
    }
    companion object {
        @JvmStatic
        fun main(args: Array<String>) {
            SpringApplication.run(CdcProcessLauncer::class.java, *args)
        }
    }
}

4.2. Saga Choreography

A Saga choreography is very much dependent on events and event handlers. There is usually no single defined structure on how code is supposed to intervene:

Saga Choreography Domain Flow Diagram

In the diagram above we see that we have different events, which wrap the same type. This is the NewsCastComments:

data class NewsCastComments(
    val idPage: Long? = null,
    val pageComment: String? = null,
    val idAuthor: Long? = null,
    val authorComment: String? = null,
    val idMessage: Long? = null,
    val messageComment: String? = null,
    var authorRequestId: Long? = null,
    var pageRequestId:Long? = null,
    var messageRequestId:Long? = null
)

If we want our chain to act as a Saga, it needs to share the same payload. Think of this as a recipe, where you create an ingredient, and you let it flow through your recipe. It will never leave the recipe, it can be modified but it will be there until the end. That whole schema above can be simplified in the following code:

class NewsCastEventConsumer(
    private val domainEventPublisher: DomainEventPublisher,
    private val newsCasePageCommentService: NewsCastPageCommentService,
    private val newsCastAuthorCommentService: NewsCastAuthorCommentService,
    private val newsCastMessageCommentService: NewsCastMessageCommentService,
    private val pageService: PageService,
    private val authorService: AuthorService,
    private val messageService: MessageService,
    ) {
    private val logger = KotlinLogging.logger {}
    fun domainEventHandlers(): DomainEventHandlers {
        return DomainEventHandlersBuilder
            .forAggregateType("org.jesperancinha.newscast.saga.data.NewsCastComments")
            .onEvent(NewsCastEvent::class.java, ::handleCreateNewsCastCommentEvent)
            .onEvent(NewsCastPageCommentEvent::class.java, ::handleCreatePageCommentEvent)
            .onEvent(NewsCastPageRejectCommentEvent::class.java, ::handleRejectPageCommentEvent)
            .onEvent(NewsCastAuthorCommentEvent::class.java, ::handleCreateAuthorCommentEvent)
            .onEvent(NewsCastAuthorRejectCommentEvent::class.java, ::handleRejectAuthorCommentEvent)
            .onEvent(NewsCastMessageCommentEvent::class.java, ::handleCreateMessageCommentEvent)
            .onEvent(NewsCastMessageRejectCommentEvent::class.java, ::handleRejectMessageCommentEvent)
            .onEvent(NewsCastDoneEvent::class.java, ::handleDone)
            .build()
    }
...
}

4.3. Saga Orchestration

A Saga orchestration has a very different shape but is quite similar to a Saga choreography. In the previous case, all events and handlers have to be very well choreographed with each other. Essentially this is why it is called that way. A handler needs to know which event to send in each circumstance. In the case of the Saga orchestration, there is a plan forward and a plan backward. There is no complicated way to define rollbacks.

Saga Orchestration Domain Flow Diagram

In the example above, we see that when we move forward in processing our data using different participants, we go through different handlers. In this case, they are also triggered, but instead of by events, they get called by commands. It’s just another name for something that almost does the same.

class CreateCommentSaga : SimpleSaga<NewsCastComments> {
    private val logger = KotlinLogging.logger {}
    private val sagaDefinition = this.step()
        .invokeLocal(this::startSaga)
        .step()
        .invokeParticipant(this::recordPageComment)
        .onReply(PageComment::class.java, this::savedPageComment)
        .withCompensation(this::rejectPageComment)
        .onReply(PageComment::class.java, this::rejectedPageComment)
        .step()
        .invokeParticipant(this::recordAuthorComment)
        .onReply(AuthorComment::class.java, this::savedAuthorComment)
        .withCompensation(this::rejectAuthorComment)
        .onReply(AuthorComment::class.java, this::rejectedAuthorComment)
        .step()
        .invokeParticipant(this::recordMessageComment)
        .onReply(MessageComment::class.java, this::savedMessageComment)
        .withCompensation(this::rejectMessageComment)
        .onReply(MessageComment::class.java, this::rejectedMessageComment)
        .step()
        .invokeLocal(this::done)
        .build()
    private fun startSaga(newsCastComments: NewsCastComments) = logger.info("Saga has started: $newsCastComments")
    private fun recordPageComment(newsCastComments: NewsCastComments): CommandWithDestination =
        send(NewsCastPageCommand(
            idPage = newsCastComments.idPage,
            requestId = newsCastComments.pageRequestId,
            comment = newsCastComments.pageComment
        )).to("pageChannel").build()
...
}

5. Running the example

In order to run this example and test how everything works please run:

make docker-clean-build-start

This command will take a while, it will build the whole project, prepare the binaries for the docker image and start all the necessary containers. For quick reference this is the docker-compose file used:

networks:
  newscast:
services:
  news_cast_postgres:
    hostname: news_cast_postgres
    container_name: news_cast_postgres
    command: -c 'max_connections=400' -c 'shared_buffers=100MB'
    build:
      context: ./docker-files/docker-psql/.
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=admin
      - POSTGRES_MULTIPLE_DATABASES=ncexplorer,eventuate
    networks:
      - newscast
    deploy:
      resources:
        limits:
          memory: 200M
        reservations:
          memory: 200M
    healthcheck:
      test: [ "CMD", "pg_isready", "-U", "postgres" ]
      interval: 30s
      timeout: 30s
      retries: 10
      start_period: 0s
  news_cast_kafka:
    hostname: news_cast_kafka
    container_name: news_cast_kafka
    build:
      context: ./docker-files/kafka/.
    deploy:
      resources:
        limits:
          memory: 1000M
        reservations:
          memory: 1000M
    networks:
      - newscast
    depends_on:
      news_cast_postgres:
        condition: service_healthy
  news_cast_mock:
    hostname: news_cast_mock
    container_name: news_cast_mock
    build:
      context: news-cast-mock/.
    restart: on-failure
    networks:
      - newscast
    deploy:
      resources:
        limits:
          memory: 400M
        reservations:
          memory: 400M
    depends_on:
      news_cast_postgres:
        condition: service_healthy
  news_cast_cdc:
    hostname: news_cast_cdc
    container_name: news_cast_cdc
    build:
      context: news-cast-explorer-cdc/.
    restart: on-failure
    deploy:
      resources:
        limits:
          memory: 300M
        reservations:
          memory: 300M
    networks:
      - newscast
    depends_on:
      news_cast_postgres:
        condition: service_healthy
  news_cast_fetcher:
    hostname: news_cast_fetcher
    container_name: news_cast_fetcher
    build:
      context: news-cast-explorer-fetcher/.
    deploy:
      resources:
        limits:
          memory: 200M
        reservations:
          memory: 200M
    networks:
      - newscast
    depends_on:
      news_cast_postgres:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "--silent", "http:/127.0.0.1:8080/api/newscast/fetcher/actuator"]
      interval: 5s
      timeout: 240s
      retries: 60
  news_cast_choreography:
    hostname: news_cast_choreography
    container_name: news_cast_choreography
    build:
      context: news-cast-explorer-saga-choreography/.
    restart: on-failure
    deploy:
      resources:
        limits:
          memory: 300M
        reservations:
          memory: 300M
    networks:
      - newscast
    depends_on:
      news_cast_postgres:
        condition: service_healthy
  news_cast_orchestration:
    hostname: news_cast_orchestration
    container_name: news_cast_orchestration
    build:
      context: news-cast-explorer-saga-orchestration/.
    restart: on-failure
    deploy:
      resources:
        limits:
          memory: 300M
        reservations:
          memory: 300M
    networks:
      - newscast
    depends_on:
      news_cast_postgres:
        condition: service_healthy
  news_cast_fe:
    hostname: news_cast_fe
    container_name: news_cast_fe
    build:
      context: docker-files/nginx/.
    restart: on-failure
    deploy:
      resources:
        limits:
          memory: 300M
        reservations:
          memory: 300M
    networks:
      - newscast
    depends_on:
      news_cast_fetcher:
        condition: service_healthy

Once everything has started, please go to http://localhost:9000. You will find a page like this:

Main Page

Once you’ve done this, you can test the different choreography types:

  • Choreography — Port 8082:
curl -X POST http://localhost:8082/api/saga/orchestration -H 'Content-Type: application/json' --data '{ "idPage": 1, "pageComment": "I love this", "idAuthor": 2, "authorComment": "This is my favourite author", "idMessage": 3, "messageComment": "I agree", "authorRequestId":123,"pageRequestId":456,"messageRequestId":789 }'
  • Orchestration — Port 8083:
curl -X POST http://localhost:8083/api/saga/choreography -H 'Content-Type: application/json' --data '{ "idPage": 1, "pageComment": "I love this", "idAuthor": 2, "authorComment": "This is my favourite author", "idMessage": 3, "messageComment": "I agree",
"authorRequestId":123,"pageRequestId":456,"messageRequestId":789 }'

Check your PostgresSQL database on port 5432 and database eventuate and check the eventuate schema and the public schemas for changes in the tables. Namely, we want to look at tables message, saga_instance and received_messages in the eventuate schema and all the comment tables in the public schema. Let’s try different Id combinations and see what happens. Just to provide an example I will now send a request, which I know it will make the whole Saga fail:

curl -X POST http://localhost:8082/api/saga/orchestration -H 'Content-Type: application/json' --data '{ "idPage": 1, "pageComment": "I love this", "idAuthor": 2, "authorComment": "This is my favourite author", "idMessage": 999999, "messageComment": "I agree", "authorRequestId":200,"pageRequestId":500,"messageRequestId":800 }'

The reason for this is that I’m sending a message with id 999999. With the current id generation system, it would take an extended amount of time to get to this id, and so I’m very sure that we have no message with this Id. For Orchestration we’ll get:

Orchestration Result

And finally, we can see what happens to the comments tables when the flow is correct and when it’s not:

Comments Table Result

Received Messages Table Result

Page comments Table

Author comments Table

Message comments Table

We can also do the same for the choreography tables. The results are very similar, so I leave that for you to try.

8. Conclusion

As we have seen in this example, Sagas are a great way to manage transactions. They work in a decoupled fashion, and they follow A.C.I.D. principles in their sub-processes. They provide a solution for LLT latency and degrading performance. We have seen advantages in both situations: For Choreography, we see that there is no workflow, this, in turn, may result in a reduced overhead in performance. There is also no need for an extra framework to support it. It is an event-driven form of implementation which allows for a known way to implement loose coupling between the different system elements. For Orchestration, we see that we can prevent process complexity given that it is command-driven and not event-driven. This means in practical terms, that orchestration will follow a built-in error-handling workflow. It gives better visibility of what is being done. Because it forces us to follow a particular standard, which is already tested and proven, it also prevents too many custom variations in the code which are usually error-prone.

References

Thank you!

I hope you enjoyed this article as much as I did making it!
Please leave a review, comments or any feedback you want to give on any of the socials in the links bellow.
I’m very grateful if you want to help me make this article better.
I have placed all the source code of this application on GitHub.
Thank you for reading!