kagkarlsson

Open Source

db-scheduler

# db-scheduler [![build status](https://github.com/kagkarlsson/db-scheduler/actions/workflows/ci.yml/badge.svg)](https://github.com/kagkarlsson/db-scheduler/actions/workflows/ci.yml) [![Maven Central](https://img.shields.io/maven-central/v/com.github.kagkarlsson/db-scheduler)](https://central.sonatype.com/artifact/com.github.kagkarlsson/db-scheduler) [![codecov](https://codecov.io/gh/kagkarlsson/db-scheduler/graph/badge.svg)](https://codecov.io/gh/kagkarlsson/db-scheduler) [![License](http://img.shields.io/:license-apache-brightgreen.svg)](http://www.apache.org/licenses/LICENSE-2.0.html) Task-scheduler for Java that was inspired by the need for a clustered `java.util.concurrent.ScheduledExecutorService` simpler than Quartz. As such, also appreciated by users ([cbarbosa2](https://github.com/kagkarlsson/db-scheduler/issues/115#issuecomment-649601944), [rafaelhofmann](https://github.com/kagkarlsson/db-scheduler/issues/140#issuecomment-704955500), [BukhariH](https://github.com/kagkarlsson/db-scheduler/pull/268#issue-1147378003)): > Your lib rocks! I'm so glad I got rid of Quartz and replaced it by yours which is way easier to handle! > > [cbarbosa2](https://github.com/cbarbosa2) Used in production by Digipost, Wise, TOMRA and [others](#who-uses-db-scheduler). ## Features * **Cluster-friendly**. Guarantees execution by single scheduler instance. * **Persistent** tasks. Requires a _single_ database-table for persistence. * **Embeddable**. Built to be embedded in existing applications. * **High throughput**. Tested to handle 2k - 10k executions / second. [Link](docs/performance.md). * **Simple**. * **Minimal dependencies**. (slf4j) ## Table of contents [Requirements](#requirements) · [Getting started](#getting-started) · [Examples](#examples) · [Databases](#database-compatibility) · [Configuration](#configuration) · [Extensions](#third-party-extensions) · [Spring Boot](#spring-boot-usage) · [SchedulerClient](#interacting-with-scheduled-executions-using-the-schedulerclient) · [How it works](#how-it-works) · [Performance](docs/performance.md) · [Things to note](#things-to-note--gotchas) · [Upgrading](#versions--upgrading) · [Building](#building-the-source) · [Who uses it](#who-uses-db-scheduler) · [FAQ](#faq) ## Requirements * **Java 17+** (since v16.x). * A relational database and a single `scheduled_tasks` table. See [Database compatibility](#database-compatibility) for supported database engines and per-engine feature support. ## Getting started 1. Add maven dependency ```xml <dependency> <groupId>com.github.kagkarlsson</groupId> <artifactId>db-scheduler</artifactId> <version>16.11.0</version> </dependency> ``` _Replace the version with the latest, shown in the Maven Central badge above._ 2. Create the `scheduled_tasks` table in your database-schema. See [Database compatibility](#database-compatibility) for a list of supported databases and the table definitions (DDL). 3. Instantiate and start the scheduler, which then will start any defined recurring tasks. ```java RecurringTask<Void> hourlyTask = Tasks.recurring("my-hourly-task", FixedDelay.ofHours(1)) .execute((inst, ctx) -> { System.out.println("Executed!"); }); final Scheduler scheduler = Scheduler .create(dataSource) .startTasks(hourlyTask) .build(); // hourlyTask is automatically scheduled on startup if not already started (i.e. exists in the db) scheduler.start(); ``` For more examples, continue reading. For details on the inner workings, see [How it works](#how-it-works). If you have a Spring Boot application, have a look at [Spring Boot Usage](#spring-boot-usage). ## Examples See also [runnable examples](https://github.com/kagkarlsson/db-scheduler/tree/master/examples/features/src/main/java/com/github/kagkarlsson/examples). ### Recurring task (_static_) Define a _recurring_ task and schedule the task's first execution on start-up using the `startTasks` builder-method. Upon completion, the task will be re-scheduled according to the defined schedule (see [pre-defined schedule-types](#schedules)). ```java RecurringTask<Void> hourlyTask = Tasks.recurring("my-hourly-task", FixedDelay.ofHours(1)) .execute((inst, ctx) -> { System.out.println("Executed!"); }); final Scheduler scheduler = Scheduler .create(dataSource) .startTasks(hourlyTask) .threads(5) .registerShutdownHook() .build(); // hourlyTask is automatically scheduled on startup if not already started (i.e. exists in the db) scheduler.start(); ``` For recurring tasks with multiple instances and schedules, see example [RecurringTaskWithPersistentScheduleMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/RecurringTaskWithPersistentScheduleMain.java). ### One-time task An instance of a _one-time_ task has a single execution-time some time in the future (i.e. non-recurring). The instance-id must be unique within this task, and may be used to encode some metadata (e.g. an id). For more complex state, custom serializable java objects are supported (as used in the example). Define a _one-time_ task and start the scheduler: ```java TaskDescriptor<MyTaskData> MY_TASK = TaskDescriptor.of("my-onetime-task", MyTaskData.class); OneTimeTask<MyTaskData> myTaskImplementation = Tasks.oneTime(MY_TASK) .execute((inst, ctx) -> { System.out.println("Executed! Custom data, Id: " + inst.getData().id); }); final Scheduler scheduler = Scheduler .create(dataSource, myTaskImplementation) .registerShutdownHook() .build(); scheduler.start(); ``` ... and then at some point (at runtime), an execution is scheduled using the `SchedulerClient`: ```java // Schedule the task for execution a certain time in the future and optionally provide custom data for the execution scheduler.schedule( MY_TASK .instanceWithId("1045") .data(new MyTaskData(1001L)) .scheduledTo(Instant.now().plusSeconds(5))); ``` ... or schedule in batches using: ```java Stream<TaskInstance<?>> taskInstances = Stream.of( MY_TASK.instance("my-task-1", 1), MY_TASK.instance("my-task-2", 2), MY_TASK.instance("my-task-3", 3)); scheduler.scheduleBatch(taskInstances, Instant.now()); ``` ### More examples #### Plain Java | Example | Description | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [EnableImmediateExecutionMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/EnableImmediateExecutionMain.java) | When scheduling executions to run `now()` or earlier, the local `Scheduler` will be hinted about this, and "wake up" to go check for new executions earlier than it normally would (as configured by `pollingInterval`. | | [MaxRetriesMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/MaxRetriesMain.java) | How to set a limit on the number of retries an execution can have. | | [ExponentialBackoffMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/ExponentialBackoffMain.java) | How to use exponential backoff as retry strategy instead of fixed delay as is default. | | [ExponentialBackoffWithMaxRetriesMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/ExponentialBackoffWithMaxRetriesMain.java) | How to use exponential backoff as retry strategy **and** a hard limit on the maximum number of retries. | | [TrackingProgressRecurringTaskMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/TrackingProgressRecurringTaskMain.java) | Recurring jobs may store `task_data` as a way of persisting state across executions. This example shows how. | | [SpawningOtherTasksMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/SpawningOtherTasksMain.java) | Demonstrates on task scheduling instances of another by using the `executionContext.getSchedulerClient()`. | | [SchedulerClientMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/SchedulerClientMain.java) | Demonstrates some of the `SchedulerClient`'s capabilities. Scheduling, fetching scheduled executions etc. | | [RecurringTaskWithPersistentScheduleMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/RecurringTaskWithPersistentScheduleMain.java) | Multi-instance recurring jobs where the `Schedule` is stored as part of the `task_data`. For example suitable for multi-tenant applications where each tenent should have a recurring task. | | [StatefulRecurringTaskWithPersistentScheduleMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/StatefulRecurringTaskWithPersistentScheduleMain.java) | Combines a dynamic recurring task (schedule persisted in `task_data`) with **stateful** execution — the returned data is updated and persisted after each run. | | [JsonSerializerMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/JsonSerializerMain.java) | Overrides serialization of `task_data` from Java-serialization (default) to JSON. | | [JobChainingUsingTaskDataMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/JobChainingUsingTaskDataMain.java) | Job chaining, i.e. "when this instance is done executing, schedule another task. | | [JobChainingUsingSeparateTasksMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/JobChainingUsingSeparateTasksMain.java) | Job chaining, as above. | | [InterceptorMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/InterceptorMain.java) | Using `ExecutionInterceptor` to inject logic before and after execution for all `ExecutionHandler`. | #### Spring Boot | Example | Description | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [BasicExamples](./examples/spring-boot-example/src/main/java/com/github/kagkarlsson/examples/boot/config/BasicExamplesConfiguration.java) | A basic one-time task and recurring task | | [TransactionallyStagedJob](./examples/spring-boot-example/src/main/java/com/github/kagkarlsson/examples/boot/config/TransactionallyStagedJobConfiguration.java) | Example of [transactionally staging a job](https://brandur.org/job-drain), i.e. making sure the background job runs **iff** the transaction commits (along with other db-modifications). | | [LongRunningJob](./examples/spring-boot-example/src/main/java/com/github/kagkarlsson/examples/boot/config/LongRunningJobConfiguration.java) | Long-running jobs need to **survive application restarts** and avoid restarting from the beginning. This example demonstrates how to **persisting progress** on shutdown and additionally a technique for limiting the job to run nightly. | | [RecurringStateTracking](./examples/spring-boot-example/src/main/java/com/github/kagkarlsson/examples/boot/config/RecurringStateTrackingConfiguration.java) | A recurring task with state that can be modified after each run. | | [ParallelJobSpawner](./examples/spring-boot-example/src/main/java/com/github/kagkarlsson/examples/boot/config/ParallellJobConfiguration.java) | Demonstrates how to use a recurring job to spawn one-time jobs, e.g. for parallelization. | | [JobChaining](./examples/spring-boot-example/src/main/java/com/github/kagkarlsson/examples/boot/config/JobChainingConfiguration.java) | A one-time job with **multiple steps**. The next step is scheduled after the previous one completes. | | [MultiInstanceRecurring](./examples/spring-boot-example/src/main/java/com/github/kagkarlsson/examples/boot/config/MultiInstanceRecurringConfiguration.java) | Demonstrates how to achieve **multiple recurring jobs** of the same type, but potentially differing schedules and data. | ## Database compatibility | Database (see link for DDL) | `fetch` | `lock-and-fetch` | Notes | |---------------------------------------------------------------------|:-------:|:--------------------:|-------------------------------------------------| | [PostgreSQL](db-scheduler/src/test/resources/postgresql_tables.sql) | ✅ | ✅ (single-statement) | | | [SQL Server](db-scheduler/src/test/resources/mssql_tables.sql) | ✅ | ✅ (generic) | Always transfers timestamps in UTC. | | [MySQL 8+](db-scheduler/src/test/resources/mysql_tables.sql) | ✅ | ✅ (generic) | Requires `.alwaysPersistTimestampInUTC()`. | | [MariaDB](db-scheduler/src/test/resources/mariadb_tables.sql) | ✅ | — | Requires `.alwaysPersistTimestampInUTC()`. | | [Oracle](db-scheduler/src/test/resources/oracle_tables.sql) | ✅ | — | Prefer default schema which uses `TIMESTAMPTZ`. | | [MySQL 5.x](db-scheduler/src/test/resources/mysql_tables.sql) | ✅ | — | | | [SQLite](db-scheduler/src/test/resources/sqlite_tables.sql) | ✅ | — | | | [HSQLDB](db-scheduler/src/test/resources/hsql_tables.sql) | ✅ | — | Typically used for testing / in-memory. | See [Polling strategy](#polling-strategy) for `fetch` vs `lock-and-fetch`, and [`.alwaysPersistTimestampInUTC()`](#consider-tuning) for timestamp-handling details. **Other databases:** db-scheduler may still work on engines not listed here. You need to (1) create the `scheduled_tasks` table using DDL compatible with your engine, and (2) if any jdbc-customization is required, supply a custom [`JdbcCustomization`](#less-commonly-tuned) via `.jdbcCustomization(...)`. The default fallback assumes timestamps can be get/set via `get/setObject(OffsetDateTime)`; if not, also enable `.alwaysPersistTimestampInUTC()`. ## Configuration ### Scheduler configuration The scheduler is created using the `Scheduler.create(...)` builder. The builder has sensible defaults, but the following options are configurable. #### Consider tuning :gear: `.threads(int)` Number of threads. Default `10`. :gear: `.pollingInterval(Duration)` How often the scheduler checks the database for due executions. Default `10s`. :gear: `.alwaysPersistTimestampInUTC()` Always transfer timestamps using UTC zone. By default the Scheduler assumes that the underlying database-schema stores instants, i.e. somehow ties timestamps to zones. However, some databases have limited support for this or other quirks, requiring overriding how timestamps are transferred and stored. For such cases, use this setting to always transfer, store and retrieve Instants in UTC. **SQL Server** always persists in UTC regardless of this setting. **MySQL** and **MariaDB** use a zone-less `TIMESTAMP` type and must enable this setting. Upgrading an existing installation requires a controlled migration: stop all scheduler instances, migrate existing timestamps to UTC, then restart with `.alwaysPersistTimestampInUTC()` set. **Oracle** default schema uses a `TIMESTAMPTZ` type which preserves timezone — only use this override if for some reason using plain `TIMESTAMP` types. **NB:** The default behavior for "unknown" databases is to assume that timestamps can be get/set reliably using `get/setObject(OffsetDateTime)`. For "known" databases, see the class `AutodetectJdbcCustomization`. :gear: `.enableImmediateExecution()` If this is enabled, the scheduler will attempt to hint to the local `Scheduler` that there are executions to be executed after they are scheduled to run `now()`, or a time in the past. **NB:** If the call to `schedule(..)`/`reschedule(..)` occur from within a transaction, the scheduler might attempt to run it before the update is visible (transaction has not committed). It is still persisted though, so even if it is a miss, it will run before the next `polling-interval`. You may also programmatically trigger an early check for due executions using the Scheduler-method `scheduler.triggerCheckForDueExecutions()`). Default `false`. :gear: `.registerShutdownHook()` Registers a shutdown-hook that will call `Scheduler.stop()` on shutdown. Stop should always be called for a graceful shutdown and to avoid dead executions. :gear: `.shutdownMaxWait(Duration)` How long the scheduler will wait before interrupting executor-service threads. If you find yourself using this, consider if it is possible to instead regularly check `executionContext.getSchedulerState().isShuttingDown()` in the ExecutionHandler and abort long-running task. Default `30min`. :gear: `.enablePriority()` It is possible to define a priority for executions which determines the order in which due executions are fetched from the database. An execution with a higher value for priority will run before an execution with a lower value (technically, the ordering will be `order by priority desc, execution_time asc`). Consider using priorities in the range 0-32000 as the field is defined as a `SMALLINT`. If you need a larger value, modify the schema. For now, this feature is **opt-in**, and column `priority` is only needed by users who choose to enable priority via this config setting. Set the priority per instance using the `TaskInstance.Builder`: ```java scheduler.schedule( MY_TASK .instance("1") .priority(100) .scheduledTo(Instant.now())); ``` You can also set the default priority for all tasks of a given type: ```java Tasks.recurring("my-task", FixedDelay.ofSeconds(5)) .defaultPriority(Priority.LOW) .execute(...); ``` **Note:** * When enabling this feature, make sure you have the new necessary indexes defined. If you regularly have a state with large amounts of executions both due and future, it might be beneficial to add an index on `(execution_time asc, priority desc)` (replacing the old `execution_time asc`). * This feature is not recommended for users of **MySQL** and **MariaDB** below version 8.x, as they do not support descending indexes. * Value `null` for priority may be interpreted differently depending on database (low or high). #### Polling strategy If you are running >1000 executions/s you might want to use the `lock-and-fetch` polling-strategy for lower overhead and higher throughput ([read more](docs/performance.md#polling-strategy-lock-and-fetch)). If not, the default `fetch` will be fine. :gear: `.pollUsingFetch(double, double)` Use default polling strategy `fetch`. If the last fetch from the database was a full batch (`executionsPerBatchFractionOfThreads`), a new fetch will be triggered when the number of executions left are less than or equal to `lowerLimitFractionOfThreads * nr-of-threads`. Fetched executions are not locked/picked, so the scheduler will compete with other instances for the lock when it is executed. Supported by all databases. Defaults: `0,5, 3.0` :gear: `.pollUsingLockAndFetch(double, double)` Use polling strategy `lock-and-fetch` which uses `select for update .. skip locked` for less overhead. If the last fetch from the database was a full batch, a new fetch will be triggered when the number of executions left are less than or equal to `lowerLimitFractionOfThreads * nr-of-threads`. The number of executions fetched each time is equal to `(upperLimitFractionOfThreads * nr-of-threads) - nr-executions-left`. Fetched executions are already locked/picked for this scheduler-instance thus saving one `UPDATE` statement. For normal usage, set to for example `0.5, 1.0`. For high throughput (i.e. keep threads busy), set to for example `1.0, 4.0`. Currently heartbeats are not updated for picked executions in queue (applicable if `upperLimitFractionOfThreads > 1.0`). If they stay there for more than `4 * heartbeat-interval` (default `20m`), not starting execution, they will be detected as _dead_ and likely be unlocked again (determined by `DeadExecutionHandler`). Currently supported by PostgreSQL, SQL Server, MySQL v8+. #### Less commonly tuned :gear: `.heartbeatInterval(Duration)` How often to update the heartbeat timestamp for running executions. Default `5m`. :gear: `.missedHeartbeatsLimit(int)` How many heartbeats may be missed before the execution is considered dead. Default `6`. :gear: `.addExecutionInterceptor(ExecutionInterceptor)` Adds an `ExecutionInterceptor` which may inject logic around executions. For Spring Boot, simply register a Bean of type `ExecutionInterceptor`. :gear: `.addSchedulerListener(SchedulerListener)` Adds an `SchedulerListener` which will receive Scheduler- and Execution-related events. For Spring Boot, simply register a Bean of type `SchedulerListener`. :gear: `.schedulerName(SchedulerName)` Name of this scheduler-instance. The name is stored in the database when an execution is picked by a scheduler. Default `<hostname>`. :gear: `.tableName(String)` Name of the table used to track task-executions. Change name in the table definitions accordingly when creating the table. Default `scheduled_tasks`. :gear: `.serializer(Serializer)` Serializer implementation to use when serializing task data. Default to using standard Java serialization, but db-scheduler also bundles a `GsonSerializer` and `JacksonSerializer`. See examples for a [KotlinSerializer](https://github.com/kagkarlsson/db-scheduler/blob/master/examples/features/src/main/java/com/github/kagkarlsson/examples/kotlin/KotlinSerializer.kt). See also additional documentation under [Serializers](#serializers). :gear: `.executorService(ExecutorService)` If specified, use this externally managed executor service to run executions. Ideally, the number of threads it will use should still be supplied (for scheduler polling optimizations). Default `null`. :gear: `.deleteUnresolvedAfter(Duration)` The time after which executions with unresolved tasks are automatically deleted. These can typically be old recurring tasks that are not in use anymore. This is non-zero to prevent accidental removal of tasks through a configuration error (missing known-tasks) and problems during rolling upgrades. Default `14d`. :gear: `.jdbcCustomization(JdbcCustomization)` db-scheduler tries to auto-detect the database used to see if any jdbc-interactions need to be customized. This method is an escape-hatch to allow for setting `JdbcCustomizations` explicitly. Default auto-detect. :gear: `.commitWhenAutocommitDisabled(boolean)` By default no commit is issued on DataSource Connections. If auto-commit is disabled, it is assumed that transactions are handled by an external transaction-manager. Set this property to `true` to override this behavior and have the Scheduler always issue commits. Default `false`. :gear: `.failureLogging(Level, boolean)` Configures how to log task failures, i.e. `Throwable`s thrown from a task execution handler. Use log level `OFF` to disable this kind of logging completely. Default `WARN, true`. ### Task configuration Tasks are created using one of the builder-classes in `Tasks`. The builders have sensible defaults, but the following options can be overridden. | Option | Default | Description | |------------------------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `.onFailure(FailureHandler)` | see desc. | What to do when a `ExecutionHandler` throws an exception. By default, _Recurring tasks_ are rescheduled according to their `Schedule`. _One-time tasks_ are retried again in 5m. | | `.onDeadExecution(DeadExecutionHandler)` | `ReviveDeadExecution` | What to do when a _dead executions_ is detected, i.e. an execution with a stale heartbeat timestamp. By default dead executions are rescheduled to `now()`. | | `.initialData(T initialData)` | `null` | The data to use the first time a _recurring task_ is scheduled. | ### Schedules The library contains a number of Schedule-implementations for recurring tasks. See class `Schedules`. | Schedule | Description | |-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `.daily(LocalTime ...)` | Runs every day at specified times. Optionally a time zone can be specified. | | `.fixedDelay(Duration)` | Next execution-time is `Duration` after last completed execution. **Note:** This `Schedule` schedules the initial execution to `Instant.now()` when used in `startTasks(...)` | | `.cron(String)` | Spring-style cron-expression (v5.3+). The pattern `-` is interpreted as a [disabled schedule](#disabled-schedules). | Another option to configure schedules is reading string patterns with `Schedules.parse(String)`. The currently available patterns are: | Pattern | Description | |--------------------------------------|-----------------------------------------------------------------------------| | `FIXED_DELAY\|Ns` | Same as `.fixedDelay(Duration)` with duration set to N seconds. | | `DAILY\|12:30,15:30...(\|time_zone)` | Same as `.daily(LocalTime)` with optional time zone (e.g. Europe/Rome, UTC) | | `-` | [Disabled schedule](#disabled-schedules) | More details on the time zone formats can be found [here](https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#of-java.lang.String-). ### Disabled schedules A `Schedule` can be marked as disabled. The scheduler will not schedule the initial executions for tasks with a disabled schedule, and it will remove any existing executions for that task. ### Serializers A task-instance may have some associated data in the field `task_data`. The scheduler uses a `Serializer` to read and write this data to the database. By default, standard Java serialization is used, but a number of options are provided: * `GsonSerializer` * `JacksonSerializer` * [KotlinSerializer](https://github.com/kagkarlsson/db-scheduler/blob/master/examples/features/src/main/java/com/github/kagkarlsson/examples/kotlin/KotlinSerializer.kt) For Java serialization it is recommended to specify a `serialVersionUID` to be able to evolve the class representing the data. If not specified, and the class changes, deserialization will likely fail with a `InvalidClassException`. Should this happen, find and set the current auto-generated `serialVersionUID` explicitly. It will then be possible to do non-breaking changes to the class. If you need to migrate from Java serialization to a `GsonSerializer`, configure the scheduler to use a `SerializerWithFallbackDeserializers`: ```java .serializer(new SerializerWithFallbackDeserializers(new GsonSerializer(), new JavaSerializer())) ``` ## Third-party extensions * [bekk/db-scheduler-ui](https://github.com/bekk/db-scheduler-ui) is admin-ui for the scheduler. It shows scheduled executions and supplies simple admin-operations such as "rerun failed execution now" and "delete execution". * [rocketbase-io/db-scheduler-log](https://github.com/rocketbase-io/db-scheduler-log) is an extension providing a history of executions, including failures and exceptions. * [piemjean/db-scheduler-mongo](https://github.com/piemjean/db-scheduler-mongo) is an extension for running db-scheduler with a MongoDB database. * [osoykan/db-scheduler-additions](https://github.com/osoykan/db-scheduler-additions) adds MongoDB & Couchbase support on top of Kotlin and Coroutines. It also provides a Ktor plugin for db-scheduler-ui. ## Spring Boot usage For Spring Boot applications, there is a starter `db-scheduler-spring-boot-starter` making the scheduler-wiring very simple. (See [full example project](https://github.com/kagkarlsson/db-scheduler/tree/master/examples/spring-boot-example)). ### Prerequisites - An existing Spring Boot application - A working `DataSource` with schema initialized. (In the example HSQLDB is used and schema is automatically applied.) ### Getting started 1. Add the following Maven dependency ```xml <dependency> <groupId>com.github.kagkarlsson</groupId> <artifactId>db-scheduler-spring-boot-4-starter</artifactId> <version>16.11.0</version> </dependency> ``` **NB:** For Spring Boot 3.x, use `db-scheduler-spring-boot-starter` 2. In your configuration, expose your `Task`'s as Spring beans. If they are recurring, they will automatically be picked up and started. 3. If you want to expose `Scheduler` state into actuator health information you need to enable `db-scheduler` health indicator. [Spring Health Information.](https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#production-ready-health) 4. Run the app. ### Configuration options Configuration is mainly done via `application.properties`. Configuration of scheduler-name, serializer and executor-service is done by adding a bean of type `DbSchedulerCustomizer` to your Spring context. ``` # application.properties example showing default values db-scheduler.enabled=true db-scheduler.heartbeat-interval=5m db-scheduler.missed-heartbeats-limit=6 db-scheduler.polling-interval=10s db-scheduler.table-name=scheduled_tasks db-scheduler.immediate-execution-enabled=false db-scheduler.scheduler-name= db-scheduler.threads=10 db-scheduler.priority-enabled=false # Ignored if a custom DbSchedulerStarter bean is defined db-scheduler.delay-startup-until-context-ready=false db-scheduler.polling-strategy=fetch db-scheduler.polling-strategy-lower-limit-fraction-of-threads=0.5 db-scheduler.polling-strategy-upper-limit-fraction-of-threads=3.0 db-scheduler.shutdown-max-wait=30m ``` ## Interacting with scheduled executions using the SchedulerClient It is possible to use the `Scheduler` to interact with the persisted future executions. For situations where a full `Scheduler`-instance is not needed, a simpler [SchedulerClient](./db-scheduler/src/main/java/com/github/kagkarlsson/scheduler/SchedulerClient.java) can be created using its builder: ```java SchedulerClient.Builder.create(dataSource, taskDefinitions).build() ``` It will allow for operations such as: * List scheduled executions * Reschedule a specific execution * Remove an old executions that have been retrying for too long * ... ## How it works A single database table is used to track future task-executions. When a task-execution is due, db-scheduler picks it and executes it. When the execution is done, the `Task` is consulted to see what should be done. For example, a `RecurringTask` is typically rescheduled in the future based on its `Schedule`. The scheduler uses optimistic locking or select-for-update (depending on polling strategy) to guarantee that one and only one scheduler-instance gets to pick and run a task-execution. ### Recurring tasks The term _recurring task_ is used for tasks that should be run regularly, according to some schedule. When the execution of a recurring task has finished, a `Schedule` is consulted to determine what the next time for execution should be, and a future task-execution is created for that time (i.e. it is _rescheduled_). The time chosen will be the nearest time according to the `Schedule`, but still in the future. There are two types of recurring tasks, the regular _static_ recurring task, where the `Schedule` is defined statically in the code, and the _dynamic_ recurring tasks, where the `Schedule` is defined at runtime and persisted in the database (still requiring only a single table). #### Static recurring task The _static_ recurring task is the most common one and suitable for regular background jobs since the scheduler automatically schedules an instance of the task if it is not present and also updates the next execution-time if the `Schedule` is updated. To create the initial execution for a static recurring task, the scheduler has a method `startTasks(...)` that takes a list of tasks that should be "started" if they do not already have an existing execution. The initial execution-time is determined by the `Schedule`. If the task already has a future execution (i.e. has been started at least once before), but an updated `Schedule` now indicates another execution-time, the existing execution will be rescheduled to the new execution-time (with the exception of _non-deterministic_ schedules such as `FixedDelay` where new execution-time is further into the future). Create using `Tasks.recurring(..)`. #### Dynamic recurring task The _dynamic_ recurring task is a later addition to db-scheduler and was added to support use-cases where there is need for multiple instances of the same type of task (i.e. same implementation) with different schedules. The `Schedule` is persisted in the `task_data` alongside any regular data. Unlike the _static_ recurring task, the dynamic one will not automatically schedule instances of the task. It is up to the user to create instances and update the schedule for existing ones if necessary (using the `SchedulerClient` interface). See the example [RecurringTaskWithPersistentScheduleMain.java](./examples/features/src/main/java/com/github/kagkarlsson/examples/RecurringTaskWithPersistentScheduleMain.java) for more details. Create using `Tasks.recurringWithPersistentSchedule(..)`. ### One-time tasks The term _one-time task_ is used for tasks that have a single execution-time. In addition to encoding data into the `instanceId` of a task-execution, it is possible to store arbitrary binary data in a separate field for use at execution-time. By default, Java serialization is used to marshal/unmarshal the data. Create using `Tasks.oneTime(..)`. ### Custom tasks For tasks not fitting the above categories, it is possible to fully customize the behavior of the tasks using `Tasks.custom(..)`. Use-cases might be: * Tasks that should be either rescheduled or removed based on output from the actual execution * .. ### Dead executions During execution, the scheduler regularly updates a heartbeat-time for the task-execution. If an execution is marked as executing, but is not receiving updates to the heartbeat-time, it will be considered a _dead execution_ after time X. That may, for example, happen if the JVM running the scheduler suddenly exits. When a dead execution is found, the `Task` is consulted to see what should be done. A dead `RecurringTask` is typically rescheduled to `now()`. ### Unresolved tasks If a task instance is found in the database but the corresponding task definition is not registered in the service (e.g., during due execution or dead execution housekeeping), it is treated as an unresolved task (see [`TaskResolver`](https://github.com/kagkarlsson/db-scheduler/blob/master/db-scheduler/src/main/java/com/github/kagkarlsson/scheduler/TaskResolver.java)). Behavior of unresolved tasks: * They are excluded from polling — the current instance will not attempt to pick or execute them. * They remain in the database so other instances (e.g., newer versions in a rolling update or canary deployment) can pick and process them. * They are **automatically removed** after a configured retention period `deleteUnresolvedAfter`, if they remain unresolved. ## Things to note / gotchas * There are no guarantees that all instants in a schedule for a `RecurringTask` will be executed. The `Schedule` is consulted after the previous task-execution finishes, and the closest time in the future will be selected for next execution-time. A new type of task may be added in the future to provide such functionality. * The methods on `SchedulerClient` (`schedule`, `cancel`, `reschedule`) will run using a new `Connection`from the `DataSource` provided. To have the action be a part of a transaction, it must be taken care of by the `DataSource` provided, for example using something like Spring's `TransactionAwareDataSourceProxy`. * Currently, the precision of db-scheduler is depending on the `pollingInterval` (default 10s) which specifies how often to look in the table for due executions. If you know what you are doing, the scheduler may be instructed at runtime to "look early" via `scheduler.triggerCheckForDueExecutions()`. (See also `enableImmediateExecution()` on the `Builder`) ## Versions / upgrading See [UPGRADING.md](UPGRADING.md) for version-specific upgrade notes (schema changes etc.), and [releases](https://github.com/kagkarlsson/db-scheduler/releases) for full release-notes. ## Building the source **Prerequisites** * Java 17+ * Maven Follow these steps: 1. Clone the repository. ``` git clone https://github.com/kagkarlsson/db-scheduler cd db-scheduler ``` 2. Build using Maven (skip tests by adding `-DskipTests=true`) ``` mvn package ``` **Recommended spec** Some users have experienced intermittent test failures when running on a single-core VMs. Therefore, it is recommended to use a minimum of: - 2 cores - 2GB RAM ## Who uses db-scheduler? List of organizations known to be running db-scheduler in production: | Company | Description | |----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------| | [Digipost](https://digipost.no) | Provider of digital mailboxes in Norway | | [Vy Group](https://www.vy.no/en) | One of the largest transport groups in the Nordic countries. | | [Wise](https://wise.com/) | A cheap, fast way to send money abroad. | | Becker Professional Education | | | [Monitoria](https://monitoria.ca) | Website monitoring service. | | [Loadster](https://loadster.app) | Load testing for web applications. | | [Statens vegvesen](https://www.vegvesen.no/) | The Norwegian Public Roads Administration | | [Lightyear](https://lightyear.com/) | A simple and approachable way to invest your money globally. | | [NAV](https://www.nav.no/) | The Norwegian Labour and Welfare Administration | | [ModernLoop](https://modernloop.io/) | Scale with your company’s hiring needs by using ModernLoop to increase efficiency in interview scheduling, communication, and coordination. | | [Diffia](https://www.diffia.com/) | Norwegian eHealth company | | [Swan](https://www.swan.io/) | Swan helps developers to embed banking services easily into their product. | | [TOMRA](https://www.tomra.com/) | TOMRA is a Norwegian multinational company that designs and manufactures reverse vending machines for recycling. | | [Kartverket](https://kartverket.no/) | The Norwegian Mapping Authority. | Feel free to open a PR to add your organization to the list. ## FAQ #### Why `db-scheduler` when there is `Quartz`? The goal of `db-scheduler` is to be non-invasive and simple to use, but still solve the persistence problem, and the cluster-coordination problem. It was originally targeted at applications with modest database schemas, to which adding 11 tables would feel a bit overkill.. #### Why use a RDBMS for persistence and coordination? KISS. It's the most common type of shared state applications have. #### I am missing feature X? Please create an issue with the feature request and we can discuss it there. If you are impatient (or feel like contributing), pull requests are most welcome :) #### Is anybody using it? Yes. It is used in production at a number of companies, and have so far run smoothly.

Cron & Job Scheduling Relational Databases

1.6K Github Stars

Software by kagkarlsson

db-scheduler