Skip to content

A testing tool, written in Haskell, to perform rolling-restarts on Elasticsearch clusters

License

Notifications You must be signed in to change notification settings

dblia/elastic-rolling-restart

Repository files navigation

Elasticsearch Rolling Restart

NOTE: You should read the Caveats section first, before proceeding with the rest document

This is a simple tool to perform a rolling restart on the given Elasticsearch cluster. A rolling restart operation allows a whole Elasticsearch cluster to be restarted keeping it online and operational with no downtime for end users, by taking nodes offline one a time.

A full Elasticsearch cluster restart may be required in various cases. Some of the most common cases are upgrading the Elasticsearch version of the cluster, or performing maintenance tasks to the Elasticsearch servers itself (such as hardware, or OS related tasks). Also, for Elasticsearch setups that are based on custom plugins to enhance their core functionality, such as analyzers, custom scripts, etc, a full cluster restart may be required even for a single plugin update.

Performing such tasks often, and for many different different clusters can become quite a pain in the administrator's day-to-day operations. The current tool aims to automate the task of performing a rolling cluster restart, by implementing the common elasticsearch Rolling Restart algorithm.

As described in the Caveats section, this is a testing tool, that should be used with care. It will perform some actions on the given Elasticsearch nodes that includes stopping the elasticsearch process on all nodes, once at a time, using the Elasticsearch Shutdown API. Then the process will be re-started using the systemctl start <service> command, executed remotely on the host via SSH.

To perform the above actions, some sort of requirements must be met; otherwise the tool will fail to run and the cluster may be left in inconsistent state. All those requirements are listed in the Requirements section, below.

Looking at the Getting Started section, we see that the master node is given separately from the rest cluster nodes. This is done in order to trigger a single master re-elect operation per ElasticSearch cluster restart.

The following list of requirements must apply in order to use the current tool:

  • SSH access to all nodes of the Elasticsearch cluster to be restarted
  • sudo access on all nodes too, for running systemd related commands

To see details about the tool usage, run rolling-restart -h/--help

$ rolling-restart --help

Usage: rolling-restart (-H|--host HOSTNAME) [-p|--port PORT] [-s|--service SVC]

  ElasticSearch cluster rolling restart tool.

Available options:
  -h,--help             Show this help text
  -H,--host HOSTNAME    The hostname or IP of an Elasticsearch node
  -p,--port PORT        The port of the Elasticsearch host (default: 9200)
  -s,--service SVC      The ES service name (default: elasticsearch.service)
  • In order to perform a rolling-restart operation you should run:
$ rolling-restart --host es_node1.example.com:9200 --port 9200 \
      --service elasticsearch.service

or simply,

$ rolling-restart -H es_node1.example.com -p 9200 -s elasticsearch.service

or even simpler,

$ rolling-restart -H es_node1.example.com
  • This tool is more a reason to write some Haskell code, rather than a tool to actually depend-on for restarting you cluster; use it with extra care.
  • In case the tool fails or is interrupted while it is running, shard allocation may be remain disabled. In that case you should re-enable it manually.
  • The Elasticsearch service name must be identical to all given nodes.

About

A testing tool, written in Haskell, to perform rolling-restarts on Elasticsearch clusters

Resources

License

Stars

Watchers

Forks

Packages

No packages published