Getting Started with Elasticsearch Installation — The Complete Guide

A complete guide for landing a working Elasticsearch 7.9 server from scratch to a secured, multi-node cluster exposed to internet with sample data.

Introduction

Getting started with Elasticsearch is quite easy as there are several brilliant pre-built solutions giving a quick and convenient approach to land a working Elasticsearch instance in minutes. However if we would like to achieve something more, it can get quite quickly really tricky.

On what we will focus now is how to build an Elasticsearch platform with multiple nodes, exposed to internet, with working Snapshots and Backups, secured with SSL and basic authentication.

Before getting started, let’s have a little bit of background.

My journey with Elasticsearch started with Algolia (not really Elastic, its more like a search-as a service solution with a great API), then evolved to Bitnami Stack which provides pre-built, pre-configured, out of the box working Elastic VMs on ASWS, Google Cloud and Azure, later on trialing with Elastic Cloud services then finally ended up with a completely self managed approach — running Linux VMs on Google Cloud and Azure platforms with native Elasticsearch installers.

The reasoning behind ending up with the self managed approach is that despite the pros and great functionalities of all the alternatives — this provides the highest flexibility in terms of configuration, upgrades and overall management:

All these above led to playing around with self-managed, cloud based options. While from zero to an operational “single-node” ElasticSearch installation is quite “simple” following the installation guided provided by ElasticSearch on this link —I found installing and configuring a multi node cluster is more challenging.

The aim of this guide is to provide end to end and step-by-step instructions about how to install and configure an ElasticSearch Server with one master- and one data-node, how to enable shared repository that works with multiple-nodes and how to secure this ElasticSearch instance with SSL and finally how to publish to internet using Let’s Encrypt starting from a fresh, default Azure Linux VM. What we will skip is Kibana for now.

We will cover the following steps:

We will end up with a simple architecture similar to below:

Fig. 1: Example ElasticSearch Architecture

In our configuration, the two VMs will host the ElasticSearch nodes, the Samba server and client will be responsible for file sharing between the ElasticSearch nodes while NGINX will act as a reverse proxy to serve the clients through HTTPS using SSL obtained via Let’s Encrypt.

Important Side-Notes Before Begin

Please note that while I tried to create as detailed and precise guide as possibile — this infrastructure is not a production ready configuration and serves education purposes only.

I came across ElasticSearch while experimenting and leveraging options around real-time data analytics and machine learning over a large dataset and I am not wearing hats like “Search Engineer”, “DevOps Engineer” or “Infrastructure / Linux Engineer”. Any suggestions on the steps below are more than welcome in the comments.

Prerequisites

Before getting started, let’s make sure that all we need is in place:

Getting the environments ready

Let‘s’ open the Azure portal and create two VMs. For the first VM let’s use “es-master” and for the second VM “es-data” names. This is how the rest of the tutorial will reference them (but you can be creative with how you name them). What is however important is that :

For the sake of simplicity, let’s Password as Authentication type over the SSH public key.

Once both VMs have been provisioned — will take a few seconds — let’s make sure that the following ports are opened: 22 (for SSH), 80 (for HTTP requests), 443 (for HTTPS requests), 9200 (required for ElasticSearch and it is used for all API calls over HTTP) and 9300 (required for ElasticSearch and it is used for communications between nodes in a cluster).

The end result in my example case:

In the upcoming sections, where ever you see “10.0.0.6” make sure you replace it with your “es-master” VM internal IP address, and “10.0.0.7” with your “es-data” VM’s internal IP Address.

1. Installing ElasticSearch

The very first step in the installation is to connect to both VMs using SSH using the external IP of the VMs. For e.g. my “es-master” VM has the 20.185.88.1 external IP.

# Run this from your local computerssh 20.185.88.1

The result will be something similar after entering the password:

Fig. 1.1: SSH to the VM

Following the official ElasticSearch installation guide available here, let’s begin our installation process by executing these commands step-by-step on both VMs (and ignoring the lines starting with # —no worries, you can copy paste these lines into your terminal, they will be ignored when you executing them):

# Run these commands on both es-master and es-data# Install necessary components 
sudo apt-get install gnupg2
sudo wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
# Actual ElasticSearch Installation
sudo apt-get update && sudo apt-get install elasticsearch
# Reloading the list of services on the VM
sudo /bin/systemctl daemon-reload
# Configure to run ElasticSearch as a service
sudo /bin/systemctl enable elasticsearch.service
# Launch the ElasticSearch service
sudo systemctl start elasticsearch.service

By completing the last step in the process, we should have an up and running ElasticSearch server on both VMs.

As I mentioned previously — as simple as this is to have an up and running an (in our case two) ElasticSearch cluster ready.

Please note that by starting the ElasticSearch clusters on both VMs, we initialized both instances as standalone ElasticSearch clusters — which will cause us later on some challenges when we try to merge these two clusters into one, and it will require couple of extra steps to overcome this — but for now at this point, there is nothing to worry about.

Let’s see if the ElasticSearch clusters are running:

# Run these commands on both es-master and es-data VMscurl -X GET "localhost:9200/"

The first proof of the successful operation is shown on the Fig. 1.2:

Fig. 1.2: Default ElasticSearch response

The next step in the process to change the default configuration of both ElasticSearch instances and try to merge these two standalone clusters into one (spoiler: it won’t succeed for the first attempt).

es-master VM

Open with a your favorite terminal based text editor to modify the ElasticSearch configuration file (“elasticsearch.yml”) which is located under the “/etc/elasticsearch” folder. If you don’t have a favorite terminal based text editor yet, nano is a great candidate and here is a great tutorial how to use it.

# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml

Delete all existing content and change the file to have the following (replace 10.0.0.6 with your internal IP for es-master VM):

cluster.name: es-cluster
node.name: es-master
network.host: 10.0.0.6
discovery.seed_hosts: ["es-master", "es-data"]
cluster.initial_master_nodes: ["es-master"]
node.master: true
node.data: true

If this is the first time you are using nano, you can save the file by pressing “Ctrl + X” and when prompted for overwrite, selecting “Yes”.

The configuration above essentially will configure this ES instance to act as both master- and a data-node and to look for the “es-data” node to discover it. These changes won’t have effect unless we restart the ElasticSearch node:

# Run these commands on es-master# Restarting the ES Service
sudo systemctl restart elasticsearch.service
# Checking if the restart was successful
curl -X GET “10.0.0.6:9200/"

The last command is essentially only for verifying if our cluster is up and running — just like we did before.

Now let’s configure the “es-data” VM.

Similar to what we did on the “es-master” VM, let’s open the ElasticSearch configuration file of this node too:

# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml

and change it to have the following content (replace 10.0.0.7 with your internal IP for “es-data” VM):

cluster.name: es-cluster
node.name: es-data
network.host: 10.0.0.7
discovery.seed_hosts: ["es-master", "es-data"]
cluster.initial_master_nodes: ["es-master"]
node.master: false
node.data: true

What the configuration above will tell to this ElasticSearch node that it should not act as a master, but only as a data node and we specify which node should act as a master. To apply the changes, let’s restart the ElasticSearch node:

# Run this command on es-datasudo systemctl restart elasticsearch.service

At this point the service should restart smoothly and should give us the impression that this ElasticSearch node is working fine and it’s connected to the same cluster as the “es-master”. Even if we check the node’s initial response with curl — we get promising results:

# Run this command on es-datacurl -X GET “10.0.0.7:9200/"

The result will be similar to Fig. 1. 3. — reassuring that this node is working properly. However if we do a couple of additional health checks, for e.g. listing all the nodes of the cluster:

# Run this command on es-datacurl -X GET "10.0.0.7:9200/_cat/nodes?pretty"

We will run into HTTP error 503: Master not discovered exception.

There could be a couple of reasons behind this exception, but what it does mean is that this data node wasn’t able to discover the master node.

In such cases the very first step of the troubleshooting would be to check if the es-data VM can access the es-master VM (e.g. ports are properly open, the VMs are in the same virtual network, etc.) for e.g. using “ telnet”.

# Run this command on es-data# Wait couple of seconds to till the connection is closed
telnet 10.0.0.6 9200
Fig. 1.3: Successful telnet

Let’s check the other ports and the es-master VM too:

# Run this on es-data VM, and wait couple of seconds 
telnet 10.0.0.6 9300
# Run this on es-master VM, and wait couple of seconds
telnet 10.0.0.7 9200
telnet 10.0.0.7 9300

If we get in all cases similar results to Fig 1.3 — it means that the VMs can communicate with each other and the ports that are needed for ElasticSearch to operate are open.

But now let’s return to our ElasticSearch exception:

{
"error" : {
"root_cause" : [ {
"type" : "master_not_discovered_exception",
"reason" : null
} ],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}

The root cause behind of this error in our case — as I wrote in the beginning of the tutorial — is that we started right away both ElasticSearch nodes on the two VMs. Even though we change the configuration file of both nodes to use the same “cluster name” (es-cluster) — since the ElasticSearch node has been started initially, it already fired up two individual clusters on the two VMs (with different cluster ID but same name). To avoid possibile data loss, ElasticSearch won’t allow merging the two nodes from two clusters into one. It’s described in more detail here and here.

Since both ElasticSearch instances are empty, addressing the issues is pretty easy for us: we need to delete the data directory of both ElasticSearch nodes and restart both services.

On both es-master VM and es-data VM, run the following commands:

# Run these commands on both es-master and es-data

sudo systemctl stop elasticsearch.service
sudo rm -rf /var/lib/elasticsearch
sudo mkdir /var/lib/elasticsearch
sudo chown elasticsearch: /var/lib/elasticsearch
sudo systemctl start elasticsearch.service

The commands above will remove the data directory, and will re-create it, and will grant the proper permissions to the built in “elasticsearch” user to this re-created data folder. We can verify if the restart (and bridging the two nodes under one cluster) was successful with curl:

# Run this command on es-datacurl -X GET "10.0.0.6:9200/_cat/nodes?pretty"

The expectation is that we will see two nodes listed similar to Fig. 1.4:

Fig. 1.4: Two nodes connected to the same cluster

Having the master and data nodes properly configured, it will immediately turn the status of our ElasticSearch cluster from yellow to green (more information about cluster health, statuses: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html)

Fig. 1.5: ElasticSearch Cluster Turning Into “green” Status

2. Configure ElasticSearch Snapshots

Running ElasticSearch without snapshots it’s like driving a car without seatbelts. Our setup consists from two nodes which means if Node A is lost, we will still have our data replicated (by default) on the Node B. However if both of the nodes is lost, our data is lost too.

To avoid this, it is recommended to create regular snapshots so that anything happens with the ES, we can restore it.

Setting up snapshots in a single-node cluster is straight forward and it’s very well described on the link below: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

However configuring it for a multi node environment is a little more complicated and require couple of extra steps, for e.g. to install network file sharing using a commonly used file sharing solution: Samba.

We will install Samba server to the “es-master: VM and share a folder over the network so that it will be accessible from both “es-master” and “es-data” VMs and we will also install Samba clients on both VMs so that both VMs can access the shared folder from the same path.

Run the following commands on the es-master VM:

# Run these commands on es-master# Install Samba Server (hit "No" when prompted)
sudo apt-get install samba
# Install Samba Client and CIFS
sudo apt install samba-client cifs-utils
# Create a folder where the snapshots will be stored
sudo mkdir /usr/share/es-backups
sudo mkdir /usr/share/es-backups/es-repo
# Create a dedicated "sambauser" whom will access the file system
# Replace "mysecretpassword" with a strong password, this will be
# the unix password of the "sambauser" unix user
sudo useradd -m sambauser -p mysecretpassword
# Create a samba password for the newly created "sambauser"
# Note that this is different password then then the unix password.
# When accessing the shared folder, we will use this password
sudo smbpasswd -a sambauser
# Grant the necessary folder permissions for "sambauser"
sudo usermod -a -G sambashare sambauser
sudo chgrp -R sambashare /usr/share/es-backups/es-repo
sudo chmod -R g+w /usr/share/es-backups/es-repo

We can test if the newly created “sambaser” who will be impersonated when accessing the shared directory has the necessary permissions using the following commands (creating a temp directory then deleting it):

# Run these commands on es-mastersudo -u sambauser mkdir /usr/share/es-backups/es-repo/todelete
sudo -u sambauser rm -rf /usr/share/es-backups/es-repo/todelete

The next step is to configure the Samba server and share the “/usr/share/es-backups/es-repo” folder over the network. To do this, let’s edit the Samba configuration file:

# Run these commands on es-mastersudo nano /etc/samba/smb.conf

And let’s add these lines to the end of the file:

[es-share]
browseable = yes
read only = no
writable = yes
valid users = sambauser
path = /usr/share/es-backups/es-repo
comment = Elastic Shared Snapshot Repo

After we finished editing the file, we can save the file with “Ctrl + X” and selecting “Yes” when prompted.

Now let’s restart the Samba server and check if the file sharing is working properly:

# Run these commands on es-master# Restart Samba server 
sudo systemctl restart smbd
# Create a folder where we will mount the shared folder
sudo mkdir /mnt/elastic-share
# Mount the shared folder
sudo mount -t cifs -o user=sambauser //10.0.0.6/es-share /mnt/elastic-share
# Unmount the shared folder
sudo umount /mnt/elastic-share

If there were no errors thrown during the execution of the commands above, our file sharing is working. Now let’s create a file to store securely the samba credentials we will use to mount this network share to our file system in a secure way.

# Run this command on es-master# Replace <loggedin-linux-username> 
sudo nano /home/<loggedin-linux-username>/.smbcredentials

Add these two lines to the file (where “mysambapassword” is the password we created previously using the “smbpasswd” command), and then save it.

username=sambauser
password=mysambapassword

Now let’s secure the file, and update the “fstab” configuration:

# Run these commands on es-master# Replace <loggedin-linux-username>
sudo chmod 600 /home/<loggedin-linux-username>/.smbcredentials
# Edit the fstab configuration
sudo nano /etc/fstab

Add this line to the end of the file (make sure you don’t remove anything from the original file and you replace the <loggedin-linux-username>):

//10.0.0.6/es-share /mnt/elastic-share cifs credentials=/home/<loggedin-linux-username>/.smbcredentials,users,rw,iocharset=utf8,file_mode=0777,dir_mode=0777 0 0

Now let’s test if our changes were successful:

# Run this command on es-master# Mount shared file system
sudo mount -a -v

The command above should have a similar outcome to Fig. 2.1:

Fig. 2.1: Mounting the shared network folder

Now let’s check if the shared folder is working as expected and it is accessible for the dedicated “elasticsearch” user too, and we can do this by creating a temp folder into the shared folder and then removing it:

# Run these commands on es-mastersudo -u elasticsearch mkdir /mnt/elastic-share/todelete
sudo -u elasticsearch rm -rf /mnt/elastic-share/todelete

Since our “elasticsearch” user has read/write access to the shared folder, let’s update our ElasticSearch configuration file (on the “es-master” VM) and this path as a “path.repo”.

# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml
Fig. 2.2: Updated ES Configuration for “es-master”
# Run this command on es-master# Restart ElasticSearch 
sudo systemctl restart elasticsearch.service

Now we need to install the Samba client on the “es-data” VM in a similar way how we did for the “es-master”.

# Run these commands on es-data# Installing samba-client to es-data VM
sudo apt install samba-client cifs-utils
# Create a folder where to mount the shared folder
sudo mkdir /mnt/elastic-share
# Test mount the shared folder
sudo mount -t cifs -o user=sambauser //10.0.0.6/es-share /mnt/elastic-share
# Unmount the test run
sudo umount /mnt/elastic-share
# Secure the credentials for "sambauser"
sudo nano /home/<loggedin-linux-username>/.smbcredentials

Update the file with the credentials from the the previous steps (use the same user name / password we used on the “es-master”):

username=sambauser
password=mysambapassword

Now let’s secure the file and update the “fstab” configuration on this VM too:

# Run these commands on es-datasudo chmod 600 /home/<loggedin-linux-username>/.smbcredentialssudo nano /etc/fstab

Add this line to the end of the file:

//10.0.0.6/es-share /mnt/elastic-share cifs credentials=/home/<loggedin-linux-username>/.smbcredentials,users,rw,iocharset=utf8,file_mode=0777,dir_mode=0777 0 0

And test if the changes were successful:

# Run this command on es-datasudo mount -a -v

Similar to the “es-master” VM results, we are also expecting similar results, as shown in Fig. 2.3:

Fig. 2.3: Mounting the Shared Network folder on “es-data”

And lastly testing if the “elasticsearch” user can access properly the shared folder:

# Run these commands on es-datasudo -u elasticsearch mkdir /mnt/elastic-share/todelete
sudo -u elasticsearch rm -rf /mnt/elastic-share/todelete

If creating and removing the temporary folder impersonating the “elasticsearch” user is successful, we are ready to move forward and update the ElasticSearch configuration file of the “es-data” node too in a way to to use this network folder for storing the snapshots.

# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml

The updated configuration file should look like following:

Fig. 2.4: Updated ES Configuration for“es-data”

Please note the extra line at the end of the file with the “path.repo” being exactly identical to the “path.repo” on the “es-master”.

Finally, let’s restart the ElasticSearch on this VM too:

# Run this command on es-datasudo systemctl restart elasticsearch.service

Registering the ElasticSearch Snapshot

Once we have our snapshot repository location specified on both nodes, the final step in order to have working ElasticSearch snapshots is to tell to the ElasticSearch cluster to use the repository created previously to store the snapshots. We can do this easily from either VMs using curl:

# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup?pretty" -H 'Content-Type: application/json' -d' {
"type": "fs",
"settings": {
"location": "/mnt/elastic-share/es-backups"
}
}'

We should have the following result:

{
"acknowledged": true
}

Verifying one more time if the snapshot is registered properly via curl:

# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true&pretty"

The command above should return similar result to Fig. 2.5:

Fig. 2.5: Snapshot registered in ElasticSearch

Creating Our First ElasticSearch Snapshot

As we have the working ElasticSearch repository — let’s use it, and create our very first snapshot (named “snapshot_1") via curl:

# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true&pretty"

And the results should look like something similar to Fig 2.6:

Fig. 2.6: Result of creating our first Snapshot

Bonus: Restoring Snapshot from Another ElasticSearch Cluster

This optional section covers an extra exercise with ElasticSearch snapshots: restoring snapshots stored in GIT, taken from another ES cluster.

While GIT is not the best option to store your ElasticSearch snapshots for the simplicity of this tutorial it comes quite handy.

Completing this part of the tutorial will also helps landing some meaningful test data into our ElasticSearch cluster.

We will use a dataset called SuperStore, from the following GitHub repository: https://github.com/botond-kopacz/es-sample-data

The repository should look like similar to Fig. 2.7:

Fig. 2.7: ES Repository Content Stored in GIT

Let’s install git on es-master VM and clone the repository above.

# Run these commands on es-master# Installing GIT
sudo apt-get install git
mkdir ~/git
cd ~/git
# Clone repository above
sudo git clone https://github.com/botond-kopacz/es-sample-data.git
# Create destination directory on the network share
sudo -u elasticsearch mkdir /mnt/elastic-share/es-sample-data
# Copy the snapshots to shared drive
# Don't forget to replace <logged-in-linux-username>
cp -R /home/<logged-in-linux-username>/git/es-sample-data/* /mnt/elastic-share/es-sample-data# Verify if the content appeared properly
ls /mnt/elastic-share/es-sample-data

We are expecting the following output:

Fig. 2.8: Content of Shared Network Folder

The next step is to register this newly created shared folder as a snapshot repository inside our ElasticSearch cluster (on both es-master and es-data nodes).

On es-master VM let’s run the following command:

# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml

And update the configuration file as per the following example:

Fig. 2.9: Updated ElasticSearch Configuration on es-master VM

Finally restart the ElasticSearch node:

# Run this command on es-mastersudo systemctl restart elasticsearch.service

Please note that how the last line (path.repo) changed from containing a single shared directory to an array of directories. And now let’s repeat the steps on the es-data VM.

# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml
Fig. 2.10: Updated ElasticSearch Configuration on es-data VM

And restart the “es-data” node as well:

# Run this command on es-datasudo systemctl restart elasticsearch.service

After the both configurations were updated, we need to register a repository for this second directory as well using curl from any of the VMs.

# Run this command on es-datacurl -X PUT "10.0.0.6:9200/_snapshot/sample_data?pretty" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/mnt/elastic-share/es-sample-data" } } '

To validate our steps and to list the available snapshots from the GIT repository downloaded from GitHub, let’s run the following curl command from any of the VMs:

# Run this command on es-datacurl -X GET “10.0.0.6:9200/_snapshot/sample_data/_all?pretty”

We will end up with the following response:

Fig. 2.11: Available Snapshots in the GitHub Repository

The example repository contains two snapshots: the first one is quite useless as it doesn’t have any indexes, and a second one, named “sample-data-snapshot” containing one index. We will restore this second snapshot to bring some data into our ElasticSearch cluster using curl, executed from any of the VMs:

# Run this command on es-datacurl -X POST "10.0.0.6:9200/_snapshot/sample_data/sample-data-snapshot/_restore?pretty"

To verify if the sample data was loaded into our ElasticSearch cluster, we will use curl again:

# Run this command on es-datacurl -X GET "10.0.0.6:9200/_cat/indices/?pretty"

The command above lists all available indices on our cluster, and as we can see, our sample “superstore_orders” index was loaded successfully with 4562 records. We can additionally take a quick look on how our data looks like with curl:

# Run this command on es-datacurl -X GET "10.0.0.6:9200/superstore_orders/_search?pretty"

Recap & Summary

Before continuing our journey with ElasticSearch, let’s recap what we achieved and where we are:

At this point, our environment is solid but working only inside our virtual network. Whatever operation we would like run on the ElasticSearch cluster, e.g. adding new data or performing search, we must connect to one of the VMs in the virtual network via SSH and run curl commands from there — which is not the most convenient way to run experiments with ElasticSearch.

Not mentioning that our ElasticSearch is running without any security: anyone who has access to our VM can run any command again our cluster (for e.g. can wipe out our indexes and data).

3. Exposing ElasticSearch to Web

When we are thinking about exposing the ElasticSearch cluster to the internet, we have several options, for e.g.:

The approach of exposing the ElasticSearch cluster to the web “as-is” will require to open the port 9200 (port used by ES for handing all API calls) on the es-master VM to be available to the world outside of the virtual network.

When using the HTTP proxy approach — no additional ports needs to be opened on the VM as the HTTP traffic (and later on the HTTPS) with all our ElasticSearch API calls will be redirected from port 80 (and 443) by the proxy to the port 9200. This brings a great list of advantages (from load balancing, through providing an extra layer of security and for e.g. giving us room to manage the SSL encrypting HTTPS traffic for API calls separately from the internal in-between-node SSL management and communication).

We will skip the “as-is” option and move forward with the reverse proxy approach, using Nginx. A more comprehensive guide to Nginx can be found here: https://www.nginx.com/resources/wiki/start/ but all we need to move forward is covered in this tutorial.

Let’s connect to the es-master VM using SSH and run the following commands:

# Run these commands on es-master# Installing Nginx
sudo apt-get install nginx
# Enabling and starting the Nginx server
sudo systemctl enable nginx
sudo systemctl start nginx

We can easily validate the installation with curl:

# Run this command on es-mastercurl -X GET "123.123.123.123/"

Where “123.123.123.123” is the external IP address of the es-master VM. The result should be a response similar to the Fig. 3.1:

Fig. 3.1: Default Nginx response

To enable the reverse proxy re-directing the traffic from port 80 to 9200, we need to edit the configuration of our Nginx server. We can do this by connecting to es-master VM and running the following command:

# Run this command on es-mastersudo nano /etc/nginx/sites-enabled/default

And changing the configuration file to look like Fig 3.2:

Fig. 3.2: Nginx Configuration in Reverse Proxy Mode

The configuration above will redirect all the server traffic on the “/elastic” path to our ElasticSearch server running on port 9200 locally. E.g. “http://server-name/elastic” or “http://external-ip-address/elastic” will be redirected to “http://10.0.0.6:9200” inside the VM.

Similar to the ElasticSearch, after saving the new Nginx configuration file, we need to reload the service on the es-master VM:

# Run these commands on es-master# Reload Nginx configuration
sudo nginx -s reload
# Replace "123.123.123.123" with the external IP of es-master
curl -X GET "123.123.123.123/elastic"

What we can notice is that the curl will return the same ElasticSearch welcome message what we had seen before:

Fig. 3.3: ElasticSearch Welcome Message

At this point, if we would like to run any ElasticSearch API request (e.g. search queries, adding new data or creating new index) — we don’t need to connect via SSH to the es-master VM, we can do it from anywhere and we can use more modern REST API client tools over curl (e.g. Postman). However I strongly discourage against leaving the web server as-is because our ElasticSearch cluster is still insecure, and accepts requests from anywhere from anybody. Although our test dataset we loaded previously is not sensitive at all— it’s still not a good idea to expose it without any security.

4. Securing ElasticSearch

Domains and DNS Records

This part of the tutorial will focus on implementing a basic security shell around our platform, that will ensure:

The guide at this point requires a domain name and proper access level to modify the DNS records for the domain. We will use the “example.com” which has to be replaced with you domain name.

Also we need to be aware of the internal and external IP address of both es-master and es-data VMs.

I will reference them in the examples as:

es-master

es-data:

The very first step of this process is to update the DNS records of our domain to have dedicated name for both VMs: so we will need to add two A records to our DNS for the two external IP addresses. In my example (using Namecheap) the DNS management interface looks like the following:

Fig. 4.1: Adding a new “A” record for es-master (IP 20.185.88.252) | Namecheap.com

In the end, the new “A” records should look like this:

Fig. 4.2: DNS Records for the 2 VMs | Namecheap.com

Please note that both the interface of the DNS management varies highly by domain provider (for e.g. if you are using GoDaddy or Google Domains).

All of providers have their pros and cons — if you don’t signed up yet for any provider — choose the one which better fits you needs.

What we achieved above is that rather then referencing to our VMs through their IPs — we can now do with domain names. If the external IP address of the VM changes because of restart for e.g. what need to do is just update these records with the new IP and there will be no need to change the ElasticSearch API URLs in our clients.

The next step is to connect to both es-master and es-data domains add this information to the hosts file.

# Run these commands on both es-master and es-data# Backup the original configuration
sudo cp -f /etc/hosts /etc/hosts.bak
# Update the file
sudo nano /etc/hosts

The new content should look similar to Fig. 4.3.

Fig. 4.3: /etc/hosts content

Please make sure that the only change you do is you add the two last lines to file and nothing else. Also don’t forget to replace the example.com with your domain name. Once we change the file, the changes will have immediate effect, we don’t need to restart any service.

To test our changes, let’s see if we can ping these new entries. Execute these tests from either of the VMs:

# Replace example.com with your domain name
ping es-data
ping es-data.es.example.com

Now we should be able to access the VM both internally and externally (outside the virtual network) using their full fully qualified domain name.

Securing Communication Between Nodes

To secure the communication between our nodes, we will use SSL certificates generated using the tools provided by ElasticSearch. In proceed, let’s connect to the es-master VM and run the following commands:

# Create temp folders where to store the generated certificates
mkdir ~/tmp
mkdir ~/tmp/es-certs
cd ~/tmp/es-certs
# Create a file named "instance.yml".
# The ElasticSearch tools we will use to generated the certificates
# will use this file to identify the nodes.
nano ~/tmp/es-certs/instance.yml

The content of the “instance.yml” file should look like Fig. 4.4 (replacing “example.com” with your domain name).

Fig. 4.4: instance.yml

Once we save the file, let’s proceed with running the rest of the commands:

# Run these commands on es-master# Generate certificate files to our temporary directory
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --keep-ca-key --pem --in ~/tmp/es-certs/instance.yml --out ~/tmp/es-certs/certs.zip
# Install zip, if it's not installed yet
sudo apt-get install zip
# Unzip the generated certificates
sudo mkdir ~/tmp/es-certs/certs
sudo unzip ~/tmp/es-certs/certs.zip -d ~/tmp/es-certs/certs/
# Copy the certificates into their final location sudo mkdir /etc/elasticsearch/certs
sudo cp ~/tmp/es-certs/certs/ca/* ~/tmp/es-certs/certs/es-master/* /etc/elasticsearch/certs/
# Validate if we have the ca.crt, ca.key, and rest of the files
sudo dir /etc/elasticsearch/certs/
# Change ElasticSearch configuration
sudo nano /etc/elasticsearch/elasticsearch.yml (note the changes in the network.host)

The updated “elasticsearch.yml” for the es-master node should look similar to Fig. 4.5:

Fig. 4.5. “elasticsearch.yml” for es-master

What have changes:

X-Pack is an Elastic Stack extension that provides among many other — security capabilities. With ElasticSearch 7.x versions X-Pack is installed by default when we install ElasticSearch. In prior versions of ES it was a paid extensions. It still has many features that needs paid licensing (e.g. token based authentication, row level security, etc.) but we can enable basic authentication and we create as many users / roles as we want with the “free” version.

If you are interested in more detail, a more deeper guide is available on: https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-security.html

Let’s finish our activities on the es-master VM with the following commands:

# Run these commands on es-master# Restart the ES instance on the node
sudo systemctl restart elasticsearch.service
# Copy the auto generated certificate files to our network share
# so that we can access these from es-data VM too
mkdir /mnt/elastic-share/certs-temp
cp -rf ~/tmp/es-certs/certs /mnt/elastic-share/certs-temp

Now let’s continue the installation of the certificates on the es-data VM

# Run these commands on es-data# Copy the certificates to their final location
sudo mkdir /etc/elasticsearch/certs
sudo cp /mnt/elastic-share/certs-temp/certs/ca/* /mnt/elastic-share/certs-temp/certs/es-data/* /etc/elasticsearch/certs/
# Edit the ElasticSearch configuration
sudo nano /etc/elasticsearch/elasticsearch.yml

The configuration file should look similar to Fig 4.6:

Fig. 4.6: “elasticsearch.yml” for es-data

The changes are very similar to the “es-master” configuration: we changed the “network.host” and added the “xpack” lines. Apply the changes by restarting the ES instance:

# Run these commands on es-data
sudo systemctl restart elasticsearch.service

The next step is to switch back to the es-master VM and setup the passwords for the built-in ElasticSearch users listed below:

Regardless to that we will use only the “elastic” user, we need to setup the passwords for all the users.

# Run this command on es-master# Note down these passwords as they are very important
# and quite tricky to recover if they are forgotten
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive

After running the commands above, the only way to manage the Users and Role will be to use the API commands (secured in our case with Basic Authentication for e.g. the “elastic” user) as described in more detail here. We can ensure that our authentication works properly using curl (note that how the curl command changed to specify the -u switch followed by the “elastic” user)

# Run this command on either es-master or es-datacurl --insecure -u elastic "https://10.0.0.6:9200/_security/_authenticate?pretty"

The expected output should be somehow similar to Fig. 4.7:

Fig. 4.7: Authentication Test Result

To eliminate even the slightest risk of failing communication between the “es-master” and “es-data” nodes because of invalid SSL certificates, let’s update our “es-data” node configuration:

# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml

And update the file according to Fig. 4.8:

Fig. 4.8: ElasticSearch Configuration for es-data Node

The changes made were for the following entries:

Finally let’s restart the ElasticSearch instance of our es-data node:

# Run this command on es-datasudo systemctl restart elasticsearch.service

Securing the Reverse Proxy

The last and final step to have a working ElasticSearch cluster exposed to internet through HTTPS is to install SSL certificates to our Nginx server using free SSL/TLS certificates from Let’s Encrypt and obtain these certificates in an automated way using Certbot.

Let’s Encrypt is a non-profit Certificate Authority providing certificates for millions of websites, and it’s sponsored by companies like Mozilla, Cisco, Chrome, Facebook and many others. Certbot is a free, open source software tool for automatically using Let’s Encrypt certificates on manually-administrated websites to enable HTTPS and it works with a variety of web servers (e.g. Nginx, Apache, Haproxy, etc.) and operating systems (Linux, macOX, Windows).

# Run these commands on es-master# Install Certbot
sudo apt-get install certbot python-certbot-nginx
# Generate the certificates
# When promoted, enter "es-master.es.example.com" where
# you replace the "example.com" with your domain
sudo certbot certonly --nginx
# Update Nginx configuration
sudo nano /etc/nginx/sites-enabled/default

The new configuration should look like very similar to Fig. 4.9:

Fig. 4.9: Nginx configuration with SSL

Pay attention to replace the the paths properly for the “ssl_certificated” and “ssl_certificate_key” lines so that they point to the proper “fullchain.pem” and “privatekey.pem” files (and not the example.com directory for e.g.).

As final step, let’s reload our Nginx configuration:

# Run these commands on es-master# Reload configuration
sudo nginx -s reload
# Test the HTTP proxy
curl -u elastic "https://es-master.es.example.com/elastic/_security/_authenticate?pretty"

Please note how the curl command changed:

Congratulations!! At this point we managed to achieve all our goals, and we achieved to have:

From this point, we can use any modern tool to perform our ElasticSearch experiments from any compute, such as Postman:

Fig. 4.9: Running ElasticSearch Requests using Postman

Founder of Grate.app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store