Getting Started with Elasticsearch Installation — The Complete Guide
A complete guide for landing a working Elasticsearch 7.9 server from scratch to a secured, multi-node cluster exposed to internet with sample data.
Introduction
Getting started with Elasticsearch is quite easy as there are several brilliant pre-built solutions giving a quick and convenient approach to land a working Elasticsearch instance in minutes. However if we would like to achieve something more, it can get quite quickly really tricky.
On what we will focus now is how to build an Elasticsearch platform with multiple nodes, exposed to internet, with working Snapshots and Backups, secured with SSL and basic authentication.
Before getting started, let’s have a little bit of background.
My journey with Elasticsearch started with Algolia (not really Elastic, its more like a search-as a service solution with a great API), then evolved to Bitnami Stack which provides pre-built, pre-configured, out of the box working Elastic VMs on ASWS, Google Cloud and Azure, later on trialing with Elastic Cloud services then finally ended up with a completely self managed approach — running Linux VMs on Google Cloud and Azure platforms with native Elasticsearch installers.
The reasoning behind ending up with the self managed approach is that despite the pros and great functionalities of all the alternatives — this provides the highest flexibility in terms of configuration, upgrades and overall management:
- Algolia: It is a Search-as-Service solution providing very fast (they claim 200x faster speed then ElasticSearch) typo tolerant prefix matching search distributed over the world. Very beneficial to use for search-as-you-type use cases as it comes with a great, easy to integrate API. I wasn’t able to find too much information about what’s under the hood of Algolia but based on the format and structure of the data returned by their API, seems to be somehow a Lucerne based engine. My takeaway and experience was that it is really fast, however it is also quite expensive. Free for 1,000 search requests for 1,000 records (which is really not too much) and they have different pricing options afterwards. Also in terms of e.g. server side aggregations it’s not comparable to ElasticSearch as it is built for fast text search, not aggregations.
- Bitnami Stack: I was using for months fulfilling different Proof of Concept purposes and it was working great (running on Google Cloud stack). You can fire up a working ElasticSearch instance in mins, and with few clicks without the need to connect via SSH to the VM. However when I tried to upgrade the ElasticSearch from 6.x to 7.x, or to play around with advanced user management — I ended up spending days of effort without getting closer to either upgrading my cluster or introducing advanced user management (advanced user management in my case = having a dedicated read only ES user). I believe this wasn’t the fault of Bitnami stack itself rather then the fact that they uses a different configuration than the default ElasticSearch, and I found very hard to adopt any ElasticSearch related tutorial to the Bitnami’s version.
- ElasticSearch Cloud Stack: This is one of the most convenient option in my opinion as requires 0 infrastructure knowledge (you don’t even need to connect to Google Cloud Console / Azure Console). Ffiring up a fully operational ElasticSearch with Kibana, Logstash takes only a few clicks. However this approach has only a very limited two week long free trial, and after that the lowest option (Standard package) is $16/month + infrastructure (VM) costs. I calculated that this would cost somewhere between $30 — $40 / month just for “playing around” with ElasticSearch. It is still a great deal as it includes may extra modules (Elastic APM, App Search, Workplace Search, Security, and Maps and with + $6/month we can get advanced security features in place) . However since my main purpose is to run small POCs— I was looking for “free” options where I can leverage the one year free credits from Google Cloud / Microsoft Azure / Amazon AWS.
All these above led to playing around with self-managed, cloud based options. While from zero to an operational “single-node” ElasticSearch installation is quite “simple” following the installation guided provided by ElasticSearch on this link —I found installing and configuring a multi node cluster is more challenging.
The aim of this guide is to provide end to end and step-by-step instructions about how to install and configure an ElasticSearch Server with one master- and one data-node, how to enable shared repository that works with multiple-nodes and how to secure this ElasticSearch instance with SSL and finally how to publish to internet using Let’s Encrypt starting from a fresh, default Azure Linux VM. What we will skip is Kibana for now.
We will cover the following steps:
- Installing ElasticSearch instances on the two VMs;
- Reconfigure the two ElasticSearch instances in a way that they act as a master+data and a data node;
- Configure ElasticSearch snapshots in a way that they works in a multi-node environment;
- Land some sample data with restoring indexes from another ElasticSearch cluster using GIT;
- Secure the communication between the nodes with SSL;
- Introduce user authentication and user management to access the ElasticSearch;
- Expose the ElasticSearch cluster to the internet using NGINX as an intermediate reverse proxy and Lets Encrypt SSL.
We will end up with a simple architecture similar to below:

In our configuration, the two VMs will host the ElasticSearch nodes, the Samba server and client will be responsible for file sharing between the ElasticSearch nodes while NGINX will act as a reverse proxy to serve the clients through HTTPS using SSL obtained via Let’s Encrypt.
Important Side-Notes Before Begin
Please note that while I tried to create as detailed and precise guide as possibile — this infrastructure is not a production ready configuration and serves education purposes only.
I came across ElasticSearch while experimenting and leveraging options around real-time data analytics and machine learning over a large dataset and I am not wearing hats like “Search Engineer”, “DevOps Engineer” or “Infrastructure / Linux Engineer”. Any suggestions on the steps below are more than welcome in the comments.
Prerequisites
Before getting started, let’s make sure that all we need is in place:
- SSH Client (and very minimal knowledge how to use it);
- Two new VMs with the following specs: 1 CPU, 4 GiB memory, 10 GB Premium SSD with Debian 10 “Buster” — Gen1 operating system (all other Debian 9+ would do the job); The current tutorial is created using Azure stack but any other cloud provider would work;
- Ability to connect to both VMs via SSH;
- Dedicated domain — with access to DNS record management (this is an optional step required to surface the ElasticSearch to internet).
Getting the environments ready
Let‘s’ open the Azure portal and create two VMs. For the first VM let’s use “es-master” and for the second VM “es-data” names. This is how the rest of the tutorial will reference them (but you can be creative with how you name them). What is however important is that :
- both VMs should be in same Resource Group,
- both VMs should be in the same Virtual Network so that they can communicate with each other, and
- both have public IP address (default Azure setting)
For the sake of simplicity, let’s Password as Authentication type over the SSH public key.
Once both VMs have been provisioned — will take a few seconds — let’s make sure that the following ports are opened: 22 (for SSH), 80 (for HTTP requests), 443 (for HTTPS requests), 9200 (required for ElasticSearch and it is used for all API calls over HTTP) and 9300 (required for ElasticSearch and it is used for communications between nodes in a cluster).
The end result in my example case:
- VM 1 (es-master) has an internal IP 10.0.0.6
- VM 2 (es-data) has an internal IP 10.0.0.7
In the upcoming sections, where ever you see “10.0.0.6” make sure you replace it with your “es-master” VM internal IP address, and “10.0.0.7” with your “es-data” VM’s internal IP Address.
1. Installing ElasticSearch
The very first step in the installation is to connect to both VMs using SSH using the external IP of the VMs. For e.g. my “es-master” VM has the 20.185.88.1 external IP.
# Run this from your local computerssh 20.185.88.1
The result will be something similar after entering the password:

Following the official ElasticSearch installation guide available here, let’s begin our installation process by executing these commands step-by-step on both VMs (and ignoring the lines starting with # —no worries, you can copy paste these lines into your terminal, they will be ignored when you executing them):
# Run these commands on both es-master and es-data# Install necessary components
sudo apt-get install gnupg2
sudo wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list# Actual ElasticSearch Installation
sudo apt-get update && sudo apt-get install elasticsearch# Reloading the list of services on the VM
sudo /bin/systemctl daemon-reload# Configure to run ElasticSearch as a service
sudo /bin/systemctl enable elasticsearch.service# Launch the ElasticSearch service
sudo systemctl start elasticsearch.service
By completing the last step in the process, we should have an up and running ElasticSearch server on both VMs.
As I mentioned previously — as simple as this is to have an up and running an (in our case two) ElasticSearch cluster ready.
Please note that by starting the ElasticSearch clusters on both VMs, we initialized both instances as standalone ElasticSearch clusters — which will cause us later on some challenges when we try to merge these two clusters into one, and it will require couple of extra steps to overcome this — but for now at this point, there is nothing to worry about.
Let’s see if the ElasticSearch clusters are running:
# Run these commands on both es-master and es-data VMscurl -X GET "localhost:9200/"
The first proof of the successful operation is shown on the Fig. 1.2:

The next step in the process to change the default configuration of both ElasticSearch instances and try to merge these two standalone clusters into one (spoiler: it won’t succeed for the first attempt).
es-master VM
Open with a your favorite terminal based text editor to modify the ElasticSearch configuration file (“elasticsearch.yml”) which is located under the “/etc/elasticsearch” folder. If you don’t have a favorite terminal based text editor yet, nano is a great candidate and here is a great tutorial how to use it.
# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml
Delete all existing content and change the file to have the following (replace 10.0.0.6 with your internal IP for es-master VM):
cluster.name: es-cluster
node.name: es-master
network.host: 10.0.0.6
discovery.seed_hosts: ["es-master", "es-data"]
cluster.initial_master_nodes: ["es-master"]
node.master: true
node.data: true
If this is the first time you are using nano, you can save the file by pressing “Ctrl + X” and when prompted for overwrite, selecting “Yes”.
The configuration above essentially will configure this ES instance to act as both master- and a data-node and to look for the “es-data” node to discover it. These changes won’t have effect unless we restart the ElasticSearch node:
# Run these commands on es-master# Restarting the ES Service
sudo systemctl restart elasticsearch.service# Checking if the restart was successful
curl -X GET “10.0.0.6:9200/"
The last command is essentially only for verifying if our cluster is up and running — just like we did before.
Now let’s configure the “es-data” VM.
Similar to what we did on the “es-master” VM, let’s open the ElasticSearch configuration file of this node too:
# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml
and change it to have the following content (replace 10.0.0.7 with your internal IP for “es-data” VM):
cluster.name: es-cluster
node.name: es-data
network.host: 10.0.0.7
discovery.seed_hosts: ["es-master", "es-data"]
cluster.initial_master_nodes: ["es-master"]
node.master: false
node.data: true
What the configuration above will tell to this ElasticSearch node that it should not act as a master, but only as a data node and we specify which node should act as a master. To apply the changes, let’s restart the ElasticSearch node:
# Run this command on es-datasudo systemctl restart elasticsearch.service
At this point the service should restart smoothly and should give us the impression that this ElasticSearch node is working fine and it’s connected to the same cluster as the “es-master”. Even if we check the node’s initial response with curl — we get promising results:
# Run this command on es-datacurl -X GET “10.0.0.7:9200/"
The result will be similar to Fig. 1. 3. — reassuring that this node is working properly. However if we do a couple of additional health checks, for e.g. listing all the nodes of the cluster:
# Run this command on es-datacurl -X GET "10.0.0.7:9200/_cat/nodes?pretty"
We will run into HTTP error 503: Master not discovered exception.
There could be a couple of reasons behind this exception, but what it does mean is that this data node wasn’t able to discover the master node.
In such cases the very first step of the troubleshooting would be to check if the es-data VM can access the es-master VM (e.g. ports are properly open, the VMs are in the same virtual network, etc.) for e.g. using “ telnet”.
# Run this command on es-data# Wait couple of seconds to till the connection is closed
telnet 10.0.0.6 9200

Let’s check the other ports and the es-master VM too:
# Run this on es-data VM, and wait couple of seconds
telnet 10.0.0.6 9300# Run this on es-master VM, and wait couple of seconds
telnet 10.0.0.7 9200
telnet 10.0.0.7 9300
If we get in all cases similar results to Fig 1.3 — it means that the VMs can communicate with each other and the ports that are needed for ElasticSearch to operate are open.
But now let’s return to our ElasticSearch exception:
{
"error" : {
"root_cause" : [ {
"type" : "master_not_discovered_exception",
"reason" : null
} ],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
The root cause behind of this error in our case — as I wrote in the beginning of the tutorial — is that we started right away both ElasticSearch nodes on the two VMs. Even though we change the configuration file of both nodes to use the same “cluster name” (es-cluster) — since the ElasticSearch node has been started initially, it already fired up two individual clusters on the two VMs (with different cluster ID but same name). To avoid possibile data loss, ElasticSearch won’t allow merging the two nodes from two clusters into one. It’s described in more detail here and here.
Since both ElasticSearch instances are empty, addressing the issues is pretty easy for us: we need to delete the data directory of both ElasticSearch nodes and restart both services.
On both es-master VM and es-data VM, run the following commands:
# Run these commands on both es-master and es-data
sudo systemctl stop elasticsearch.service
sudo rm -rf /var/lib/elasticsearch
sudo mkdir /var/lib/elasticsearch
sudo chown elasticsearch: /var/lib/elasticsearch
sudo systemctl start elasticsearch.service
The commands above will remove the data directory, and will re-create it, and will grant the proper permissions to the built in “elasticsearch” user to this re-created data folder. We can verify if the restart (and bridging the two nodes under one cluster) was successful with curl:
# Run this command on es-datacurl -X GET "10.0.0.6:9200/_cat/nodes?pretty"
The expectation is that we will see two nodes listed similar to Fig. 1.4:

Having the master and data nodes properly configured, it will immediately turn the status of our ElasticSearch cluster from yellow to green (more information about cluster health, statuses: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html)

2. Configure ElasticSearch Snapshots
Running ElasticSearch without snapshots it’s like driving a car without seatbelts. Our setup consists from two nodes which means if Node A is lost, we will still have our data replicated (by default) on the Node B. However if both of the nodes is lost, our data is lost too.
To avoid this, it is recommended to create regular snapshots so that anything happens with the ES, we can restore it.
Setting up snapshots in a single-node cluster is straight forward and it’s very well described on the link below: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html
However configuring it for a multi node environment is a little more complicated and require couple of extra steps, for e.g. to install network file sharing using a commonly used file sharing solution: Samba.
We will install Samba server to the “es-master: VM and share a folder over the network so that it will be accessible from both “es-master” and “es-data” VMs and we will also install Samba clients on both VMs so that both VMs can access the shared folder from the same path.
Run the following commands on the es-master VM:
# Run these commands on es-master# Install Samba Server (hit "No" when prompted)
sudo apt-get install samba# Install Samba Client and CIFS
sudo apt install samba-client cifs-utils# Create a folder where the snapshots will be stored
sudo mkdir /usr/share/es-backups
sudo mkdir /usr/share/es-backups/es-repo# Create a dedicated "sambauser" whom will access the file system
# Replace "mysecretpassword" with a strong password, this will be
# the unix password of the "sambauser" unix user
sudo useradd -m sambauser -p mysecretpassword# Create a samba password for the newly created "sambauser"
# Note that this is different password then then the unix password.
# When accessing the shared folder, we will use this password
sudo smbpasswd -a sambauser# Grant the necessary folder permissions for "sambauser"
sudo usermod -a -G sambashare sambauser
sudo chgrp -R sambashare /usr/share/es-backups/es-repo
sudo chmod -R g+w /usr/share/es-backups/es-repo
We can test if the newly created “sambaser” who will be impersonated when accessing the shared directory has the necessary permissions using the following commands (creating a temp directory then deleting it):
# Run these commands on es-mastersudo -u sambauser mkdir /usr/share/es-backups/es-repo/todelete
sudo -u sambauser rm -rf /usr/share/es-backups/es-repo/todelete
The next step is to configure the Samba server and share the “/usr/share/es-backups/es-repo” folder over the network. To do this, let’s edit the Samba configuration file:
# Run these commands on es-mastersudo nano /etc/samba/smb.conf
And let’s add these lines to the end of the file:
[es-share]
browseable = yes
read only = no
writable = yes
valid users = sambauser
path = /usr/share/es-backups/es-repo
comment = Elastic Shared Snapshot Repo
After we finished editing the file, we can save the file with “Ctrl + X” and selecting “Yes” when prompted.
Now let’s restart the Samba server and check if the file sharing is working properly:
# Run these commands on es-master# Restart Samba server
sudo systemctl restart smbd# Create a folder where we will mount the shared folder
sudo mkdir /mnt/elastic-share# Mount the shared folder
sudo mount -t cifs -o user=sambauser //10.0.0.6/es-share /mnt/elastic-share# Unmount the shared folder
sudo umount /mnt/elastic-share
If there were no errors thrown during the execution of the commands above, our file sharing is working. Now let’s create a file to store securely the samba credentials we will use to mount this network share to our file system in a secure way.
# Run this command on es-master# Replace <loggedin-linux-username>
sudo nano /home/<loggedin-linux-username>/.smbcredentials
Add these two lines to the file (where “mysambapassword” is the password we created previously using the “smbpasswd” command), and then save it.
username=sambauser
password=mysambapassword
Now let’s secure the file, and update the “fstab” configuration:
# Run these commands on es-master# Replace <loggedin-linux-username>
sudo chmod 600 /home/<loggedin-linux-username>/.smbcredentials# Edit the fstab configuration
sudo nano /etc/fstab
Add this line to the end of the file (make sure you don’t remove anything from the original file and you replace the <loggedin-linux-username>):
//10.0.0.6/es-share /mnt/elastic-share cifs credentials=/home/<loggedin-linux-username>/.smbcredentials,users,rw,iocharset=utf8,file_mode=0777,dir_mode=0777 0 0
Now let’s test if our changes were successful:
# Run this command on es-master# Mount shared file system
sudo mount -a -v
The command above should have a similar outcome to Fig. 2.1:

Now let’s check if the shared folder is working as expected and it is accessible for the dedicated “elasticsearch” user too, and we can do this by creating a temp folder into the shared folder and then removing it:
# Run these commands on es-mastersudo -u elasticsearch mkdir /mnt/elastic-share/todelete
sudo -u elasticsearch rm -rf /mnt/elastic-share/todelete
Since our “elasticsearch” user has read/write access to the shared folder, let’s update our ElasticSearch configuration file (on the “es-master” VM) and this path as a “path.repo”.
# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml

# Run this command on es-master# Restart ElasticSearch
sudo systemctl restart elasticsearch.service
Now we need to install the Samba client on the “es-data” VM in a similar way how we did for the “es-master”.
# Run these commands on es-data# Installing samba-client to es-data VM
sudo apt install samba-client cifs-utils# Create a folder where to mount the shared folder
sudo mkdir /mnt/elastic-share# Test mount the shared folder
sudo mount -t cifs -o user=sambauser //10.0.0.6/es-share /mnt/elastic-share# Unmount the test run
sudo umount /mnt/elastic-share# Secure the credentials for "sambauser"
sudo nano /home/<loggedin-linux-username>/.smbcredentials
Update the file with the credentials from the the previous steps (use the same user name / password we used on the “es-master”):
username=sambauser
password=mysambapassword
Now let’s secure the file and update the “fstab” configuration on this VM too:
# Run these commands on es-datasudo chmod 600 /home/<loggedin-linux-username>/.smbcredentialssudo nano /etc/fstab
Add this line to the end of the file:
//10.0.0.6/es-share /mnt/elastic-share cifs credentials=/home/<loggedin-linux-username>/.smbcredentials,users,rw,iocharset=utf8,file_mode=0777,dir_mode=0777 0 0
And test if the changes were successful:
# Run this command on es-datasudo mount -a -v
Similar to the “es-master” VM results, we are also expecting similar results, as shown in Fig. 2.3:

And lastly testing if the “elasticsearch” user can access properly the shared folder:
# Run these commands on es-datasudo -u elasticsearch mkdir /mnt/elastic-share/todelete
sudo -u elasticsearch rm -rf /mnt/elastic-share/todelete
If creating and removing the temporary folder impersonating the “elasticsearch” user is successful, we are ready to move forward and update the ElasticSearch configuration file of the “es-data” node too in a way to to use this network folder for storing the snapshots.
# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml
The updated configuration file should look like following:

Please note the extra line at the end of the file with the “path.repo” being exactly identical to the “path.repo” on the “es-master”.
Finally, let’s restart the ElasticSearch on this VM too:
# Run this command on es-datasudo systemctl restart elasticsearch.service
Registering the ElasticSearch Snapshot
Once we have our snapshot repository location specified on both nodes, the final step in order to have working ElasticSearch snapshots is to tell to the ElasticSearch cluster to use the repository created previously to store the snapshots. We can do this easily from either VMs using curl:
# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup?pretty" -H 'Content-Type: application/json' -d' {
"type": "fs",
"settings": {
"location": "/mnt/elastic-share/es-backups"
}
}'
We should have the following result:
{
"acknowledged": true
}
Verifying one more time if the snapshot is registered properly via curl:
# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true&pretty"
The command above should return similar result to Fig. 2.5:

Creating Our First ElasticSearch Snapshot
As we have the working ElasticSearch repository — let’s use it, and create our very first snapshot (named “snapshot_1") via curl:
# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true&pretty"
And the results should look like something similar to Fig 2.6:

Bonus: Restoring Snapshot from Another ElasticSearch Cluster
This optional section covers an extra exercise with ElasticSearch snapshots: restoring snapshots stored in GIT, taken from another ES cluster.
While GIT is not the best option to store your ElasticSearch snapshots for the simplicity of this tutorial it comes quite handy.
Completing this part of the tutorial will also helps landing some meaningful test data into our ElasticSearch cluster.
We will use a dataset called SuperStore, from the following GitHub repository: https://github.com/botond-kopacz/es-sample-data
The repository should look like similar to Fig. 2.7:

Let’s install git on es-master VM and clone the repository above.
# Run these commands on es-master# Installing GIT
sudo apt-get install gitmkdir ~/git
cd ~/git# Clone repository above
sudo git clone https://github.com/botond-kopacz/es-sample-data.git# Create destination directory on the network share
sudo -u elasticsearch mkdir /mnt/elastic-share/es-sample-data# Copy the snapshots to shared drive
# Don't forget to replace <logged-in-linux-username>cp -R /home/<logged-in-linux-username>/git/es-sample-data/* /mnt/elastic-share/es-sample-data# Verify if the content appeared properly
ls /mnt/elastic-share/es-sample-data
We are expecting the following output:

The next step is to register this newly created shared folder as a snapshot repository inside our ElasticSearch cluster (on both es-master and es-data nodes).
On es-master VM let’s run the following command:
# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml
And update the configuration file as per the following example:

Finally restart the ElasticSearch node:
# Run this command on es-mastersudo systemctl restart elasticsearch.service
Please note that how the last line (path.repo) changed from containing a single shared directory to an array of directories. And now let’s repeat the steps on the es-data VM.
# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml

And restart the “es-data” node as well:
# Run this command on es-datasudo systemctl restart elasticsearch.service
After the both configurations were updated, we need to register a repository for this second directory as well using curl from any of the VMs.
# Run this command on es-datacurl -X PUT "10.0.0.6:9200/_snapshot/sample_data?pretty" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/mnt/elastic-share/es-sample-data" } } '
To validate our steps and to list the available snapshots from the GIT repository downloaded from GitHub, let’s run the following curl command from any of the VMs:
# Run this command on es-datacurl -X GET “10.0.0.6:9200/_snapshot/sample_data/_all?pretty”
We will end up with the following response:

The example repository contains two snapshots: the first one is quite useless as it doesn’t have any indexes, and a second one, named “sample-data-snapshot” containing one index. We will restore this second snapshot to bring some data into our ElasticSearch cluster using curl, executed from any of the VMs:
# Run this command on es-datacurl -X POST "10.0.0.6:9200/_snapshot/sample_data/sample-data-snapshot/_restore?pretty"
To verify if the sample data was loaded into our ElasticSearch cluster, we will use curl again:
# Run this command on es-datacurl -X GET "10.0.0.6:9200/_cat/indices/?pretty"

The command above lists all available indices on our cluster, and as we can see, our sample “superstore_orders” index was loaded successfully with 4562 records. We can additionally take a quick look on how our data looks like with curl:
# Run this command on es-datacurl -X GET "10.0.0.6:9200/superstore_orders/_search?pretty"
Recap & Summary
Before continuing our journey with ElasticSearch, let’s recap what we achieved and where we are:
- Running ElasticSearch cluster with a master and a data node
- Configure snapshots and backups
- Load the SuperStore sample dataset into ElasticSearch
- Run our very first “search” query agains the cluster
At this point, our environment is solid but working only inside our virtual network. Whatever operation we would like run on the ElasticSearch cluster, e.g. adding new data or performing search, we must connect to one of the VMs in the virtual network via SSH and run curl commands from there — which is not the most convenient way to run experiments with ElasticSearch.
Not mentioning that our ElasticSearch is running without any security: anyone who has access to our VM can run any command again our cluster (for e.g. can wipe out our indexes and data).
3. Exposing ElasticSearch to Web
When we are thinking about exposing the ElasticSearch cluster to the internet, we have several options, for e.g.:
- Expose ElasticSearch “as-is” or
- Expose using a HTTP reverse proxy
The approach of exposing the ElasticSearch cluster to the web “as-is” will require to open the port 9200 (port used by ES for handing all API calls) on the es-master VM to be available to the world outside of the virtual network.
When using the HTTP proxy approach — no additional ports needs to be opened on the VM as the HTTP traffic (and later on the HTTPS) with all our ElasticSearch API calls will be redirected from port 80 (and 443) by the proxy to the port 9200. This brings a great list of advantages (from load balancing, through providing an extra layer of security and for e.g. giving us room to manage the SSL encrypting HTTPS traffic for API calls separately from the internal in-between-node SSL management and communication).
We will skip the “as-is” option and move forward with the reverse proxy approach, using Nginx. A more comprehensive guide to Nginx can be found here: https://www.nginx.com/resources/wiki/start/ but all we need to move forward is covered in this tutorial.
Let’s connect to the es-master VM using SSH and run the following commands:
# Run these commands on es-master# Installing Nginx
sudo apt-get install nginx# Enabling and starting the Nginx server
sudo systemctl enable nginx
sudo systemctl start nginx
We can easily validate the installation with curl:
# Run this command on es-mastercurl -X GET "123.123.123.123/"
Where “123.123.123.123” is the external IP address of the es-master VM. The result should be a response similar to the Fig. 3.1:

To enable the reverse proxy re-directing the traffic from port 80 to 9200, we need to edit the configuration of our Nginx server. We can do this by connecting to es-master VM and running the following command:
# Run this command on es-mastersudo nano /etc/nginx/sites-enabled/default
And changing the configuration file to look like Fig 3.2:

The configuration above will redirect all the server traffic on the “/elastic” path to our ElasticSearch server running on port 9200 locally. E.g. “http://server-name/elastic” or “http://external-ip-address/elastic” will be redirected to “http://10.0.0.6:9200” inside the VM.
Similar to the ElasticSearch, after saving the new Nginx configuration file, we need to reload the service on the es-master VM:
# Run these commands on es-master# Reload Nginx configuration
sudo nginx -s reload# Replace "123.123.123.123" with the external IP of es-master
curl -X GET "123.123.123.123/elastic"
What we can notice is that the curl will return the same ElasticSearch welcome message what we had seen before:

At this point, if we would like to run any ElasticSearch API request (e.g. search queries, adding new data or creating new index) — we don’t need to connect via SSH to the es-master VM, we can do it from anywhere and we can use more modern REST API client tools over curl (e.g. Postman). However I strongly discourage against leaving the web server as-is because our ElasticSearch cluster is still insecure, and accepts requests from anywhere from anybody. Although our test dataset we loaded previously is not sensitive at all— it’s still not a good idea to expose it without any security.
4. Securing ElasticSearch
Domains and DNS Records
This part of the tutorial will focus on implementing a basic security shell around our platform, that will ensure:
- All the API communication between clients and ElasticSearch server is secured with HTTPS;
- All the communication between the master and data nodes is secured with HTTPS;
- The ElasticSearch cluster can be accessed only with password (implementing Basic Authentication requiring user name + password );
- The ElasticSearch cluster is accessible from a static URL even if the external IP address of the VMs changes
The guide at this point requires a domain name and proper access level to modify the DNS records for the domain. We will use the “example.com” which has to be replaced with you domain name.
Also we need to be aware of the internal and external IP address of both es-master and es-data VMs.
I will reference them in the examples as:
es-master
- Internal IP: 10.0.0.6
- External IP: 20.185.88.252
es-data:
- Internal IP: 10.0.0.7
- External IP: 20.185.88.255
The very first step of this process is to update the DNS records of our domain to have dedicated name for both VMs: so we will need to add two A records to our DNS for the two external IP addresses. In my example (using Namecheap) the DNS management interface looks like the following:

In the end, the new “A” records should look like this:

Please note that both the interface of the DNS management varies highly by domain provider (for e.g. if you are using GoDaddy or Google Domains).
All of providers have their pros and cons — if you don’t signed up yet for any provider — choose the one which better fits you needs.
What we achieved above is that rather then referencing to our VMs through their IPs — we can now do with domain names. If the external IP address of the VM changes because of restart for e.g. what need to do is just update these records with the new IP and there will be no need to change the ElasticSearch API URLs in our clients.
The next step is to connect to both es-master and es-data domains add this information to the hosts file.
# Run these commands on both es-master and es-data# Backup the original configuration
sudo cp -f /etc/hosts /etc/hosts.bak# Update the file
sudo nano /etc/hosts
The new content should look similar to Fig. 4.3.

Please make sure that the only change you do is you add the two last lines to file and nothing else. Also don’t forget to replace the example.com with your domain name. Once we change the file, the changes will have immediate effect, we don’t need to restart any service.
To test our changes, let’s see if we can ping these new entries. Execute these tests from either of the VMs:
# Replace example.com with your domain name
ping es-data
ping es-data.es.example.com
Now we should be able to access the VM both internally and externally (outside the virtual network) using their full fully qualified domain name.
Securing Communication Between Nodes
To secure the communication between our nodes, we will use SSL certificates generated using the tools provided by ElasticSearch. In proceed, let’s connect to the es-master VM and run the following commands:
# Create temp folders where to store the generated certificates
mkdir ~/tmp
mkdir ~/tmp/es-certs
cd ~/tmp/es-certs# Create a file named "instance.yml".
# The ElasticSearch tools we will use to generated the certificates
# will use this file to identify the nodes.nano ~/tmp/es-certs/instance.yml
The content of the “instance.yml” file should look like Fig. 4.4 (replacing “example.com” with your domain name).

Once we save the file, let’s proceed with running the rest of the commands:
# Run these commands on es-master# Generate certificate files to our temporary directory
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --keep-ca-key --pem --in ~/tmp/es-certs/instance.yml --out ~/tmp/es-certs/certs.zip# Install zip, if it's not installed yet
sudo apt-get install zip# Unzip the generated certificates
sudo mkdir ~/tmp/es-certs/certs
sudo unzip ~/tmp/es-certs/certs.zip -d ~/tmp/es-certs/certs/# Copy the certificates into their final location sudo mkdir /etc/elasticsearch/certs
sudo cp ~/tmp/es-certs/certs/ca/* ~/tmp/es-certs/certs/es-master/* /etc/elasticsearch/certs/# Validate if we have the ca.crt, ca.key, and rest of the files
sudo dir /etc/elasticsearch/certs/# Change ElasticSearch configuration
sudo nano /etc/elasticsearch/elasticsearch.yml (note the changes in the network.host)
The updated “elasticsearch.yml” for the es-master node should look similar to Fig. 4.5:

What have changes:
- network.host was updated to the fully qualified domain name
- we added new lines starting with “xpack”
X-Pack is an Elastic Stack extension that provides among many other — security capabilities. With ElasticSearch 7.x versions X-Pack is installed by default when we install ElasticSearch. In prior versions of ES it was a paid extensions. It still has many features that needs paid licensing (e.g. token based authentication, row level security, etc.) but we can enable basic authentication and we create as many users / roles as we want with the “free” version.
If you are interested in more detail, a more deeper guide is available on: https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-security.html
Let’s finish our activities on the es-master VM with the following commands:
# Run these commands on es-master# Restart the ES instance on the node
sudo systemctl restart elasticsearch.service# Copy the auto generated certificate files to our network share
# so that we can access these from es-data VM too
mkdir /mnt/elastic-share/certs-temp
cp -rf ~/tmp/es-certs/certs /mnt/elastic-share/certs-temp
Now let’s continue the installation of the certificates on the es-data VM
# Run these commands on es-data# Copy the certificates to their final location
sudo mkdir /etc/elasticsearch/certs
sudo cp /mnt/elastic-share/certs-temp/certs/ca/* /mnt/elastic-share/certs-temp/certs/es-data/* /etc/elasticsearch/certs/# Edit the ElasticSearch configuration
sudo nano /etc/elasticsearch/elasticsearch.yml
The configuration file should look similar to Fig 4.6:

The changes are very similar to the “es-master” configuration: we changed the “network.host” and added the “xpack” lines. Apply the changes by restarting the ES instance:
# Run these commands on es-data
sudo systemctl restart elasticsearch.service
The next step is to switch back to the es-master VM and setup the passwords for the built-in ElasticSearch users listed below:
- elastic (the built-in ElasticSearch “superuser”)
- kibana_system (the built-in Kibana user)
- logstash_system (the Logstash user which stores monitoring information)
- beats_system (the Beats user which stores monitoring information)
- apm_system (the APM Server user which stores monitoring information)
- remote_monitoring_user (the Metricbeat user which collects and stores monitoring information)
Regardless to that we will use only the “elastic” user, we need to setup the passwords for all the users.
# Run this command on es-master# Note down these passwords as they are very important
# and quite tricky to recover if they are forgotten
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive
After running the commands above, the only way to manage the Users and Role will be to use the API commands (secured in our case with Basic Authentication for e.g. the “elastic” user) as described in more detail here. We can ensure that our authentication works properly using curl (note that how the curl command changed to specify the -u switch followed by the “elastic” user)
# Run this command on either es-master or es-datacurl --insecure -u elastic "https://10.0.0.6:9200/_security/_authenticate?pretty"
The expected output should be somehow similar to Fig. 4.7:

To eliminate even the slightest risk of failing communication between the “es-master” and “es-data” nodes because of invalid SSL certificates, let’s update our “es-data” node configuration:
# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml
And update the file according to Fig. 4.8:

The changes made were for the following entries:
- “discovery.seed_hosts”: Replace master reference with the fully qualified domain name
- “cluster.initial_master_nodes”: Replace master reference with the fully qualified domain name
Finally let’s restart the ElasticSearch instance of our es-data node:
# Run this command on es-datasudo systemctl restart elasticsearch.service
Securing the Reverse Proxy
The last and final step to have a working ElasticSearch cluster exposed to internet through HTTPS is to install SSL certificates to our Nginx server using free SSL/TLS certificates from Let’s Encrypt and obtain these certificates in an automated way using Certbot.
Let’s Encrypt is a non-profit Certificate Authority providing certificates for millions of websites, and it’s sponsored by companies like Mozilla, Cisco, Chrome, Facebook and many others. Certbot is a free, open source software tool for automatically using Let’s Encrypt certificates on manually-administrated websites to enable HTTPS and it works with a variety of web servers (e.g. Nginx, Apache, Haproxy, etc.) and operating systems (Linux, macOX, Windows).
# Run these commands on es-master# Install Certbot
sudo apt-get install certbot python-certbot-nginx# Generate the certificates
# When promoted, enter "es-master.es.example.com" where
# you replace the "example.com" with your domain
sudo certbot certonly --nginx# Update Nginx configuration
sudo nano /etc/nginx/sites-enabled/default
The new configuration should look like very similar to Fig. 4.9:

Pay attention to replace the the paths properly for the “ssl_certificated” and “ssl_certificate_key” lines so that they point to the proper “fullchain.pem” and “privatekey.pem” files (and not the example.com directory for e.g.).
As final step, let’s reload our Nginx configuration:
# Run these commands on es-master# Reload configuration
sudo nginx -s reload# Test the HTTP proxy
curl -u elastic "https://es-master.es.example.com/elastic/_security/_authenticate?pretty"
Please note how the curl command changed:
- we no longer need the “-unsecure” switch as our site will have a valid and trusted SSL certificate
- we can access our ElasticSearch with the fully qualified domain name without having to worry about the IP address of the VM
Congratulations!! At this point we managed to achieve all our goals, and we achieved to have:
- A fully operational ElasticSearch cluster with master and data nodes
- Sample data loaded in our ElasticSearch cluster through snapshots
- Secure communication between the nodes and secure communication between any external ElasticSearch client and the cluster
From this point, we can use any modern tool to perform our ElasticSearch experiments from any compute, such as Postman:
