Getting Started with Elasticsearch Installation — The Complete Guide

A complete guide for landing a working Elasticsearch 7.9 server from scratch to a secured, multi-node cluster exposed to internet with sample data.

Introduction

Getting started with Elasticsearch is quite easy as there are several brilliant pre-built solutions giving a quick and convenient approach to land a working Elasticsearch instance in minutes. However if we would like to achieve something more, it can get quite quickly really tricky.

  • Bitnami Stack: I was using for months fulfilling different Proof of Concept purposes and it was working great (running on Google Cloud stack). You can fire up a working ElasticSearch instance in mins, and with few clicks without the need to connect via SSH to the VM. However when I tried to upgrade the ElasticSearch from 6.x to 7.x, or to play around with advanced user management — I ended up spending days of effort without getting closer to either upgrading my cluster or introducing advanced user management (advanced user management in my case = having a dedicated read only ES user). I believe this wasn’t the fault of Bitnami stack itself rather then the fact that they uses a different configuration than the default ElasticSearch, and I found very hard to adopt any ElasticSearch related tutorial to the Bitnami’s version.
  • ElasticSearch Cloud Stack: This is one of the most convenient option in my opinion as requires 0 infrastructure knowledge (you don’t even need to connect to Google Cloud Console / Azure Console). Ffiring up a fully operational ElasticSearch with Kibana, Logstash takes only a few clicks. However this approach has only a very limited two week long free trial, and after that the lowest option (Standard package) is $16/month + infrastructure (VM) costs. I calculated that this would cost somewhere between $30 — $40 / month just for “playing around” with ElasticSearch. It is still a great deal as it includes may extra modules (Elastic APM, App Search, Workplace Search, Security, and Maps and with + $6/month we can get advanced security features in place) . However since my main purpose is to run small POCs— I was looking for “free” options where I can leverage the one year free credits from Google Cloud / Microsoft Azure / Amazon AWS.
  1. Reconfigure the two ElasticSearch instances in a way that they act as a master+data and a data node;
  2. Configure ElasticSearch snapshots in a way that they works in a multi-node environment;
  3. Land some sample data with restoring indexes from another ElasticSearch cluster using GIT;
  4. Secure the communication between the nodes with SSL;
  5. Introduce user authentication and user management to access the ElasticSearch;
  6. Expose the ElasticSearch cluster to the internet using NGINX as an intermediate reverse proxy and Lets Encrypt SSL.
Fig. 1: Example ElasticSearch Architecture

Important Side-Notes Before Begin

Please note that while I tried to create as detailed and precise guide as possibile — this infrastructure is not a production ready configuration and serves education purposes only.

Prerequisites

Before getting started, let’s make sure that all we need is in place:

  • Two new VMs with the following specs: 1 CPU, 4 GiB memory, 10 GB Premium SSD with Debian 10 “Buster” — Gen1 operating system (all other Debian 9+ would do the job); The current tutorial is created using Azure stack but any other cloud provider would work;
  • Ability to connect to both VMs via SSH;
  • Dedicated domain — with access to DNS record management (this is an optional step required to surface the ElasticSearch to internet).

Getting the environments ready

Let‘s’ open the Azure portal and create two VMs. For the first VM let’s use “es-master” and for the second VM “es-data” names. This is how the rest of the tutorial will reference them (but you can be creative with how you name them). What is however important is that :

  • both VMs should be in the same Virtual Network so that they can communicate with each other, and
  • both have public IP address (default Azure setting)
  • VM 2 (es-data) has an internal IP 10.0.0.7

1. Installing ElasticSearch

The very first step in the installation is to connect to both VMs using SSH using the external IP of the VMs. For e.g. my “es-master” VM has the 20.185.88.1 external IP.

# Run this from your local computerssh 20.185.88.1
Fig. 1.1: SSH to the VM
# Run these commands on both es-master and es-data# Install necessary components 
sudo apt-get install gnupg2
sudo wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
# Actual ElasticSearch Installation
sudo apt-get update && sudo apt-get install elasticsearch
# Reloading the list of services on the VM
sudo /bin/systemctl daemon-reload
# Configure to run ElasticSearch as a service
sudo /bin/systemctl enable elasticsearch.service
# Launch the ElasticSearch service
sudo systemctl start elasticsearch.service
# Run these commands on both es-master and es-data VMscurl -X GET "localhost:9200/"
Fig. 1.2: Default ElasticSearch response
# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml
cluster.name: es-cluster
node.name: es-master
network.host: 10.0.0.6
discovery.seed_hosts: ["es-master", "es-data"]
cluster.initial_master_nodes: ["es-master"]
node.master: true
node.data: true
# Run these commands on es-master# Restarting the ES Service
sudo systemctl restart elasticsearch.service
# Checking if the restart was successful
curl -X GET “10.0.0.6:9200/"
# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml
cluster.name: es-cluster
node.name: es-data
network.host: 10.0.0.7
discovery.seed_hosts: ["es-master", "es-data"]
cluster.initial_master_nodes: ["es-master"]
node.master: false
node.data: true
# Run this command on es-datasudo systemctl restart elasticsearch.service
# Run this command on es-datacurl -X GET “10.0.0.7:9200/"
# Run this command on es-datacurl -X GET "10.0.0.7:9200/_cat/nodes?pretty"
# Run this command on es-data# Wait couple of seconds to till the connection is closed
telnet 10.0.0.6 9200
Fig. 1.3: Successful telnet
# Run this on es-data VM, and wait couple of seconds 
telnet 10.0.0.6 9300
# Run this on es-master VM, and wait couple of seconds
telnet 10.0.0.7 9200
telnet 10.0.0.7 9300
{
"error" : {
"root_cause" : [ {
"type" : "master_not_discovered_exception",
"reason" : null
} ],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
# Run these commands on both es-master and es-data

sudo systemctl stop elasticsearch.service
sudo rm -rf /var/lib/elasticsearch
sudo mkdir /var/lib/elasticsearch
sudo chown elasticsearch: /var/lib/elasticsearch
sudo systemctl start elasticsearch.service
# Run this command on es-datacurl -X GET "10.0.0.6:9200/_cat/nodes?pretty"
Fig. 1.4: Two nodes connected to the same cluster
Fig. 1.5: ElasticSearch Cluster Turning Into “green” Status

2. Configure ElasticSearch Snapshots

Running ElasticSearch without snapshots it’s like driving a car without seatbelts. Our setup consists from two nodes which means if Node A is lost, we will still have our data replicated (by default) on the Node B. However if both of the nodes is lost, our data is lost too.

# Run these commands on es-master# Install Samba Server (hit "No" when prompted)
sudo apt-get install samba
# Install Samba Client and CIFS
sudo apt install samba-client cifs-utils
# Create a folder where the snapshots will be stored
sudo mkdir /usr/share/es-backups
sudo mkdir /usr/share/es-backups/es-repo
# Create a dedicated "sambauser" whom will access the file system
# Replace "mysecretpassword" with a strong password, this will be
# the unix password of the "sambauser" unix user
sudo useradd -m sambauser -p mysecretpassword
# Create a samba password for the newly created "sambauser"
# Note that this is different password then then the unix password.
# When accessing the shared folder, we will use this password
sudo smbpasswd -a sambauser
# Grant the necessary folder permissions for "sambauser"
sudo usermod -a -G sambashare sambauser
sudo chgrp -R sambashare /usr/share/es-backups/es-repo
sudo chmod -R g+w /usr/share/es-backups/es-repo
# Run these commands on es-mastersudo -u sambauser mkdir /usr/share/es-backups/es-repo/todelete
sudo -u sambauser rm -rf /usr/share/es-backups/es-repo/todelete
# Run these commands on es-mastersudo nano /etc/samba/smb.conf
[es-share]
browseable = yes
read only = no
writable = yes
valid users = sambauser
path = /usr/share/es-backups/es-repo
comment = Elastic Shared Snapshot Repo
# Run these commands on es-master# Restart Samba server 
sudo systemctl restart smbd
# Create a folder where we will mount the shared folder
sudo mkdir /mnt/elastic-share
# Mount the shared folder
sudo mount -t cifs -o user=sambauser //10.0.0.6/es-share /mnt/elastic-share
# Unmount the shared folder
sudo umount /mnt/elastic-share
# Run this command on es-master# Replace <loggedin-linux-username> 
sudo nano /home/<loggedin-linux-username>/.smbcredentials
username=sambauser
password=mysambapassword
# Run these commands on es-master# Replace <loggedin-linux-username>
sudo chmod 600 /home/<loggedin-linux-username>/.smbcredentials
# Edit the fstab configuration
sudo nano /etc/fstab
//10.0.0.6/es-share /mnt/elastic-share cifs credentials=/home/<loggedin-linux-username>/.smbcredentials,users,rw,iocharset=utf8,file_mode=0777,dir_mode=0777 0 0
# Run this command on es-master# Mount shared file system
sudo mount -a -v
Fig. 2.1: Mounting the shared network folder
# Run these commands on es-mastersudo -u elasticsearch mkdir /mnt/elastic-share/todelete
sudo -u elasticsearch rm -rf /mnt/elastic-share/todelete
# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml
Fig. 2.2: Updated ES Configuration for “es-master”
# Run this command on es-master# Restart ElasticSearch 
sudo systemctl restart elasticsearch.service
# Run these commands on es-data# Installing samba-client to es-data VM
sudo apt install samba-client cifs-utils
# Create a folder where to mount the shared folder
sudo mkdir /mnt/elastic-share
# Test mount the shared folder
sudo mount -t cifs -o user=sambauser //10.0.0.6/es-share /mnt/elastic-share
# Unmount the test run
sudo umount /mnt/elastic-share
# Secure the credentials for "sambauser"
sudo nano /home/<loggedin-linux-username>/.smbcredentials
username=sambauser
password=mysambapassword
# Run these commands on es-datasudo chmod 600 /home/<loggedin-linux-username>/.smbcredentialssudo nano /etc/fstab
//10.0.0.6/es-share /mnt/elastic-share cifs credentials=/home/<loggedin-linux-username>/.smbcredentials,users,rw,iocharset=utf8,file_mode=0777,dir_mode=0777 0 0
# Run this command on es-datasudo mount -a -v
Fig. 2.3: Mounting the Shared Network folder on “es-data”
# Run these commands on es-datasudo -u elasticsearch mkdir /mnt/elastic-share/todelete
sudo -u elasticsearch rm -rf /mnt/elastic-share/todelete
# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml
Fig. 2.4: Updated ES Configuration for“es-data”
# Run this command on es-datasudo systemctl restart elasticsearch.service

Registering the ElasticSearch Snapshot

Once we have our snapshot repository location specified on both nodes, the final step in order to have working ElasticSearch snapshots is to tell to the ElasticSearch cluster to use the repository created previously to store the snapshots. We can do this easily from either VMs using curl:

# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup?pretty" -H 'Content-Type: application/json' -d' {
"type": "fs",
"settings": {
"location": "/mnt/elastic-share/es-backups"
}
}'
{
"acknowledged": true
}
# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true&pretty"
Fig. 2.5: Snapshot registered in ElasticSearch

Creating Our First ElasticSearch Snapshot

As we have the working ElasticSearch repository — let’s use it, and create our very first snapshot (named “snapshot_1") via curl:

# Run this command on es-mastercurl -X PUT "10.0.0.6:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true&pretty"
Fig. 2.6: Result of creating our first Snapshot

Bonus: Restoring Snapshot from Another ElasticSearch Cluster

This optional section covers an extra exercise with ElasticSearch snapshots: restoring snapshots stored in GIT, taken from another ES cluster.

Fig. 2.7: ES Repository Content Stored in GIT
# Run these commands on es-master# Installing GIT
sudo apt-get install git
mkdir ~/git
cd ~/git
# Clone repository above
sudo git clone https://github.com/botond-kopacz/es-sample-data.git
# Create destination directory on the network share
sudo -u elasticsearch mkdir /mnt/elastic-share/es-sample-data
# Copy the snapshots to shared drive
# Don't forget to replace <logged-in-linux-username>
cp -R /home/<logged-in-linux-username>/git/es-sample-data/* /mnt/elastic-share/es-sample-data# Verify if the content appeared properly
ls /mnt/elastic-share/es-sample-data
Fig. 2.8: Content of Shared Network Folder
# Run this command on es-mastersudo nano /etc/elasticsearch/elasticsearch.yml
Fig. 2.9: Updated ElasticSearch Configuration on es-master VM
# Run this command on es-mastersudo systemctl restart elasticsearch.service
# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml
Fig. 2.10: Updated ElasticSearch Configuration on es-data VM
# Run this command on es-datasudo systemctl restart elasticsearch.service
# Run this command on es-datacurl -X PUT "10.0.0.6:9200/_snapshot/sample_data?pretty" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/mnt/elastic-share/es-sample-data" } } '
# Run this command on es-datacurl -X GET “10.0.0.6:9200/_snapshot/sample_data/_all?pretty”
Fig. 2.11: Available Snapshots in the GitHub Repository
# Run this command on es-datacurl -X POST "10.0.0.6:9200/_snapshot/sample_data/sample-data-snapshot/_restore?pretty"
# Run this command on es-datacurl -X GET "10.0.0.6:9200/_cat/indices/?pretty"
# Run this command on es-datacurl -X GET "10.0.0.6:9200/superstore_orders/_search?pretty"

Recap & Summary

Before continuing our journey with ElasticSearch, let’s recap what we achieved and where we are:

  • Configure snapshots and backups
  • Load the SuperStore sample dataset into ElasticSearch
  • Run our very first “search” query agains the cluster

3. Exposing ElasticSearch to Web

When we are thinking about exposing the ElasticSearch cluster to the internet, we have several options, for e.g.:

  • Expose using a HTTP reverse proxy
# Run these commands on es-master# Installing Nginx
sudo apt-get install nginx
# Enabling and starting the Nginx server
sudo systemctl enable nginx
sudo systemctl start nginx
# Run this command on es-mastercurl -X GET "123.123.123.123/"
Fig. 3.1: Default Nginx response
# Run this command on es-mastersudo nano /etc/nginx/sites-enabled/default
Fig. 3.2: Nginx Configuration in Reverse Proxy Mode
# Run these commands on es-master# Reload Nginx configuration
sudo nginx -s reload
# Replace "123.123.123.123" with the external IP of es-master
curl -X GET "123.123.123.123/elastic"
Fig. 3.3: ElasticSearch Welcome Message

4. Securing ElasticSearch

Domains and DNS Records

This part of the tutorial will focus on implementing a basic security shell around our platform, that will ensure:

  • All the communication between the master and data nodes is secured with HTTPS;
  • The ElasticSearch cluster can be accessed only with password (implementing Basic Authentication requiring user name + password );
  • The ElasticSearch cluster is accessible from a static URL even if the external IP address of the VMs changes
  • External IP: 20.185.88.252
  • External IP: 20.185.88.255
Fig. 4.1: Adding a new “A” record for es-master (IP 20.185.88.252) | Namecheap.com
Fig. 4.2: DNS Records for the 2 VMs | Namecheap.com
# Run these commands on both es-master and es-data# Backup the original configuration
sudo cp -f /etc/hosts /etc/hosts.bak
# Update the file
sudo nano /etc/hosts
Fig. 4.3: /etc/hosts content
# Replace example.com with your domain name
ping es-data
ping es-data.es.example.com

Securing Communication Between Nodes

To secure the communication between our nodes, we will use SSL certificates generated using the tools provided by ElasticSearch. In proceed, let’s connect to the es-master VM and run the following commands:

# Create temp folders where to store the generated certificates
mkdir ~/tmp
mkdir ~/tmp/es-certs
cd ~/tmp/es-certs
# Create a file named "instance.yml".
# The ElasticSearch tools we will use to generated the certificates
# will use this file to identify the nodes.
nano ~/tmp/es-certs/instance.yml
Fig. 4.4: instance.yml
# Run these commands on es-master# Generate certificate files to our temporary directory
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --keep-ca-key --pem --in ~/tmp/es-certs/instance.yml --out ~/tmp/es-certs/certs.zip
# Install zip, if it's not installed yet
sudo apt-get install zip
# Unzip the generated certificates
sudo mkdir ~/tmp/es-certs/certs
sudo unzip ~/tmp/es-certs/certs.zip -d ~/tmp/es-certs/certs/
# Copy the certificates into their final location sudo mkdir /etc/elasticsearch/certs
sudo cp ~/tmp/es-certs/certs/ca/* ~/tmp/es-certs/certs/es-master/* /etc/elasticsearch/certs/
# Validate if we have the ca.crt, ca.key, and rest of the files
sudo dir /etc/elasticsearch/certs/
# Change ElasticSearch configuration
sudo nano /etc/elasticsearch/elasticsearch.yml (note the changes in the network.host)
Fig. 4.5. “elasticsearch.yml” for es-master
  • we added new lines starting with “xpack”
# Run these commands on es-master# Restart the ES instance on the node
sudo systemctl restart elasticsearch.service
# Copy the auto generated certificate files to our network share
# so that we can access these from es-data VM too
mkdir /mnt/elastic-share/certs-temp
cp -rf ~/tmp/es-certs/certs /mnt/elastic-share/certs-temp
# Run these commands on es-data# Copy the certificates to their final location
sudo mkdir /etc/elasticsearch/certs
sudo cp /mnt/elastic-share/certs-temp/certs/ca/* /mnt/elastic-share/certs-temp/certs/es-data/* /etc/elasticsearch/certs/
# Edit the ElasticSearch configuration
sudo nano /etc/elasticsearch/elasticsearch.yml
Fig. 4.6: “elasticsearch.yml” for es-data
# Run these commands on es-data
sudo systemctl restart elasticsearch.service
  • kibana_system (the built-in Kibana user)
  • logstash_system (the Logstash user which stores monitoring information)
  • beats_system (the Beats user which stores monitoring information)
  • apm_system (the APM Server user which stores monitoring information)
  • remote_monitoring_user (the Metricbeat user which collects and stores monitoring information)
# Run this command on es-master# Note down these passwords as they are very important
# and quite tricky to recover if they are forgotten
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive
# Run this command on either es-master or es-datacurl --insecure -u elastic "https://10.0.0.6:9200/_security/_authenticate?pretty"
Fig. 4.7: Authentication Test Result
# Run this command on es-datasudo nano /etc/elasticsearch/elasticsearch.yml
Fig. 4.8: ElasticSearch Configuration for es-data Node
  • “cluster.initial_master_nodes”: Replace master reference with the fully qualified domain name
# Run this command on es-datasudo systemctl restart elasticsearch.service

Securing the Reverse Proxy

The last and final step to have a working ElasticSearch cluster exposed to internet through HTTPS is to install SSL certificates to our Nginx server using free SSL/TLS certificates from Let’s Encrypt and obtain these certificates in an automated way using Certbot.

# Run these commands on es-master# Install Certbot
sudo apt-get install certbot python-certbot-nginx
# Generate the certificates
# When promoted, enter "es-master.es.example.com" where
# you replace the "example.com" with your domain
sudo certbot certonly --nginx
# Update Nginx configuration
sudo nano /etc/nginx/sites-enabled/default
Fig. 4.9: Nginx configuration with SSL
# Run these commands on es-master# Reload configuration
sudo nginx -s reload
# Test the HTTP proxy
curl -u elastic "https://es-master.es.example.com/elastic/_security/_authenticate?pretty"
  • we can access our ElasticSearch with the fully qualified domain name without having to worry about the IP address of the VM
  • Sample data loaded in our ElasticSearch cluster through snapshots
  • Secure communication between the nodes and secure communication between any external ElasticSearch client and the cluster
Fig. 4.9: Running ElasticSearch Requests using Postman

Founder of Grate.app.

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store