What is AWS Elasticsearch
Elasticsearch is an open-source database tool that can be easily deployed and operated. It is used for the analytic purpose and searching your logs and data in general. Basically, it is a NoSQL database to store the unstructured data in document format. Besides from that, if we talk about AWS Elasticsearch, it is like the Amazon which is easier as a service to create it in the clouds. You can use it for various purposes not only for online poor checking your logs or data, but you can also connect it to your cloud watch and use it for modeling after creating the AWS Elasticsearch.
There may be several ways to add data or connect it with your logs after creating the AWS Elasticsearch. We can use it by API and send the bulk data or files. We can also connect with it using any of our code to do this automatically. You can use third-party plugins with AWS Elasticsearch, e.g., Amazon s3 River plugin. AWS Elasticsearch makes things simpler to its users as they do not need to manually create an Elasticsearch cluster. It allows the user to visualize, analyze, and search the data in real-time.
In this chapter, we are going to describe the following point of AWS Elasticsearch Services –
- What is AWS Elasticsearch
- Concept of AWS Elasticsearch
- Advantages of AWS Elasticsearch
- Limitation of AWS Elasticsearch
- AWS Elasticsearch Architecture
- Features of AWS Elasticsearch
- Getting Start with AWS Elasticsearch Services
- Supported version of Elasticsearch
Concept of AWS Elasticsearch
There are following concepts of AWS Elasticsearch –
- Amazon Elasticsearch domain and Elasticsearch cluster are identical to each other. Domains are the clusters with instance type, instance count, settings, and storage resources that you specify.
- It allows us to create one or more Elasticsearch indices within the same domain.
- A blue/green deployment process is used by Amazon Elasticsearch while updating the domain. It refers to the practice of running two production environments that are one live and one idle.
- AWS automatically updates the service software after a certain timeframe if you do not take any action on the required updates.
Advantages of AWS Elasticsearch
There are several advantages of using AWS Elasticsearch, which are as follows –
1) Easily usable
In Amazon Elasticsearch, all the services are fully managed, and this makes it easy to use. We can save time for backup, failure recovery, software patching, and monitoring. The users of AWS Elasticsearch can post the production-ready Elasticsearch cluster using AWS Elasticsearch within a few seconds. They do not need to worry about the installation and maintenance of Elasticsearch software.
2) Highly secure
AWS Elasticsearch is highly secure. It is easy to set up secure access to Amazon Elasticsearch Service from the VPC. It is done for the perfect maintenance of VPC. AWS IAM and Amazon Cognito policies help to manage authentication and access control. Users can achieve network isolation with Amazon VPC for their data in Elasticsearch service.
3) Cost-effective
One of the biggest advantage of Amazon Elasticsearch service is that you need to pay only for those resources you consume. It gives a choice to its users that they can select on-demand pricing with no upfront costs. As we already said, that Amazon Elasticsearch service is a fully managed service; it reduces the cost of operations by eliminating the Elasticsearch experts team to manage and monitor the clusters.
4) Easily scalable and available
AWS Elasticsearch is a highly scalable tool. It enables the users to store up to 3 PB data in a single cluster. Besides from that, it also allows the users to run the large log analytics workloads through the user interface such as Kibana. The cluster can be easily up and down through a single API call or by a few clicks in the AWS console.
Multi-AZ deployments allow replicating data between three availability zones in the same region. Using this, Elasticsearch is designed to be highly available.
5) Tightly integrated with AWS Services
AWS Elasticsearch has built-in integrations with AWS services. This includes AWS IOT, CloudWatch Logs, and Kinesis Firehose for seamless data ingestion.
6) Support Open Source APIs
AWS Elasticsearch does not require any new software or programming skills and provides direct access to open-source API. Logstash, an open-source data ingestion, is supported by the AWS Elasticsearch services. Along with Logstash, it also supports Kibana which is a data visualization tool. The combination of all three tools is known as ELK Stack.
Limitations of AWS Elasticsearch
Along with several advantages, there are few limitations of AWS Elasticsearch, which are as follows –
- It allows the users to launch their domain within a VPC or use a public endpoint. Although both actions are not allowed to be performed together in it.
- AWS Elasticsearch provides free tier only for 12 months; means it is not free. After 12 months of signup, you have to pay for using it.
Architecture of AWS Elasticsearch
You will get the idea of several services going to be provided by AWS Elasticsearch by just seeing the architecture of AWS Elasticsearch. Amazon Elasticsearch domain is surely deployed by the AWS CloudFormation template. This can be either hardware, software, or data exposed to Amazon Elasticsearch Service endpoints.
In this AWS Elasticsearch architecture, you see the Elastic Load balancing whose main objective is to distribute the traffic to proxy servers and enable the automatic recovery to maintain the instance availability. Elastic Load Balancing uses highly available designs here to achieve this objective. The above template easily launches three Amazon EC2 instances. These are separately Availability zones of Amazon VPC Network. Here, VPC means Virtual Private Cloud.
AWS Elasticsearch Features
AWS Elasticsearch has various features and each of them introduces some unique functionality. A list of AWS Elasticsearch is as follows –
a) Security
- It provides access control on AWS Identity and Access Management (IAM).
- The data is encrypted and offers node-to-node encryption.
- AWS Elasticsearch provides security at different levels, which are field-level, document-level, and index-level.
- For Kibana (which is a data visualization tool), it offers HTTP basic authentication.
b) Flexibility
- AWS Elasticsearch offers flexibility to its users, e.g., to improve the search results, it provides custom packages.
- AWS Elasticsearch provides SQL support to integrate with BI applications (Business Intelligence Application).
c) Scalability
- AWS Elasticsearch is highly scalable as it provides up to 3PB attached storage to hold the data.
- Besides from that, it supports for UltraWarm storage to store read-only data. UltraWarm storage is a cost-effective way to store huge data.
- With AWS Elasticsearch, we can configure various CPU, memory, and storage capacity.
d) Stability
- One of the most important features is, it provides an automated snapshot facility to take back up of Amazon ES domains and restore them. In this, backup and restore process is done automatically.
- There are various geographical locations (called Regions and Availability Zones) is provided by AWS Elasticsearch for your resources.
- It allows allocating the nodes across two or three Availability zones in the same AWS Region.
- To offload the cluster management tasks, it offers dedicated master nodes.
e) Integration with popular Services
- AWS Elasticsearch can be integrated with several other popular services, like integrate with Kibana for data visualization.
- To monitor the Amazon ES domain metrics and to set the alarms, it is integrated with Amazon CloudWatch.
- To load the streaming data into Amazon Elasticsearch, it integrates with different Amazon services, which are – Amazon DynamoDB, Amazon S3, and Amazon Kinesis.
- AWS Elasticsearch integrates with AWS CloudTrail for auditing configuration API calls to Amazon Elasticsearch domains.
- In case your data exceeds the certain thresholds, it alerts the users from Amazon SNS.
Getting started with AWS Elasticsearch services
Amazon Elasticsearch Service is a managed service from AWS. It makes it easy to set up, operate, and scale Elasticsearch clusters in the cloud. We can get direct access to Elasticsearch APIs using this Amazon Elasticsearch. There are a number of steps to get started with AWS Elasticsearch. These steps are as follows –
- Signup for AWS account
- Create an Amazon ES domain
- Upload data to Amazon ES domain for indexing
- Search document in an Amazon domain
- Delete an Amazon ES domain
First of all, to getting started with AWS, we are required to create an account on AWS services.
Step 1: Signup for AWS Account
Step 1: Signup with AWS to create a new account on it. Click here and hit on Create an AWS Account button at the top right corner.
Step 2: Provide all required information here that is needed and click on the Continue button.
Step 3: Next, provide the contact information and check the box by agreeing with terms and conditions and then click on Create Account and Continue button.
Here, you can choose the account type, i.e., Professional or Personal. By default, it is Professional.
Step 4: In this step, you have to save your debit/credit card information such as card number, expiration date, billing address, etc. for Payment Information.
Step 2: Create an Amazon ES domain
An Amazon ES domain and Elasticsearch cluster are equal to each other. Once your AWS account is created, you are ready to create an Amazon Elasticsearch domain. In this step, we will create an Amazon ES domain named books. Following are the steps to up and run the Elasticsearch service domain.
- Define your domain
- Configure your cluster
- Set up access policy
- Review
Following are the detailed steps to create an Amazon ES domain.
Define your domain
- Login to your AWS account with your credentials.
- To navigate on the Elasticsearch Service page, go to the Analytics section where click on Elasticsearch Service.
- Click on Create a new domain button and then choose Development and testing.
- Here, you need to select the Elasticsearch version and your preferred Deployment type. Elasticsearch 7.4.0 is the latest version, and we are also this version.
Configure your domain
- Enter the domain name (e.g., books) which you are going to create and choose the Instance type from the drop-down list.
- Use default value for Data nodes storage and 1 in number of instance
- We choose small.elasticsearch in instance type, which is a free tier.
- Just ignore the other fields and click on Next to move on Set up access page.
Set up access policy
- To access this domain, we have to set up appropriate permission for it. Therefore, you have to set up access on this page.
- For simplicity, we recommend you to select the public access domain. Although, you can restrict access to a VPC or an IAM role. A specific set of users can access your Elasticsearch cluster.
- Leave the Amazon Cognito Authentication setting for now.
- Under the Access policy, select a template for Set the domain access policy Choose Allow open access to the domain policy for this.
- Ignore the encryption setting and leave it as default and click Next.
Review
The last step of domain creation is review. The review page shows all the settings at once before finalize, which you have set up in previous steps.
- Double-check your configuration and choose Confirm
- A new domain (cluster) will take around 10-15 minutes to create and initialize. However, it can also take more time to initialize depending on the configuration.
Once all these steps are completed, you get a message that “You have successfully created an Elasticsearch domain”.
Your ES domain will start-up and running. You will see the domain status set to Active and cluster health to green.
Step 3: Uploading data for indexing
Now, the next step is to upload the data for indexing. Using the command-line interface or programming language, we can upload the data to Amazon ES Service domain. In this step, we will upload a small amount of test data.
On Windows operating system, you can install curl to use it from the command prompt. However, we recommend you to use a tool like Cygwin. MacOS and Linux operating systems already come with pre-installed curl. So, you don’t need to install curl on it.
Upload a single document via command line
Execute the below command on command line to upload a single document in Amazon ES domain.
Upload a JSON file containing multiple documents
1. For this, we will create a JSON file named as json. Copy and paste the following content:
2. Now, run the below command to upload the json file to books domain.
Step 4: Searching document in Amazon ES domain
Elasticsearch Search APIs help the user to search the document in Amazon Elasticsearch Service domain. Else, you can also use Kibana (data visualization tool) to search the document in domain. Searching operation is one of the most important event of Elasticsearch. It’s a good idea to search the data using a specific query string when there is a large amount of data.
Using the below example, we will look for the technical books inside the books domain.
To search document through the command line
Execute the below command on the command line to search the domain which you have created.
To search document using the Kibana interface
1. On the browser, navigate to Kibana plugin for your Amazon ES domain. On the Amazon ES console, you will get the Kibana endpoint on your domain dashboard. The URL format will be like –
2. Log in to the console using your master user name and password.
3. Here, it is must to configure atleast one index pattern to use the Kibana because these patterns are used by Kibana to identify which indices you want to analyze. As we have created books domain so, enter books for this tutorial and then choose Create.
4. Now, you will see various document field such as book_name, author, publisher, etc. shown by the Index Pattern For now, choose Discover to search your data.
5. Enter Mars in the search bar and press Enter. Note that when you search for phrase mars attacks, how the similarity score (_score) increases.
Step 5: Delete an Amazon ES domain
In step 2, we have created an Amazon ES domain named books. This domain is created only for test purposes. Now, we will delete it in this step. To delete an Amazon ES domain, follow the below steps:
- Sign in to the Amazon Elasticsearch Service console using user name and password.
- In the navigation page, select books domain under My domains
- Now, select Action, and then Delete domain inside it.
- At last, check the Delete Domain checkbox and choose Delete.
Supported Elasticsearch Version
However, all the versions of Elasticsearch are not supported by AWS Elasticsearch. But following versions of Elasticsearch is supported by AWS Elasticsearch –
- 1, 7.4, 7.7
- 0, 6.2, 6.3, 6.4, 6.5, 6.7, 6.8
- 1, 5.3, 5.5, 5.6
- 3
- 5
If we compare Elasticsearch 7.x and 6.x versions with earlier versions of Elasticsearch, then 7.x and 6.x offer more powerful features. They provide the features that make the AWS Elasticsearch more secure, faster, and easier to use.
Better safeguard – Latest version of Elasticsearch prevents the complex queries from affecting the performance and stability of the cluster negatively.
Higher indexing performance – They provide improved indexing capabilities, which increase the throughput of data updates.
Vega visualization – Latest version of Elasticsearch supports the Vega visualization language. This Vega language enables users to make context-aware queries. Along with that, it also helps to combine several data sources into a single graph as well as add user interactivity to graphs and many more.
Java high-level REST client – The Java REST client offers a simplified development experience in comparison to a low-level client. AWS Elasticsearch supports most of the Elasticsearch APIs.