InsideDarkWeb.com

How to update the elastic search document with python?

I have code below to add the data into elastic search

from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
 {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
 {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
es.indices.create(index='my-index_1', ignore=400)

for e in enumerate(r):
    #es.indices.update(index="my-index_1", body=e[1])
    es.index(index="my-index_1", body=e[1])

#Retrieve the data
es.search(index = 'my-index_1')['hits']['hits']

Requirement
How to update the document

r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]

Here Dr. Messi, Dr. Christiano has to update the index and Dr. Bernard M. Aaron should not update as it is already present in the index

Stack Overflow Asked by user6882757 on November 15, 2021

1 Answers

One Answer

In Elasticsearch, you when index data without giving a custom id, then a new id will be created by elasticsearch for every document you index.

Hence, in your case as you are not giving any id, elasticsearch gives it for you. But you also want to check if Name is already or not depending on which you will index the data. There are 2 possible solutions to this.

  1. Index the data without passing an _id for every document. After this you will have to search with Name if the document exists.
  2. Index the data with your own _id for every document. After this search with _id. It's faster and easier approach.

I'm going ahead with the 2nd approach of creating own id's. As you are search on Name I'll create an based on Name value field. The hash of the Name value field is the _id. I'll use md5. But you can use any other hashing function.

First Indexing Data:

import hashlib
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]

index_name="my-index_1"
es.indices.create(index=index_name, ignore=400)


for e in enumerate(r):
    #es.indices.update(index="my-index_1", body=e[1])
    es.index(index=index_name, body=e[1],id=hashlib.md5(e[1]['Name'].encode()).hexdigest())

Output:

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}}]

Next Step: Indexing new data

r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]


for rec in r:
    try:
        es.get(index=index_name, id=hashlib.md5(rec['Name'].encode()).hexdigest())
    except NotFoundError:
        print("Record Not found")
        es.index(index=index_name, body=rec,id=hashlib.md5(rec['Name'].encode()).hexdigest())

Output:

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': 'e2e0f463145568471097ff027b18b40d',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '23bb4f1a3a41efe7f4cab8a80d766708',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]

As you can see Dr. Bernard M. Aaron record is not indexed as it's already present

Answered by bigbounty on November 15, 2021

Add your own answers!

Related Questions

Animate gradient bar chart – matplotlib

1  Asked on August 4, 2020 by jonboy

       

c# error – Can not convert Array to byte array

1  Asked on August 2, 2020 by rakesh

         

Correcting Business name misspellings

1  Asked on August 2, 2020 by jonathan-rauscher

     

How to cut a string to the first “/” from right to left c# .net

1  Asked on August 1, 2020 by ignacio-gomez

   

Inserting random data from a list

2  Asked on August 1, 2020 by smiley

     

Invalid token SELECT

2  Asked on July 31, 2020 by virendra-varma

     

Django – No column found for custom field?

0  Asked on July 29, 2020 by jare42

       

Package.json with multiple entrypoints

1  Asked on July 29, 2020 by jeanluca-scaljeri

   

REACT vs REACT_PROJECT vs WEBPACK for storybook type?

1  Asked on July 29, 2020 by temporary_user_name

 

Random Background Image from Button Click

1  Asked on July 29, 2020 by charmy

       

Systemd-journald disk wear-out

0  Asked on July 29, 2020 by rohit

       

Ask a Question

Get help from others!

© 2021 InsideDarkWeb.com. All rights reserved.