I’m implementing an ecommerce search using nlp. there is a lack of clear documentation anywhere online about implementation and flow. this is what i’m doing.
Search query comes in
- Do spellcheck
- Do lemmatization (like use men instead of husband,wed instead of wedding or marriage,have modified to include more than just lexical)
Do Named entity recognition(NER)
1.label brands,measurement units/symbols,numerics
2.Recognize facets/filters among them (brand,size,color,length etc)
3.Recognize model numbers (for this have to index all model number so is it a good idea to do under NER Only? or have a separate step to check it or just have model numbers inside product text indexed to search and match there)
4.Symptoms and Fuzzy search again using NER only(like “faulty” pc,”living room” furniture)
4.Split at stop words and look for additional faces/filter (laptop “under” 20000)
5.After NER use recognized Entities for facets/filters and use non recognised for text search
another question is that some steps i’m doing based on manual logic if/else (like after NER step and splitting at stop words and querying word before and after if it’s a numeral(then size or amount) or color.
also how to implement real time changes to model like for lemmatization is it better if i just index lemmas in lucene and add dynamically the new ones based on user searches?
Also i’m not using stemming since that i believe could lead to quite different results for synonyms which have same lemma but different stems