ML Systems — Search Autocomplete

Anirban Sen
2 min readDec 10, 2023
Photo by visuals on Unsplash

The goal of autosuggestion is to predict relevant query completions for incomplete prefix inputs so that users reach their search intent with fewer keystrokes, saving their time and effort.

1. Autosuggestion Services in Web Search

Blog by Microsoft where they summarise fundamentals and key aspects of modern QAC services

  1. Tries and inverted indexes are some favorites for choosing efficient data structures for the storage and lookup of suggestions.
  2. Improving the relevance of autosuggestion services using —
    a. Personalization — Language-specific (for the prefix “a” it might be useful to show “aadhar card” as a suggestion when Hindi is selected as the language, but not when Korean), Time-sensitive: Several events (sports, political, festivals, and others) occur at more or less fixed intervals of time and Based on user behavior and search history (for a user whose search history contains “avengers” and “iron man”, it may be better to display “mark ruffalo” as the topmost suggestion for the prefix “mark ” instead of “mark zuckerberg”)
    b. Diversity — Greedy methods of deduplication while some use algorithms based on dynamic programming and A* search as well (instead of showing all about aadhar cards for prefix aa, also show aaj tak, aakash digital etc)
    c. Freshness — News, forecasts, sentiments becoming viral, and so on
  3. Non-prefix matching (due to misspellings or malformed prefixes)— Language-specific spell correction mechanisms (gogle -> google) or some inverted index based (popular near me -> restaurants popular near me) non-prefix matching
  4. Interesting features in modern QAC services — Rich entity information (One often finds thumbnails and some auxiliary information about entity-type suggestions personalities etc) , Factual answers in suggestion box (small calculation or lookup — prefix 1usd converted in inr as a result), Ghosting (Autosuggest algorithms calculate the likelihood of the first suggestion being clicked by the user. If this likelihood is greater than some threshold, the search box automatically gets pre-populated with this likely suggestion.)
  5. Zero-input scenario — (a) Yahoo shows history search queries, (b) Bing shows region-wise trending queries, and (c) Google shows queries related to the previous search result.
  6. Evaluation of QAC Techniques — Mean reciprocal rank (MRR), Success rate at top K (SR@K), α-nDC, GMinimum keystroke length (MKS) and e-Saved

--

--