Embeddings and Vector Search Exam Question Explained
This AWS AI Practitioner question tests semantic search fundamentals. The clue is that the search must match meaning, not only exact words.
Short answer
The correct answer is B. Vector embeddings.
Vector embeddings are correct because the question asks for semantic search. The query "how do I reset my laptop" and the document title "device recovery procedure" may not share the same important keywords, but they are close in meaning. Embeddings convert text into numeric vectors that place similar meanings near each other in vector space. A vector search system can then retrieve documents by similarity even when the words differ. A keyword inverted index is useful for exact or lexical search, but it depends on token overlap and can miss paraphrases. One-hot encoded tokens are sparse identifiers that show whether a token is present; they do not capture semantic closeness. Regex pattern dictionaries only match patterns you explicitly write and become brittle as language varies. On AIF-C01, words such as semantic search, similar meaning, paraphrase, nearest neighbor, vector store, embeddings, or knowledge base point to vector embeddings.
Practice Question
A team wants to build semantic search over an internal knowledge base so that a query like "how do I reset my laptop" also matches a document titled "device recovery procedure." Which representation should they use to index the documents?
Why B is correct
Vector embeddings are correct because the question asks for semantic search. The query "how do I reset my laptop" and the document title "device recovery procedure" may not share the same important keywords, but they are close in meaning. Embeddings convert text into numeric vectors that place similar meanings near each other in vector space. A vector search system can then retrieve documents by similarity even when the words differ. A keyword inverted index is useful for exact or lexical search, but it depends on token overlap and can miss paraphrases. One-hot encoded tokens are sparse identifiers that show whether a token is present; they do not capture semantic closeness. Regex pattern dictionaries only match patterns you explicitly write and become brittle as language varies. On AIF-C01, words such as semantic search, similar meaning, paraphrase, nearest neighbor, vector store, embeddings, or knowledge base point to vector embeddings.
Why the other options are wrong
A keyword inverted index maps words to documents and works well for exact search. It can miss the laptop reset versus device recovery example because those phrases may not share the same tokens.
One-hot encoding marks token identity but does not encode meaning. Two related phrases can look completely unrelated if their one-hot vectors contain different active positions.
Regex can catch known patterns, but semantic search needs to handle unseen paraphrases. Writing enough regex rules for natural language queries is fragile and high-maintenance.
Why embeddings power semantic search
Embeddings are dense numeric representations of text, images, or other inputs. For text, an embedding model maps words, sentences, or document chunks into vectors. The important property is that semantically similar inputs should land near each other. That makes embeddings useful for semantic search, recommendations, duplicate detection, clustering, and retrieval-augmented generation. In a knowledge base workflow, documents are split into chunks, an embedding model converts each chunk into a vector, and those vectors are stored in a vector database or vector index. When a user asks a question, the query is embedded too. The search system compares the query vector to document vectors and retrieves the nearest neighbors. Those retrieved chunks can then be shown directly or passed to a foundation model as grounding context. This is the retrieval part of RAG. On AWS, Amazon Bedrock Knowledge Bases manages much of this pipeline for supported data sources: chunking, embedding, vector storage, retrieval, and cited generation. The exam may contrast embeddings with keyword search, one-hot encoding, tokenization, or regex. Tokenization breaks text into tokens. One-hot encoding identifies tokens without meaning. Keyword search finds exact terms. Embeddings support meaning-based similarity.
Ready to see how you'd score?
Take a free 20-question diagnostic and find out which AWS Certified AI Practitioner domains you need to focus on. No signup required.
Practice 5 similar questions
Same cert, same or adjacent domain. Use these after reviewing the explanation.
Related AWS Certified AI Practitioner Practice Questions
Quick FAQ
What is the correct answer for this AWS Certified AI Practitioner question?
The correct answer is B. Vector embeddings. Vector embeddings are correct because the question asks for semantic search. The query "how do I reset my laptop" and the document title "device recovery procedure" may not share the same important keywords, but they are close in meaning. Embeddings convert text into numeric vectors that place similar meanings near each other in vector space. A vector search system can then retrieve documents by similarity even when the words differ. A keyword inverted index is useful for exact or lexical search, but it depends on token overlap and can miss paraphrases. One-hot encoded tokens are sparse identifiers that show whether a token is present; they do not capture semantic closeness. Regex pattern dictionaries only match patterns you explicitly write and become brittle as language varies. On AIF-C01, words such as semantic search, similar meaning, paraphrase, nearest neighbor, vector store, embeddings, or knowledge base point to vector embeddings.
How should I study similar AWS Certified AI Practitioner questions?
Review the explanation, compare every distractor, then practice related questions in the same domain: Fundamentals of GenAI.