
Explanation:
Why Option B is correct:
Amazon Transcribe streaming with partial results enabled: This is crucial for real-time transcription where text fragments are delivered before customers finish speaking, meeting the requirement for "incremental suggestions while a customer is still speaking."
InvokeModelWithResponseStream API: This API enables streaming responses from Amazon Bedrock, allowing the GenAI model to provide incremental suggestions as it processes the incoming text fragments, maintaining low latency.
Amazon API Gateway WebSocket API: This supports bidirectional streaming, enabling real-time updates to call center agents as suggestions are generated.
End-to-end latency under 1 second: The combination of streaming transcription, streaming model inference, and WebSocket delivery creates a pipeline optimized for low latency.
Why other options are incorrect:
Option A: Uses InvokeModel API (non-streaming) and stores results in DynamoDB, which adds latency and doesn't support real-time incremental updates.
Option C: Uses batch processing (not real-time) and requires complete transcripts, violating the requirement for suggestions while customers are still speaking.
Option D: Uses Titan Embeddings model (not generative AI for suggestions) and publishes to SNS (asynchronous), which doesn't support real-time bidirectional streaming to agents.
Key requirements met by Option B:
Ultimate access to all questions.
No comments yet.
A financial services company is developing a real-time generative AI (GenAI) assistant to support human call center agents. The GenAI assistant must transcribe live customer speech, analyze context, and provide incremental suggestions to call center agents while a customer is still speaking. To preserve responsiveness, the GenAI assistant must maintain end-to-end latency under 1 second from speech to initial response display. The architecture must use only managed AWS services and must support bidirectional streaming to ensure that call center agents receive updates in real time.
Which solution will meet these requirements?
A
Use Amazon Transcribe streaming to transcribe calls. Pass the text to Amazon Comprehend for sentiment analysis. Feed the results to Anthropic Claude on Amazon Bedrock by using the InvokeModel API. Store results in Amazon DynamoDB. Use a WebSocket API to display the results.
B
Use Amazon Transcribe streaming with partial results enabled to deliver fragments of transcribed text before customers finish speaking. Forward text fragments to Amazon Bedrock by using the InvokeModelWithResponseStream API. Stream responses to call center agents through an Amazon API Gateway WebSocket API.
C
Use Amazon Transcribe batch processing to convert calls to text. Pass complete transcripts to Anthropic Claude on Amazon Bedrock by using the ConverseStream API. Return responses through an Amazon Lex chatbot interface.
D
Use the Amazon Transcribe streaming API with an AWS Lambda function to transcribe each audio segment. Call the Amazon Titan Embeddings model on Amazon Bedrock by using the InvokeModel API. Publish results to Amazon SNS.