代写辅导接单-Assignment 8: Integrating the Generation and Retrieval

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

Assignment 8: Integrating the Generation and Retrieval

Service

Due Nov 11 by 3am

Points 100

Submitting a website url

Start Assignment

PREREQUISITES: Review the Retrieval-Augmented Generation lectures.

REQUIRED PYTHON PACKAGES (in 'requirements.txt'):

mistralai

transformers

sentence_transformers

nltk

qa_metrics

faiss-cpu

torch

REQUIRED RESOURCES:

qa_resoruces/questions.csv

storage/corpus/*.txt.clean

The student-resources repository provides the dataset we will use for this case study and also contains

the code we will use to implement this RAG system.

OBJECTIVES: You will be tasked with implementing the logic to preprocess the corpus and

communicating with the Mistral API. Review the code found in the modules/ directory. The modules

included runners (if __name__= "__main__") as examples to run each module. Review the provided

datasets. For this assignment, update your repository by adding a directory called 'textwave' in your

project root directory.

To complete this assignment, you must install the python packages in requirements.txt

The generator/question_answering.py module leverages the Mistral API. You can configure this class by

specifying three class arguments:

The api_key argument should take in your unique API key (string) provided to you once you

registered for a Mistral account.

11/4/24, 12:24 AMAssignment 8: Integrating the Generation and Retrieval Service

https://jhu.instructure.com/courses/82966/assignments/897476?module_item_id=42102981/3

Go to https://mistral.ai/ (https://mistral.ai/) and register for a new account. You will need to

follow the authentication process to complete this.

Once registered, log in to your account and create a new workspace.

Go to "Le Plateforme" menu -> "Billing" -> "Go to billings plans page." Select "Experiment for

free" and subscribe to the plan. You will need to complete the authentication process.

In the "Le Plateforme" menu, select the "API Keys." You can view your API key here.

In a terminal, run the command: export MISTRAL_API_KEY=. You must run this

command each time you open a terminal to run this code. Optionally, add this line at the bottom

of your ~/.bashrc file.

The temperature controls the randomness of the model's responses.

The generator_model specifies the model (e.g., mistral-{small|medium|large}-latest)

Task 1: Search nearest neighbors

In the pipeline.py module, modify your Pipeline class (from the previous homework):

a class method called __encode(query), which will return the embedding vector output from a

preprocessed user input text query. Define/configure your embedding strategy in Pipeline's

__init__().

a class method called search_neighbors(query_embedding, k=10), which will return the k-nearest nearest

neighbors. Define/configure your index and search strategy in Pipeline's __init__().

In a notebook called notebook/context_answering_analysis.ipynb, demonstrate the output of

search_neighbors() function:

query = "Who was Abraham Lincoln?", k = 15

query = "Who was Abraham Adams?", k = 15

query = "Did Abraham Lincoln live in the Frontier?", k = 1

query = "Did Abraham Lincoln live in the Frontier?", k = 10

query = "Did Abraham Lincoln live in the Frontier?", k = 20

query = "Did Abraham Lincoln live in the Frontier?", k = 50

query ="How did Fillmore ascend to the presidency?" k = ?

query = "What is the capital of France?", k = ?

Discuss how your observations.

Task 2: Generate answers

In the pipeline.py module, modify your Pipeline class with:

a class method called generate_answer(query, context, rerank=True), which will return an answer given

the query and the retrieved context. Define/configure your re-ranker and question_answering

11/4/24, 12:24 AMAssignment 8: Integrating the Generation and Retrieval Service

https://jhu.instructure.com/courses/82966/assignments/897476?module_item_id=42102982/3

strategy in Pipeline's __init__().

In a notebook called notebook/context_answering_analysis.ipynb, demonstrate the output of

generate_answer() with the following:

query = "Who was Abraham Lincoln?", k = 15, rerank = {True|False}

query = "Who was Abraham Adams?", k = 15, rerank = {True|False}

query ="How did Fillmore ascend to the presidency?" k = {1|5|10|20|...}, rerank = {True|False}

query = "What trail did Lincoln use a Farmers' Almanac in?", k = {1|5|10|20|...}, rerank = {True|False}

query = "What is the capital of France?", k = 15, rerank = {True|False}

Discuss how your observations.

SUBMISSION: You will need to check in the following files and any supporting python modules:

textwave/pipeline.py

textwave/notebooks/context_answering_analysis.ipynb

Provide the GitHub URL link to your textwave/notebooks/context_answering_analysis.ipynb file

via Canvas to get credit for this submission.

11/4/24, 12:24 AMAssignment 8: Integrating the Generation and Retrieval Service

https://jhu.instructure.com/courses/82966/assignments/897476?module_item_id=42102983/3

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228