辅导案例-Q1

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

The details of Q1 implementation
1. __init__ function
Four variables tf_tokens, tf_entities, idf_tokens and idf_entities are
respectively initialized to none,which is convenient for the use of the
following functions.And the following results will verify these four
indicators.
2. index_documents function
First, traverse the documents, take out the corresponding text, and
connect it as a string. Deal with it with Spacy, and take out the
corresponding entity and token respectively. The dict variable is used to
save the corresponding doc_ID and frequences, and the token is the same.
However, restrictions such as is_stop, is_punch and single word need to
be added.Index the entity first, because no other factors need to be
considered. To index the token, we need to filter the token of stop and
punct, and invalidate the token that appears in entity. First find the token
TF and entity TF, and then find the corresponding IDF.
3. split_query function
First, define a queries array to hold queries corresponding to different
splits. First, select the eligible entity in doe. The second step is to name
all the combinations of entities. The third step is to eliminate the
frequency of query in all combinations and select the eligible entity
combinations. Finally, according to the matching entity combination, the
corresponding token and query are obtained.
4. max_score_query function
According to the query obtained in the previous step, the corresponding
token and entity sets are calculated respectively. For the token set, TF
IDF of each token is calculated by the corresponding TF and IDF
calculation methods, and the accumulation is saved by S2. Similarly,
TFIDF corresponding to each entity is calculated, and the accumulation is
saved with S1. Finally, S1 and S2 are respectively given corresponding
weights and added to S. Get the query with the largest s, and save the
corresponding s and query as result.