Skip to content

Assertions & Relations#8

Open
tanikina wants to merge 10 commits intomainfrom
assertions_and_relations
Open

Assertions & Relations#8
tanikina wants to merge 10 commits intomainfrom
assertions_and_relations

Conversation

@tanikina
Copy link
Collaborator

@tanikina tanikina commented Sep 17, 2025

This PR adds the following functionality:

  1. Collecting data that are abstracts from the NLP-related papers from the ACL Anthology and can be downloaded with the following script:
    $ python src/prepare_data.py --threshold=50 --keyword_string="semantic parsing" --output_path="data/selected_paper_abstracts.jsonl"
  2. Generating a JSON file with the text splitted into assertions:
    $ python src/get_assertions_and_relations.py --task=assertions --model_name_or_path=openai-gpt-4o --input_data_path=data/selected_paper_abstracts.jsonl --output_data_path=data/generated_assertions.jsonl
  3. Generating a JSON file with the text annotated with relations:
    $ python src/get_assertions_and_relations.py --task=relations --model_name_or_path=openai-gpt-4o --input_data_path=data/generated_assertions.jsonl --output_data_path=data/generated_relations.jsonl

EDIT: I've also added the script that works directly with OpenAI, you just need to specify your OPENAI_API_KEY in the environment. The usage is the same as in the examples above, only the script name changes to get_assertions_and_relations_openai.py. This way you do not need Unsloth+GPU or Grazie-Api-Gateway-Client :)

$ python src/get_assertions_and_relations_openai.py --task=relations --model_name_or_path=gpt-4o --input_data_path=data/generated_assertions.jsonl --output_data_path=data/generated_relations.jsonl

I have also uploaded some samples with 3 annotated examples (using the --debug option with src/get_assertions_and_relations.py) and can add/annotate more if needed. The examples are located in the data folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant