API Reference¶
Inference¶
- class kogito.inference.CommonsenseInference(language: str = 'en_core_web_sm')¶
Main interface for commonsense inference
- __init__(language: str = 'en_core_web_sm') None ¶
Initialize a commonsense inference module
- Parameters:
language (str, optional) – Spacy language pipeline to use. Defaults to “en_core_web_sm”.
- property processors: dict¶
List all processors
- Returns:
List of head and relation processors
- Return type:
dict
- infer(text: str | None = None, model: KnowledgeModel | None = None, heads: List[str] | None = None, model_args: dict | None = None, extract_heads: bool = True, match_relations: bool = True, relations: List[KnowledgeRelation] | None = None, dry_run: bool = False, sample_graph: KnowledgeGraph | None = None, context: List[str] | str | None = None, linker: KnowledgeLinker | None = None, threshold: float = 0.5) KnowledgeGraph ¶
Make commonsense inferences.
- Parameters:
text (Optional[str], optional) – Text to use to extract commonsense inferences from. If omitted, no head extraction will be performed even if
extract_heads
is True. If provided andextract_heads
is False, text will be used as a head as is. Defaults to None.model (Optional[KnowledgeModel], optional) – Knowledge model to use for inference. If omitted, behaviour is equivalent to dry-run mode, i.e. no inference will be performed and incomplete input graph will be returned. Defaults to None.
heads (Optional[List[str]], optional) – List of custom heads to use for inference. Defaults to None.
model_args (Optional[dict], optional) – Custom arguments to pass to
KnowledgeModel.generate()
method. Defaults to None.extract_heads (bool, optional) – Whether to extract heads from given text if any. Defaults to True.
match_relations (bool, optional) – Whether to do smart relation matching. Defaults to True.
relations (Optional[List[KnowledgeRelation]], optional) – Subset of relations to use for direct matching. If
match_relations
is true, intersection of matched and given relations will be used. Defaults to None.dry_run (bool, optional) – Whether to skip actual inference and return incomplete input graph. Defaults to False.
sample_graph (Optional[KnowledgeGraph], optional) – A knowledge graph containing examples. It can be used to provide examples for GPT-3 inference. If omitted and GPT-3 model is used, warning will be raised. Defaults to None.
context (Optional[Union[List[str], str]], optional) – Context text. Can be either given as a list of sentences or as a string, in which case, it will be split into sentences using spacy engine. Defaults to None.
threshold (float, optional) – Relevance probability used for filtering. Defaults to 0.5.
linker (Optional[KnowledgeLinker], optional) – Knowledge linker model used for linking to given context. Defaults to Deberta-based linker.
- Raises:
ValueError – if relations argument is not of type list
- Returns:
Inferred knowledge graph.
- Return type:
- add_processor(processor: KnowledgeHeadExtractor | KnowledgeRelationMatcher) None ¶
Add a new head or relation processor to the module
- Parameters:
processor (Union[KnowledgeHeadExtractor, KnowledgeRelationMatcher]) – Head or relation processor.
- Raises:
ValueError – When processor type is not recognized.
- remove_processor(processor_name: str) None ¶
Remove a processor from the module
- Parameters:
processor_name (str) – Name of the processor to remove
Head¶
- class kogito.core.head.KnowledgeHeadType(value)¶
Type of a Knowledge Head
- class kogito.core.head.KnowledgeHead(text: str, type: KnowledgeHeadType = KnowledgeHeadType.TEXT, entity: Any | None = None, verbalizer: Callable | None = None)¶
Represents a concept of Knowledge Head.
- __init__(text: str, type: KnowledgeHeadType = KnowledgeHeadType.TEXT, entity: Any | None = None, verbalizer: Callable | None = None) None ¶
Initialize a Knowledge Head.
- Parameters:
text (str) – Head text.
type (KnowledgeHeadType, optional) – Type of a Knowledge head. Defaults to KnowledgeHeadType.TEXT.
entity (Any, optional) – External Knowledge head entity. Defaults to None.
verbalizer (Optional[Callable], optional) – Function to convert knowledge head to natural text. Defaults to None.
- verbalize() str | None ¶
Convert head to a meaningful text.
- Returns:
Verbalized head
- Return type:
Optional[str]
- copy() KnowledgeHead ¶
Copy itself
- Returns:
Copied knowledge head
- Return type:
Relation¶
- class kogito.core.relation.KnowledgeRelationType(value)¶
Represents a Knowledge relation type.
- class kogito.core.relation.KnowledgeRelation(text: str, type: KnowledgeRelationType = KnowledgeRelationType.ATOMIC, verbalizer: Callable | None = None, prompt: str | None = None)¶
Represents a concept of Knowledge Relation.
- __init__(text: str, type: KnowledgeRelationType = KnowledgeRelationType.ATOMIC, verbalizer: Callable | None = None, prompt: str | None = None) None ¶
Initialize a KnowledgeRelation
- Parameters:
text (str) – Relation text.
type (KnowledgeRelationType, optional) – Relation type. Defaults to KnowledgeRelationType.ATOMIC.
verbalizer (Optional[Callable], optional) – Function to convert relation to natural text. Defaults to None.
prompt (Optional[str], optional) – Instructive text used to prompt NLG models. Defaults to None.
- verbalize(head: str, tail: str | None = None, include_tail: bool = False, **kwargs) str | None ¶
Convert knowledge relation into natural text.
- Parameters:
head (str) – Knowledge head to use.
tail (Optional[str], optional) – Knowledge tail to use if any. Defaults to None.
include_tail (bool, optional) – Whether to include tail. Defaults to False.
- Returns:
Verbalized relation.
- Return type:
Optional[str]
- classmethod from_text(text: str, type: KnowledgeRelationType = KnowledgeRelationType.ATOMIC) KnowledgeRelation ¶
Initialize relation from text.
- Parameters:
text (str) – Relation text.
type (KnowledgeRelationType, optional) – Type of relation to use. Defaults to KnowledgeRelationType.ATOMIC.
- Returns:
An instance of KnowledgeRelation
- Return type:
- copy() KnowledgeRelation ¶
Copy itself
- Returns:
Copied knowledge relation
- Return type:
- kogito.core.relation.PHYSICAL_RELATIONS = [ObjectUse, CapableOf, MadeUpOf, HasProperty, Desires, NotDesires, AtLocation]¶
ATOMIC 2020 Physical relations
- kogito.core.relation.EVENT_RELATIONS = [Causes, HinderedBy, xReason, isAfter, isBefore, HasSubEvent]¶
ATOMIC 2020 Event relations
- kogito.core.relation.SOCIAL_RELATIONS = [xIntent, xReact, oReact, xAttr, xEffect, xNeed, xWant, oEffect, oWant]¶
ATOMIC 2020 Social relations
- kogito.core.relation.register_relation(relation: KnowledgeRelation, kind: Literal['physical', 'event', 'social'] | None = None) None ¶
Register a new relation based on its kind. By default, it registers the relation as a custom relation without a kind, but optionally it can be registered to be a Physical, Event or a Social relation.
- Parameters:
relation (KnowledgeRelation) – Knowledge relation to register.
kind (Optional[RELATION_KIND], optional) – Relation kind. Available kinds: physical, event, social. Defaults to None.
Knowledge¶
- class kogito.core.knowledge.Knowledge(head: KnowledgeHead | str | None = None, relation: KnowledgeRelation | str | None = None, tails: List[str] | None = None)¶
Represents a concept of Knowledge
- __init__(head: KnowledgeHead | str | None = None, relation: KnowledgeRelation | str | None = None, tails: List[str] | None = None) None ¶
Initialize a Knowledge instance.
- Parameters:
head (Optional[Union[KnowledgeHead, str]], optional) – Instance of a knowledge head. Defaults to None.
relation (Optional[Union[KnowledgeRelation, str]], optional) – Instance of a knowledge relation. Defaults to None.
tails (Optional[List[str]], optional) – List of knowledge tails. Defaults to None.
- to_prompt(include_tail: bool = False, **kwargs) str ¶
Convert knowledge to a prompt text.
- Parameters:
include_tail (bool, optional) – Whether to include tails in the prompt. Defaults to False.
- Returns:
Prompt text for the knowledge
- Return type:
str
- to_query(decode_method: str = 'greedy') str ¶
Convert knowledge to a query text
- Parameters:
decode_method (str, optional) – Decoding method. Defaults to “greedy”.
- Raises:
ValueError – When decode_method is not recognized.
- Returns:
Query text for the knowledge
- Return type:
str
- to_json(only_one_tail: bool = False) dict ¶
Convert knowledge to dictionary
- Parameters:
only_one_tail (bool, optional) – Include only one tail. Defaults to False.
- Returns:
Jsonified knowledge
- Return type:
dict
- class kogito.core.knowledge.KnowledgeGraph(graph: List[Knowledge])¶
Represents a concept of Knowledge Graph.
- __init__(graph: List[Knowledge]) None ¶
Initialize a knowledge graph
- Parameters:
graph (List[Knowledge]) – List of Knowledge instances
- classmethod from_jsonl(filepath: str, head_attr: str = 'head', relation_attr: str = 'relation', tails_attr: str = 'tails', relation_type: KnowledgeRelationType = KnowledgeRelationType.ATOMIC) KnowledgeGraph ¶
Initialize a knowledge graph from json file.
- Parameters:
filepath (str) – Path to the graph file.
head_attr (str, optional) – JSON attribute for head. Defaults to “head”.
relation_attr (str, optional) – JSON attribute for relation. Defaults to “relation”.
tails_attr (str, optional) – JSON attribute for tails. Defaults to “tails”.
relation_type (KnowledgeRelationType, optional) – Type of relation to use. Defaults to KnowledgeRelationType.ATOMIC.
- Returns:
An instance of KnowledgeGraph
- Return type:
- classmethod from_csv(filepath: str, header: int | List[int] | str | None = 'infer', head_col: str = 'head', relation_col: str = 'relation', tails_col: str = 'tails', sep: str = ',', relation_type: KnowledgeRelationType = KnowledgeRelationType.ATOMIC) KnowledgeGraph ¶
Initialize a knowledge graph from csv file.
- Parameters:
filepath (str) – Path to the graph file.
header (Union[int, List[int], str], optional) – Whether to look for header. Defaults to “infer”.
head_col (str, optional) – Head column name. Defaults to “head”.
relation_col (str, optional) – Relation column name. Defaults to “relation”.
tails_col (str, optional) – Tails column name. Defaults to “tails”.
sep (str, optional) – Delimiter to use. Defaults to “,”.
relation_type (KnowledgeRelationType, optional) – Relation type to use. Defaults to KnowledgeRelationType.ATOMIC.
- Returns:
An instance of KnowledgeGraph
- Return type:
- classmethod from_dataframe(df: DataFrame, head_col: str = 'head', relation_col: str = 'relation', tails_col: str = 'tails', relation_type: KnowledgeRelationType = KnowledgeRelationType.ATOMIC) KnowledgeGraph ¶
Initialize a knowledge graph from csv file.
- Parameters:
df (pd.DataFrame) – Graph dataframe.
head_col (str, optional) – Head column name. Defaults to “head”.
relation_col (str, optional) – Relation column name. Defaults to “relation”.
tails_col (str, optional) – Tails column name. Defaults to “tails”.
relation_type (KnowledgeRelationType, optional) – Relation type to use. Defaults to KnowledgeRelationType.ATOMIC.
- Returns:
An instance of KnowledgeGraph
- Return type:
- to_jsonl(filepath: str) None ¶
Write knowledge graph to a json file
- Parameters:
filepath (str) – JSON file path
- to_dataframe() DataFrame ¶
Convert knowledge graph to a pandas dataframe
- Returns:
Pandas dataframe of a knowledge graph
- Return type:
pd.DataFrame
- union(other: KnowledgeGraph) KnowledgeGraph ¶
Union two knowledge graphs
- Parameters:
other (KnowledgeGraph) – Knowledge graph to union with.
- Returns:
Merged knowledge graph
- Return type:
- intersection(other: KnowledgeGraph) KnowledgeGraph ¶
Intersect two knowledge graphs
- Parameters:
other (KnowledgeGraph) – Knowledge graph to intersect with.
- Returns:
Intersection of two graphs
- Return type:
- difference(other: KnowledgeGraph) KnowledgeGraph ¶
Subtract knowledge graphs
- Parameters:
other (KnowledgeGraph) – Knowledge graph to subtract.
- Returns:
Difference of two graphs
- Return type:
- sort() KnowledgeGraph ¶
Sort graph based on head text and relation distribution in ATOMIC.
- Returns:
Sorted graph.
- Return type:
- clean() KnowledgeGraph ¶
Clean graph tails by removing empty generations.
- Returns:
Cleaned graph.
- Return type:
Models¶
- class kogito.core.model.KnowledgeModel¶
Base class to represent a Knowledge Model.
- abstract train(train_graph: KnowledgeGraph, *args, **kwargs) KnowledgeModel ¶
Train a knowledge model
- Parameters:
train_graph (KnowledgeGraph) – Training dataset
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
Trained knowledge model
- Return type:
- abstract generate(input_graph: KnowledgeGraph, *args, **kwargs) KnowledgeGraph ¶
Generate inferences from knowledge model
- Parameters:
input_graph (KnowledgeGraph) – Input dataset
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
Input graph with tails generated
- Return type:
- abstract save_pretrained(save_path: str) None ¶
Save model as a pretrained model
- Parameters:
save_path (str) – Directory to save the model to.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- abstract classmethod from_pretrained(model_name_or_path: str) KnowledgeModel ¶
Load model from a pretrained model path This method can load models either from HuggingFace by model name or from disk by model path.
- Parameters:
model_name_or_path (str) – HuggingFace model name or local model path to load from.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
Loaded knowledge model.
- Return type:
- evaluate(input_graph: KnowledgeGraph, metrics: List[str] = ['bleu', 'meteor', 'rouge', 'cider', 'bert-score'], top_k: int = 1, *args, **kwargs) dict ¶
Evaluate model on various metrics. Input graph should contain the reference tails, so that it can be used to score the model generations on the same input graph. Any arguments provided aside from the ones accepted by this method will be passed onto the
KnowledgeModel.generate
method.- Parameters:
input_graph (KnowledgeGraph) – Input graph to evaluate. Should contain the ground truth tails.
metrics (List[str], optional) – Metrics to compute. Defaults to [“bleu”, “meteor”, “rouge”, “cider”, “bert-score”].
top_k (int, optional) – Top k generations to evaluate. Defaults to 1.
*args (optional) – Extra arguments for KnowledgeModel.generate method.
**kwargs (optional) – Extra keyword arguments for
KnowledgeModel.generate
method.
- Returns:
Dictionary of scores
- Return type:
dict
- class kogito.models.bart.comet.COMETBART(config: COMETBARTConfig, **kwargs)¶
COMET knowledge model based on BART
- __init__(config: COMETBARTConfig, **kwargs) None ¶
Initialize COMET model
- Parameters:
config (COMETBARTConfig) – Config to use
- train(train_graph: KnowledgeGraph, val_graph: KnowledgeGraph, test_graph: KnowledgeGraph | None = None, logger_name: str = 'default') KnowledgeModel ¶
Train a COMET model
- Parameters:
train_graph (KnowledgeGraph) – Training dataset
val_graph (KnowledgeGraph) – Validation dataset
test_graph (KnowledgeGraph, optional) – Test dataset. Defaults to None
logger_name (str, optional) – Logger name to use. Accepted values: [“wandb”, “default”] Defaults to “default”.
- Raises:
ValueError – When config.task is not recognized
- Returns:
Trained knowledge model
- Return type:
- generate(input_graph: KnowledgeGraph, decode_method: str = 'greedy', num_generate: int = 3, batch_size: int = 64, max_length: int = 24, min_length: int = 1) KnowledgeGraph ¶
Generate inferences from the model
- Parameters:
input_graph (KnowledgeGraph) – Input dataset
decode_method (str, optional) – Decoding method. Accepts [“beam”, “greedy”]. Defaults to “greedy”.
num_generate (int, optional) – Number of inferences to generate. Defaults to 3.
batch_size (int, optional) – Batch size to use. Defaults to 64.
max_length (int, optional) – Maximum output length. Defaults to 24.
min_length (int, optional) – Minimum output length. Defaults to 1.
- Returns:
Complete knowledge graph
- Return type:
- classmethod from_pretrained(model_name_or_path: str = 'mismayil/comet-bart-ai2', task: str = 'summarization') KnowledgeModel ¶
Load pretrained model
- Parameters:
model_name_or_path (str, optional) – HuggingFace model name or local model path. Defaults to “mismayil/comet-bart-ai2”.
task (str, optional) – Task used in training. Defaults to “summarization”.
- Returns:
Loaded knowledge model
- Return type:
- save_pretrained(save_path: str) None ¶
Save pretrained model
- Parameters:
save_path (str) – Directory path to save model to
- class kogito.models.gpt2.comet.COMETGPT2(model_name_or_path: str = 'gpt2')¶
COMET model based on GPT-2
- __init__(model_name_or_path: str = 'gpt2') None ¶
Initialize COMET model
- Parameters:
model_name_or_path (str, optional) – HuggingFace model name or local model path. Defaults to “gpt2”.
- train(train_graph: KnowledgeGraph, val_graph: KnowledgeGraph, batch_size: int = 8, in_len: int = 16, out_len: int = 34, summary_len: int = 0, epochs: int = 1, lr: float = 5e-05, seed: int = 42, log_wandb: bool = False, output_dir: str | None = None) KnowledgeModel ¶
Train a COMET model
- Parameters:
train_graph (KnowledgeGraph) – Training dataset
val_graph (KnowledgeGraph) – Validation dataset
batch_size (int, optional) – Batch size. Defaults to 2.
in_len (int, optional) – Input length. Defaults to 16.
out_len (int, optional) – Output length. Defaults to 34.
summary_len (int, optional) – Summary length. Defaults to 0.
epochs (int, optional) – Number of epochs. Defaults to 3.
lr (float, optional) – Learning rate. Defaults to 1e-5.
seed (int, optional) – Random seed. Defaults to 42.
log_wandb (bool, optional) – Whether to log to wandb. Defaults to False.
output_dir (Optional[str], optional) – Directory to save intermediate model checkpoints. Defaults to None.
- Returns:
Trained knowledge model
- Return type:
- generate(input_graph: KnowledgeGraph, max_length: int = 34, in_len: int = 16, out_len: int = 34, top_k: int = 1, temperature: float = 0.7, top_p: float = 0.9, repetition_penalty: float = 1.2, num_beams: int = 1, num_return_sequences: int = 1) KnowledgeGraph ¶
Generate inferences from knowledge model
- Parameters:
input_graph (KnowledgeGraph) – Input dataset
max_length (int, optional) – Maximum output length. Defaults to 34.
in_len (int, optional) – Input length. Defaults to 16.
out_len (int, optional) – Output length. Defaults to 34.
top_k (int, optional) – Top k inferences to consider. Defaults to 1.
temperature (float, optional) – GPT-2 temperature parameter. Defaults to 0.7.
top_p (float, optional) – GPT-2 top_p parameter. Defaults to 0.9.
repetition_penalty (float, optional) – GPT-2 repetition_penalty parameter. Defaults to 1.2.
num_beams (int, optional) – GPT-2 num_beams parameter. Defaults to 1.
num_return_sequences (int, optional) – GPT-2 num_return_sequences parameter. Defaults to 1.
- Returns:
Completed knowledge graph
- Return type:
- save_pretrained(save_path: str) None ¶
Save pretrained model
- Parameters:
save_path (str) – Directory to save model to
- classmethod from_pretrained(model_name_or_path: str = 'mismayil/comet-gpt2-ai2') KnowledgeModel ¶
Load pretrained model
- Parameters:
model_name_or_path (str, optional) – HuggingFace model name or local model path. Defaults to “mismayil/comet-gpt2-ai2”.
- Returns:
Loaded knowledge model
- Return type:
- class kogito.models.gpt2.zeroshot.GPT2Zeroshot(gpt2_model: str = 'gpt2')¶
Zeroshot knowledge model based on GPT-2
- __init__(gpt2_model: str = 'gpt2') None ¶
Initialize GPT-2 model :param gpt2_model: HuggingFace model name for gpt2. Defaults to “gpt2”. :type gpt2_model: str, optional
- train()¶
Train a knowledge model
- Parameters:
train_graph (KnowledgeGraph) – Training dataset
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
Trained knowledge model
- Return type:
- save_pretrained(save_path)¶
Save model as a pretrained model
- Parameters:
save_path (str) – Directory to save the model to.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- classmethod from_pretrained(model_name_or_path: str = 'gpt2')¶
Load model from a pretrained model path This method can load models either from HuggingFace by model name or from disk by model path.
- Parameters:
model_name_or_path (str) – HuggingFace model name or local model path to load from.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
Loaded knowledge model.
- Return type:
- generate(input_graph: KnowledgeGraph, seed: int = 42, top_k: int = 1, top_p: float = 0.9, num_sequences: int = 3, num_beams: int = 3, temperature: float = 0.7, repetition_penalty: float = 1.2, max_length: int = 32) KnowledgeGraph ¶
Generate inferences from GPT2 model :param input_graph: Input dataset :type input_graph: KnowledgeGraph :param seed: Random seed. Defaults to 42. :type seed: int, optional :param top_k: GPT-2 top k parameter. Defaults to 1. :type top_k: int, optional :param top_p: GPT-2 top p parameter. Defaults to 0.9. :type top_p: float, optional :param num_sequences: GPT-2 num_return_sequences parameter. Defaults to 3. :type num_sequences: int, optional :param num_beams: GPT-2 num_beams parameter. Defaults to 3. :type num_beams: int, optional :param temperature: GPT-2 temperature parameter. Defaults to 0.7. :type temperature: float, optional :param repetition_penalty: GPT-2 repetition_penalty parameter. Defaults to 1.2. :type repetition_penalty: float, optional :param max_length: Max length of generated tokens. Defaults to 32. :type max_length: int, optional
- Returns:
Completed knowledge graph
- Return type:
- class kogito.models.gpt3.zeroshot.GPT3Zeroshot(api_key: str, model_name: str = 'text-davinci-002')¶
Zeroshot knowledge model based on GPT-3
- __init__(api_key: str, model_name: str = 'text-davinci-002') None ¶
Initialize a GPT-3 model
- Parameters:
api_key (str) – OpenAI API Key for GPT-3 model
model_name (str, optional) – Type of GPT-3 model. Defaults to “text-davinci-002”.
- train()¶
Train a knowledge model
- Parameters:
train_graph (KnowledgeGraph) – Training dataset
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
Trained knowledge model
- Return type:
- save_pretrained(save_model_path: str)¶
Save model as a pretrained model
- Parameters:
save_path (str) – Directory to save the model to.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- classmethod from_pretrained(model_name_or_path: str)¶
Load model from a pretrained model path This method can load models either from HuggingFace by model name or from disk by model path.
- Parameters:
model_name_or_path (str) – HuggingFace model name or local model path to load from.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
Loaded knowledge model.
- Return type:
- generate(input_graph: KnowledgeGraph, num_samples: int = 10, max_tokens: int = 16, temperature: float = 0.9, top_p: float = 1, n: int = 1, logprobs: int | None = None, stop: str | None = None, include_task_prompt: bool = True, debug: bool = False) KnowledgeGraph ¶
Generate inferences from GPT-3 model
- Parameters:
input_graph (KnowledgeGraph) – Input dataset
num_samples (int, optional) – Number of samples to use. Defaults to 10.
max_tokens (int, optional) – Max number of tokens. Defaults to 16.
temperature (float, optional) – GPT-3 temperature parameter. Defaults to 0.9.
top_p (float, optional) – GPT-3 top_p parameter. Defaults to 1.
n (int, optional) – Number of generations. Defaults to 1.
logprobs (Optional[int], optional) – GPT-3 logprobs parameter. Defaults to None.
stop (Optional[str], optional) – Stop token to use. Defaults to None.
include_task_prompt (bool, optional) – Whether to include task prompt. Defaults to True.
debug (bool, optional) – Whether to enable debug mode. Defaults to False.
- Returns:
Completed knowledge graph
- Return type:
Processors¶
- class kogito.core.processors.head.KnowledgeHeadExtractor(name: str, lang: Language | None = None)¶
Base class for head extraction
- __init__(name: str, lang: Language | None = None) None ¶
Initialize a head extractor
- Parameters:
name (str) – Unique head extractor name
lang (Optional[Language], optional) – Spacy language pipeline to use. Defaults to None.
- abstract extract(text: str, doc: Doc | None = None) List[KnowledgeHead] ¶
Extract heads from text
- Parameters:
text (str) – Text to extract from
doc (Optional[Doc], optional) – Spacy doc to use for extraction. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
List of extracted knowledge heads.
- Return type:
List[KnowledgeHead]
- class kogito.core.processors.head.SentenceHeadExtractor(name: str, lang: Language | None = None)¶
Extracts sentences as heads from text
- extract(text: str, doc: Doc | None = None) List[KnowledgeHead] ¶
Extract heads from text
- Parameters:
text (str) – Text to extract from
doc (Optional[Doc], optional) – Spacy doc to use for extraction. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
List of extracted knowledge heads.
- Return type:
List[KnowledgeHead]
- class kogito.core.processors.head.NounPhraseHeadExtractor(name: str, lang: Language | None = None)¶
Extracts noun phrases as heads from text
- extract(text: str, doc: Doc | None = None) List[KnowledgeHead] ¶
Extract heads from text
- Parameters:
text (str) – Text to extract from
doc (Optional[Doc], optional) – Spacy doc to use for extraction. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
List of extracted knowledge heads.
- Return type:
List[KnowledgeHead]
- class kogito.core.processors.head.VerbPhraseHeadExtractor(name: str, lang: Language | None = None)¶
Extracts verb phrases as heads from text
- extract(text: str, doc: Doc | None = None) List[KnowledgeHead] ¶
Extract heads from text
- Parameters:
text (str) – Text to extract from
doc (Optional[Doc], optional) – Spacy doc to use for extraction. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
List of extracted knowledge heads.
- Return type:
List[KnowledgeHead]
- class kogito.core.processors.relation.KnowledgeRelationMatcher(name: str, lang: Language | None = None)¶
Base class for relation matching
- __init__(name: str, lang: Language | None = None) None ¶
Initialize relation matcher
- Parameters:
name (str) – Unique relation matcher name
lang (Optional[Language], optional) – Spacy language pipeline to use. Defaults to None.
- abstract match(heads: List[KnowledgeHead], relations: List[KnowledgeRelation] | None = None, **kwargs) List[Tuple[KnowledgeHead, KnowledgeRelation]] ¶
Match relations to given heads
- Parameters:
heads (List[KnowledgeHead]) – List of heads to match for.
relations (Optional[List[KnowledgeRelation]], optional) – Subset of relations to use for matching. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses
- Returns:
List of matched head, relation tuples
- Return type:
List[Tuple[KnowledgeHead, KnowledgeRelation]]
- class kogito.core.processors.relation.BaseRelationMatcher(name: str, lang: Language | None = None)¶
Matches all relations with all heads
- match(heads: List[KnowledgeHead], relations: List[KnowledgeRelation] | None = None, **kwargs) List[Tuple[KnowledgeHead, KnowledgeRelation]] ¶
Match relations to given heads
- Parameters:
heads (List[KnowledgeHead]) – List of heads to match for.
relations (Optional[List[KnowledgeRelation]], optional) – Subset of relations to use for matching. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses
- Returns:
List of matched head, relation tuples
- Return type:
List[Tuple[KnowledgeHead, KnowledgeRelation]]
- class kogito.core.processors.relation.SimpleRelationMatcher(name: str, lang: Language | None = None)¶
Matches relation based on simple heuristics
- match(heads: List[KnowledgeHead], relations: List[KnowledgeRelation] | None = None, **kwargs) List[Tuple[KnowledgeHead, KnowledgeRelation]] ¶
Match relations to given heads
- Parameters:
heads (List[KnowledgeHead]) – List of heads to match for.
relations (Optional[List[KnowledgeRelation]], optional) – Subset of relations to use for matching. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses
- Returns:
List of matched head, relation tuples
- Return type:
List[Tuple[KnowledgeHead, KnowledgeRelation]]
- class kogito.core.processors.relation.ModelBasedRelationMatcher(name: str, dataset_class: Type[Dataset], model_class: Type[LightningModule], model_path: str, batch_size: int = 64, lang: Language | None = None)¶
Matches relations based on relation classifiers
- __init__(name: str, dataset_class: Type[Dataset], model_class: Type[LightningModule], model_path: str, batch_size: int = 64, lang: Language | None = None) None ¶
Initialize a model based relation matcher
- Parameters:
name (str) – Unique relation matcher name
dataset_class (Type[Dataset]) – Dataset class to use
model_class (Type[pl.LightningModule]) – Model class to use
model_path (str) – Model path to load model from
batch_size (int, optional) – Batch size for inference. Defaults to 64.
lang (Optional[Language], optional) – Spacy lang pipeline. Defaults to None.
- match(heads: List[KnowledgeHead], relations: List[KnowledgeRelation] | None = None, **kwargs) List[Tuple[KnowledgeHead, KnowledgeRelation]] ¶
Match relations to given heads
- Parameters:
heads (List[KnowledgeHead]) – List of heads to match for.
relations (Optional[List[KnowledgeRelation]], optional) – Subset of relations to use for matching. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses
- Returns:
List of matched head, relation tuples
- Return type:
List[Tuple[KnowledgeHead, KnowledgeRelation]]
- class kogito.core.processors.relation.SWEMRelationMatcher(name: str, lang: Language | None = None)¶
Relation matcher based on Simple Word Embeddings (GloVe)
- __init__(name: str, lang: Language | None = None) None ¶
Initialize a model based relation matcher
- Parameters:
name (str) – Unique relation matcher name
dataset_class (Type[Dataset]) – Dataset class to use
model_class (Type[pl.LightningModule]) – Model class to use
model_path (str) – Model path to load model from
batch_size (int, optional) – Batch size for inference. Defaults to 64.
lang (Optional[Language], optional) – Spacy lang pipeline. Defaults to None.
- class kogito.core.processors.relation.DistilBERTRelationMatcher(name: str, lang: Language | None = None)¶
Relation matcher based on DistilBERT embeddings
- __init__(name: str, lang: Language | None = None) None ¶
Initialize a model based relation matcher
- Parameters:
name (str) – Unique relation matcher name
dataset_class (Type[Dataset]) – Dataset class to use
model_class (Type[pl.LightningModule]) – Model class to use
model_path (str) – Model path to load model from
batch_size (int, optional) – Batch size for inference. Defaults to 64.
lang (Optional[Language], optional) – Spacy lang pipeline. Defaults to None.
- class kogito.core.processors.relation.BERTRelationMatcher(name: str, lang: Language | None = None)¶
Relation matcher based on BERT embeddings
- __init__(name: str, lang: Language | None = None) None ¶
Initialize a model based relation matcher
- Parameters:
name (str) – Unique relation matcher name
dataset_class (Type[Dataset]) – Dataset class to use
model_class (Type[pl.LightningModule]) – Model class to use
model_path (str) – Model path to load model from
batch_size (int, optional) – Batch size for inference. Defaults to 64.
lang (Optional[Language], optional) – Spacy lang pipeline. Defaults to None.
- class kogito.core.processors.relation.GraphBasedRelationMatcher(name: str, lang: Language | None = None)¶
Relation matcher based on knowledge graphs
- match(heads: List[KnowledgeHead], relations: List[KnowledgeRelation] | None = None, **kwargs) List[Tuple[KnowledgeHead, KnowledgeRelation]] ¶
Match relations to given heads
- Parameters:
heads (List[KnowledgeHead]) – List of heads to match for.
relations (Optional[List[KnowledgeRelation]], optional) – Subset of relations to use for matching. Defaults to None.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses
- Returns:
List of matched head, relation tuples
- Return type:
List[Tuple[KnowledgeHead, KnowledgeRelation]]
Linkers¶
- class kogito.core.linker.KnowledgeLinker¶
Base Knowledge Linker
- abstract save_pretrained(save_path: str) None ¶
Save linker as a pretrained model
- Parameters:
save_path (str) – Directory to save the linker to.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- abstract classmethod from_pretrained(model_name_or_path: str) KnowledgeLinker ¶
Load model from a pretrained model path This method can load linkers either from HuggingFace by model name or from disk by model path.
- Parameters:
model_name_or_path (str) – HuggingFace model name or local model path to load from.
- Raises:
NotImplementedError – This method has to be implemented by concrete subclasses.
- Returns:
Loaded knowledge linker.
- Return type:
- abstract link(input_graph: KnowledgeGraph, context: List[str] | str) List[List[float]] ¶
Link given knowledge graph with the context. This method computes a relevance probability for each knowledge in the graph with respect to the given context and returns these probabilities in a list in the same order as the knowledge tuples are in the given graph. Note that returned object is a list of list of numbers because a knowledge tuple might have multiple tails and the probability is calculated for each combination.
- Parameters:
input_graph (KnowledgeGraph) – Input graph to link.
context (Union[List[str], str]) – Context text. Can be either given as a list of sentences or as a string, in which case, it will be split into sentences using spacy engine.
- Returns:
List of relevance probabilities for each tail
- Return type:
List[List[float]]
- filter(input_graph: KnowledgeGraph, context: List[str] | str, threshold: float = 0.5, return_probs: bool = False) KnowledgeGraph | Tuple[KnowledgeGraph, List[List[float]]] ¶
Filter given graph based on context relevancy. This method under the hood links the graph to the context and then filters knowledge tuples from the graph which have a relevance probability lower than the given threshold. Filtered knowledge tuples are returned as a new knowledge graph. If there are multiple tails for a given knowledge, these tails will be filtered as well.
- Parameters:
input_graph (KnowledgeGraph) – Input graph to filter.
context (Union[List[str], str]) – Context text. Can be either given as a list of sentences or as a string, in which case, it will be split into sentences using spacy engine.
threshold (float, optional) – Relevance probability used for filtering. Defaults to 0.5.
return_probs (bool, optional) – Whether to return all the relevancy probs for the input graph. Defaults to False.
- Returns:
Filtered knowledge graph based on the relevancy scores and optionally, the relevancy scores.
- Return type:
Union[KnowledgeGraph, Tuple[KnowledgeGraph, List[List[float]]]]
- class kogito.linkers.deberta.DebertaLinker(model_name_or_path: str = 'mismayil/comfact-deberta-v2', language: str = 'en_core_web_sm')¶
- __init__(model_name_or_path: str = 'mismayil/comfact-deberta-v2', language: str = 'en_core_web_sm') None ¶
- save_pretrained(save_path: str) None ¶
Save pretrained model
- Parameters:
save_path (str) – Directory to save model to
- classmethod from_pretrained(model_name_or_path: str) KnowledgeLinker ¶
Load pretrained linker
- Parameters:
model_name_or_path (str) – HuggingFace model name or local model path
- Returns:
Loaded knowledge linker
- Return type:
- link(input_graph: KnowledgeGraph, context: List[str] | str) List[float] ¶
Link given knowledge graph with the context. This method computes a relevance probability for each knowledge in the graph with respect to the given context and returns these probabilities in a list in the same order as the knowledge tuples are in the given graph. Note that returned object is a list of list of numbers because a knowledge tuple might have multiple tails and the probability is calculated for each combination.
- Parameters:
input_graph (KnowledgeGraph) – Input graph to link.
context (Union[List[str], str]) – Context text. Can be either given as a list of sentences or as a string, in which case, it will be split into sentences using spacy engine.
- Returns:
List of relevance probabilities for each tail
- Return type:
List[List[float]]