0%

Relation Extraction with Matrix Factorization and Universal Schemas

发表于 2021-09-20 分类于 Knowledge Graph 阅读次数： Valine：
本文字数： 2.1k

Perface

This is the notes when I was reading Relation Extraction with Matrix Factorization and Universal Schemas paper.

Introduction Notes

semantic equivalence？
surface pattern relations？

Model Notes

Inspired from collaborative filtering
Matrix Factorization
$R$ is the set of relations
$T$ is the set of input tuples (entity pairs)

Predict the probability of $$. $r \in R, t \in T$

Using a natural parameter $\theta_{r, t}$ and the logistic function

$p (y_{r,t} = 1|\theta_{r, t}) := σ (\theta_{r, t}) = \frac{1}{1 + exp (−\theta_{r, t})}$

Latent Feature Model

$\textbf{Model F}$

Measure compatibility between relation $r$ and tuple $t$ as dot product of two latent feature representations of size $K^F$
$a_r$ for relation r, and $v_t$ for tuple t

$\theta^F_{r,t} := \sum^{K^F}_{k} a_{r, k} v_{t, k}$

Neighborhood Model

$\textbf{Model N}$

Interpolate the confidence for a given tuple
and relation based on the trueness of other similar
relations for the same tuple
weights $w_{r, r^{‘}}$, correspond to a
directed association strength between relations $r$ and
$r^{‘}$
$O$ is the set of real $$ tuples

$\theta^N_{r,t} := \sum_{(r^{'}, t) \in O \backslash \{(r,t)\} } w_{r, r^{'}}$

This model cannot harness any synergies between textual and pre-existing DB relations.

Entity Model

$\textbf{Model E}$

a feature vector $d_i$ for relation $r$, binary relations have feature representations $d_1$ for argument 1, and $d_2$ for argument 2
$t_e$ is the vector of dimension $K^E$ for entity $e$ $\theta^E_{r,t} := \sum_{i = 1}^{arity(r)}\sum_{k}^{K^E} d_{i, k}t_{t_i,k}$

Combined Model

$\theta^{NEF}_{r,t} = \theta^N_{r,t} + \theta^E_{r,t} + \theta^F_{r,t}$

Parameter Estimation

Models are parametrized through weights and latent component vectors.
Because of inspiring by collaborative filtering, the score is used to rank not predict (data is positive-only), it needs to pick a threshold.

Objective

Using Bayesian Personalized Ranking (BPR)

For each relation r and each observed fact $f^{+}:= \in O$
all tuples $t_{−}$ such that $f^{−} := \notin O$
For each pair of facts $f^{+}$ and f^{−} we want $p(f^{+})$ > $p(f^{-})$ and hence $\theta{f^{+}}$ > $\theta{f^{-}}$
In BPR this is achieved by maximizing a sum terms of the form $Obj{f^{+}, f^{-}} = log (\sigma(theta{f^{+}} - theta_{f^{-}}))$

Reference

[1] Relation extraction with matrix factorization and universal schemas.

[2] Embedding Entities and Relations for Learning and Inference in Knowledge Bases

[3] A Three-Way Model for Collective Learning on Multi-Relational Data

-------------本文结束感谢您的阅读-------------

1. Perface
2. Introduction Notes
3. Model Notes
4. Reference

永缘空

山东大学17级计算机科学与技术，澳大利亚国立大学Bachelor of Advanced Computing(Honors), 澳大利亚国立大学Master of Machine Learning and Computer Vision主要学习NLP, Machine Learning等机器学习领域，此为个人博客。