Perface
This is the notes when I was reading Relation Extraction with Matrix Factorization and Universal Schemas paper.
Introduction Notes
semantic equivalence?
surface pattern relations?
Model Notes
- Inspired from collaborative filtering
- Matrix Factorization
- $R$ is the set of relations
- $T$ is the set of input tuples (entity pairs)
Predict the probability of $
Using a natural parameter $\theta_{r, t}$ and the logistic function
Latent Feature Model
$\textbf{Model F}$
- Measure compatibility between relation $r$ and tuple $t$ as dot product of two latent feature representations of size $K^F$
- $a_r$ for relation r, and $v_t$ for tuple t
Neighborhood Model
$\textbf{Model N}$
- Interpolate the confidence for a given tuple
and relation based on the trueness of other similar
relations for the same tuple - weights $w_{r, r^{‘}}$, correspond to a
directed association strength between relations $r$ and
$r^{‘}$ - $O$ is the set of real $
$ tuples
- This model cannot harness any synergies between textual and pre-existing DB relations.
Entity Model
$\textbf{Model E}$
- a feature vector $d_i$ for relation $r$, binary relations have feature representations $d_1$ for argument 1, and $d_2$ for argument 2
- $t_e$ is the vector of dimension $K^E$ for entity $e$
Combined Model
Parameter Estimation
- Models are parametrized through weights and latent component vectors.
- Because of inspiring by collaborative filtering, the score is used to rank not predict (data is positive-only), it needs to pick a threshold.
Objective
Using Bayesian Personalized Ranking (BPR)
- For each relation r and each observed fact $f^{+}:=
\in O$ - all tuples $t_{−}$ such that $f^{−} :=
\notin O$ - For each pair of facts $f^{+}$ and f^{−} we want $p(f^{+})$ > $p(f^{-})$ and hence $\theta{f^{+}}$ > $\theta{f^{-}}$
- In BPR this is achieved by maximizing a sum terms of the form $Obj{f^{+}, f^{-}} = log (\sigma(theta{f^{+}} - theta_{f^{-}}))$