PyTorch-BigGraph 从实体嵌入到边得分
From entity embeddings to edge scores
The goal of training is to embed each entity in ℝ^D so that the embeddings of two entities are a good proxy to predict whether there is a relation of a certain type between them.
训练的目的是将每个实体嵌入到D维的向量空间中,以便两个实体的嵌入可以很好地预测它们之间是否存在某种类型的关系。
To be more precise, the goal is to learn an embedding for each entity and a function for each relation type that takes two entity embeddings and assigns them a score, with the goal of having positive relations achieve higher scores than negative ones.
更确切地说,PBG的目标是学习每个实体的嵌入和每种关系类型的函数,该函数接受两个实体嵌入并为其分配一个分数,目的是使正面关系比负面关系获得更高的分数。
key point: 实体->D维向量 关系->函数
All the edges provided in the training set are considered positive instances. In order to perform training, a set of negative edges is needed as well. These are not provided by the user but instead generated by the system during training (Negative sampling), usually by fixing the left-hand side entity and the relation type and sampling a new right-hand side entity, or vice versa. This sampling scheme makes sense for large sparse graphs, where there is a low probability that edges generated this way are true positives edges in the graph.
训练集中提供的所有边被视为正例。为了执行训练,还需要一些负边(负样本)。 这些不是由用户提供的,而是由系统在训练过程中生成的(负采样),通常是通过固定左侧实体和关系类型并采样新的右侧实体,反之亦然。 对于大的稀疏图,这种采样方案是有意义的,因为在这样的情况下,以这种方式生成的边为图中的正例的可能性很小。
A priori, entity embeddings could take any value in ℝ^D. Although, in some cases (for example when restricting them to be within a certain ball, or when comparing them using cosine distance), their “angle” will have greater importance than their norm.
先验实体嵌入可以在D维空间中取任何值。但是在某些情况下(例如,当将它们限制在某个球内或使用余弦距离进行比较时),它们的“角度”比范数更具有意义。
Per-relation scoring functions, however, must be expressible in a specific form (the most common functions in the literature can be converted to such a representation). In the current implementation, they are only allowed to transform the embedding of one of the two sides, which is then compared to the un-transformed embedding of the other side using a generic symmetric comparator function, which is the same for all relations. Formally, for left- and right-hand side entities 𝑥 and 𝑦 respectively, and for a relation type 𝑟, the score is:
where 𝜃𝑥 and 𝜃𝑦 are the embeddings of 𝑥 and 𝑦 respectively, 𝑓𝑟 is the scoring function for 𝑟, 𝑔𝑟 is the operator for 𝑟 and 𝑐 is the comparator.
然而,对于每个关系的得分计算函数,必须以特定的形式表现出来(文献中最常见的函数可以转换为这种表示形式)。在目前的实现中,只允许对一侧的实体进行嵌入转换,通过使用通用的对称比较器将其与另一侧未转换的实体嵌入进行比较,这样的操作对于所有的关系类型都是一样的。形式上,分别对于左侧和右侧实体𝑥和,,以及对于关系类型𝑟,得分为:𝑓𝑟(𝜃𝑥,𝜃𝑦)=𝑐(𝜃𝑥,𝑔𝑟(𝜃𝑦)), 其中,𝜃𝑥和𝜃𝑦分别是𝑥和𝑦的嵌入,𝑓𝑟是关系𝑟的评分函数,𝑔𝑟是关系r的算子,𝑐是比较器。
Under “normal” circumstances (the so-called “standard” relations mode) the operator is solely applied to the right-hand side entities. This is not the case when using dynamic relations. Applying the operator to both sides would oftentimes be redundant. Also, preferring one side over the other allows to break the symmetry and capture the direction of the edge.
在通常情况下(或者说是标准的关系模式中),算子仅仅是适用于右侧实体。但是在PBG的动态关系中则不是这样。通常,将算子运用在两侧的实体嵌入上是多余的,并且倾向于选择某一侧的举动可以很好的打破对称性并且捕捉到边的方向。
Embeddings
Embeddings live in a 𝐷-dimensional real space, where 𝐷 is determined by the dimension configuration parameter.
嵌入是一个𝐷维实空间的向量,其中𝐷的大小由配置参数确定。
Normally, each entity has its own embedding, which is entirely independent from any other entity’s embedding. When using featurized entities however this works differently, and an entity’s embedding will be the average of the embeddings of its features.
通常的,每一个实体都有属于自己的完全和其他实体嵌入独立的嵌入。然而,当使用特征化的实体时,情况有所不同。此时实体的嵌入为其特征嵌入的平均值。
If the max_norm configuration parameter is set, embeddings will be projected onto the unit ball with radius max_norm after each parameter update.
如果设置了max_norm配置参数,则在每次更新参数后,嵌入将会被投影到半径为max_norm的单位球上。
🤔️ 暂时没看懂。
To add a new type of embedding, one needs to subclass the torchbiggraph.model.AbstractEmbedding class.
要添加一种新型的嵌入,需要将torchbiggraph.model.AbstractEmbedding类作为子类。
🤔️ 暂时没看懂。
Global embeddings
When the global_emb configuration option is active, each entity’s embedding will be translated by a vector that is specific to each entity type (and that is learned at the same time as the embeddings).
当global_emb配置参数设置为True的时候,每个实体的嵌入将会被表示成一个向量,该向量对于每种实体类型都是特定的。(并且该向量是与嵌入同时学习得到的)
🤔️ 暂时没看懂。
Operators
The operators that are currently provided are:
• none, no-op, which leaves the embeddings unchanged;
• translation, which adds to the embedding a vector of the same dimension;
• diagonal, which multiplies each dimension by a different coefficient (equivalent to multiplying by a diagonal matrix);
• linear, which applies a linear map, i.e., multiplies by a full square matrix
• affine, which applies a affine transformation, i.e., linear followed by translation.
• complex_diagonal, which interprets the 𝐷-dimensional real vector as a 𝐷/2-dimensional complex vector (𝐷 must be even; the first half of the vector are the real parts, the second half the imaginary parts) and then multiplies each entry by a different complex parameter, just like diagonal.
当前提供的算子有:
• 无,无操作,使嵌入保持不变;
• 平移算子,将相同尺寸的向量添加到嵌入中;
• 对角算子,将每个维度乘以不同的系数(相当于乘以对角矩阵);
• 线性算子,运用一个线性映射,例如,讲嵌入和一个全方阵点乘;
• 仿射算子,应用仿射变换,即线性变换后再进行平移;
• 复对角算子,将D维实向量转化为D/2维复矢量(𝐷必须是偶数;矢量的前半部分是实数部分,后半部分是虚数部分),然后将每个项乘以不同的复数参数,就像对角算子一样。
All the operators’ parameters are learned during training.
训练过程中将学习所有算子的参数。
To define an additional operator, one must subclass the torchbiggraph.model.AbstractOperator class (or the torchbiggraph.model.AbstractDynamicOperator one when using dynamic relations; their docstrings explain what must be implemented) and decorate it with the torchbiggraph.model.OPERATORS.register_as() decorator (respectively the torchbiggraph.model.DYNAMIC_OPERATORS.register_as() one), specifying a new name that can then be used in the config to select that comparator. All of the above can be done inside the config file itself.
如果要自定义新的算子,需要实现torchbiggraph.model.AbstractOperator的子类(动态关系情况下实现torchbiggraph.model.AbstractDynamicOperator子类,docstrings解释了必须实现什么)并且在torchbiggraph.model.OPERATORS.register_as()装饰器中注册(或者torchbiggraph.model.DYNAMIC_OPERATORS.register_as() )指定一个新名称,然后在配置中使用该名称来选择比较器。上述所有操作都可以在配置文件内部完成。
🤔️ 暂时没看懂。
Comparators
The available comparators are:
• dot, the dot-product, which computes the scalar or inner product of the two embedding vectors;
• cos, the cos distance, which is the cosine of the angle between the two vectors or, equivalently, the dot product divided by the product of the vectors’ norms.
• l2, the negative L2 distance, a.k.a. the Euclidean distance (negative because smaller distances should get higher scores).
• squared_l2, the negative squared L2 distance.
当前提供的比较器有:
• 点乘,计算两个实体嵌入向量的标量或内积;
• 余弦距离,两个实体嵌入向量的余弦夹角,或者等效的说是 dot(a,b)/(sqrt(a^2)*sqrt(a^2))
• 负L2距离,又称欧几里得距离(使用负的二范数是因为真正比较的是两者的相似度,较小的距离应获得更高的分数,这里的分数其实就类似于相似度)
• L2的负平方距离。
Custom comparators need to extend the torchbiggraph.model.AbstractComparator class (its docstring explains how) and decorate it with the torchbiggraph.model.COMPARATORS.register_as() decorator, specifying a new name that can then be used in the config to select that comparator. All of the above can be done inside the config file itself.
自定义比较器需要扩展torchbiggraph.model.AbstractComparator类(其文档字符串说明方式),并使用torchbiggraph.model.COMPARATORS.register_as()装饰器对其进行修饰,并指定一个新名称,该名称随后可在配置中用于选择该比较器。以上所有操作均可在配置文件本身中完成。
Bias
If the bias configuration key is in use, then the first coordinate of the embeddings will act as a bias in the comparator computation. This means that the comparator will be computed on the last 𝐷−1 entries of the vectors only, and then both the first entries of the two vectors will be added to the result.
如果bias参数在配置文件中设置为True,那么嵌入的第一个坐标将充当比较器计算中的偏置。这意味着比较器将仅在向量得第二维到第D维上进行计算,然后将两个向量的第一维都直接被添加到结果中。
Coherent sets of configuration parameters
While the parameters described in this chapter are exposed as uncoupled knobs in the configuration file (to more closely match the implementation, and to allow for more flexible tuning), some combinations of them are more sensible than others.
尽管本章中描述的参数在配置文件中显示为未耦合的旋钮(为了更近似匹配实现效果,并允许更灵活的调优),但它们中的某些组合比其他组合更合理。
Apart from the default one, the following configuration has been found to work well: init_scale = 0.1, comparator = dot, bias = true, loss_fn = logistic, lr = 0.1.
除默认配置外,还发现以下配置可以正常运行:init_scale = 0.1,comparator = dot, bias = true, loss_fn = logistic, lr = 0.1。
Interpreting the scores
The scores will be tuned to have different meaning and become more suitable for certain applications based on the loss function used during training. Common options include ranking what other entities may be related to a given entity, determining the probability that a certain relation exists between two given entities, etc.
根据训练过程中使用的损失函数,得分将被调整为具有不同的含义,并且变得更适合某些特定的应用。常见选项包括对其他实体可能与给定实体相关的等级进行排名、确定两个给定实体之间存在某种关系的可能性等指标。
源地址:
https://torchbiggraph.readthedocs.io/en/latest/scoring.html
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!