Tag: TG_DOC_TEXT
(1203 ranking factors)
Factors |
---|
TR
web_production: 1
Text relevance (Maxfreq is the frequency of the most frequent word that makes sense of the length of the document).
|
PrBonus
web_production: 3
Weight: 0.07124278745128 Priority bonus, priority 7 - text priority. The binary factor, matters 0 for all monosyllabic requests, and the value of 1 for almost all two or more words, except for a very small number of answers for which there is not a single link that has passed quorum, and the text also did not pass the quorum.
|
TRp1
web_production: 4
Stript priority for TR is a text priority - there are all the words of the request somewhere in the document (while they pass contextual restrictions on the request, for example, both words DB in one sentence).
|
TRp2
web_production: 5
Weight: -0.109820338929289 PHRASE priority for TR is a text priority - there are all the words of the request in a row in the document.
|
TRtitle
web_production: 8
The presence of an accurate phrase (request text) in the header (more precisely, in the first sentence of the document). Contextual restrictions and feet are taken into account exactly as in TRP2, i.e. Factor [8] Minors Factor [5]
|
TRhr
web_production: 9
There was a plot that passed the quorum in which all the word positions are designated as those who have the relevance of Best_relev (title or Meta Keywords).
|
Long
web_production: 15
Weight: -0.084798680877042 Long document (the longer the document, the greater the value of the factor).
|
TRhitw
web_production: 16
Hitweigt is a variant of textual relevance, in which the weights of all hits are considered equal (i.e., they do not take into account the allowances for title and the proximity of words). In this case, the corresponding hits must be restricted by the syntactic sorcerer, i.e. We can assume that the TRHITW factor is 0 and only when Softandok is 0
|
PureText
web_production: 18
Long text without links.
|
SubqueryThMatch
web_production: 23
Coincidence of thematic spectra of request and document. Request themes-the result of work ((http://wiki.yandex-team.ru/evgenijjkroxalev/subquery Rules of the sorcerer Subquerysearch)) The subject of the document is taken from Yandex-Catalog
|
TRref
web_production: 25
The factor about the number of Refines. In the queries, there is a feature of user refines ('' word that is faced with a percentage sign '). According to the idea, this means something like 'it would be good if the word in the document was'. The only famous ((http://staff.yandex-team.ru/gulin Andrey Gulin)) the valuable use of this feature is a request [ %official %site name of the film]. This feature is unknown to users, because Not described in any documentation. It is planned that it will disappear from the tongue of requests, but in the sorcerer the words with the priority of User_refine will remain. The factor indicates how much the maximum user_refine was simultaneously found in the framework of a single hit in the quorum. It is believed that there are from 0 to 3 (if> 3, then it is believed that 3). This number is waved in the half interval [0.1)
|
TRboost
web_production: 26
The number for which some linseed factors are multiplied (namely, factors number 6, 7, 47, 66), if text relevant 0, and there are few links
|
TRLRlemma
web_production: 27
In textual relevance, Lemma coincides.
|
RelevSentsDssm
web_production: 29
DSSM model, trained for reformulations, in the document uses relevant to the request of the proposal
|
TRUnmapped
web_production: 39
TR divided by a cube of the number of words in a request and transformed by a standard REMAPTR.
|
RusLang
web_production: 40
The language of the document is Russian.
|
TextBM25
web_production: 46
Simple BM25 in text.
|
TLBM25
web_production: 48
Weight: 0.031399776481102 Simple BM25 in text and links at the same time.
|
TLp1
web_production: 49
All the words of the request are in the text + links.
|
TxtPair
web_production: 53
Weight: -0.020921642736537 Simple BM25 in pairs of words - we take all pairs of words of the request and consider the number of their entry into the text of the document. In the quality of the weight of the pair we use the sum of the scales of words. It does not work if there is a stop-word in the request
|
TxtBreak
web_production: 55
BM25 from the number of sentences in the document in which it occurs.
|
TxtHead
web_production: 56
Weight: -0.037878046829073 BM25 according to only in the heading.
|
TxtHiRel
web_production: 57
BM25 according to only with High Rel-bots ('significant', with the allocation (<b> ITP)).
|
HasNoTR
web_production: 61
The document has no TR.
|
TxtPairEx
web_production: 67
Weight: -0.00667940021707 the presence of pairs of words in the exact form
|
TxtBreakEx
web_production: 68
Weight: 0.024006117828321 the number of sentences in which there are many words in the exact form
|
TxtHeadEx
web_production: 69
Weight: -0.03957553241619 the presence of words in the header in the exact form
|
TxtHiRelEx
web_production: 70
BM25 in the exact form
|
TxtBm25Ex
web_production: 71
Simple BM25 in the exact form.
|
TxtPairSy
web_production: 72
Weight: -0.022152880819573 the presence of pairs of words taking into account synonyms (> = txtpair)
|
TxtBreakSy
web_production: 73
Weight: -0.116819481337211 the number of sentences in which there are many words taking into account synonyms
|
TxtHeadSy
web_production: 74
Weight: -0.012919083353605 the presence of words in the header, taking into account synonyms
|
TxtHiRelSy
web_production: 75
Weight: -0.039215257302626 BM25 taking into account synonyms
|
TxtBm25Sy
web_production: 76
Simple BM25 taking into account synonyms.
|
Megafon
web_production: 80
The relative frequency of the words in the links (1 - the words of the request are often found in links, 0.3 - rarely); More precisely, the value of this factor is pessimized provided: TR = 0 && LR = 0 & (there is not a single link with all the words of the request) && (did not pass the quorum) && (at least one pair of words of the request is found in the text)
|
BFexact
web_production: 91
There is an exact form of all words of the request in the text/lincers
|
BFlemma
web_production: 92
There is a lemma of all the words of the request in the text/lincers
|
SoftAndOk
web_production: 93
The document passed Softand on the restrictions of the syntactic sorcerer. Only for documents with textual relevance. For monosyllabic requests, always 1.
|
TextFeatures
web_production: 100
Weight: -0.016033504310566 The quality of the text. It is considered a rather complex formula
|
TextLike
web_production: 101
Weight: -0.094096848692163 Text quality (classifier Alekseeva)
|
DocLen
web_production: 110
Weight: -0.065128132003719 Document length in sentences
|
IsHTML
web_production: 114
Document type - HTML
|
IsPorno
web_production: 131
Document from porn kitski
|
IsComm
web_production: 132
Weight: -0.066463228806236 A document from a commercial clay. Not used (depreded)
|
IsFake
web_production: 133
Fast document
|
IsSEO
web_production: 134
The page title contains commercial vocabulary. Not used (depreded)
|
IsEShop
web_production: 136
Commercial page (Classifier Savina)
|
HasNoAllWordsTRSy
web_production: 138
The document does not have all the words of the request (with an accuracy to a synonym)
|
NumWordsTRSy
web_production: 139
The percentage of the words of the request in the document (with an accuracy to a synonym)
|
HasAllWordsTRSy
web_production: 140
The document has all the words of the request (with an accuracy to a synonym)
|
TxtInvPair
web_production: 144
Tr by pairs of words in the reverse order
|
TxtSkipPair
web_production: 146
Weight: -0.077504878926916 TR by pairs of words of the request through one word in texts
|
NumWordsTRFm
web_production: 148
The percentage of all the words of the request in the text (with an accuracy to the form)
|
HasAllWordsTRFm
web_production: 149
The document has all the words of the request (with an accuracy to the form)
|
TLen
web_production: 164
The length of the page text in the words tlen = map (number of words, 1/400), where map (x, y) = x*y / (1 + x*y)
|
ExactWordOrderLen
web_production: 180
The length of the maximum coincidence of forms in the text and request
|
ExactWordOrderWeight
web_production: 181
Weight of maximum coincidence of forms in the text and request
|
WordOrderLen
web_production: 182
The length of the maximum coincidence in the lemma in the text and request
|
WordOrderWeight
web_production: 183
The weight of the maximum coincidence by lemma in the text and request
|
TRp1All
web_production: 185
Options for relevant factors taking into account the feet of words
|
LRp1All
web_production: 186
Options for relevant factors taking into account the feet of words
|
TLp1All
web_production: 187
Weight: 0.055767877134775 Options for relevant factors taking into account the feet of words
|
BFexactAll
web_production: 188
Options for relevant factors taking into account the feet of words
|
BFlemmaAll
web_production: 189
Weight: 0.059222635368125 Options for relevant factors taking into account the feet of words
|
PassageLegacyTR
web_production: 190
Weight: 0.038806477920761 TR of the best passage - how high -quality snippet
|
TxtBM25AttenSyn
web_production: 191
Weight: 0.075434934641649 Tr with discount for suggestions
|
TRWithStops
web_production: 199
Weight of maximum coincidence of forms in the text and request
|
LRWithStops
web_production: 200
Weight of maximum coincidence of forms in the text and request
|
HasPayments
web_production: 201
The page has a about 'payment SMS'.
|
EshopValue
web_production: 203
Weight: -0.123814718900663 Stage of the page
|
PornoValue
web_production: 204
Pornography of the page
|
AuxTextBM25
web_production: 268
BM25 for the user region for localized queries, for the unflapped in Cuba, is a country. The texts of the queries sent for the regions can be viewed in Relev_regions.txt in the sorcerer
|
TRDocQuorum
web_production: 283
The weight of the words of the request that is in the text
|
TRLRDocQuorum
web_production: 285
The weight of the words of the request that is in the text and links
|
JokerLen
web_production: 297
We consider text features, believing that the page title is attributed to each of its proposal, i.e. The distance between the word from Title and any other word 1 sentence. Len is the maximum attitude of words from the request of the text met in some sentence (with attributed Title) in relation to the length of the request. Example [Harms Circus Vertunov] for ((http://wiki.yandex-team.ru//h.yandex.net/?http%3A%2F%2FWWWWIKILIVRES.info%2FWIKI%2F%25D0%25A6%25D %25b8%25D1%2580%25D0%25D0%25A %25BC%25D1%2581%of this document))
|
JokerWeight
web_production: 298
The ratio of the amount of IDF words in a sentence+Title to all words.
|
ExactJokerLen
web_production: 299
The same as Jokerlen, in the exact forms
|
ExactJokerWeight
web_production: 300
The same as Jokerweight, in the exact forms
|
Adultness
web_production: 312
equals 2 * NastyContent
|
Poetry
web_production: 319
The poetry of the document
|
PoetryQuad
web_production: 320
The maximum poetry of the quatrain
|
EngLang
web_production: 321
Document language - English
|
Has2ExactQueryParts
web_production: 322
The request is fully covered by two exact groups consisting of an exact Match of the words of a contract in a row ((http://wiki.yandex-team.ru/poiskovajaplatform/tr/coveragebygroups about coating in groups))
|
HasLevensht1QueryFragment
web_production: 323
There is a group consisting of an Exact Match of the words of the request that covers the request (possibly with a pass, addition or replacement of a word)
|
LargestSyInexactGroup
web_production: 324
Weight: -0.067337343351376 The share of the request, covered by the longest group consisting of any hits (including word forms and synonyms). Possibly with a pass, addition or replacement of a word
|
CyrLang
web_production: 327
The language of the document is Cyrillic
|
SynS1
web_production: 334
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
SynFLremap1
web_production: 335
Weight: 0.002431406823392 Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
SynFLremap2
web_production: 336
Weight: 0.08033186404617 Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
PageDate
web_production: 345
Weight: -0.034716206980983 The date of the document that is registered on the page is remarkable
|
HasTextPos
web_production: 350
The document has textual relevance
|
QSegmentsBM25
web_production: 351
Weight: -0.059299975637935 BM25, where the selected segments of the request act as 'words'
|
QSegmentsWeight
web_production: 352
Weight: -0.057628362537565 'Weight' of the segments of the request in the text
|
SynPercentBadWordPairs
web_production: 353
An indicator of the unnaturalness of the text from the point of view of the Russian language. The number of bad pairs of words in the text, transferred to the segment [0.1] according to the Z/(Z+10) formula
|
SynNumBadWordPairs
web_production: 354
The proportion of bad steam among all found in the table: Z/(X+1), where Z is the number of bad couples in the text, and X is (http://wiki.yandex-team.ru/evgenijgrechnikov/testsynonimizers of 2000-navigable )) steam
|
NumLatinLetters
web_production: 355
Weight: -0.086731079136512 The number of Latin letters in the text (not counting the markings), driven into [0.1] formula n/(n+100)
|
DocIdfSumFixed
web_production: 357
Previous factors - fixed
|
TitleIdfSumFixed
web_production: 358
Weight: 0.047164043400143 Previous factors - fixed
|
HeadingIdfSumFixed
web_production: 359
Weight: -0.068235863277027 Previous factors - fixed
|
NormalTextIdfSumFixed
web_production: 360
Previous factors - fixed
|
RusWordsInText
web_production: 364
The number of words in the text (the word is what the lemmeter selected) is displayed in [0.1] according to the formula x/(x+a)
|
RusWordsInTitle
web_production: 365
Weight: 0.03118624384934 The number of words of the Russian language in the title
|
MeanWordLength
web_production: 366
Weight: 0.019580616053835 The average length of the word
|
PercentWordsInLinks
web_production: 367
Weight: 0.057053549836014 The percentage of the number of words inside the tag <a> .. </a> from the number of all words
|
PercentVisibleContent
web_production: 368
Weight: -0.032828345615772 The percentage of the number of words outside the tags (outside the brackets <>) from the number of all words
|
PercentFreqWords
web_production: 369
Weight: -0.020210221137273 The percentage of the number of words, which are 200 the most frequent words of the language, from the number of all words of the text
|
PercentUsedFreqWords
web_production: 370
Weight: -0.063976585802142 The number used in the text 500 of the most popular words of the language, divided by 500
|
TrigramsProb
web_production: 371
Weight: -0.002170850269151 Logarithm of average geometric probabilities of trigrams in the text. (the probability of a trigram - the number of its meetings in the text, divided by the number of all trigrams) is displayed in [0.1] according to the formula -x (x+a)
|
TrigramsCondProb
web_production: 372
Weight: 0.026650508120317 Logarithm of the average geometric conditional probabilities of trigrams. The conditional probability of a trigram is its probability, divided by the probability of a bigram from the first two words
|
DaterAge
web_production: 380
Weight: -0.207437366708906 The difference between the current date and the date of the document defined by the dates, 1 - the date of the document is equal to the current, 0 - the document of 10 years or more, if the date is not defined, equal to 0. Attention! ((1 - dateraage)*60)^2 = age of the page In days.
|
TextMaxForms
web_production: 385
Weight: -0.015212586791057 The maximum number of forms in all words of the request is max in all words of the request request_form_dl_lov/64
|
TextWeightedForms
web_production: 386
Weight: 0.022803839020796 The sum of the number of forms balanced by the scales of words - the amount in all words of the request of the number_form_dly_lov/64*weight_lov; REMAP species x/(1 + x).
|
TextForms
web_production: 387
Weight: -0.008656938143421 The unwarmed amount of the number of forms is the amount in all words of the request of the number_form_dl_lov/64/number_lov_
|
TR_W1
web_production: 391
Analogues of the factors of the same name, the weight of the word = 1
|
TextBM25_Fm_W1
web_production: 393
Analogues of the factors of the same name, the weight of the word = 1
|
TextBM25_Sy_W1
web_production: 394
Analogues of the factors of the same name, the weight of the word = 1
|
TLBM25_W1
web_production: 396
Analogues of the factors of the same name, the weight of the word = 1
|
NumeralsPortion
web_production: 399
The share of different parts of speech in the text. The share of numerals (among all words that managed to recognize part of the speech)
|
ParticlesPortion
web_production: 400
Weight: -0.012429221647235 The share of particles
|
AdjPronounsPortion
web_production: 401
Weight: -0.005976754416269 The share of pronoun adjectives
|
AdvPronounsPortion
web_production: 402
Weight: -0.001250755074786 The proportion of pronoun nouns
|
VerbsPortion
web_production: 403
The share of verbs
|
FemAndMasNounsPortion
web_production: 404
Weight: 0.011650367441796 The share of words that can be both masculine nouns and nouns of the feminine, but not of the middle kind, among all nouns (examples: 'hummingbirds' are an example of an indefinite kind that can be determined in two ways, 'Alexander' is homonymy).
|
LongestText
web_production: 410
Weight: 0.069696682544392 The size of the largest text segment (from the factor [18] puretext)
|
DssmYaMusicASREarlyBindingCe
web_production: 436
DSSM model with early binding, trained on reforming and learned by ASR hypotheses of musical requests for Alice
|
DssmBertDistillSinsigCeCountryRegChain
web_production: 437
A model trained on a PRS-Law PRS to predict BERT, trained on sinsig_ce with threshold value 0.5, using a chain of regions to the country
|
DssmYaMusicEarlyBindingCe
web_production: 438
DSSM model with early binding, trained on reforming and learned on musical requests for Alice
|
Swbm25
web_production: 452
Weight: 0.019740981979634 Cunning BM25 in a sliding window. The size of the window is set in sentences. 'Jokers' are used for headlines and the beginning of the document. Morphological proximity and structure of the text are taken into account. The weight of the window fades with the removal from the beginning of the document.
|
PositionLanguageModel
web_production: 453
Weight: -0.032269052994315 The factor about that, a good snippet can turn out.
|
TxtPair_W1
web_production: 454
Weight: -0.016932610010322 Simple BM25 in pairs of words - we take all pairs of words of the request and consider the number of their entry into the text of the document. Weight = 1. It does not work if there is a stop-word in the request
|
AuraDocLogShared
web_production: 455
Weight: -0.097686304848915 Logarithm of the number of shingles on which this document is not unique
|
AuraDocLogAuthor
web_production: 456
Weight: -0.097277529611975 Logarithm of the number of shingles on which this owner of the document is recognized as the author
|
AuraDocMeanSharedWeight
web_production: 457
Weight: -0.110593487056685 The average weight of non-ugly shingles of this document
|
LanguageCompliance
web_production: 469
Weight: 0.054576897612176 The language of the document corresponds to the language language
|
IsPornoAdvert
web_production: 477
On the Porn Advertising page
|
BM25FdPR_obsolete
web_production: 481
Weight: 0.054156294329288 BM25 with different parameters for different fields, including an incoming anchortekst. The weight of the text of the links included on the page is normalized depending on Delta Page Rank links
|
YmwFull
web_production: 492
Weight: -0.044940112806396 The size of the minimum piece of text, including all the words of the request found in the document. Not used now. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAformula/tekushhiekomponenty/ymw Read more))
|
Bclm
web_production: 493
Weight: 0.030786458206337 Buettcher, Clarke and Lushman factor (modified) ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAformula/tekushichiekomponenty/bclm more)))))))))
|
FieldLM
web_production: 495
Weight: 1.36522746e-7 Unigramal language model. Language is modeling according to the document, smoothed out by the general linguistic model. When building a model, the document uses information on which field of the document met the word request (Title, Head or Plain Text)
|
TitleTrigramsQuery
web_production: 501
Weight: 0.112928770384249 Calculates the coating of the request with letter trigrams of the document header
|
TitleTrigramsTitle
web_production: 502
Calculates the heading of the heading of the document header with letter trigrams
|
QueryWordSequencesTR
web_production: 504
Weight: -0.11860635115951 He considers the sum of the following species: the sequence of words of the request more than two, met in one sentence; It is normalized for the length of the document.
|
DmozThemeMatchAll
web_production: 511
Coincidence of the thematic spectrum (according to DMOZ) request and document. The theme of the request is determined ((http://wiki.yandex-team.ru/jandekspoisk/zarubezhnyjjinternet/dmozqueryClassifier1 The rule of the sorcerer Dmoztheme))
|
DmozThemeMatchBest
web_production: 512
Coincidence of the thematic spectrum (according to DMOZ) request and document. The theme of the request is determined by the best result ((http://wiki.yandex-team.ru/jandekspoisk/zarubezhnyjjinternet/dmozqueryClassifier1 Rules for the sorcerer DmozTheme)) The subject of the document is determined by the automatic classifier
|
Mpsa
web_production: 513
Weight: 0.093045433292429 Evaluates the minimum distance between the pairs of words of the request, taking into account the remoteness of the pair from the beginning of the document (Minimal Pair Size with Attenuation). Steles are understood to mean all consistent bigrams of the words of the request. Thus, the number of vapor is equal to the number of words in a request reduced by 1. Accordingly, the factor makes sense for requests consisting of more than one word. (Http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayaformula/ Tekushhiekomponenty/MPSA MPSA))
|
Bclm2
web_production: 514
It differs from BCLM in that the weights of all words are considered the same. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAformula/tekushhiekomponenty/bclm2 BCLM2))))))))))))
|
AbsolutePLM
web_production: 515
Text relevant based on the language model, taking into account the absolute position. We go along the text with a window of 20 words, build a language model on each window (that is, the distribution of probabilities in the words of the Russian language) and calculate the probability of generating a request. For removal from the beginning of the document, we finish the model.
|
BclmLite
web_production: 522
Modification of the BCLM2 factor, lightweight for use in tulle. The main difference is that BCLMLite does not use absolute displacements of words relative to the beginning of the document. Instead, the factor works with the usual positions of the type <number of the_prising, position_v_production>. At the same time, the proximity between the words is taken into account only inside the sentence. (Http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayaFormula/tekushichiekomponenty/bclmlite bclmlite)))))))))))))
|
YmwFull2
web_production: 527
Weight: -0.044940112806396 Fixed YMWFull. It differs from the previous version only by behavior on 2 -word queries. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAformula/tekushhiekomponenty/ymw Read more))
|
FullQuorum
web_production: 528
Binary factor, every word of the request is in the text or in the links
|
AuxCTextBM25
web_production: 529
'Country praets' (AUXQC)
|
AuxCLinkBM25
web_production: 530
'Country praets' (AUXQC)
|
Soft404
web_production: 531
Page - '404' (share of tokens '404' in relation to the total number of tokens on the page)
|
DBM25
web_production: 533
BM25, in which the weight of the word is machine -like
|
QueryWordCohesionTR
web_production: 534
Weight: -0.053739168786067 The factor evaluates as the words of the request is grouped with each other in the text of the document without taking into account their order. ((http://wiki.yandex-team.ru/sergejjkrylov/queryWordCohesionTR Description))
|
SegmentAuxAlphasInText
web_production: 542
Weight: 0.010581678208134 Number of letters in the AUX segment
|
SegmentAuxSpacesInText
web_production: 543
Weight: -0.011681967583253 The number of spaces in the AUX segment
|
SegmentContentCommasInText
web_production: 544
The number of commas in the Content segment
|
IsShop
web_production: 545
Weight: -0.133931985443449 Page is a store. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAformula/tekushhiekomponenty/opisanijafaktorov#SSHOP Description)). Not used (depreded)
|
AuraDocLogOrigin
web_production: 547
Logarithm of the number of shingles in the document added by the owner of the site as original texts in ((http://wiki.yandex-team.ru/jandekspoisk/jekosistema/marketingPr/webmasters/plan/vtorcontect of originality plugin)). It does not participate in the formula, it is needed to disconnect the takes
|
AuraDocMeanFltAuthorSource
web_production: 548
The average filtered number of sources of authorship of the document. It does not participate in the formula, it is needed to disconnect the takes
|
IdfVariance
web_production: 551
Weight: 0.025691573951246 Dispersion of IDF words,
|
NationalLanguage
web_production: 553
The language of the document corresponds to the country's request
|
FiltrationSegments
web_production: 561
The share of the segments of the request present in the text
|
LanguageGoodForTurkey
web_production: 562
The language of the document is one of the permissible for Turkey (Turkish, English, German, French, Arabic, Azerbaijani) or the document has zero length. In the search stage is calculated only for Isrealgeolocal requests.
|
DBM25_2
web_production: 563
Variation of Temo ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayaformula/tekushhiekomponenty/DBM25 dBM25), cm.
|
BM25FdPRFixed
web_production: 566
Weight: 0.058870258158539 BM25FDPR with standardization on the average length of the document, depending on the language of the document. ((http://wiki.yandex-team.ru/bm25frework test results.))
|
LanguagePopularity
web_production: 567
The popularity of the language of the document. Number from 0 to 1. (http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayaformula/tekushhiekomponenty/languaguaguagepopalarity)))))))
|
QueryDOwnerWeightedSumFRCAndBM25FdPRFixed
web_production: 568
Weight: 0.087850313290757 The amount of factors QueryDownerClicksFRC and BM25FDPRFIXED with scales 0.358449 and 0.184922, respectively. '565' in the name of the factor does not need to be perceived literally, it is Legashi or a typo.
|
Tocm
web_production: 572
Weight: -0.005028751679547 The factor evaluates the differences in the positions of words in the heading from the posterity in the request
|
DBM30Smerch
web_production: 576
Variation of Temo ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayaformula/tekushhiekomponenty/DBM25 dBM25), cm.
|
DssmBertDistillL2
web_production: 579
A pool of logs is marked with BERT trained on Sinsig. DSSM model is trained on this pool using BaseregionChain
|
StaticTitleComm
web_production: 583
The degree of commerce page title. Not used (depreded)
|
StaticTitleBM25Ex
web_production: 584
Weight: 0.016179974819787 BM25 page title by its text
|
StaticTitleLRBM25
web_production: 585
Weight: 0.038263040612831 BM25 page title by texts of links to it
|
TitleInLinksTrigrams
web_production: 597
Weight: -0.076334972364641 The share of unique trigrams in the trigrams of links
|
LinksInTitleTrigrams
web_production: 598
Weight: 0.019301158836494 Share of unique trigrams of links in trigrams header
|
TrashAdv
web_production: 599
The greasy of the page
|
DBM35
web_production: 606
Weight: 0.046757967567051 BM25 in texts and links with special. Libra in the level of coincidence (shape, lemma, synonym)
|
TRLRQuorumFm
web_production: 607
Weight: -0.062810308974889 The weight of the words of the request that is in the text in the exact form
|
TRLRQuorumLemma
web_production: 608
Weight: -0.003021983245146 The weight of the words of the request that is in the text with an accuracy to lemma
|
TRLRQuorumSyn
web_production: 609
The weight of the words of the request that is in the text
|
SmallWindow
web_production: 621
Maximum amount weight of the words of the request in the window of 50 words
|
FooterInLinksTrigrams
web_production: 648
The share of unique trigrams of a footer fragment in trigrams of links
|
LinksInFooterTrigrams
web_production: 649
The share of unique trigrams of links among a fragment of trigrams of a footer
|
DBM40
web_production: 652
Variation of Temo ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayaformula/tekushhiekomponenty/DBM25 dBM25), cm.
|
BM25_0
web_production: 654
Variation on the topic BM25
|
BM25_1
web_production: 655
Variation on the topic BM25
|
BM25_0123
web_production: 656
Variation on the topic BM25
|
DBMNumbers
web_production: 662
DBM separately by numbers
|
DBMGeo
web_production: 663
DBM separately by geo-objects of request
|
DBMSubstantive
web_production: 664
DBM separately on the noun
|
Bocm
web_production: 668
Evals the correspondence of the positions of words in the sentences of the document to the positions of words in the request.
|
FioMatch
web_production: 670
The document contains a name from the request.
|
HasDownloadLinkOnFile
web_production: 682
The document has a direct link to the file
|
HasDownloadLinkOnFileHosting
web_production: 683
The document has a link to filehosting
|
BclmMax
web_production: 696
The proximity of the words of the request to the most difficult word.
|
HasUserReviews
web_production: 698
The document contains user review/comment
|
DBM15Wares
web_production: 703
|
DocCreateMonth
web_production: 705
The time of creating a document with an accuracy of 1.0 is the current month, 0- 10 years ago and older. Temporarily disconnected
|
DocUpdateMonth
web_production: 706
The time for updating the document with an accuracy of 1.0 is the current month, 0- 10 years ago and older. Temporarily disconnected
|
DaterStatsYearNormLikelihood
web_production: 709
The function of the credibility of the distribution of years in the document. Temporarily disconnected
|
DaterStatsAverageSourceSegment
web_production: 712
The arithmetic mean position of dates in the document. Temporarily disconnected
|
DBM15Wares2
web_production: 713
|
Cabm
web_production: 714
BM with attenuation in the text of catalog links.
|
SegmentWordPortionFromMainContent
web_production: 723
The share of the words of the document from the segments with Score> 2.
|
SmallWindowAttenuation
web_production: 734
|
WeightedSumIsIndexPageBocm
web_production: 762
|
AuxTitleBM25
web_production: 770
TEXTBM25 is considered in the title by the text of the name of the user region - similar to the factor 268.
|
Bclmf
web_production: 771
BCLM for Annotation index, doc text and links.
|
CommercialDssmOddLike
web_production: 812
Finetuned reformulations DSSM to commercial clicked bargain odd-like target from visit log
|
FioFromOriginalRequestBodyChain0Wcm
web_production: 820
The factor according to the name from the original request is considered according to the contents of the document. Algorithm: Chain0wcm
|
DssmNavigationL2
web_production: 859
Request and documentary navigation model.
|
SmallWindowAttenuationQ
web_production: 865
|
QueryDocTitleRangesMatchingScore
web_production: 866
The factor on the text of the request and heading (Title) of the document, assessment of the compliance of numerical ranges in words-markers
|
FioFromOriginalRequestBodyMinWindowSize
web_production: 873
The factor according to the name from the original request is considered according to the contents of the document. The minimum window size, which includes all the words of the request. It is normalized for the number of words in the request.
|
FioFromOriginalRequestTextCosineMatchMaxPrediction
web_production: 874
Factor for name from the original request text of the document. Algorithm Cosinematchmaxpredical.
|
AllFioFromOriginalRequestAllMaxFBodyChain0Wcm
web_production: 875
The factor for all the name from the original request Aggregation on all extensions. Type of aggregation for extensions: the greatest value of the factor; It is considered according to the contents of the document. Algorithm: Chain0wcm
|
AllFioFromOriginalRequestAllMaxFBodyMinWindowSize
web_production: 876
The factor for all the name from the original request Aggregation on all extensions. Type of aggregation for extensions: the greatest value of the factor; It is considered according to the contents of the document. The minimum window size, which includes all the words of the request. It is normalized for the number of words in the request.
|
AllFioFromOriginalRequestAllMaxFTextCosineMatchMaxPrediction
web_production: 882
The factor for all the name from the original request Aggregation on all extensions. Type of aggregation for extensions: the greatest value of the factor; The text of the document. Algorithm Cosinematchmaxpredical.
|
AliceClickDssm
web_production: 900
DSSM CLOSE DISCOUNT according to data specific for Alice
|
TelFullAttributeTextBocm15K001
web_production: 901
The factor for telephone attributes Tel_Full from the original request text of the document. Algorithm for aggregation of the scales of words BOCM15. Normalization coefficient 0.01.
|
AliceTimespentSuffixSum
web_production: 957
The prediction of the total time spent to the end of the session, provided that this pair is implemented by the request-document
|
AliceTimespent
web_production: 958
The prediction of the contribution of this pair request-document to the timetable
|
AliceMaxPercentPlayed
web_production: 965
The prediction of the percentage of the length of the track, which will be lost subject to the implementation of this pair of the request
|
XfDtShowAllMaxFFieldSet2Bm15FLogK0001
web_production: 1025
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: BM15 in the group of streams 2. The maximum value of the factor for extensions.
|
XfDtShowAllMaxFFieldSet3BclmWeightedFLogW0K0001
web_production: 1026
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: BCLMWEIGHTEDFLOGW0 in the Stream group 3. The maximum value of the expansion factor.
|
XfDtShowAllMaxFFieldSetUTBm15FLogW0
web_production: 1027
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: BM15FLOGW0 for Urlu and Title. The maximum value of the expansion factor.
|
XfDtShowAllMaxFTextCosineMatchMaxPrediction
web_production: 1028
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: CosinemaxMatchprediction in text and Title. The maximum value of the expansion factor.
|
XfDtShowAllSumW2FSumWFieldSet1Bm15FLogK0001
web_production: 1032
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: BM15FLOG by the Stream group 1. The average balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) for extensions.
|
XfDtShowAllSumW2FSumWFieldSetUTBm15FLogW0
web_production: 1033
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: BM15FLOGW0 for Urlu and Title. The average balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) for extensions.
|
XfDtShowAllSumWFSumWBodyMinWindowSize
web_production: 1034
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: Minwindowsize in text. The average balanced values ​​of the expansion factor.
|
XfDtShowBagOfWordsFieldSetBagOfWordsOriginalRequestFractionExact
web_production: 1035
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: ORIGINALREQUARY ORIGINALREKETRACTRENEXACT for a group of streams for bag factors (text, Title, annotation streams).
|
XfDtShowBagOfWordsTitleCosineMaxMatch
web_production: 1039
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: CosinemaxMattcg bag.
|
XfDtShowTopMinWFFieldSet3BclmWeightedFLogW0K0001
web_production: 1040
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: BCLMWEIGHTEDFLOGW0 in the Stream group 3. The minimum balanced value of the factor for the expansion top.
|
XfDtShowTopSumW2FSumWBodyChain0Wcm
web_production: 1043
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: Chain0wcm in text. The average balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) according to the expansion top.
|
XfDtShowTopSumWFSumWFieldSet3BclmWeightedFLogW0K0001
web_production: 1046
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: BCLMWEIGHTEDFLOGW0 in the Stream group 3. The average balanced values ​​of the factor for the expansion top.
|
TitleBm15K01
web_production: 1092
Bm15K01 factor over hits from Title
|
TitleBocm15K001
web_production: 1093
Bocm15K001 factor over hits from Title
|
TextBm11Norm16384
web_production: 1094
Bm11Norm16384 factor over hits from Text
|
TextBocm11Norm256
web_production: 1095
Bocm11Norm256 factor over hits from Text
|
TextCosineMatchMaxPrediction
web_production: 1096
CosineMatchMaxPrediction factor over hits from Text
|
FieldSet1Bm15FLogK0001
web_production: 1097
Bm15FLogK0001 factor over hits from FieldSet1 stream
|
FieldSet2Bm15FLogK0001
web_production: 1098
Bm15FLogK0001 factor over hits from FieldSet2 stream
|
FieldSet3BclmWeightedFLogW0K0001
web_production: 1099
BclmWeightedFLogW0K0001 factor over hits from FieldSet3 stream
|
FieldSetUTBm15FLogW0K00001
web_production: 1100
Bm15FLogW0K00001 factor over hits from FieldSetUT stream
|
BodyChain0Wcm
web_production: 1101
Chain0Wcm factor over hits from Body
|
BodyPairMinProximity
web_production: 1102
PairMinProximity factor over hits from Body
|
BodyMinWindowSize
web_production: 1103
MinWindowSize factor over hits from Body
|
DssmLongMiddleShortVsHardClicks
web_production: 1219
DSSM model trained on clicks.
|
DssmLongVsMiddleShortNoClicks
web_production: 1220
DSSM model trained on clicks.
|
DssmMiddleVsShortLongHardNoClicks
web_production: 1221
DSSM model trained on clicks.
|
DssmShortVsMiddleLongHardNoClicks
web_production: 1222
DSSM model trained on clicks.
|
DssmNOVsShortMiddleLongHardClicks
web_production: 1223
DSSM model trained on clicks.
|
DssmLongVsShortMiddleHardClicks
web_production: 1224
DSSM model trained on clicks.
|
DssmMiddleLongVsShortHardClicks
web_production: 1225
DSSM model trained on clicks.
|
DssmShortMiddleLongVsHardNoClicks
web_production: 1226
DSSM model trained on clicks.
|
Medical2UrlQuality
web_production: 1227
Neural model of content quality for medical subjects
|
Medical2UrlQualityFresh
web_production: 1244
Neural model of content quality for medical subjects (for ex -)
|
FinLawUrlQuality
web_production: 1247
Neural model of content quality for financial and legal topics
|
FinLawUrlQualityFresh
web_production: 1249
Neural model of content quality for financial and legal topics (for exposures)
|
RequestWithRegionNameTextBm11Norm16384
web_production: 1255
Linguistic boosting factor. Type of extensions: Requestwithregionname. BM11 in the text and the Title of the Document
|
RequestWithRegionNameTextCosineMatchMaxPrediction
web_production: 1256
Linguistic boosting factor. Type of extensions: Requestwithregionname. Cosinematchmaxprediction on the text and dump title
|
RequestWithRegionNameFieldSet1Bm15FLogK0001
web_production: 1263
Linguistic boosting factor. Type of extensions: Requestwithregionname. Factor: BM15 in the group of streams 1.
|
RequestWithRegionNameFieldSet2Bm15FLogK0001
web_production: 1264
Linguistic boosting factor. Type of extensions: Requestwithregionname. Factor: BM15 in the group of streams 2.
|
RequestWithRegionNameFieldSet3BclmWeightedFLogW0K0001
web_production: 1265
Linguistic boosting factor. Type of extensions: Requestwithregionname. Factor: BCLMWEIGHTEDFLOGW0 in the Stream group 3.
|
RequestWithRegionNameBodyChain0Wcm
web_production: 1266
Linguistic boosting factor. Type of extensions: Requestwithregionname. Chain0WCM factor on the text of the document
|
SosUrlQuality
web_production: 1268
Neural model of content quality for SOS topics
|
SosUrlQualityFresh
web_production: 1270
Neural model of content quality for SOS subjects (for ex -)
|
AliceTimespentSum
web_production: 1273
Prediction of the time of the session, provided that this pair is requested by the request-document
|
DssmSinsigL2
web_production: 1278
Request-document model Sinsiga.
|
OriginalRequestTitleBclmMixPlainKE5
web_production: 1281
The factor for the original request. It is considered according to the heading of the document. The algorithm for aggregation of words weights is BCLMMIXPLAIN: a linear mixture of annotation BCLM weights and balanced Positionless weights of the word, then the former meters are aggregated through BM15. Normalization coefficient 10^(-5).
|
OriginalRequestTitleCMMatchTop5AvgMatchValue
web_production: 1282
The factor for the original request. It is considered according to the heading of the document. CMMATCHTOP5AVGMATCHVALUE algorithm.
|
OriginalRequestTitleWordCoverageForm
web_production: 1283
The factor for the original request. It is considered according to the heading of the document. The degree of coating of the words of the request is accurate to the form (without synonyms).
|
OriginalRequestTitleAttenV1Bm15K05
web_production: 1284
The factor for the original request. It is considered according to the heading of the document. The weight of the hit is multiplied by 1/ (1 + the position of the word in the sentence) an algorithm for aggregation of the scales of words: BM15. Normalization coefficient 0.5.
|
OriginalRequestBodyBclmMixPlainKE5
web_production: 1285
The factor for the original request. It is considered according to the contents of the document. The algorithm for aggregation of words weights is BCLMMIXPLAIN: a linear mixture of annotation BCLM weights and balanced Positionless weights of the word, then the former meters are aggregated through BM15. Normalization coefficient 10^(-5).
|
OriginalRequestBodyCosineMatchMaxPrediction
web_production: 1286
The factor for the original request. It is considered according to the contents of the document. Algorithm Cosinematchmaxpredical.
|
OriginalRequestBodyAllWcmWeightedPrediction
web_production: 1287
The factor for the original request. It is considered according to the contents of the document. Algorithm Allwcmweightedpredical.
|
OriginalRequestBodyBocm15K001
web_production: 1288
The factor for the original request. It is considered according to the contents of the document. Algorithm for aggregation of the scales of words BOCM15. Normalization coefficient 0.01.
|
OriginalRequestBodyQueryPartMatchSumValueAny
web_production: 1289
The factor for the original request. It is considered according to the contents of the document. Algorithm: Querypartmatchsumvalueany.
|
OriginalRequestBodyWordCoverageForm
web_production: 1290
The factor for the original request. It is considered according to the contents of the document. The degree of coating of the words of the request is accurate to the form (without synonyms).
|
OriginalRequestBodyWordCoverageExact
web_production: 1291
The factor for the original request. It is considered according to the contents of the document. The degree of covering the words of the request in the exact form.
|
OriginalRequestBodyBm15MaxAnnotationK001
web_production: 1292
The factor for the original request. It is considered according to the contents of the document. Libra Agnregation algorithm: BM15Maxannotation normalization coefficient 0.01.
|
DssmLogDwellTimeBigrams
web_production: 1338
DSSM model trained on clicks. Takes bigrams into account.
|
XfDtShowTopSumW2FSumWFieldSet5AvgPerTrigramMaxValueAny
web_production: 1352
Linguistic boosting factor. Type of extensions: XFDTSHOW. Factor: AVGPERGRAGRAMMAXVALEANY in the Stream group 5. The average balanced values ​​of the factor for the expansion top.
|
DssmLogDwelltimeBigramsL2
web_production: 1354
DSSM model trained on clicks. Takes bigrams into account. Embeddings for documents are computed offline.
|
DssmBigramsQueryDerivativeMin
web_production: 1356
A minimum of gradients according to the Bigramm LogdwellTime model.
|
DssmBigramsQueryDerivativeMax
web_production: 1357
Maximum from gradients according to the Bigramm Logdwelltime model.
|
DssmBigramsQueryDerivativeMoment2Central
web_production: 1358
The second central moment (dispersion) from gradients according to the Bigramm Logdwelltime model.
|
DssmBigramsQueryDerivativeMoment3Central
web_production: 1359
The third central moment from gradients according to the Bigramm Logdwelltime model.
|
QfufTopSumWFSumWFieldSet3BclmWeightedFLogW0K0001
web_production: 1390
Linguistic boosting factor. Type of extensions: QFUF. Factor: BCLMWEIGHTEDFLOGW0_K0.001 FIELDSET3. The average balanced values ​​of the TOP-10 factor by extensions.
|
QueryToTextAllSumWFSumWBodyMinWindowSize
web_production: 1391
Linguistic boosting factor. Type of extensions: Querytotext. Factor: by minwindowsize according to the contents of the document. The average balanced values ​​of the expansion factor.
|
QueryToTextTopMinWFBodyMinWindowSize
web_production: 1394
Linguistic boosting factor. Type of extensions: Querytotext. Factor: Minwindowsize according to the contents of the document. The average balanced values ​​of the TOP-10 factor by extensions.
|
QfufAllMaxFFieldSetUTBm15FLogW0K00001
web_production: 1395
Linguistic boosting factor. Type of extensions: QFUF. Factor: BM15FLOGW0_K0.0001 on Ural and the heading. The maximum value of the expansion factor.
|
QfufAllSumWFSumWFieldSet3BclmWeightedFLogW0K0001
web_production: 1396
Linguistic boosting factor. Type of extensions: QFUF. Factor: BCLMWEIGHTEDFLOGW0_K0.001 FIELDSET3. The average balanced values ​​of the expansion factor.
|
QueryToTextAllSumFCountBodyPairMinProximity
web_production: 1398
Linguistic boosting factor. Type of extensions: Querytotext. Factor: PairminProximity according to the contents of the document. The average values ​​of the expansion factor.
|
QueryToTextAllSumFCountTextBocm11Norm256
web_production: 1400
Linguistic boosting factor. Type of extensions: Querytotext. Factor: Bocm11_norm256 according to the text of the document. The average values ​​of the expansion factor.
|
QfufAllMaxFTextCosineMatchMaxPrediction
web_production: 1401
Linguistic boosting factor. Type of extensions: QFUF. Factor: COSINEMATCHMAXPRECTION on the text of the document. The maximum value of the expansion factor.
|
QfufTopSumW2FSumWFieldSet1Bm15FLogK0001
web_production: 1402
Linguistic boosting factor. Type of extensions: QFUF. Factor: BM15FLOG_K0.001 according to Fieldset1. The average balanced values ​​of the factor with a quadratic weight in the top 10 in terms of factor value by extensions.
|
QfufAllMaxFTextBocm11Norm256
web_production: 1403
Linguistic boosting factor. Type of extensions: QFUF. Factor: Bocm11_norm256 according to the text of the document. The maximum value of the expansion factor.
|
QfufTopSumWFSumWFieldSetUTBm15FLogW0K00001
web_production: 1404
Linguistic boosting factor. Type of extensions: QFUF. Factor: BM15FLOGW0_K0.0001 on Ural and the heading. The average balanced values ​​of the expansion factor.
|
DssmOneClickProbability
web_production: 1405
DSSM model trained on clicks, target=OneClicks/Clicks. Takes bigrams into account.
|
DssmQueryDwellTime
web_production: 1406
DSSM model trained on clicks, target=QueryDwellTime stream value. Takes bigrams into account.
|
AllMatchedWordWeightsSum
web_production: 1407
The normalized amount of the scales of the words of the request that met in the text of the document or links to it.
|
StringMatchedWordWeightsSum
web_production: 1408
The normalized amount of the scales of the words of the request that Equal_by_String in the text of the document or links to it.
|
AllMatchedWordWeightsSumText
web_production: 1409
The normalized amount of the scales of the words of the request that met in the text of the document.
|
AllMatchedWordWeightsSumLink
web_production: 1410
The normalized amount of the scales of the words of the request that met in the links to the document.
|
StringMatchedWordWeightsSumLink
web_production: 1411
The normalized amount of the scales of the words of the request that Equal_by_String in the links to the document.
|
AllMatchedWordFiltrationModelWeightsSum
web_production: 1412
The normalized scales for the IFILTRETRATIONMODEL words of the request that met in the text of the document or links to it.
|
StringMatchedWordFiltrationModelWeightsSum
web_production: 1413
The normalized scales for the IFILTRETRATIONMODEL Words of the request, which are Equal_by_String in the text of the document or links to it.
|
LemmaMatchedWordFiltrationModelWeightsSum
web_production: 1414
The normalized scales for the IFILTRETRATIONMODEL Words of the request, which Equal_by_lemma in the text of the document or links to it.
|
AllMatchedWordFiltrationModelWeightsSumLink
web_production: 1415
The normalized scales for the IFILTRETRATIONMODEL words of the request that met in links to the document.
|
StringMatchedWordFiltrationModelWeightsSumLink
web_production: 1416
The normalized scales for the IFILTRETRATIONMODEL Words of the request, which Equal_by_String in the links to the document.
|
DssmLanguageClassifierRusL2
web_production: 1425
Document DSSM model Language Classifier Rus.
|
DssmLanguageClassifierEngL2
web_production: 1426
Document DSSM model Language Classifier Eng.
|
DssmLanguageClassifierOthL2
web_production: 1427
Document DSSM model Language Classifier Other.
|
alice_aramusic_dssm
web_production: 1430
|
AliceMusicRelevanceDssm
web_production: 1431
DSSM Prediction to determine Alice's irrelevant answers
|
BM25FdPRFixedNoLinks
web_production: 1462
BM25FDPR with standardization on the average length of the document, depending on the language of the document. Only texts are used.
|
NoApproxSmallWindowAttenuation
web_production: 1470
|
NoApproxSmallWindowAttenuationQ
web_production: 1471
|
DssmMainContentKeywords
web_production: 1472
Query-MainContentKeywords similarity, target: logDwellTime
|
DssmCtrNoMiner
web_production: 1504
DSSM model trained on CTRs without miner.
|
DssmQueryUrlTitleRegChainClicksOdd
web_production: 1513
DSSM model trained on click odd pool
|
DssmQueryUrlTitleRegChainClicksPers
web_production: 1514
DSSM model trained on click personalization pool
|
DssmQueryUrlTitleRegChainClicksTrFull
web_production: 1515
DSSM model trained on click triangle pool
|
DssmLogDtBigramsAMHardQueriesNoClicks
web_production: 1523
DSSM model trained on clicks without miner (with no-clicks and AM-hard negatives). Takes bigrams into account.
|
XfDtShowKnnAllMaxWFFieldSet3BclmWeightedFLogW0K0001
web_production: 1573
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BCLMWEIGHTEDFLOGW0 in the Stream group 3. The maximum balanced value of the factor.
|
XfDtShowKnnAllMaxWFFieldSet2Bm15FLogK0001
web_production: 1574
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BM15FLOG in the group of streams 2. The maximum balanced value of the factor.
|
XfDtShowKnnBagOfWordsFieldSetBagOfWordsOriginalRequestFraction
web_production: 1575
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: ORIGINALREQUENTFRACTFRACTION OF THE FIELDSETBAGOFWORDS Stream.
|
XfDtShowKnnAllMaxWFSumWQueryDwellTimeMixMatchWeightedValue
web_production: 1576
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: MixmatchweightedValue by Stream Querydwelltime. The maximum balanced value of the factor is normalized for the total weight.
|
XfDtShowKnnAllSumW2FSumWTitleBm15K01
web_production: 1577
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BM15 according to Stream Title. The total balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) normalized for total weight.
|
XfDtShowKnnTopMinFFieldSet3BclmWeightedFLogW0K0001
web_production: 1578
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BCLMWEIGHTEDFLOGW0 in the Stream group 3. The minimum value of the factor for the expansion top.
|
XfDtShowKnnAllSumW2FSumWFieldSet3BclmWeightedFLogW0K0001
web_production: 1579
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BCLMWEIGHTEDFLOGW0 in the Stream group 3. The total balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) normalized for the total weight.
|
XfDtShowKnnAllMaxWFFieldSet1Bm15FLogK0001
web_production: 1580
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BM15FLOG in the Stream group 1. The maximum balanced value of the factor.
|
XfDtShowKnnAllSumWFSumWFieldSet1Bm15FLogK0001
web_production: 1581
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BM15FLOG in the Stream group 1. The total balanced value of the factor is normalized for the total weight.
|
XfDtShowKnnBagOfWordsLongClickSPAnnotationMatchAvgValue
web_production: 1582
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: Bag AnnotationMatChavgvalue by Stream LongClicksp.
|
XfDtShowKnnTopSumW2FSumWFieldSet1Bm15FLogK0001
web_production: 1583
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BM15FLOG for the Stream group 1. The total balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) for expansion top extensions, standardized for the total weight of the expansion top.
|
XfDtShowKnnTopMinWFMaxWFieldSet1Bm15FLogK0001
web_production: 1584
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BM15FLOG in the Stream group 1. The minimum balanced value of the factor for the expansion top extensions normalized for the maximum weight by the expansion top.
|
XfDtShowKnnAllMaxWFSumWBodyPairMinProximity
web_production: 1585
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: PairminProximity according to Stream Body. The maximum balanced value of the factor is normalized for the total weight.
|
XfDtShowKnnAllSumW2FSumWFieldSet1Bm15FLogK0001
web_production: 1586
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: BM15FLOG for the Stream group 1. The total balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) normalized for total weight.
|
XfDtShowKnnBagOfWordsSimpleClickAnnotationMatchAvgValue
web_production: 1587
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: SIMPLECLIC SIMPLECLICS bag.
|
XfDtShowKnnBagOfWordsTitleCosineMaxMatch
web_production: 1588
Linguistic boosting factor. Type of extensions: XFDTSHOWKNN. Factor: CosinemaxMatch bag according to Title Stream.
|
DssmLogDtBigramsAMHardQueriesNoClicksMixed
web_production: 1596
DSSM model trained on clicks without miner (with no-clicks and am_hard negatives 50/50 and then on am_hard negatives only). Takes bigrams into account.
|
QueryToTextByXfDtShowKnnAllSumW2FSumWTextBocm11Norm256
web_production: 1615
Linguistic boosting factor. Type of extensions: Querytotextbyxfdtshowknn. Factor: Norm256 by stream BOCM11. The total balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}).
|
QueryToTextByXfDtShowKnnTopSumW2FSumWBodyMinWindowSize
web_production: 1616
Linguistic boosting factor. Type of extensions: Querytotextbyxfdtshowknn. Factor: Minwindowsize by Stream Body. The total balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) by the expansion top, normalized for the total weight according to the expansion top.
|
QueryToTextByXfDtShowKnnAllSumW2FSumWBodyMinWindowSize
web_production: 1617
Linguistic boosting factor. Type of extensions: Querytotextbyxfdtshowknn. Factor: Minwindowsize by Stream Body. The total balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) normalized for total weight.
|
QueryToTextByXfDtShowKnnTopSumW2FSumWTextBocm11Norm256
web_production: 1618
Linguistic boosting factor. Type of extensions: Querytotextbyxfdtshowknn. Factor: Norm256 by stream BOCM11. The total balanced values ​​of the factor multiplied by weight (\ frac {\ sum w_i * (w_i * f_i)} {\ sum w_i}) according to the expansion top.
|
QueryToTextByXfDtShowKnnAllMinW
web_production: 1619
Linguistic boosting factor. Type of extensions: Querytotextbyxfdtshowknn. The minimum expansion weight.
|
QueryToTextByXfDtShowKnnAllAvgW
web_production: 1620
Linguistic boosting factor. Type of extensions: Querytotextbyxfdtshowknn. The arithmetic mean of expansion weights.
|
QueryToTextByXfDtShowKnnAllTotalW
web_production: 1621
Linguistic boosting factor. Type of extensions: Querytotextbyxfdtshowknn. The total weight of the extensions.
|
QueryToTextByXfDtShowKnnBagOfWordsFieldSetBagOfWordsOriginalRequestFraction
web_production: 1622
Linguistic boosting factor. Type of extensions: Querytotextbyxfdtshowknn. Factor: ORIGINALREQUENTFRACTFRACTION OF THE FIELDSETBAGOFWORDS Stream.
|
UnexpectedTrashUrlQuality
web_production: 1656
Neural document model for finding unexpected tin
|
RequestWithoutVerbsTitleBm15K01
web_production: 1713
The initial request with the removal of verbs. It is considered according to the heading of the document. The algorithm for aggregation of the scales of words: BM15. Normalization coefficient 0.1.
|
RequestWithoutVerbsFieldSetUTBm15FLogW0K00001
web_production: 1714
The initial request with the removal of verbs. It is considered to be composational stream, consisting of an tokenized Url and a title of a document. The algorithm for aggregation of the scales of words: BM15FLOGW0. Normalization coefficient 0.0001.
|
RequestWithoutVerbsSumWBodyMinWindowSize
web_production: 1715
The initial request with the removal of verbs. It is considered according to the contents of the document. The minimum window size, which includes all the words of the request. It is normalized for the number of words in the request.
|
DssmPantherTerms
web_production: 1773
|
NeuroTextModelLongClickPredictorByWordAndBigramCountersWithSSHards
web_production: 1845
The result of the use of a neural model, trained to distinguish long clicks from other events, the input of the model is the ambassadors and bigram meters, calculated by text streams (Title, Body, URL).
|
QfufFilteredByXfOneSeAllMaxFFieldSet2Bm15FLogK0001
web_production: 1847
Linguistic boosting factor. Type of extensions: QFUFFILTEDBYXFONSE (QFUF, filtered on the DSSM models Xfonese). Aggregation on all extensions. The greatest value of the factor. Into aircraft association of the URLs, Title, Body, Correctedctr, Longclick, OneClick, Browserpagerank, Splitdwelltime, SampleperiodDayFrc, SimpleClick, Yabarvisits, Yabartime. The algorithm for aggregation of the scales of words: BM15FLOG (BM15 Aggregation of Logarithm of Construction of Words). Normalization coefficient 0.001.
|
QfufFilteredByXfOneSeAllMaxFFieldSet3BclmWeightedFLogW0K0001
web_production: 1848
Linguistic boosting factor. Type of extensions: QFUFFILTEDBYXFONSE (QFUF, filtered on the DSSM models Xfonese). Aggregation on all extensions. The greatest value of the factor. Rebelled association of streams Title, Body, LongClick, LongClicksp, OneClick. The algorithm for aggregation of the scales of words: BCLMWEIGHTEDFLOGW0. Normalization coefficient 0.001.
|
QfufFilteredByXfOneSeAllMaxFFieldSetUTBm15FLogW0K00001
web_production: 1849
Linguistic boosting factor. Type of extensions: QFUFFILTEDBYXFONSE (QFUF, filtered on the DSSM models Xfonese). Aggregation on all extensions. The greatest value of the factor. It is considered to be composational stream, consisting of an tokenized Url and a title of a document. The algorithm for aggregation of the scales of words: BM15FLOGW0. Normalization coefficient 0.0001.
|
QfufFilteredByXfOneSeAllMaxFTitleBm15K01
web_production: 1850
Linguistic boosting factor. Type of extensions: QFUFFILTEDBYXFONSE (QFUF, filtered on the DSSM models Xfonese). Aggregation on all extensions. The greatest value of the factor. It is considered according to the heading of the document. The algorithm for aggregation of the scales of words: BM15. Normalization coefficient 0.1.
|
QfufFilteredByXfOneSeTopSumWFSumWFieldSet2Bm15FLogK0001
web_production: 1851
Linguistic boosting factor. Type of extensions: QFUFFILTEDBYXFONSE (QFUF, filtered on the DSSM models Xfonese). Aggregation by TOP-10 (by the value of the factor) extensions. A suspended sum of the Libra of factors. Normalized for the total weight of extensions. Into aircraft association of the URLs, Title, Body, Correctedctr, Longclick, OneClick, Browserpagerank, Splitdwelltime, SampleperiodDayFrc, SimpleClick, Yabarvisits, Yabartime. The algorithm for aggregation of the scales of words: BM15FLOG (BM15 Aggregation of Logarithm of Construction of Words). Normalization coefficient 0.001.
|
QfufFilteredByXfOneSeTopSumWFSumWBodyMinWindowSize
web_production: 1852
Linguistic boosting factor. Type of extensions: QFUFFILTEDBYXFONSE (QFUF, filtered on the DSSM models Xfonese). Aggregation by TOP-10 (by the value of the factor) extensions. A suspended sum of the Libra of factors. Normalized for the total weight of extensions. It is considered according to the contents of the document. The minimum window size, which includes all the words of the request. It is normalized for the number of words in the request.
|
OriginalRequestWordsFilteredByDssmSSHardFieldSet1Bm15FLogK0001
web_production: 1853
The factor for the filtered original request: the DSSM state from the request is calculated without words to the initial request, after which the threshold is cut off. Into aircraft association of the URLs, Title, Body, Links, Correctedctr, LongClick, OneClick, Browserpagerank, Splitdwelltime, SampleperiodDayFrc, SimpleClick, Yabarvisits, Yabartime. The algorithm for aggregation of the scales of words: BM15FLOG (BM15 Aggregation of Logarithm of Construction of Words). Normalization coefficient 0.001.
|
OriginalRequestWordsFilteredByDssmSSHardFieldSetUTBm15FLogW0K00001
web_production: 1854
The factor for the filtered original request: the DSSM state from the request is calculated without words to the initial request, after which the threshold is cut off. It is considered to be composational stream, consisting of an tokenized Url and a title of a document. The algorithm for aggregation of the scales of words: BM15FLOGW0. Normalization coefficient 0.0001.
|
DssmCtrEngSsHard
web_production: 1855
DSSM model trained on cross language CTRs using serp similarity hard miner.
|
FractionOfPresentedInTitleWordsWithWeightsByDssmSSHardModel
web_production: 1857
For all words of the request, the weight is calculated by the Query-Mutation method (the distance between the requests in nash and there is no word). The sum of the scales of the words found in the title is taken, divided by the sum of the scales of all words.
|
MaxWeightOfAbsentInTitleWordsWithWeightsByDssmSSHardModel
web_production: 1858
For all words of the request, the weight is calculated by the Query-Mutation method (the distance between the requests in nash and there is no word). Maximum weight is taken among words absent in the title of the document.
|
NeuroTextModelLongClickPredictorByWordAndBigramCountersWithoutTitleWithSSHards
web_production: 1859
The result of the use of a neural model, trained to distinguish long clicks from other events, the input of the model is the ambassadors and bigram meters calculated by text streams (Body, URL).
|
DaterAddTime80Hours
web_production: 1861
It is considered as (80-x) where X is the return of the document in the clock (continuously). Uses the data of the Robotaddtime dates
|
DaterAddTime10Days
web_production: 1862
It is considered as (10-x) where X is the return of the document in days (continuously). Uses the data of the Robotaddtime dates
|
DaterAge10Days
web_production: 1863
The difference between the current date and the date of the document, determined by the Robotaddtime, 1 - the date is equal to the current, 0 - the document of 10 days or more, or the date is not determined
|
XfOneSeKnnAllMaxWFMaxWFieldSet1Bm15FLogK0001
web_production: 1864
Linguistic boosting factor. Type of extensions: XFONESEKNN (closest to the DSSM models trained to predict XFDTSHOW of extension). Aggregation on all extensions. The greatest balanced value of the factor. It is normalized for the maximum weight of expansion. Into aircraft association of the URLs, Title, Body, Links, Correctedctr, LongClick, OneClick, Browserpagerank, Splitdwelltime, SampleperiodDayFrc, SimpleClick, Yabarvisits, Yabartime. The algorithm for aggregation of the scales of words: BM15FLOG (BM15 Aggregation of Logarithm of Construction of Words). Normalization coefficient 0.001.
|
QueryToTextByXfOneSeKnnTopSumWFSumWBodyMinWindowSize
web_production: 1866
Linguistic boosting factor. Type of extensions: QuerytotextByxfoneKnn (Querytotext extensions of Xfoneeseknn extensions). Aggregation by TOP-10 (by the value of the factor) extensions. A suspended sum of the Libra of factors. Normalized for the total weight of extensions. It is considered according to the contents of the document. The minimum window size, which includes all the words of the request. It is normalized for the number of words in the request.
|
QueryToTextByXfOneSeKnnAllSumWFSumWFieldSet3BclmWeightedFLogW0K0001
web_production: 1867
Linguistic boosting factor. Type of extensions: QuerytotextByxfoneKnn (Querytotext extensions of Xfoneeseknn extensions). Aggregation on all extensions. A suspended sum of the Libra of factors. Normalized for the total weight of extensions. Rebelled association of streams Title, Body, LongClick, LongClicksp, OneClick. The algorithm for aggregation of the scales of words: BCLMWEIGHTEDFLOGW0. Normalization coefficient 0.001.
|
ReformulationsLongestClickLogDt
web_production: 1885
DSSM model that predicts the logarithm of the longest click on the Serpa. As negative examples, select Urla from past requests of the same user, and the maximum time between requests is no more than 7 minutes (super -cords for reformulations)
|
ReformulationsLongestClickLogDtEarlyBindingDssm
web_production: 1892
DSSM model with early binding, trained in reformulations, which predicts the logarithm of the longest click on the Serpa.
|
HitContextsDssm
web_production: 1896
Neural network value for contexts of query hits in document text. Predicts relevance-all-8-years. Uses formula ussr-dump-20190719 prs-20190720 all-8-years [t > 0.25] CrossEntropy 20k 0.25 -S 0.8 -Z 1 predictions for learning.
|
DssmReformulationsWithExtensions
web_production: 1898
DSSM model trained on a reformal pool, which in the request, in addition to the request itself, receives 4 extensions of the XFDT with the largest weight
|
DssmFomula8YearsCe25Prediction
web_production: 1906
A model trained to predict an assessment of the USSR-DUMP-20190719 PRS-20190720 ALL-8-YEARS [T> 0.25] Crossentropy 20K 0.25 -s 0.8 -z 1.
|
UnexpectedTrashUrlQualityFresh
web_production: 1909
Neuron document model for finding unexpected tin (for ex -)
|
DssmFomula8YearsCe25PredictionRatings
web_production: 1912
A model trained to predict an assessment of the USSR-DUMP-20190719 PRS-20190720 ALL-8-YEARS [T> 0.25] Crossentropy 20K 0.25 -s 0.8 -z 1 and an educational study on assessments of relevance.
|
QueryInDirectOfferMax
alice_direct_scenario: 0
Max percent of query words in DirectOffer
|
QueryInDirectOfferMean
alice_direct_scenario: 1
Mean percent of query words in DirectOffer
|
QueryInDirectOfferMin
alice_direct_scenario: 2
Min percent of query words in DirectOffer
|
DirectOfferInQueryMax
alice_direct_scenario: 3
Max percent of DirectOffer words in query
|
DirectOfferInQueryMean
alice_direct_scenario: 4
Mean percent of DirectOffer words in query
|
DirectOfferInQueryMin
alice_direct_scenario: 5
Min percent of DirectOffer words in query
|
QueryInDirectOfferPrefixMax
alice_direct_scenario: 6
Max percent of query words in DirectOffer prefix (query length)
|
QueryInDirectOfferPrefixMean
alice_direct_scenario: 7
Mean percent of query words in DirectOffer prefix (query length)
|
QueryInDirectOfferPrefixMin
alice_direct_scenario: 8
Min percent of query words in DirectOffer prefix (query length)
|
QueryInDirectOfferDoublePrefixMax
alice_direct_scenario: 9
Max percent of query words in DirectOffer prefix (2X query length)
|
QueryInDirectOfferDoublePrefixMean
alice_direct_scenario: 10
Mean percent of query words in DirectOffer prefix (2X query length)
|
QueryInDirectOfferDoublePrefixMin
alice_direct_scenario: 11
Min percent of query words in DirectOffer prefix (2X query length)
|
QueryInDirectTitleMax
alice_direct_scenario: 12
Max percent of query words in DirectTitle
|
QueryInDirectTitleMean
alice_direct_scenario: 13
Mean percent of query words in DirectTitle
|
QueryInDirectTitleMin
alice_direct_scenario: 14
Min percent of query words in DirectTitle
|
DirectTitleInQueryMax
alice_direct_scenario: 15
Max percent of DirectTitle words in query
|
DirectTitleInQueryMean
alice_direct_scenario: 16
Mean percent of DirectTitle words in query
|
DirectTitleInQueryMin
alice_direct_scenario: 17
Min percent of DirectTitle words in query
|
QueryInDirectTitlePrefixMax
alice_direct_scenario: 18
Max percent of query words in DirectTitle prefix (query length)
|
QueryInDirectTitlePrefixMean
alice_direct_scenario: 19
Mean percent of query words in DirectTitle prefix (query length)
|
QueryInDirectTitlePrefixMin
alice_direct_scenario: 20
Min percent of query words in DirectTitle prefix (query length)
|
QueryInDirectTitleDoublePrefixMax
alice_direct_scenario: 21
Max percent of query words in DirectTitle prefix (2X query length)
|
QueryInDirectTitleDoublePrefixMean
alice_direct_scenario: 22
Mean percent of query words in DirectTitle prefix (2X query length)
|
QueryInDirectTitleDoublePrefixMin
alice_direct_scenario: 23
Min percent of query words in DirectTitle prefix (2X query length)
|
QueryInDirectInfoMax
alice_direct_scenario: 24
Max percent of query words in DirectInfo
|
QueryInDirectInfoMean
alice_direct_scenario: 25
Mean percent of query words in DirectInfo
|
QueryInDirectInfoMin
alice_direct_scenario: 26
Min percent of query words in DirectInfo
|
DirectInfoInQueryMax
alice_direct_scenario: 27
Max percent of DirectInfo words in query
|
DirectInfoInQueryMean
alice_direct_scenario: 28
Mean percent of DirectInfo words in query
|
DirectInfoInQueryMin
alice_direct_scenario: 29
Min percent of DirectInfo words in query
|
QueryInDirectInfoPrefixMax
alice_direct_scenario: 30
Max percent of query words in DirectInfo prefix (query length)
|
QueryInDirectInfoPrefixMean
alice_direct_scenario: 31
Mean percent of query words in DirectInfo prefix (query length)
|
QueryInDirectInfoPrefixMin
alice_direct_scenario: 32
Min percent of query words in DirectInfo prefix (query length)
|
QueryInDirectInfoDoublePrefixMax
alice_direct_scenario: 33
Max percent of query words in DirectInfo prefix (2X query length)
|
QueryInDirectInfoDoublePrefixMean
alice_direct_scenario: 34
Mean percent of query words in DirectInfo prefix (2X query length)
|
QueryInDirectInfoDoublePrefixMin
alice_direct_scenario: 35
Min percent of query words in DirectInfo prefix (2X query length)
|
QueryInResultTrackNameRatio
alice_music_scenario: 1
Percent of query words in result track name
|
ResultTrackNameInQueryRatio
alice_music_scenario: 2
Percent of result track name words in query
|
QueryInResultAlbumNameRatio
alice_music_scenario: 3
Percent of query words in result album name
|
ResultAlbumNameInQueryRatio
alice_music_scenario: 4
Percent of result album name words in query
|
QueryInResultArtistNameRatio
alice_music_scenario: 5
Percent of query words in result artist name
|
ResultArtistNameInQueryRatio
alice_music_scenario: 6
Percent of result artist name words in query
|
QueryInWizardTitleRatio
alice_music_scenario: 7
Percent of query words in wizard title
|
WizardTitleInQueryRatio
alice_music_scenario: 8
Percent of wizard title words in query
|
QueryInWizardTrackNameRatio
alice_music_scenario: 9
Percent of query words in wizard track name
|
WizardTrackNameInQueryRatio
alice_music_scenario: 10
Percent of wizard track name words in query
|
QueryInWizardAlbumNameRatio
alice_music_scenario: 11
Percent of query words in wizard album name
|
WizardAlbumNameInQueryRatio
alice_music_scenario: 12
Percent of wizard album name words in query
|
QueryInWizardArtistNameRatio
alice_music_scenario: 13
Percent of query words in wizard artist name
|
WizardArtistNameInQueryRatio
alice_music_scenario: 14
Percent of wizard artist name words in query
|
QueryInWizardTrackLyricsRatio
alice_music_scenario: 15
Percent of query words in wizard track lyrics
|
WizardTrackLyricsInQueryRatio
alice_music_scenario: 16
Percent of wizard track lyrics words in query
|
QueryInDocumentsTitleRatioMin
alice_music_scenario: 17
Min percent of query words in documents title
|
QueryInDocumentsTitleRatioMean
alice_music_scenario: 18
Mean percent of query words in documents title
|
QueryInDocumentsTitleRatioMax
alice_music_scenario: 19
Max percent of query words in documents title
|
DocumentsTitleInQueryRatioMin
alice_music_scenario: 20
Min percent of documents title words in query
|
DocumentsTitleInQueryRatioMean
alice_music_scenario: 21
Mean percent of documents title words in query
|
DocumentsTitleInQueryRatioMax
alice_music_scenario: 22
Max percent of documents title words in query
|
QueryInDocumentsSnippetRatioMin
alice_music_scenario: 23
Min percent of query words in documents snippet
|
QueryInDocumentsSnippetRatioMean
alice_music_scenario: 24
Mean percent of query words in documents snippet
|
QueryInDocumentsSnippetRatioMax
alice_music_scenario: 25
Max percent of query words in documents snippet
|
DocumentsSnippetInQueryRatioMin
alice_music_scenario: 26
Min percent of documents snippet words in query
|
DocumentsSnippetInQueryRatioMean
alice_music_scenario: 27
Mean percent of documents snippet words in query
|
DocumentsSnippetInQueryRatioMax
alice_music_scenario: 28
Max percent of documents snippet words in query
|
QueryInTitleMean
alice_search_scenario: 20
Mean percent of query words in title
|
QueryInTitleMin
alice_search_scenario: 21
Min percent of query words in title
|
TitleInQueryMax
alice_search_scenario: 22
Max percent of title words in query
|
TitleInQueryMean
alice_search_scenario: 23
Mean percent of title words in query
|
TitleInQueryMin
alice_search_scenario: 24
Min percent of title words in query
|
PrefixMax
alice_search_scenario: 25
Max percent of query words in title prefix (query length)
|
PrefixMean
alice_search_scenario: 26
Mean percent of query words in title prefix (query length)
|
PrefixMin
alice_search_scenario: 27
Min percent of query words in title prefix (query length)
|
DoublePrefixMax
alice_search_scenario: 28
Max percent of query words in title prefix (2X query length)
|
DoublePrefixMean
alice_search_scenario: 29
Mean percent of query words in title prefix (2X query length)
|
DoublePrefixMin
alice_search_scenario: 30
Min percent of query words in title prefix (2X query length)
|
QueryInHeadlineMax
alice_search_scenario: 31
Max percent of query words in headline
|
QueryInHeadlineMean
alice_search_scenario: 32
Mean percent of query words in headline
|
QueryInHeadlineMin
alice_search_scenario: 33
Min percent of query words in headline
|
HeadlineInQueryMax
alice_search_scenario: 34
Max percent of headline words in query
|
HeadlineInQueryMean
alice_search_scenario: 35
Mean percent of headline words in query
|
HeadlineInQueryMin
alice_search_scenario: 36
Min percent of headline words in query
|
HeadlinePrefixMax
alice_search_scenario: 37
Max percent of query words in headline prefix (query length)
|
HeadlinePrefixMean
alice_search_scenario: 38
Mean percent of query words in headline prefix (query length)
|
HeadlinePrefixMin
alice_search_scenario: 39
Min percent of query words in headline prefix (query length)
|
DoubleHeadlinePrefixMax
alice_search_scenario: 40
Max percent of query words in headline prefix (2X query length)
|
DoubleHeadlinePrefixMean
alice_search_scenario: 41
Mean percent of query words in headline prefix (2X query length)
|
DoubleHeadlinePrefixMin
alice_search_scenario: 42
Min percent of query words in headline prefix (2X query length)
|
ItemSelectorConfidence
alice_video_scenario: 14
Confidence of item selector in current gallery
|
QueryInItemNameMax
alice_video_scenario: 15
Max percent of query words in ItemName
|
QueryInItemNameMean
alice_video_scenario: 16
Mean percent of query words in ItemName
|
QueryInItemNameMin
alice_video_scenario: 17
Min percent of query words in ItemName
|
ItemNameInQueryMax
alice_video_scenario: 18
Max percent of ItemName words in query
|
ItemNameInQueryMean
alice_video_scenario: 19
Mean percent of ItemName words in query
|
ItemNameInQueryMin
alice_video_scenario: 20
Min percent of ItemName words in query
|
QueryInItemNamePrefixMax
alice_video_scenario: 21
Max percent of query words in ItemName prefix (query length)
|
QueryInItemNamePrefixMean
alice_video_scenario: 22
Mean percent of query words in ItemName prefix (query length)
|
QueryInItemNamePrefixMin
alice_video_scenario: 23
Min percent of query words in ItemName prefix (query length)
|
QueryInItemNameDoublePrefixMax
alice_video_scenario: 24
Max percent of query words in ItemName prefix (2X query length)
|
QueryInItemNameDoublePrefixMean
alice_video_scenario: 25
Mean percent of query words in ItemName prefix (2X query length)
|
QueryInItemNameDoublePrefixMin
alice_video_scenario: 26
Min percent of query words in ItemName prefix (2X query length)
|
QueryInItemDescriptionMax
alice_video_scenario: 27
Max percent of query words in ItemDescription
|
QueryInItemDescriptionMean
alice_video_scenario: 28
Mean percent of query words in ItemDescription
|
QueryInItemDescriptionMin
alice_video_scenario: 29
Min percent of query words in ItemDescription
|
ItemDescriptionInQueryMax
alice_video_scenario: 30
Max percent of ItemDescription words in query
|
ItemDescriptionInQueryMean
alice_video_scenario: 31
Mean percent of ItemDescription words in query
|
ItemDescriptionInQueryMin
alice_video_scenario: 32
Min percent of ItemDescription words in query
|
QueryInItemDescriptionPrefixMax
alice_video_scenario: 33
Max percent of query words in ItemDescription prefix (query length)
|
QueryInItemDescriptionPrefixMean
alice_video_scenario: 34
Mean percent of query words in ItemDescription prefix (query length)
|
QueryInItemDescriptionPrefixMin
alice_video_scenario: 35
Min percent of query words in ItemDescription prefix (query length)
|
QueryInItemDescriptionDoublePrefixMax
alice_video_scenario: 36
Max percent of query words in ItemDescription prefix (2X query length)
|
QueryInItemDescriptionDoublePrefixMean
alice_video_scenario: 37
Mean percent of query words in ItemDescription prefix (2X query length)
|
QueryInItemDescriptionDoublePrefixMin
alice_video_scenario: 38
Min percent of query words in ItemDescription prefix (2X query length)
|
ItemSelectorConfidenceByName
alice_video_scenario: 40
Confidence of item selector by name in current gallery
|
ItemSelectorConfidenceByNumber
alice_video_scenario: 41
Confidence of item selector by number in current gallery
|
AbsolutePLM
collections_production: 3
|
Bclm
collections_production: 4
|
TxtBm25Sy
collections_production: 5
|
DocLen
collections_production: 6
|
Bclm2
collections_production: 7
|
Tocm
collections_production: 8
|
TitleTrigramsTitle
collections_production: 9
|
TextBM25_Fm_W1
collections_production: 10
|
TxtBm25Ex
collections_production: 11
|
TextBM25
collections_production: 12
|
TextBM25_Sy_W1
collections_production: 13
|
TxtHeadSy
collections_production: 14
|
YmwFull2
collections_production: 15
|
TxtHeadEx
collections_production: 16
|
TxtHead
collections_production: 17
|
TxtBreakSy
collections_production: 18
|
TxtBreakEx
collections_production: 19
|
RussianSrcOwnersShare
images_cbir: 8
Fraction of hosts with Russian language
|
GruesomeCombined
images_l1: 93
The result of the aggregated tin classifier is used on average to determine tin queries
|
RussianSrcOwnersShare
images_market_l4: 366
Fraction of hosts with Russian language
|
RussianSrcOwnersShare
images_market: 372
Fraction of hosts with Russian language
|
VwChildPorn
images_new_runtime_doc_features: 1
The value of the DP classifier is used to filter on average
|
VwDwellTime
images_new_runtime_doc_features: 23
The result of the text classifier of long views via VowPal Wabbit
|
GruesomeCombined
images_new_runtime_doc_features: 28
The result of the aggregated tin classifier is used on average to determine tin queries
|
DocIdfSumFixed
images_new_runtime_doc_features: 49
Previous factors - fixed
|
VwSuggestive
images_new_runtime_doc_features: 50
The result of the Suggestive text classifier via VowPal Wabbit
|
VwPorno2
images_new_runtime_doc_features: 59
The result of the text classifier of porn viapal wabbit
|
VwGruesome2
images_new_runtime_doc_features: 60
The result of the text classifier of tin according to Vowpal Wabbit
|
ImagePorno4
images_new_runtime_doc_features: 85
Image porn classifier output
|
TurkishSrcOwnersShare
images_new_runtime_doc_features: 86
Fraction of hosts with Turkish language
|
RussianSrcOwnersShare
images_new_runtime_doc_features: 88
Fraction of hosts with Russian language
|
VwChildPorn
images_production: 1
The value of the DP classifier is used to filter on average
|
ImageLangsFound
images_production: 12
|
ImageNearbyTextBm15MaxK3MaxMeta
images_production: 31
Feature from utracker
|
ImageOwnersWithAllWords
images_production: 35
|
ImageOwnersWithHitsShare
images_production: 36
|
ImageOwnersWithAllWordsShare
images_production: 37
|
VwDwellTime
images_production: 42
The result of the text classifier of long views via VowPal Wabbit
|
BFexact
images_production: 47
There is an exact form of all words of the request in the text/lincers
|
HasAllWordsTRSy
images_production: 56
The document has all the words of the request (with an accuracy to a synonym)
|
GruesomeCombined
images_production: 58
The result of the aggregated tin classifier is used on average to determine tin queries
|
ImageLinksWithAllWords
images_production: 85
|
LargestSyInexactGroup
images_production: 99
The share of the request, covered by the longest group consisting of any hits (including word forms and synonyms). Possibly with a pass, addition or replacement of a word
|
DocIdfSumFixed
images_production: 101
Previous factors - fixed
|
ImageDBM
images_production: 102
BM25 - like factor, but with custom coefficients
|
VwSuggestive
images_production: 104
The result of the Suggestive text classifier via VowPal Wabbit
|
QfufAllMaxFTitleBclmMixPlainKE5
images_production: 111
Linguistic boosting factor. Type of extensions: QFUF. Aggregation on all extensions. The greatest value of the factor. It is considered according to the heading of the document. The algorithm for aggregation of words weights is BCLMMIXPLAIN: a linear mixture of annotation BCLM weights and balanced Positionless weights of the word, then the former meters are aggregated through BM15. Normalization coefficient 10^(-5).
|
ImageMaxWordsPairShare
images_production: 126
|
ImageMinLemmaWordsShare
images_production: 127
|
ImageMaxLemmaAndSynonymWordsShare
images_production: 128
|
ImageMaxWordsPairSumExpWeight
images_production: 131
|
MaxLeftAndRightBigramsWeight
images_production: 132
|
ImageDBM25
images_production: 133
|
ImageIBocm
images_production: 153
|
NumWordsTRSy
images_production: 164
The percentage of the words of the request in the document (with an accuracy to a synonym)
|
ImagePhraseLRSyn
images_production: 165
|
VwPorno2
images_production: 176
The result of the text classifier of porn viapal wabbit
|
VwGruesome2
images_production: 177
The result of the text classifier of tin according to Vowpal Wabbit
|
HasAllWordsTRSyAvgMeta
images_production: 186
The document has all the words of the request (accurate to synonym) - Top Average
|
TextWord1TFRatioAvgMeta
images_production: 187
Ratio of word_1 hits to all hits - top average
|
TextWord2TFRatioAvgMeta
images_production: 188
Ratio of word_2 hits to all hits - top average
|
ImageIBocmDouble
images_production: 206
|
TextWord3TFRatioAvgMeta
images_production: 216
Ratio of word_3 hits to all hits - top avg
|
WordPairsWeight
images_production: 254
|
TextWord1PosShiftAvgMeta
images_production: 260
Distance from average position of word 1 to position 1 - top avg
|
TextWord1PosShift
images_production: 306
Distance from average position of word 1 to position 1
|
TextWord2PosShift
images_production: 307
Distance from average position of word 2 to position 2
|
TextMinWordPosShift
images_production: 308
Min of distances of average position of word_i to prosition i
|
TextMaxWeightedWordPosVariance
images_production: 309
Max weighted variance of position of word_i for all i
|
TextWord1ExactRatio
images_production: 310
Ratio of EQUAL_BY_STRING hits of word_1 to all hits of word_2
|
TextWord2ExactRatio
images_production: 311
Ratio of EQUAL_BY_STRING hits of word_2 to all hits of word_2
|
TextWord3ExactRatio
images_production: 312
Ratio of EQUAL_BY_STRING hits of word_3 to all hits of word_3
|
TextGlobalExactRatio
images_production: 313
Ratio of EQUAL_BY_STRING hits to all hits for all words
|
TextWord1TFRatio
images_production: 314
Ratio of word_1 hits to all hits
|
TextWord2TFRatio
images_production: 315
Ratio of word_2 hits to all hits
|
TextWord3TFRatio
images_production: 316
Ratio of word_3 hits to all hits
|
TextAvgDistForTwoImportantStr
images_production: 317
Average distance between EQUAL_BY_STRING hits of two max-weighted words in one break
|
TextMaxPrefixLenStr
images_production: 318
Max prefix length(in words) of exact(by position and form) match
|
TextAvgDist12Syn
images_production: 319
Average distance between word_1 and word_2 in a break
|
TextMaxAvgDistSyn
images_production: 320
Max average distance between word_i and word_i+1 in a break
|
TextMaxPrefixLenSyn
images_production: 321
Max prefix length(in words) of exact(by position) matches
|
ImagePorno4
images_production: 338
Image porn classifier output
|
TurkishSrcOwnersShare
images_production: 339
Fraction of hosts with Turkish language
|
RussianSrcOwnersShare
images_production: 341
Fraction of hosts with Russian language
|
ImageAltTitleBocm
images_production: 361
|
ImageAltTitleBocmDouble
images_production: 362
|
ImageNonAltTitleBocm
images_production: 363
|
ImageNonAltTitleBocmDouble
images_production: 364
|
ImageAltTitleValueWcmAvg
images_production: 366
Feature from utracker
|
ImageAltTitleValueWcmPrediction
images_production: 367
Feature from utracker
|
ImageAltTitleBm15MaxK3
images_production: 368
Feature from utracker
|
ImageAltTitleBclmPlainW1K3
images_production: 369
Feature from utracker
|
ImageAltTitleBclmWeightedK3
images_production: 370
Feature from utracker
|
ImageAltTitleBocmWeightedW1K3
images_production: 371
Feature from utracker
|
ImageAltTitleBocmWeightedMaxK1
images_production: 372
Feature from utracker
|
ImageAltTitleBm15CoverageK3
images_production: 373
Feature from utracker
|
ImageAltTitleBclmWeightedV2K3
images_production: 374
Feature from utracker
|
ImageAltTitleBocmDoubleK5
images_production: 375
Feature from utracker
|
ImageNearbyTextValueWcmAvg
images_production: 376
Feature from utracker
|
ImageNearbyTextValueWcmPrediction
images_production: 377
Feature from utracker
|
ImageNearbyTextBm15MaxK3
images_production: 378
Feature from utracker
|
ImageNearbyTextBclmPlainW1K3
images_production: 379
Feature from utracker
|
ImageNearbyTextBclmWeightedK3
images_production: 380
Feature from utracker
|
ImageNearbyTextBocmWeightedW1K3
images_production: 381
Feature from utracker
|
ImageNearbyTextBocmWeightedMaxK1
images_production: 382
Feature from utracker
|
ImageNearbyTextBocmDoubleK5
images_production: 383
Feature from utracker
|
ImageTextAnnotationMatchPredictionWeighted
images_production: 384
Feature from utracker
|
ImageTextValueWcmAvg
images_production: 385
Feature from utracker
|
ImageTextValueWcmPrediction
images_production: 386
Feature from utracker
|
ImageTextBm15StrictK2
images_production: 387
Feature from utracker
|
ImageTextBm15MaxK3
images_production: 388
Feature from utracker
|
ImageTextBclmPlainW1K3
images_production: 389
Feature from utracker
|
ImageTextBclmWeightedK3
images_production: 390
Feature from utracker
|
ImageTextBocmWeightedW1K3
images_production: 391
Feature from utracker
|
ImageTextBocmWeightedMaxK1
images_production: 392
Feature from utracker
|
ImageTextBm15K9
images_production: 393
Feature from utracker
|
ImageTextBm15CoverageK3
images_production: 394
Feature from utracker
|
ImageTextBm15CoverageV2K3
images_production: 395
Feature from utracker
|
ImageTextBm15CoverageV4K3
images_production: 396
Feature from utracker
|
ImageTextBclmPlainK5
images_production: 397
Feature from utracker
|
ImageTextBclmWeightedV2K3
images_production: 398
Feature from utracker
|
ImageTextBclmMixPlainW1K1
images_production: 399
Feature from utracker
|
ImageTextBocmPlain
images_production: 400
Feature from utracker
|
ImageTextBocmWeightedK5
images_production: 401
Feature from utracker
|
ImageTextBocmWeightedK7
images_production: 402
Feature from utracker
|
ImageTextBocmWeightedK9
images_production: 403
Feature from utracker
|
ImageTextBocmWeightedV4W1K2
images_production: 404
Feature from utracker
|
ImageTextBocmDoubleK5
images_production: 405
Feature from utracker
|
AnnL1BocmPlain
images_production: 406
Bocm plain used in fastrank for all ann hits
|
AnnL1BocmDouble
images_production: 407
Bocm double used in fastrank for all ann hits
|
AnnL1BfExact
images_production: 408
BfExact used in fastrank for all ann hits
|
AnnL1ImageDbm25
images_production: 409
ImageDbm25 used in fastrank for all ann hits
|
AnnL1LargestSyInexactGroup
images_production: 410
LargestSyInexactGroup used in fastrank for all ann hits
|
ImageOwnersWithHitsShareAvgMeta
images_production: 425
|
TextWordWeightSum
images_production: 439
Sum of weights for found words in indexkey/inv
|
XfImgClicksAllMaxFTitleWordCoverageForm
images_production: 468
Linguistic boosting factor. Type of extensions: XFIMGCLICS. Aggregation on all extensions. The greatest value of the factor. It is considered according to the heading of the document. The degree of coating of the words of the request is accurate to the form (without synonyms).
|
XfImgClicksAllMaxWFTitleExactQueryMatchAvgValue
images_production: 469
Linguistic boosting factor. Type of extensions: XFIMGCLICS. Aggregation on all extensions. The greatest balanced value of the factor. It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
QfufAllMaxWFTitleBclmPlaneProximity1Bm15W0Size1K0001
images_production: 631
Linguistic boosting factor. Type of extensions: QFUF. Aggregation on all extensions. The greatest balanced value of the factor. It is considered according to the heading of the document. The BCLMPLANEPROXIMITY15W0SIZE1 algorithm: uses BCLM with free weighing if there are several words, if the word is one, then the sum of hits is used as a type of coincidence. Normalization coefficient 0.001.
|
QfufTopMinWFSumWTitleBclmMixPlainKE5
images_production: 632
Linguistic boosting factor. Type of extensions: QFUF. Aggregation by TOP-10 (by the value of the factor) extensions. Nimenest, balanced meaning of the factor. Normalized for the total weight of extensions. It is considered according to the heading of the document. The algorithm for aggregation of words weights is BCLMMIXPLAIN: a linear mixture of annotation BCLM weights and balanced Positionless weights of the word, then the former meters are aggregated through BM15. Normalization coefficient 10^(-5).
|
QfufAllMaxWFMaxWTitleExactQueryMatchAvgValue
images_production: 634
Linguistic boosting factor. Type of extensions: QFUF. Aggregation on all extensions. The greatest balanced value of the factor. It is normalized for the maximum weight of expansion. It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
XfDtShowAllMaxWFMaxWTitleExactQueryMatchAvgValue
images_production: 657
Linguistic boosting factor. Type of extensions: XFDTSHOW. Aggregation on all extensions. The greatest balanced value of the factor. It is normalized for the maximum weight of expansion. It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
AllMatchedWordWeightsSumText
images_production: 686
The normalized amount of the scales of the words of the request that met in the text of the document.
|
StringMatchedWordWeightsSumText
images_production: 688
The normalized amount of the scales of the words of the request that Equal_by_String in the text of the document.
|
StringMatchedWordWeightsSumAnn
images_production: 689
The normalized amount of the scales of the words of the request that Equal_by_String in the annotations to the document.
|
AllMatchedWordFiltrationModelWeightsSumText
images_production: 690
The normalized scales for the IFILTRETRATIONMODEL words of the request that met in the text of the document.
|
AllMatchedWordFiltrationModelWeightsSumAnn
images_production: 691
The normalized scales for the IFILTRETRATIONMODEL words of the request that met in anotages to the document.
|
StringMatchedWordFiltrationModelWeightsSumText
images_production: 692
The normalized scales for the IFILTRETRATIONMODEL Words of the request, which Equal_by_String in the text of the document.
|
StringMatchedWordFiltrationModelWeightsSumAnn
images_production: 693
The normalized scales for the IFILTRETRATIONMODEL Words of the request, which Equal_by_String in the annotations to the document.
|
VwChildPorn
images_recommendations: 1
The value of the DP classifier is used to filter on average
|
VwDwellTime
images_recommendations: 24
The result of the text classifier of long views via VowPal Wabbit
|
GruesomeCombined
images_recommendations: 29
The result of the aggregated tin classifier is used on average to determine tin queries
|
DocIdfSumFixed
images_recommendations: 50
Previous factors - fixed
|
VwSuggestive
images_recommendations: 51
The result of the Suggestive text classifier via VowPal Wabbit
|
VwPorno2
images_recommendations: 70
The result of the text classifier of porn viapal wabbit
|
VwGruesome2
images_recommendations: 71
The result of the text classifier of tin according to Vowpal Wabbit
|
ImagePorno4
images_recommendations: 114
Image porn classifier output
|
TurkishSrcOwnersShare
images_recommendations: 115
Fraction of hosts with Turkish language
|
RussianSrcOwnersShare
images_recommendations: 117
Fraction of hosts with Russian language
|
KinopoiskSuggestAllMaxWFMaxWTitleExactQueryMatchAvgValue
kp_text_machine: 0
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation on all extensions. Type of aggregation for extensions: the greatest balanced value of the factor; It is normalized for the maximum weight of expansion. It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
KinopoiskSuggestTopMinWFMaxWTitleBclmMixPlainKE5
kp_text_machine: 1
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation by TOP-10 (by the value of the factor) extensions. Type of aggregation for extensions: the smallest balanced value of the factor; The maximum weight of the extension. It is considered according to the heading of the document. The algorithm for aggregation of words weights is BCLMMIXPLAIN: a linear mixture of annotation BCLM weights and balanced Positionless weights of the word, then the former meters are aggregated through BM15. Normalization coefficient 10^(-5).
|
KinopoiskSuggestTopSumW2FSumWTitleExactQueryMatchAvgValue
kp_text_machine: 2
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation by TOP-10 (by the value of the factor) extensions. Type of aggregation for extensions: an abstract by square of expansion weight, multiplied by the value of the factor; normalized for the total weight of extensions. It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
KinopoiskSuggestAllMaxWFTitleExactQueryMatchAvgValue
kp_text_machine: 3
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation on all extensions. Type of aggregation for extensions: the greatest balanced value of the factor; It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
KinopoiskSuggestAllMaxFTitleAttenV1Bm15K001
kp_text_machine: 4
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation on all extensions. Type of aggregation for extensions: the greatest value of the factor; It is considered according to the heading of the document. The weight of the hit is multiplied by 1/ (1 + the position of the word in the sentence) an algorithm for aggregation of the scales of words: BM15. Normalization coefficient 0.01.
|
KinopoiskSuggestAllMaxFTitleWordCoverageExact
kp_text_machine: 5
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation on all extensions. Type of aggregation for extensions: the greatest value of the factor; It is considered according to the heading of the document. The degree of covering the words of the request in the exact form.
|
KinopoiskSuggestTopMinWFTitleWordCoverageForm
kp_text_machine: 6
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation by TOP-10 (by the value of the factor) extensions. Type of aggregation for extensions: the smallest balanced value of the factor; It is considered according to the heading of the document. The degree of coating of the words of the request is accurate to the form (without synonyms).
|
KinopoiskSuggestAllMaxWFSumWTitleExactQueryMatchAvgValue
kp_text_machine: 7
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation on all extensions. Type of aggregation for extensions: the greatest balanced value of the factor; normalized for the total weight of extensions. It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
KinopoiskSuggestAllSumW2FSumWTitleExactQueryMatchAvgValue
kp_text_machine: 8
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation on all extensions. Type of aggregation for extensions: an abstract by square of expansion weight, multiplied by the value of the factor; normalized for the total weight of extensions. It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
KinopoiskSuggestAllMaxWFMaxWTitleCosineMatchMaxPrediction
kp_text_machine: 9
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation on all extensions. Type of aggregation for extensions: the greatest balanced value of the factor; It is normalized for the maximum weight of expansion. It is considered according to the heading of the document. Algorithm Cosinematchmaxpredical.
|
KinopoiskSuggestTopMinWFSumWTitleExactQueryMatchAvgValue
kp_text_machine: 10
Linguistic boosting factor. Type of extensions: Kinopoisksuggest (extensions of the textual orgate to text saddles). Aggregation by TOP-10 (by the value of the factor) extensions. Type of aggregation for extensions: the smallest balanced value of the factor; normalized for the total weight of extensions. It is considered according to the heading of the document. The average weight of the anntations among those in which the request was an accurate tuning.
|
IsPorno
neural_network_over_dssm_factors: 0
Document from porn kitski
|
IsFake
neural_network_over_dssm_factors: 2
Fast document
|
IsEShop
neural_network_over_dssm_factors: 3
Commercial page (Classifier Savina)
|
HasPayments
neural_network_over_dssm_factors: 6
The page has a about 'payment SMS'.
|
EshopValue
neural_network_over_dssm_factors: 9
Stage of the page
|
PornoValue
neural_network_over_dssm_factors: 10
Pornography of the page
|
IsPornoAdvert
neural_network_over_dssm_factors: 11
On the Porn Advertising page
|
Poetry
neural_network_over_dssm_factors: 12
The poetry of the document
|
PoetryQuad
neural_network_over_dssm_factors: 13
The maximum poetry of the quatrain
|
SynS1
neural_network_over_dssm_factors: 14
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
SynFLremap1
neural_network_over_dssm_factors: 15
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
SynFLremap2
neural_network_over_dssm_factors: 16
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
SynPercentBadWordPairs
neural_network_over_dssm_factors: 18
An indicator of the unnaturalness of the text from the point of view of the Russian language. The number of bad pairs of words in the text, transferred to the segment [0.1] according to the Z/(Z+10) formula
|
SynNumBadWordPairs
neural_network_over_dssm_factors: 19
The proportion of bad steam among all found in the table: Z/(X+1), where Z is the number of bad couples in the text, and X is (http://wiki.yandex-team.ru/evgenijgrechnikov/testsynonimizers of 2000-navigable )) steam
|
NumLatinLetters
neural_network_over_dssm_factors: 20
The number of Latin letters in the text (not counting the markings) driven into [0.1] formula n/(n+100)
|
RusWordsInText
neural_network_over_dssm_factors: 22
The number of words in the text (the word is what the lemmeter selected) is displayed in [0.1] according to the formula x/(x+a)
|
RusWordsInTitle
neural_network_over_dssm_factors: 23
The number of words of the Russian language in the title
|
MeanWordLength
neural_network_over_dssm_factors: 24
The average length of the word
|
PercentWordsInLinks
neural_network_over_dssm_factors: 25
The percentage of the number of words inside the tag <a> .. </a> from the number of all words
|
PercentVisibleContent
neural_network_over_dssm_factors: 26
The percentage of the number of words outside the tags (outside the brackets <>) from the number of all words
|
PercentFreqWords
neural_network_over_dssm_factors: 27
The percentage of the number of words, which are 200 the most frequent words of the language, from the number of all words of the text
|
PercentUsedFreqWords
neural_network_over_dssm_factors: 28
The number used in the text 500 of the most popular words of the language, divided by 500
|
TrigramsProb
neural_network_over_dssm_factors: 29
Logarithm of average geometric probabilities of trigrams in the text. (the probability of a trigram - the number of its meetings in the text, divided by the number of all trigrams) is displayed in [0.1] according to the formula -x (x+a)
|
TrigramsCondProb
neural_network_over_dssm_factors: 30
Logarithm of the average geometric conditional probabilities of trigrams. The conditional probability of a trigram is its probability, divided by the probability of a bigram from the first two words
|
NumeralsPortion
neural_network_over_dssm_factors: 31
The share of different parts of speech in the text. The share of numerals (among all words that managed to recognize part of the speech)
|
ParticlesPortion
neural_network_over_dssm_factors: 32
The share of particles
|
AdjPronounsPortion
neural_network_over_dssm_factors: 33
The share of pronoun adjectives
|
AdvPronounsPortion
neural_network_over_dssm_factors: 34
The proportion of pronoun nouns
|
VerbsPortion
neural_network_over_dssm_factors: 35
The share of verbs
|
FemAndMasNounsPortion
neural_network_over_dssm_factors: 36
The share of words that can be both masculine nouns and nouns of the feminine, but not of the middle kind, among all nouns (examples: 'hummingbirds' are an example of an indefinite kind that can be determined in two ways, 'Alexander' is homonymy).
|
LongestText
neural_network_over_dssm_factors: 37
The size of the largest text segment (from the factor [18] puretext)
|
SegmentAuxAlphasInText
neural_network_over_dssm_factors: 44
Number of letters in the AUX segment
|
SegmentAuxSpacesInText
neural_network_over_dssm_factors: 45
The number of spaces in the AUX segment
|
SegmentContentCommasInText
neural_network_over_dssm_factors: 46
The number of commas in the Content segment
|
StaticTitleBM25Ex
neural_network_over_dssm_factors: 48
BM25 page title by its text
|
TrashAdv
neural_network_over_dssm_factors: 49
The greasy of the page
|
Soft404
neural_network_over_dssm_factors: 55
Page - '404' (share of tokens '404' in relation to the total number of tokens on the page)
|
PureText
neural_network_over_dssm_factors: 58
Long text without links.
|
RusLang
neural_network_over_dssm_factors: 66
The language of the document is Russian.
|
AuraDocLogShared
neural_network_over_dssm_factors: 77
Logarithm of the number of shingles on which this document is not unique
|
AuraDocLogAuthor
neural_network_over_dssm_factors: 78
Logarithm of the number of shingles on which this owner of the document is recognized as the author
|
AuraDocLogOrigin
neural_network_over_dssm_factors: 79
Logarithm of the number of shingles in the document added by the owner of the site as original texts in ((http://wiki.yandex-team.ru/jandekspoisk/jekosistema/marketingPr/webmasters/plan/vtorcontect of originality plugin)). It does not participate in the formula, it is needed to disconnect the takes
|
AuraDocMeanSharedWeight
neural_network_over_dssm_factors: 80
The average weight of non-ugly shingles of this document
|
AuraDocMeanFltAuthorSource
neural_network_over_dssm_factors: 81
The average filtered number of sources of authorship of the document. It does not participate in the formula, it is needed to disconnect the takes
|
HasUserReviews
neural_network_over_dssm_factors: 82
The document contains user review/comment
|
HasDownloadLinkOnFile
neural_network_over_dssm_factors: 83
The document has a direct link to the file
|
HasDownloadLinkOnFileHosting
neural_network_over_dssm_factors: 84
The document has a link to filehosting
|
SegmentWordPortionFromMainContent
neural_network_over_dssm_factors: 86
The share of the words of the document from the segments with Score> 2.
|
TextFeatures
neural_network_over_dssm_factors: 119
The quality of the text. It is considered a rather complex formula
|
TextLike
neural_network_over_dssm_factors: 120
Text quality (classifier Alekseev)
|
DocLen
neural_network_over_dssm_factors: 121
Document length in sentences
|
IsHTML
neural_network_over_dssm_factors: 123
Document type - HTML
|
EngLang
neural_network_over_dssm_factors: 136
Document language - English
|
CyrLang
neural_network_over_dssm_factors: 137
The language of the document is Cyrillic
|
LanguagePopularity
neural_network_over_dssm_factors: 138
The popularity of the language of the document. Number from 0 to 1. (http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayaformula/tekushhiekomponenty/languaguaguagepopalarity)))))))
|
DaterStatsYearNormLikelihood
neural_network_over_dssm_factors: 140
The function of the credibility of the distribution of years in the document. Temporarily disconnected
|
DaterStatsAverageSourceSegment
neural_network_over_dssm_factors: 141
The arithmetic mean position of dates in the document. Temporarily disconnected
|
MaxD30Long
personalization: 0
Max cosine similarity between document and user history with clicks dwelltime > 30sec, by realtime user_history
|
MaxD60Long
personalization: 1
Max cosine similarity between document and user history with clicks dwelltime > 60sec, by realtime user_history
|
MaxD120Long
personalization: 2
Max cosine similarity between document and user history with clicks dwelltime > 120sec, by realtime user_history
|
MaxD180Long
personalization: 3
Max cosine similarity between document and user history with clicks dwelltime > 180sec, by realtime user_history
|
MaxD360Long
personalization: 4
Max cosine similarity between document and user history with clicks dwelltime > 360sec, by realtime user_history
|
MaxD30Short
personalization: 5
Max cosine similarity between document and user history with clicks dwelltime <= 30sec, by realtime user_history
|
MaxD60Short
personalization: 6
Max cosine similarity between document and user history with clicks dwelltime <= 60sec, by realtime user_history
|
MaxD120Short
personalization: 7
Max cosine similarity between document and user history with clicks dwelltime <= 120sec, by realtime user_history
|
MaxD180Short
personalization: 8
Max cosine similarity between document and user history with clicks dwelltime <= 180sec, by realtime user_history
|
MaxD360Short
personalization: 9
Max cosine similarity between document and user history with clicks dwelltime <= 360sec, by realtime user_history
|
TopavgS5D30Long
personalization: 10
Avg by top-5 maximum cosine similarity between document and user history with clicks dwelltime > 30sec, by realtime user_history
|
TopavgS5D60Long
personalization: 11
Avg by top-5 maximum cosine similarity between document and user history with clicks dwelltime > 60sec, by realtime user_history
|
TopavgS5D120Long
personalization: 12
Avg by top-5 maximum cosine similarity between document and user history with clicks dwelltime > 120sec, by realtime user_history
|
TopavgS5D180Long
personalization: 13
Avg by top-5 maximum cosine similarity between document and user history with clicks dwelltime > 180sec, by realtime user_history
|
TopavgS5D360Long
personalization: 14
Avg by top-5 maximum cosine similarity between document and user history with clicks dwelltime > 360sec, by realtime user_history
|
TopavgS10D30Long
personalization: 15
Avg by top-10 maximum cosine similarity between document and user history with clicks dwelltime > 30sec, by realtime user_history
|
TopavgS10D60Long
personalization: 16
Avg by top-10 maximum cosine similarity between document and user history with clicks dwelltime > 60sec, by realtime user_history
|
TopavgS10D120Long
personalization: 17
Avg by top-10 maximum cosine similarity between document and user history with clicks dwelltime > 120sec, by realtime user_history
|
TopavgS10D180Long
personalization: 18
Avg by top-10 maximum cosine similarity between document and user history with clicks dwelltime > 180sec, by realtime user_history
|
TopavgS10D360Long
personalization: 19
Avg by top-10 maximum cosine similarity between document and user history with clicks dwelltime > 360sec, by realtime user_history
|
TopavgS15D30Long
personalization: 20
Avg by top-15 maximum cosine similarity between document and user history with clicks dwelltime > 30sec, by realtime user_history
|
TopavgS15D60Long
personalization: 21
Avg by top-15 maximum cosine similarity between document and user history with clicks dwelltime > 60sec, by realtime user_history
|
TopavgS15D120Long
personalization: 22
Avg by top-15 maximum cosine similarity between document and user history with clicks dwelltime > 120sec, by realtime user_history
|
TopavgS15D180Long
personalization: 23
Avg by top-15 maximum cosine similarity between document and user history with clicks dwelltime > 180sec, by realtime user_history
|
TopavgS15D360Long
personalization: 24
Avg by top-15 maximum cosine similarity between document and user history with clicks dwelltime > 360sec, by realtime user_history
|
DssmHaveShowsUrlTitleKeywordsPrediction
robot_selection_rank: 3
|
DssmHaveClicksUrlTitleKeywordsPrediction
robot_selection_rank: 4
|
DssmLogClicksUrlTitleKeywordsPrediction
robot_selection_rank: 5
|
WebTRp1
video_production: 2
Stript priority for TR is a text priority - there are all the words of the request somewhere in the document (while they pass contextual restrictions on the request, for example, both words DB in one sentence).
|
WebTRtitle
video_production: 3
The presence of an accurate phrase (request text) in the header (more precisely, in the first sentence of the document).
|
WebSoftAndOk
video_production: 7
The document passed Softand on the restrictions of the syntactic sorcerer. Only for documents with textual relevance. For monosyllabic requests, always 1.
|
WebPassageLegacyTR
video_production: 8
Text relevance (maxfreq is the frequency of the most frequent word that makes sense of the length of the document).
|
WebTRDocQuorum
video_production: 10
The weight of the words of the request that is in the text.
|
DssmL2WebReformulationsDt
video_production: 99
Logdwelltime by the VEB model DSSM, trained in reformulations. It is also used in the ranking of ether.
|
DssmL2VideoReformulationsWin
video_production: 103
Win (click longer than 60 seconds) on the DSSM video model trained in reformulations.
|
QfufAllMaxFBodyWordCoverageExact
video_production: 126
Linguistic boosting factor. Type of extensions: QFUF. Aggregation on all extensions. The greatest value of the factor. It is considered according to the contents of the document. The degree of covering the words of the request in the exact form.
|
QfufTopMinWFSumWBodyBclmMixPlainKE5
video_production: 127
Linguistic boosting factor. Type of extensions: QFUF. Aggregation by TOP-10 (by the value of the factor) extensions. Nimenest, balanced meaning of the factor. Normalized for the total weight of extensions. It is considered according to the contents of the document. The algorithm for aggregation of words weights is BCLMMIXPLAIN: a linear mixture of annotation BCLM weights and balanced Positionless weights of the word, then the former meters are aggregated through BM15. Normalization coefficient 10^(-5).
|
Bclm2
video_production: 155
The factor about the proximity of the request and text of the document. It differs from BCLM in that the weights of all words are considered the same. It is also used in the ranking of ether.
|
DBMNumbers
video_production: 157
DBM (BM25 with machine-like words) exclusively in numbers.
|
BocmFull
video_production: 178
Simple BOCM gluing Links.
|
FirstHitSentenceBocmFull
video_production: 179
BOCM for gluing Links, calculated only on the first sentences with hits and all forms of entering are considered equivalent.
|
BestFirstHitSentenceTocm
video_production: 180
The best BOCM among all links, such as Title (analogue of TOCM), calculated as follows: only sentences with hits are considered and all forms of entry are considered equivalent.
|
DbmVideoNumbers
video_production: 183
The new DBM only in terms of gluing links (differs from DBMNUMBERS [157] only constants and completely clogs it).
|
TitleTrigramsInQuery
video_production: 219
Coating trigrams of Title trigrams. It is also used in the ranking of ether.
|
UrlTrigramsInQuery
video_production: 220
Coating trigrams of a query trigrams of Urla.
|
QfufAllSumW2FSumWTitlePerWordCMMaxPredictionMin
video_production: 248
Linguistic boosting factor. Type of extensions: QFUF. Aggregation on all extensions. The amount by the square of the expansion, multiplied by the value of the factor. Normalized for the total weight of extensions. It is considered according to the heading of the document. PerwordCmmaxMatchMin algorithm: At least according to the maximum of the CMMAXMATCH weight abstracts.
|
QfufTopSumWFSumWTitleWordCoverageExact
video_production: 253
Linguistic boosting factor. Type of extensions: QFUF. Aggregation by TOP-10 (by the value of the factor) extensions. A suspended sum of the Libra of factors. Normalized for the total weight of extensions. It is considered according to the heading of the document. The degree of covering the words of the request in the exact form.
|
QfufAllSumFCountBodyAllWcmMatch80AvgValue
video_production: 280
Linguistic boosting factor. Type of extensions: QFUF. Aggregation on all extensions. The sum of the scales of factors. The number of extensions. It is considered according to the contents of the document. The sum of the scales of words, balanced by the weight of the annotation, is normalized for the sum of the scales of words. Only annotations are calculated, on which the sum of words of words is more than 80%.
|
QfufTopSumWFSumWBodyQueryPrefixMatchOriginalWordValue
video_production: 323
Linguistic boosting factor. Type of extensions: QFUF. Aggregation by TOP-10 (by the value of the factor) extensions. A suspended sum of the Libra of factors. Normalized for the total weight of extensions. It is considered according to the contents of the document. The maximum weight of the annotation, the prefix of which contains the words of the request in the same order (with an accuracy to the form).
|
MetaWebTRDocQuorum
video_production: 374
The average value of the webtrdocquorum factor is average.
|
MetaBocmFull
video_production: 387
The average value of the Bocmfull factor is average.
|
MetaBestFirstHitSentenceTocm
video_production: 388
The average value of the Bestfirsthitsentencetocm factor is average.
|
MetaAvgBocmFull
video_production: 459
The average value of the Bocmfull factor in PRS
|
MetaAvgBestFirstHitSentenceTocm
video_production: 460
The average value of the Bestfirsthitsentencetocm factor in PRS
|
MetaRmsBocmFull
video_production: 471
The mid -sequential deviation of the Bocmfull factor in PRS
|
MetaRmsTitleTrigramsInQuery
video_production: 472
The mid -sequential deviation of the Titlerigramsinquery factor in PRS
|
MetaVarianceTitleCovering
video_production: 485
COF-T Variations Pyrson Dul factor Titlecovering in PRs
|
MetaVarianceTitleTrigramsInQuery
video_production: 487
CoEF-T of Pieron Variation for the Titlerigramsinquery factor in PRS
|
MetaResidWebTRDocQuorum
video_production: 496
Resid for Webtrdocquorum Factor in PRS
|
MetaResidTitleCovering
video_production: 500
Resid for Titlecovering Factor in PRS
|
MetaFractTitleTrigramsInQuery
video_production: 503
FRACT for the Titlerigramsinquery factor in PRS
|
DssmL3WebLogDwellTime
video_production: 559
Logdwelltime by the VEB model DSSM. It is also used in the ranking of ether.
|
DssmL2WebLogDwellTime
video_production: 585
Logdwelltime by the VEB model DSSM.
|
TitleQuerySimilarityByClicks
video_production: 593
The similarity of the T2q vehicles of the Title and the request a la Klakhman, trained by clicks
|
DssmL3VideoDeepClickPlayerDepth
video_production: 596
DSSM with PlayerDepth Target on the deep click pool video. It is also used in the ranking of ether.
|
DssmL2VideoDcPlayerDepth
video_production: 599
DSSM with PlayerDepth Target on the deep click of a video at L2 Stages.
|
OriginalRequestBodyAvgPerTrigramAvgValueAny
video_production: 729
The factor for the original request. It is considered according to the contents of the document. Algorithm: AVGPERGRAMAMAVGVALueany.
|
OriginalRequestBodyBclmPlaneProximity1Bm15W0Size1K001
video_production: 730
The factor for the original request. It is considered according to the contents of the document. The BCLMPLANEPROXIMITY15W0SIZE1 algorithm: uses BCLM with free weighing if there are several words, if the word is one, then the sum of hits is used as a type of coincidence. Normalization coefficient 0.01.
|
OriginalRequestBodyBocm15K001
video_production: 731
The factor for the original request. It is considered according to the contents of the document. Algorithm for aggregation of the scales of words BOCM15. Normalization coefficient 0.01.
|
OriginalRequestBodyWordCoverageExact
video_production: 732
The factor for the original request. It is considered according to the contents of the document. The degree of covering the words of the request in the exact form.
|
OriginalRequestTitleWordCoverageAny
video_production: 733
The factor for the original request. It is considered according to the heading of the document. The degree of coating of the words of the request (all types of hits).
|
LeftIsPorno
web_itditp: 0
Document from porn kitski
|
IsPorno
web_itditp: 1
Document from porn kitski
|
LeftIsComm
web_itditp: 4
A document from a commercial clay. Not used (depreded)
|
IsComm
web_itditp: 5
A document from a commercial clay. Not used (depreded)
|
LeftIsFake
web_itditp: 6
Fast document
|
IsFake
web_itditp: 7
Fast document
|
LeftIsSEO
web_itditp: 8
The page title contains commercial vocabulary. Not used (depreded)
|
IsSEO
web_itditp: 9
The page title contains commercial vocabulary. Not used (depreded)
|
LeftIsEShop
web_itditp: 10
Commercial page (Classifier Savina)
|
IsEShop
web_itditp: 11
Commercial page (Classifier Savina)
|
LeftHasPayments
web_itditp: 16
On the page there is about 'Payment SMS '.
|
HasPayments
web_itditp: 17
On the page there is about 'Payment SMS '.
|
LeftEshopValue
web_itditp: 22
Stage of the page
|
EshopValue
web_itditp: 23
Stage of the page
|
LeftPornoValue
web_itditp: 24
Pornography of the page
|
PornoValue
web_itditp: 25
Pornography of the page
|
LeftIsPornoAdvert
web_itditp: 26
On the Porn Advertising page
|
IsPornoAdvert
web_itditp: 27
On the Porn Advertising page
|
LeftPoetry
web_itditp: 28
The poetry of the document
|
Poetry
web_itditp: 29
The poetry of the document
|
LeftPoetryQuad
web_itditp: 30
The maximum poetry of the quatrain
|
PoetryQuad
web_itditp: 31
The maximum poetry of the quatrain
|
LeftSynS1
web_itditp: 32
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
SynS1
web_itditp: 33
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
LeftSynFLremap1
web_itditp: 34
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
SynFLremap1
web_itditp: 35
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
LeftSynFLremap2
web_itditp: 36
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
SynFLremap2
web_itditp: 37
Show how much the text is unnatural from the point of view of the Russian language. Assessment of how much the text of the document can be considered as a generated synonymizer or automatic. ((http://wiki.yandex-team.ru/jandekspoisk/kachestvopoiska/obshayAfermula/tekushhiekomponenty/antispam?v=1il#h58953-2 more))
|
LeftSynPercentBadWordPairs
web_itditp: 40
An indicator of the unnaturalness of the text from the point of view of the Russian language. The number of bad pairs of words in the text, transferred to the segment [0.1] according to the Z/(Z+10) formula
|
SynPercentBadWordPairs
web_itditp: 41
An indicator of the unnaturalness of the text from the point of view of the Russian language. The number of bad pairs of words in the text, transferred to the segment [0.1] according to the Z/(Z+10) formula
|
LeftSynNumBadWordPairs
web_itditp: 42
The proportion of bad steam among all found in the table: Z/(x+1), where Z 342 200 223 The number of bad couples in the text, and X 342 200 223 number ((http: //wiki.yandex- Team.ru/evgenijjjgrechnikov/testSynonimizers 2000-navigable)) steam
|
SynNumBadWordPairs
web_itditp: 43
The proportion of bad steam among all found in the table: Z/(x+1), where Z 342 200 223 The number of bad couples in the text, and X 342 200 223 number ((http: //wiki.yandex- Team.ru/evgenijjjgrechnikov/testSynonimizers 2000-navigable)) steam
|
LeftNumLatinLetters
web_itditp: 44
The number of Latin letters in the text (not counting the markings) driven into [0.1] formula n/(n+100)
|
NumLatinLetters
web_itditp: 45
The number of Latin letters in the text (not counting the markings) driven into [0.1] formula n/(n+100)
|
LeftRusWordsInText
web_itditp: 48
The number of words in the text (the word is what the lemmeter selected) is displayed in [0.1] according to the formula x/(x+a)
|
RusWordsInText
web_itditp: 49
The number of words in the text (the word is what the lemmeter selected) is displayed in [0.1] according to the formula x/(x+a)
|
LeftRusWordsInTitle
web_itditp: 50
The number of words of the Russian language in the title
|
RusWordsInTitle
web_itditp: 51
The number of words of the Russian language in the title
|
LeftMeanWordLength
web_itditp: 52
The average length of the word
|
MeanWordLength
web_itditp: 53
The average length of the word
|
LeftPercentWordsInLinks
web_itditp: 54
The percentage of the number of words inside the tag <a> .. </a> from the number of all words
|
PercentWordsInLinks
web_itditp: 55
The percentage of the number of words inside the tag <a> .. </a> from the number of all words
|
LeftPercentVisibleContent
web_itditp: 56
The percentage of the number of words outside the tags (outside the brackets <>) from the number of all words
|
PercentVisibleContent
web_itditp: 57
The percentage of the number of words outside the tags (outside the brackets <>) from the number of all words
|
LeftPercentFreqWords
web_itditp: 58
The percentage of the number of words, which are 200 the most frequent words of the language, from the number of all words of the text
|
PercentFreqWords
web_itditp: 59
The percentage of the number of words, which are 200 the most frequent words of the language, from the number of all words of the text
|
LeftPercentUsedFreqWords
web_itditp: 60
The number used in the text 500 of the most popular words of the language, divided by 500
|
PercentUsedFreqWords
web_itditp: 61
The number used in the text 500 of the most popular words of the language, divided by 500
|
LeftTrigramsProb
web_itditp: 62
Logarithm of average geometric probabilities of trigrams in the text. (the probability of a trigram - the number of its meetings in the text, divided by the number of all trigrams) is displayed in [0.1] according to the formula -x (x+a)
|
TrigramsProb
web_itditp: 63
Logarithm of average geometric probabilities of trigrams in the text. (the probability of a trigram - the number of its meetings in the text, divided by the number of all trigrams) is displayed in [0.1] according to the formula -x (x+a)
|
LeftTrigramsCondProb
web_itditp: 64
Logarithm of the average geometric conditional probabilities of trigrams. The conditional probability of a trigram is its probability, divided by the probability of a bigram from the first two words
|
TrigramsCondProb
web_itditp: 65
Logarithm of the average geometric conditional probabilities of trigrams. The conditional probability of a trigram is its probability, divided by the probability of a bigram from the first two words
|
LeftNumeralsPortion
web_itditp: 66
The share of different parts of speech in the text. The share of numerals (among all words that managed to recognize part of the speech)
|
NumeralsPortion
web_itditp: 67
The share of different parts of speech in the text. The share of numerals (among all words that managed to recognize part of the speech)
|
LeftParticlesPortion
web_itditp: 68
The share of particles
|