SlideShare a Scribd company logo
1 of 21
Download to read offline
University of Southern California
Information Sciences Institute
!"#$%&'()$"*)+,
-"$),./0)1,'$),230%$4"54,.%%
!"#$$%&'()*#
!"#$%&'()$"*+,)-",-.*!".()(/(-
0")1-%.)(2*$#*+$/(3-%"*4'5)#$%")'
tg@isi.edu
+%,-,. /(.
6-7(*$#*4$&7/(-%*'"8*!"#$%&'()$"*+,)-",-
0")1-%.)(2*$#*9-"".251'")'
weiqiuy@seas.upenn.edu
0(123#13,1%&4,51(2
:),3($& +,3$$5*$#*4$&7/(-%*+,)-",-
;%'"8-).*0")1-%.)(2
lignos@brandeis.edu
6(1#3"#1&7#8
!"#$%&'()$"*+,)-",-.*!".()(/(-
0")1-%.)(2*$#*+$/(3-%"*4'5)#$%")'
jonmay@isi.edu
!"#$#%&#'()&(*++,-(./.0(
University of Southern California
Information Sciences Institute
!
67087"5,9714$7:;47%5
• !"#$%$&'()*+,-(.
"#$%&'$#()#*+,#-&'$.%&/$.0*&1.$2)-3
456&7.8#*-9&4:;<&7=2#-&
University of Southern California
Information Sciences Institute
>
!%47("47%5
• !"#$%$&'()*+,-(.
• 7#9",1%&:%#;1,15&3%9"1,-.%2&
2.<<%;&<;($&9:#22&,$=#:#19%
– !"#$%&'()&*%)+,)-'&'./)0&.)7?.'?7@#?A.B
• >#;%&38?%2&#;%&,$?(;3#13
12-)$&(34"'$56(7&##'8)%6(.//9:
• @!"#$%&'%$()&*+&,((-(-
#.&/%0.$&#1(&$%$(&#23(+
– 1,%2-3&4$5)62-3&!"#$#%&'()*+,-(.(/-01 23234
– !"#$%&'()*+"*,$-%'!"#$%&'()*+
P(x)
- log2 P(x)
,-%./&(0&".)1&2).34.05$.%&2)(1&,)('0&6()74%8
9:;&"(*.0%<&9=>?&"@7.%&
University of Southern California
Information Sciences Institute
C
<()$(7)=
/0 12$%3$+(*45*6&*7"#$%$&'()*)$+$*3.7&8*4$'96:/
;0 <3.+7=7'$+76&*=69*4$'96:/>*
./ !$#(0,'1"2*%'*33(332(%,'
4/ !-5%3,#(*2',*367'0#-338+$%&"*+'9:
?0 @3$%7+$+72(*)7==(9(&'(.*#(+A((&*B3-(927.()*$&)*
C&.3-(927.()*D45>*
!"#$%&''(')*+,'*-'')+./01+$)2+3$4&567
University of Southern California
Information Sciences Institute
:
>("?;"475*,!.,"1,",!;?47&#?"11,@?"11787)$
%5,23:"?"5#)A,.)14,B)41
University of Southern California
Information Sciences Institute
D
C!.,"1,@?"11787)$
• 7T ∶ (𝑥!𝑥"𝑥# … 𝑥$) → (𝑦!𝑦"𝑦# … 𝑦%)
• !"#: 𝑦!:% $%&'()*'+,-.()*'+, 𝑥!:$//%
– 3$8"("9'+∏!"#
$
𝑃 𝑦! 𝑦%!, 𝑥#:' ; 𝜃)
– 3$8"("9'+∏!"#
$
𝑃 𝑦! 𝑓 𝑦%!, 𝑥#:'; 𝜃# ; 𝜃()
!"#0%123)+'4+'55)+ 6%78955:;:'+%EF.0GH&H*G&6H=9 !;!;I
<= 123)+'4+'55)+0%%h' = 𝑓 𝑦(:)*, 𝑥!:$ ℎ! ∈ 𝑅" ).*,$"()"/$/.
>= 78955:;:'+%%%%%0%𝑃 𝑦* ℎ*) 𝑦! ).*8).,%-(-<*
=;4<)=>?)"@ABCD(E(. ? F>?&GH<?)$$(<?)$$GIG#"(
University of Southern California
Information Sciences Institute
J
@?"11,D)$8%$3"5#)
• #'53%5'3@%𝑇 = ℎ + , 𝑦 + 𝑖 = 1,2,3, … 𝑚} $#*>327$(3-.).?*%-#-%-",-@
• 78955'5%9+'%3A'%B)+*%3CD'5@%9;3'+%3)E'.:F93:).
– ;('"3(',1('3*2(',-6(%$<(#'*3'=>?@A'*3'$2B+(2(%,(C'$%'D*0#(=>?@
• 𝐶(𝑐, 𝑎) ()2.35%3A'%.2GH'+%);%3)E'.5%);%3CD' c :.%5'I2'.('%a
• J+'*5,(/ = ∑+,!
$
𝐶(𝑐, ℎ(+)) ; K';5,(/ = ∑+,!
$
𝐶(𝑐, 𝑦(+))
• "93(A,(/ = ∑+,!
$
min{𝐶 𝑐, ℎ + , 𝐶(𝑐, 𝑦(+))} E/KLM9&NH2,*#*, #7&HO&!;;!I
• J+'(:5:).@%𝑃/ =
01*/2 /
34567 /
L%%%%%%%%%%K'(988@%𝑅/ =
01*/2 /
8597 /
• MNG'952+'%D'+%(8955%(0% 𝐹:;/ = 1 + 𝛽 " 3# × 8#
:$ × 3#=8#
University of Southern California
Information Sciences Institute
P
!"#$%&#'(()*#'((%+%,-)).,-+/-0'1&,
OP'+988%D'+;)+G9.('%$%9.%9P'+94'%);%:.*:P:*298%(8955%D'+;)+G9.('5
<= "9(+)N9P'+94'0%2.B':4A3'*%:='=@%'I298%:GD)+39.('%3)%'9(A%3CD'
Macro𝐹) =
∑!∈# +$;!
|-|
:; 3"4&5<$='&$%'>+-'"%?*'2+';%;@+,A+B&'CD')4A>+'CD$E+"(F5&*$)4'+*5+'$4?+*5G')
3𝑖𝑐𝑟𝑜𝐹) =
∑!∈# .! × +$;!
∑
!&∈#
.!&
7()%)8&7)23('&*$%&96"008&w' = 𝑅𝑒𝑓𝑠 𝑐 + 𝑘 *$%&0$4)&&𝑘 ≥ 1
o J#(>$#(𝑘 = 1 ; *4&#K(GI(𝑘 → ∞, F𝑖𝑐𝑟𝑜𝐹! → F)<"4𝐹!
o J#(>$#( 𝛽 = 1 3#($<)?#(1/L/6(0L/:(&4(1/6(0//:6(M>$&(?GN#(O-PQ
University of Southern California
Information Sciences Institute
Q
E;14787#"47%5,8%$,!"#$%FG,
"1,"5,!.,)("?;"47%5,3)4$7#
èCompare MacroF1 with
Ø BLEU and ChrF1 – popular options
Ø MicroF1 – so we can see the difference micro and macro avg makes
Ø BLEURT – model-based metric based on BERT
* BLEURT has undesirable biases, shown in paper (not talking about here)
University of Southern California
Information Sciences Institute
5;
!"#$%F1 (1,<4H)$1
A:B*CD*6EFEG*G-H.B-.(I*:',%$JC*3'.*-K/'5*H-)L3(*#$%*'55*(27-.
6,+$.?HR#$HS#G&%#7$,+-&.R#$O..8&,%2$.R#%#*7-&'$.%&$H$#&7=2#-9&H'7#$&$.)*G,*S&7.&.*#&.$&70.&G#+,%HO-
6&'CD')4A+H(,$E$)4'
0CD$E+-'"%?*
University of Southern California
Information Sciences Institute
55
I):CJK 9"4"&4%&.)L4,>("?;"47%5
;TCC
;T:;
;T!P
;TJP
;T!!
;TJ!
;T>>
;TD5
;TCC
;TP>
;TD5
;TDJ
H/L0
/L0
/LR
/LS
/LT
/LU
J5/-",2*'"8*M%'&&'% +-&'"(),.
𝜏
3G&V(W>8)%(X>'Y#8#%&$(
;NE0 43%JC :',%$JC :),%$JC ;NE0OB&-'" ;NE0OB&-8)'"
MacroF1 is a poor indicator of Fluency + Grammar,
but one of the strong indicators of Semantics
Correlation with Fluency, Grammar, and Semantics on English only
University of Southern California
Information Sciences Institute
5!
I!.,!)4$7#1+,I751,0)$,!)4$7#
S
0
R
Z
.
R
.
R
[
. .
R
[ [
S
/
0
.
R
Z
S
[
T
!"#$%&#'( !"#)%&#*( !"#+%&#)(
JG%$
O-PQ O-PQ F)<"4]0 FG<"4]0 ,V"]0
• A)".*P*G/&Q-%*$#*()&-.*'*&-(%),*.,$%-8*3)L3-.(*,$%%-5'()$"*H)(3*3/&'"*R/8L-&-"(.
• S;NE0*).*#%$&*(3-*A:B*&-(%),.*7',T'L-?*7%-,$&7/(-8*Q2*('.T*$%L'")U-%.
• :',%$JC*'"8*:),%$JC*/.-*(3-*.'&-*($T-")U-%*'.*;NE0?*$Q(')"-8*/.)"L*+',%-;NE0
MacroF1 has more wins in the recent year -- when systems are mostly fluent, adequacy is a key discriminator
University of Southern California
Information Sciences Institute
5>
9%=514$)"3,@J2-,."1M N@JBB.B,OPOPQ
/L.T
/LS.
/L09
/LZ[
/L[Z
/LZR
/LZ.
/LS.
/L.[
/LZ.
/LZ9
/LRR
/LZ.
/LSZ
/L.U
/LZ/
/LSR
/L.U
/L//
/L./
/LZ/
/L[/
/L9/
-^HP* !7HP* O_HP*
𝜏
3G&V(8+!
O-PQ F)<"4]0 FG<"4]0 ,V"]0 O-PQ`^F#)% O-PQ`^F#'G)%
MacroF1 is the strongest indicator of downstream IR task performance
• IR task with queries and docs in different languages
• Translate source docs to target language, and match queries with docs
• MT metric having strong correlation with IR metric (e.g. mAP) is more useful
University of Southern California
Information Sciences Institute
5C
R;"?74"47(),9788)$)5#),S)4=))5,
B;0)$(71)A,"5A,T51;0)$(71)A,C!.
University of Southern California
Information Sciences Institute
5:
BC!.,(1,TC!.+,SJ>T,"5A,!"#$%FG
32.7
24.0
31.1
25.6
30.8 31.2
33.9
24.0
31.2
27.1
29.6
31.0
38.5
24.0
41.6
31.9
40.3
34.6
33.6
23.5
33.6
27.4
33.0
31.0
20
25
30
35
40
45
DE-EN EN-DE FR-EN EN-FR RO-EN EN-RO
BLEU
and
MacroF1
score
BLEU SNMT BLEU UNMT MacroF1 SNMT MacroF1 UNMT
!"*(-%&.*$#*;NE0?*0G:B*'"8*+G:B*7-%#$%&'",-*).*,$&7'%'Q5-S?*
Q/(*:',%$JC*.3$H.*.)L")#),'"(*8)##-%-",-.*Q-(H--"*+G:B*'"8*0G:B
A&BC;D&%@%".1%&'.).&5#(%.0&"(&1-"5#&,EFG&%5().%&'$"#&GC;D
University of Southern California
Information Sciences Institute
5D
BC!.,(1,TC!.+,9>&>C
Frequent Types:
UNMT > SNMT Rare Types:
SNMT > UNMT
* Only the top 500 types
are shown here
University of Southern California
Information Sciences Institute
5J
BC!.,"5A,TC!.
MacroF1 diff: 0.044, favors SNMT; BLEU diff: -0.00087, favors UNMT
Src "Er ist witzig, er ist sarkastisch, witzig, er liebt es, Leute zu unterhalten, und er ist ein
großartiger Kerl, den er im Teamraum haben kann", erklärte er.
Ref "He's funny, he's sarcastic, witty, likes to poke fun at people, and he's a great guy to have in
the team room," he explained.
SNMT "He is funny, he is sarcastic, funny, he loves to talk people, and he is a great guy he can have
in the team room," he explained
UNMT "He's funny, he's sly, funny, he loves to unterhalten people, and he's a great guy he can have
in the team room," he explained.
Problems SNMT: punctuation, wrong_verb. UNMT: synonym, untranslation
-=4=<0%%"9(+)M< :5 G)+' :.3)8'+9.3%3) 2.3+9.5893:). 3A9. QR-S
University of Southern California
Information Sciences Institute
5P
BC!.,"5A,TC!.
MacroF1 diff: 0.044, favors SNMT; BLEU diff: -0.00087, favors UNMT
Src Vor 32 Jahren schloss ich mich als Schüler, wegen der Vernachlässigung der Thatcher-Regierung, Labour an. Diese
Vernachlässigung hatte dazu geführt, dass mein Klassenzimmer buchstäblich zusammengebrochen war.
Infolgedessen habe ich versucht, mich für bessere öffentliche Dienstleistungen für diejenigen einzusetzen, die sie am
meisten brauchen. Egal ob als Gemeinderat oder Minister.
Ref Ever since I joined Labour 32 years ago as a school pupil, provoked by the Thatcher government‘s neglect that had left
my comprehensive school classroom literally falling down, I’ve sought to champion better public services for those
who need them most whether as a local councillor or government minister.
SNMT 32 years ago, I joined Labour as a student because of the neglect of the Thatcher government, which had led to my
classroom literally collapsed, and as a result I tried to promote better public services for those who need it most,
whether as a local council or ministers.
UNMT Last 32 years ago, as a student, because of the disdain for the Thatcher-era government, Labour joined Labour.
Problems SNMT: synonym, word_order UNMT: subject, truncation, word_order
-=4= >% "9(+)M< :5 G)+' :.3)8'+9.3%3) 3+2.(93:). 3A9. QR-S
University of Southern California
Information Sciences Institute
5Q
ü 45*(2$%3$+76&*$.*"3%+7'%$..*'%$..7=7(9*6&*7"#$%$&'()*+(.+*.(+
ü <3.+7=7()*4$'96:/*$.*$*%(87+7"$+(*(2$%*"(+97'
ü!"#$%&'())$))*$+&,''-$./012''(+3'-45'6789:;9<=
ü!>?+)&#$(*'&()@,'A#>))'B"+CD(B'"+E>#*(&">+'#$&#"$F(B
üBD45*2.*CD45*E3$%7+$+72(*)7==(9(&'(.
!"##$%&
University of Southern California
Information Sciences Institute
!;
References
• Mark Steedman. 2008. On Becoming a Discipline. Computational Linguistics 34, 1 (2008), 137–144. https://doi.org/10.
1162/coli.2008.34.1.137 arXiv: https://doi.org/10.1162/coli.2008.34.1.137
• Gowda, Thamme, and Jonathan May. "Finding the Optimal Vocabulary Size for Neural Machine Translation." Proceedings of the
2020 Conference on Empirical Methods in Natural Language Processing: Findings. 2020.
• Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association
for Computational Linguistics, Berlin, Germany, 1715–1725. https://doi.org/10.18653/v1/P16-1162
• Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine
Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for
Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318. https://doi.org/10.3115/1073083. 1073135
• Maja Popović. 2015. ChrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on
Statistical Machine Translation. Association for Computational Linguistics, Lisbon, Portugal, 392–395. https:
//doi.org/10.18653/v1/W15-3049
• Thibault Sellam, Dipanjan Das, and Ankur Parikh. 2020. BLEURT: Learning Robust Metrics for Text Generation. In Proceedings
of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online,
7881–7892. https://www.aclweb.org/anthology/2020.acl-main.704
• Ilya Zavorin, Aric Bills, Cassian Corey, Michelle Morrison, Audrey Tong, and Richard Tong. 2020. Corpora for Cross-Language
Information Retrieval in Six Less-Resourced Languages. In Proceedings of the workshop on Cross-Language Search and
Summarization of Text and Speech (CLSSTS2020). European Language Resources Association, Marseille, France, 7–13.
https://www.aclweb.org/anthology/2020.clssts-1.2
University of Southern California
Information Sciences Institute
!5
.U'CV,W<T,🙏
• Fork of SacreBLEU: https://github.com/isi-nlp/sacrebleu
• Pull request is under review: https://github.com/mjpost/sacrebleu/pull/153

More Related Content

Similar to Macro average: rare types are important too

η νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμούη νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
Kostas Panagio
 
Courdimanche demain... la suite ! Présentation du 17 octobre 2011
Courdimanche demain... la suite ! Présentation du 17 octobre 2011Courdimanche demain... la suite ! Présentation du 17 octobre 2011
Courdimanche demain... la suite ! Présentation du 17 octobre 2011
courdimanche95
 
G mc ghee data stories
G mc ghee data stories G mc ghee data stories
G mc ghee data stories
padatascience
 

Similar to Macro average: rare types are important too (20)

Where 2.0 -- Get me a mobile strategy or you’re fired!
Where 2.0 -- Get me a mobile strategy or you’re fired!Where 2.0 -- Get me a mobile strategy or you’re fired!
Where 2.0 -- Get me a mobile strategy or you’re fired!
 
Mobile: The Market, The Web and Windows Phone’s Future
Mobile: The Market, The Web and Windows Phone’s Future Mobile: The Market, The Web and Windows Phone’s Future
Mobile: The Market, The Web and Windows Phone’s Future
 
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...
 
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμούη νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
 
Thai Alcoholic Beverages Regulations 2011
Thai Alcoholic Beverages Regulations 2011Thai Alcoholic Beverages Regulations 2011
Thai Alcoholic Beverages Regulations 2011
 
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
 
Courdimanche demain... la suite ! Présentation du 17 octobre 2011
Courdimanche demain... la suite ! Présentation du 17 octobre 2011Courdimanche demain... la suite ! Présentation du 17 octobre 2011
Courdimanche demain... la suite ! Présentation du 17 octobre 2011
 
The Mythology of Big Data
The Mythology of Big DataThe Mythology of Big Data
The Mythology of Big Data
 
K02-salen: Systems Thinking in Action 2011
K02-salen: Systems Thinking in Action 2011K02-salen: Systems Thinking in Action 2011
K02-salen: Systems Thinking in Action 2011
 
Apostila Ambiente, Energia e Sociedade. Unidade 1.
Apostila Ambiente, Energia e Sociedade. Unidade 1.Apostila Ambiente, Energia e Sociedade. Unidade 1.
Apostila Ambiente, Energia e Sociedade. Unidade 1.
 
2011 ICT &amp; Human Rights Presentation Intro 02
2011 ICT &amp; Human Rights Presentation Intro 022011 ICT &amp; Human Rights Presentation Intro 02
2011 ICT &amp; Human Rights Presentation Intro 02
 
Get Me a Mobile Strategy or You're FIRED!
Get Me a Mobile Strategy or You're FIRED!Get Me a Mobile Strategy or You're FIRED!
Get Me a Mobile Strategy or You're FIRED!
 
G mc ghee data stories
G mc ghee data stories G mc ghee data stories
G mc ghee data stories
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
 
Personalisation and Coproduction
Personalisation and CoproductionPersonalisation and Coproduction
Personalisation and Coproduction
 
Social media usage change during covid
Social media usage change during covidSocial media usage change during covid
Social media usage change during covid
 
فیزیولوژی انسان – 8
فیزیولوژی انسان – 8فیزیولوژی انسان – 8
فیزیولوژی انسان – 8
 
Skip to main content
Skip to main contentSkip to main content
Skip to main content
 
Tests in vitro de l'action inhibitrice de la croissance de P. falciparum par ...
Tests in vitro de l'action inhibitrice de la croissance de P. falciparum par ...Tests in vitro de l'action inhibitrice de la croissance de P. falciparum par ...
Tests in vitro de l'action inhibitrice de la croissance de P. falciparum par ...
 
Innotech - Get Me a Mobile Strategy or You’re Fired!
Innotech - Get Me a Mobile Strategy or You’re Fired!Innotech - Get Me a Mobile Strategy or You’re Fired!
Innotech - Get Me a Mobile Strategy or You’re Fired!
 

More from Thamme Gowda

Thamme Gowda's PhD dissertation defense slides
Thamme Gowda's PhD dissertation defense slidesThamme Gowda's PhD dissertation defense slides
Thamme Gowda's PhD dissertation defense slides
Thamme Gowda
 

More from Thamme Gowda (9)

Thamme Gowda's PhD dissertation defense slides
Thamme Gowda's PhD dissertation defense slidesThamme Gowda's PhD dissertation defense slides
Thamme Gowda's PhD dissertation defense slides
 
500 languages to English Machine Translation Model
500 languages to English Machine Translation Model500 languages to English Machine Translation Model
500 languages to English Machine Translation Model
 
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
 
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRGData Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
 
Sparkler at spark summit east 2017
Sparkler at spark summit east 2017Sparkler at spark summit east 2017
Sparkler at spark summit east 2017
 
Sparkler - Spark Crawler
Sparkler - Spark Crawler Sparkler - Spark Crawler
Sparkler - Spark Crawler
 
Thamme Gowda's Summer2016- NASA JPL Internship
Thamme Gowda's Summer2016- NASA JPL InternshipThamme Gowda's Summer2016- NASA JPL Internship
Thamme Gowda's Summer2016- NASA JPL Internship
 
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityIEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
 
Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache Spark
 

Recently uploaded

Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 

Recently uploaded (20)

STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 

Macro average: rare types are important too

  • 1. University of Southern California Information Sciences Institute !"#$%&'()$"*)+, -"$),./0)1,'$),230%$4"54,.%% !"#$$%&'()*# !"#$%&'()$"*+,)-",-.*!".()(/(- 0")1-%.)(2*$#*+$/(3-%"*4'5)#$%")' tg@isi.edu +%,-,. /(. 6-7(*$#*4$&7/(-%*'"8*!"#$%&'()$"*+,)-",- 0")1-%.)(2*$#*9-"".251'")' weiqiuy@seas.upenn.edu 0(123#13,1%&4,51(2 :),3($& +,3$$5*$#*4$&7/(-%*+,)-",- ;%'"8-).*0")1-%.)(2 lignos@brandeis.edu 6(1#3"#1&7#8 !"#$%&'()$"*+,)-",-.*!".()(/(- 0")1-%.)(2*$#*+$/(3-%"*4'5)#$%")' jonmay@isi.edu !"#$#%&#'()&(*++,-(./.0(
  • 2. University of Southern California Information Sciences Institute ! 67087"5,9714$7:;47%5 • !"#$%$&'()*+,-(. "#$%&'$#()#*+,#-&'$.%&/$.0*&1.$2)-3 456&7.8#*-9&4:;<&7=2#-&
  • 3. University of Southern California Information Sciences Institute > !%47("47%5 • !"#$%$&'()*+,-(. • 7#9",1%&:%#;1,15&3%9"1,-.%2& 2.<<%;&<;($&9:#22&,$=#:#19% – !"#$%&'()&*%)+,)-'&'./)0&.)7?.'?7@#?A.B • >#;%&38?%2&#;%&,$?(;3#13 12-)$&(34"'$56(7&##'8)%6(.//9: • @!"#$%&'%$()&*+&,((-(- #.&/%0.$&#1(&$%$(&#23(+ – 1,%2-3&4$5)62-3&!"#$#%&'()*+,-(.(/-01 23234 – !"#$%&'()*+"*,$-%'!"#$%&'()*+ P(x) - log2 P(x) ,-%./&(0&".)1&2).34.05$.%&2)(1&,)('0&6()74%8 9:;&"(*.0%<&9=>?&"@7.%&
  • 4. University of Southern California Information Sciences Institute C <()$(7)= /0 12$%3$+(*45*6&*7"#$%$&'()*)$+$*3.7&8*4$'96:/ ;0 <3.+7=7'$+76&*=69*4$'96:/>* ./ !$#(0,'1"2*%'*33(332(%,' 4/ !-5%3,#(*2',*367'0#-338+$%&"*+'9: ?0 @3$%7+$+72(*)7==(9(&'(.*#(+A((&*B3-(927.()*$&)* C&.3-(927.()*D45>* !"#$%&''(')*+,'*-'')+./01+$)2+3$4&567
  • 5. University of Southern California Information Sciences Institute : >("?;"475*,!.,"1,",!;?47&#?"11,@?"11787)$ %5,23:"?"5#)A,.)14,B)41
  • 6. University of Southern California Information Sciences Institute D C!.,"1,@?"11787)$ • 7T ∶ (𝑥!𝑥"𝑥# … 𝑥$) → (𝑦!𝑦"𝑦# … 𝑦%) • !"#: 𝑦!:% $%&'()*'+,-.()*'+, 𝑥!:$//% – 3$8"("9'+∏!"# $ 𝑃 𝑦! 𝑦%!, 𝑥#:' ; 𝜃) – 3$8"("9'+∏!"# $ 𝑃 𝑦! 𝑓 𝑦%!, 𝑥#:'; 𝜃# ; 𝜃() !"#0%123)+'4+'55)+ 6%78955:;:'+%EF.0GH&H*G&6H=9 !;!;I <= 123)+'4+'55)+0%%h' = 𝑓 𝑦(:)*, 𝑥!:$ ℎ! ∈ 𝑅" ).*,$"()"/$/. >= 78955:;:'+%%%%%0%𝑃 𝑦* ℎ*) 𝑦! ).*8).,%-(-<* =;4<)=>?)"@ABCD(E(. ? F>?&GH<?)$$(<?)$$GIG#"(
  • 7. University of Southern California Information Sciences Institute J @?"11,D)$8%$3"5#) • #'53%5'3@%𝑇 = ℎ + , 𝑦 + 𝑖 = 1,2,3, … 𝑚} $#*>327$(3-.).?*%-#-%-",-@ • 78955'5%9+'%3A'%B)+*%3CD'5@%9;3'+%3)E'.:F93:). – ;('"3(',1('3*2(',-6(%$<(#'*3'=>?@A'*3'$2B+(2(%,(C'$%'D*0#(=>?@ • 𝐶(𝑐, 𝑎) ()2.35%3A'%.2GH'+%);%3)E'.5%);%3CD' c :.%5'I2'.('%a • J+'*5,(/ = ∑+,! $ 𝐶(𝑐, ℎ(+)) ; K';5,(/ = ∑+,! $ 𝐶(𝑐, 𝑦(+)) • "93(A,(/ = ∑+,! $ min{𝐶 𝑐, ℎ + , 𝐶(𝑐, 𝑦(+))} E/KLM9&NH2,*#*, #7&HO&!;;!I • J+'(:5:).@%𝑃/ = 01*/2 / 34567 / L%%%%%%%%%%K'(988@%𝑅/ = 01*/2 / 8597 / • MNG'952+'%D'+%(8955%(0% 𝐹:;/ = 1 + 𝛽 " 3# × 8# :$ × 3#=8#
  • 8. University of Southern California Information Sciences Institute P !"#$%&#'(()*#'((%+%,-)).,-+/-0'1&, OP'+988%D'+;)+G9.('%$%9.%9P'+94'%);%:.*:P:*298%(8955%D'+;)+G9.('5 <= "9(+)N9P'+94'0%2.B':4A3'*%:='=@%'I298%:GD)+39.('%3)%'9(A%3CD' Macro𝐹) = ∑!∈# +$;! |-| :; 3"4&5<$='&$%'>+-'"%?*'2+';%;@+,A+B&'CD')4A>+'CD$E+"(F5&*$)4'+*5+'$4?+*5G') 3𝑖𝑐𝑟𝑜𝐹) = ∑!∈# .! × +$;! ∑ !&∈# .!& 7()%)8&7)23('&*$%&96"008&w' = 𝑅𝑒𝑓𝑠 𝑐 + 𝑘 *$%&0$4)&&𝑘 ≥ 1 o J#(>$#(𝑘 = 1 ; *4&#K(GI(𝑘 → ∞, F𝑖𝑐𝑟𝑜𝐹! → F)<"4𝐹! o J#(>$#( 𝛽 = 1 3#($<)?#(1/L/6(0L/:(&4(1/6(0//:6(M>$&(?GN#(O-PQ
  • 9. University of Southern California Information Sciences Institute Q E;14787#"47%5,8%$,!"#$%FG, "1,"5,!.,)("?;"47%5,3)4$7# èCompare MacroF1 with Ø BLEU and ChrF1 – popular options Ø MicroF1 – so we can see the difference micro and macro avg makes Ø BLEURT – model-based metric based on BERT * BLEURT has undesirable biases, shown in paper (not talking about here)
  • 10. University of Southern California Information Sciences Institute 5; !"#$%F1 (1,<4H)$1 A:B*CD*6EFEG*G-H.B-.(I*:',%$JC*3'.*-K/'5*H-)L3(*#$%*'55*(27-. 6,+$.?HR#$HS#G&%#7$,+-&.R#$O..8&,%2$.R#%#*7-&'$.%&$H$#&7=2#-9&H'7#$&$.)*G,*S&7.&.*#&.$&70.&G#+,%HO- 6&'CD')4A+H(,$E$)4' 0CD$E+-'"%?*
  • 11. University of Southern California Information Sciences Institute 55 I):CJK 9"4"&4%&.)L4,>("?;"47%5 ;TCC ;T:; ;T!P ;TJP ;T!! ;TJ! ;T>> ;TD5 ;TCC ;TP> ;TD5 ;TDJ H/L0 /L0 /LR /LS /LT /LU J5/-",2*'"8*M%'&&'% +-&'"(),. 𝜏 3G&V(W>8)%(X>'Y#8#%&$( ;NE0 43%JC :',%$JC :),%$JC ;NE0OB&-'" ;NE0OB&-8)'" MacroF1 is a poor indicator of Fluency + Grammar, but one of the strong indicators of Semantics Correlation with Fluency, Grammar, and Semantics on English only
  • 12. University of Southern California Information Sciences Institute 5! I!.,!)4$7#1+,I751,0)$,!)4$7# S 0 R Z . R . R [ . . R [ [ S / 0 . R Z S [ T !"#$%&#'( !"#)%&#*( !"#+%&#)( JG%$ O-PQ O-PQ F)<"4]0 FG<"4]0 ,V"]0 • A)".*P*G/&Q-%*$#*()&-.*'*&-(%),*.,$%-8*3)L3-.(*,$%%-5'()$"*H)(3*3/&'"*R/8L-&-"(. • S;NE0*).*#%$&*(3-*A:B*&-(%),.*7',T'L-?*7%-,$&7/(-8*Q2*('.T*$%L'")U-%. • :',%$JC*'"8*:),%$JC*/.-*(3-*.'&-*($T-")U-%*'.*;NE0?*$Q(')"-8*/.)"L*+',%-;NE0 MacroF1 has more wins in the recent year -- when systems are mostly fluent, adequacy is a key discriminator
  • 13. University of Southern California Information Sciences Institute 5> 9%=514$)"3,@J2-,."1M N@JBB.B,OPOPQ /L.T /LS. /L09 /LZ[ /L[Z /LZR /LZ. /LS. /L.[ /LZ. /LZ9 /LRR /LZ. /LSZ /L.U /LZ/ /LSR /L.U /L// /L./ /LZ/ /L[/ /L9/ -^HP* !7HP* O_HP* 𝜏 3G&V(8+! O-PQ F)<"4]0 FG<"4]0 ,V"]0 O-PQ`^F#)% O-PQ`^F#'G)% MacroF1 is the strongest indicator of downstream IR task performance • IR task with queries and docs in different languages • Translate source docs to target language, and match queries with docs • MT metric having strong correlation with IR metric (e.g. mAP) is more useful
  • 14. University of Southern California Information Sciences Institute 5C R;"?74"47(),9788)$)5#),S)4=))5, B;0)$(71)A,"5A,T51;0)$(71)A,C!.
  • 15. University of Southern California Information Sciences Institute 5: BC!.,(1,TC!.+,SJ>T,"5A,!"#$%FG 32.7 24.0 31.1 25.6 30.8 31.2 33.9 24.0 31.2 27.1 29.6 31.0 38.5 24.0 41.6 31.9 40.3 34.6 33.6 23.5 33.6 27.4 33.0 31.0 20 25 30 35 40 45 DE-EN EN-DE FR-EN EN-FR RO-EN EN-RO BLEU and MacroF1 score BLEU SNMT BLEU UNMT MacroF1 SNMT MacroF1 UNMT !"*(-%&.*$#*;NE0?*0G:B*'"8*+G:B*7-%#$%&'",-*).*,$&7'%'Q5-S?* Q/(*:',%$JC*.3$H.*.)L")#),'"(*8)##-%-",-.*Q-(H--"*+G:B*'"8*0G:B A&BC;D&%@%".1%&'.).&5#(%.0&"(&1-"5#&,EFG&%5().%&'$"#&GC;D
  • 16. University of Southern California Information Sciences Institute 5D BC!.,(1,TC!.+,9>&>C Frequent Types: UNMT > SNMT Rare Types: SNMT > UNMT * Only the top 500 types are shown here
  • 17. University of Southern California Information Sciences Institute 5J BC!.,"5A,TC!. MacroF1 diff: 0.044, favors SNMT; BLEU diff: -0.00087, favors UNMT Src "Er ist witzig, er ist sarkastisch, witzig, er liebt es, Leute zu unterhalten, und er ist ein großartiger Kerl, den er im Teamraum haben kann", erklärte er. Ref "He's funny, he's sarcastic, witty, likes to poke fun at people, and he's a great guy to have in the team room," he explained. SNMT "He is funny, he is sarcastic, funny, he loves to talk people, and he is a great guy he can have in the team room," he explained UNMT "He's funny, he's sly, funny, he loves to unterhalten people, and he's a great guy he can have in the team room," he explained. Problems SNMT: punctuation, wrong_verb. UNMT: synonym, untranslation -=4=<0%%"9(+)M< :5 G)+' :.3)8'+9.3%3) 2.3+9.5893:). 3A9. QR-S
  • 18. University of Southern California Information Sciences Institute 5P BC!.,"5A,TC!. MacroF1 diff: 0.044, favors SNMT; BLEU diff: -0.00087, favors UNMT Src Vor 32 Jahren schloss ich mich als Schüler, wegen der Vernachlässigung der Thatcher-Regierung, Labour an. Diese Vernachlässigung hatte dazu geführt, dass mein Klassenzimmer buchstäblich zusammengebrochen war. Infolgedessen habe ich versucht, mich für bessere öffentliche Dienstleistungen für diejenigen einzusetzen, die sie am meisten brauchen. Egal ob als Gemeinderat oder Minister. Ref Ever since I joined Labour 32 years ago as a school pupil, provoked by the Thatcher government‘s neglect that had left my comprehensive school classroom literally falling down, I’ve sought to champion better public services for those who need them most whether as a local councillor or government minister. SNMT 32 years ago, I joined Labour as a student because of the neglect of the Thatcher government, which had led to my classroom literally collapsed, and as a result I tried to promote better public services for those who need it most, whether as a local council or ministers. UNMT Last 32 years ago, as a student, because of the disdain for the Thatcher-era government, Labour joined Labour. Problems SNMT: synonym, word_order UNMT: subject, truncation, word_order -=4= >% "9(+)M< :5 G)+' :.3)8'+9.3%3) 3+2.(93:). 3A9. QR-S
  • 19. University of Southern California Information Sciences Institute 5Q ü 45*(2$%3$+76&*$.*"3%+7'%$..*'%$..7=7(9*6&*7"#$%$&'()*+(.+*.(+ ü <3.+7=7()*4$'96:/*$.*$*%(87+7"$+(*(2$%*"(+97' ü!"#$%&'())$))*$+&,''-$./012''(+3'-45'6789:;9<= ü!>?+)&#$(*'&()@,'A#>))'B"+CD(B'"+E>#*(&">+'#$&#"$F(B üBD45*2.*CD45*E3$%7+$+72(*)7==(9(&'(. !"##$%&
  • 20. University of Southern California Information Sciences Institute !; References • Mark Steedman. 2008. On Becoming a Discipline. Computational Linguistics 34, 1 (2008), 137–144. https://doi.org/10. 1162/coli.2008.34.1.137 arXiv: https://doi.org/10.1162/coli.2008.34.1.137 • Gowda, Thamme, and Jonathan May. "Finding the Optimal Vocabulary Size for Neural Machine Translation." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 2020. • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715–1725. https://doi.org/10.18653/v1/P16-1162 • Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318. https://doi.org/10.3115/1073083. 1073135 • Maja Popović. 2015. ChrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation. Association for Computational Linguistics, Lisbon, Portugal, 392–395. https: //doi.org/10.18653/v1/W15-3049 • Thibault Sellam, Dipanjan Das, and Ankur Parikh. 2020. BLEURT: Learning Robust Metrics for Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7881–7892. https://www.aclweb.org/anthology/2020.acl-main.704 • Ilya Zavorin, Aric Bills, Cassian Corey, Michelle Morrison, Audrey Tong, and Richard Tong. 2020. Corpora for Cross-Language Information Retrieval in Six Less-Resourced Languages. In Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020). European Language Resources Association, Marseille, France, 7–13. https://www.aclweb.org/anthology/2020.clssts-1.2
  • 21. University of Southern California Information Sciences Institute !5 .U'CV,W<T,🙏 • Fork of SacreBLEU: https://github.com/isi-nlp/sacrebleu • Pull request is under review: https://github.com/mjpost/sacrebleu/pull/153