Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generateWikiConfig won't work for some languages #159

Closed
claeyzre opened this issue Jun 8, 2018 · 17 comments · Fixed by #202
Closed

generateWikiConfig won't work for some languages #159

claeyzre opened this issue Jun 8, 2018 · 17 comments · Fixed by #202

Comments

@claeyzre
Copy link

claeyzre commented Jun 8, 2018

Hi,

I am trying to parse wiki-text from many wikipedias. For some languages the configs output from generateWikiConfig work fine. But I end up with that kind of stacktraces when I try for example for Japanese. I am using "org.sweble.wikitext" % "swc-engine" % "3.1.7". Checking the issues, related bugs should have been fixed in the 3.1.7.

java.lang.IllegalArgumentException: The name `sub' was already registered by the alias `sub' when trying to register it for alias `img_sub'.
	at org.sweble.wikitext.engine.config.WikiConfigImpl.addI18nAlias(WikiConfigImpl.java:386)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.addi18NAliases(LanguageConfigGenerator.java:204)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:112)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:78)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:63)
	at PageBuilder$.apply(PageBuilder.scala:59)
	at Xml2FasttextDS$.xml2Dataset(Xml2FasttextDS.scala:27)
	at Xml2FasttextDS$.main(Xml2FasttextDS.scala:139)
	at Xml2FasttextDS.main(Xml2FasttextDS.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sbt.Run.invokeMain(Run.scala:67)
	at sbt.Run.run0(Run.scala:61)
	at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
	at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Logger$$anon$4.apply(Logger.scala:84)
	at sbt.TrapExit$App.run(TrapExit.scala:248)
	at java.lang.Thread.run(Thread.java:745)
java.lang.IllegalArgumentException: The name `名前空間:' was already registered by the alias `namespace' when trying to register it for alias `ns'.
	at org.sweble.wikitext.engine.config.WikiConfigImpl.addI18nAlias(WikiConfigImpl.java:386)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.addi18NAliases(LanguageConfigGenerator.java:204)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:112)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:78)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:63)
	at PageBuilder$.apply(PageBuilder.scala:59)
	at Xml2FasttextDS$.xml2Dataset(Xml2FasttextDS.scala:27)
	at Xml2FasttextDS$.main(Xml2FasttextDS.scala:139)
	at Xml2FasttextDS.main(Xml2FasttextDS.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sbt.Run.invokeMain(Run.scala:67)
	at sbt.Run.run0(Run.scala:61)
	at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
	at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Logger$$anon$4.apply(Logger.scala:84)
	at sbt.TrapExit$App.run(TrapExit.scala:248)
	at java.lang.Thread.run(Thread.java:745)
java.lang.IllegalArgumentException: The name `noerror' was already registered by the alias `defaultsort_noerror' when trying to register it for alias `displaytitle_noerror'.
	at org.sweble.wikitext.engine.config.WikiConfigImpl.addI18nAlias(WikiConfigImpl.java:386)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.addi18NAliases(LanguageConfigGenerator.java:204)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:112)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:78)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:63)
	at PageBuilder$.apply(PageBuilder.scala:59)
	at Xml2FasttextDS$.xml2Dataset(Xml2FasttextDS.scala:27)
	at Xml2FasttextDS$.main(Xml2FasttextDS.scala:139)
	at Xml2FasttextDS.main(Xml2FasttextDS.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sbt.Run.invokeMain(Run.scala:67)
	at sbt.Run.run0(Run.scala:61)
	at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
	at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Logger$$anon$4.apply(Logger.scala:84)
	at sbt.TrapExit$App.run(TrapExit.scala:248)
	at java.lang.Thread.run(Thread.java:745)
java.lang.IllegalArgumentException: The name `noreplace' was already registered by the alias `defaultsort_noreplace' when trying to register it for alias `displaytitle_noreplace'.
	at org.sweble.wikitext.engine.config.WikiConfigImpl.addI18nAlias(WikiConfigImpl.java:386)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.addi18NAliases(LanguageConfigGenerator.java:204)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:112)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:78)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:63)
	at PageBuilder$.apply(PageBuilder.scala:59)
	at Xml2FasttextDS$.xml2Dataset(Xml2FasttextDS.scala:27)
	at Xml2FasttextDS$.main(Xml2FasttextDS.scala:139)
	at Xml2FasttextDS.main(Xml2FasttextDS.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sbt.Run.invokeMain(Run.scala:67)
	at sbt.Run.run0(Run.scala:61)
	at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
	at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Logger$$anon$4.apply(Logger.scala:84)
	at sbt.TrapExit$App.run(TrapExit.scala:248)
	at java.lang.Thread.run(Thread.java:745)
java.lang.IllegalArgumentException: No alias registered for parser function `ns'.
	at org.sweble.wikitext.engine.config.WikiConfigImpl.addParserFunction(WikiConfigImpl.java:449)
	at org.sweble.wikitext.engine.config.WikiConfigImpl.addParserFunctionGroup(WikiConfigImpl.java:430)
	at org.sweble.wikitext.engine.utils.DefaultConfig.addParserFunctions(DefaultConfig.java:573)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:114)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:78)
	at org.sweble.wikitext.engine.utils.LanguageConfigGenerator.generateWikiConfig(LanguageConfigGenerator.java:63)
	at PageBuilder$.apply(PageBuilder.scala:59)
	at Xml2FasttextDS$.xml2Dataset(Xml2FasttextDS.scala:27)
	at Xml2FasttextDS$.main(Xml2FasttextDS.scala:139)
	at Xml2FasttextDS.main(Xml2FasttextDS.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sbt.Run.invokeMain(Run.scala:67)
	at sbt.Run.run0(Run.scala:61)
	at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
	at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
	at sbt.Logger$$anon$4.apply(Logger.scala:84)
	at sbt.TrapExit$App.run(TrapExit.scala:248)
	at java.lang.Thread.run(Thread.java:745)
[success] Total time: 1 s, completed Jun 8, 2018 11:00:56 AM```

Can you tell me if that's possible to generate Japanese WikiConfig with this method or should I do it manually ?

Thanks for your help,
@mawiesne
Copy link
Contributor

I can confirm the output of the stacktrace for other languages than Japanese: German/French... are also affected. It seems that Swebble in version 3.1.7 (or below) has issues at WikiConfigImpl.addI18nAlias(WikiConfigImpl.java:386). The code that prints out these errors to the standard console even has a documented TODO stating

// TODO resolve conflicts problem

see: LanguageConfigGenerator.addi18NAliases(LanguageConfigGenerator.java:204) and below.

Besides the actual problem with multiple registrations ("was already registered by the alias") , it would be better to use a logging framework here, as otherwise it will not be captured within log files, given a server environment (e.g., application containers...).

Can @ferschke or @reckart have a look at this? Or: contact Samy Ateia, samyateia [at] hotmail.de

@reckart
Copy link
Member

reckart commented Jun 22, 2018

@mawiesne I'm trying to help out with PRs and build infrastructure etc a bit and I could even try running a release, but unfortunately, I have no spare resources to actually work on the code.

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

fyi I took a look at the code and this is not an error per se. The language configurator tries to associate some aliases for some specific tags and prints the stacktrace if the tag is already in use. The problem is in Sweble sweble/sweble-wikitext#72 and not something that lies in the code of dkpro. That being said, you should be able to create a japanese config even with these stacktraces being printed.

@claeyzre
Copy link
Author

On some languages, the stacktraces are just printed and the config seems correct and the parsing goes on. On some other languages like Japanese, this leads to not just a print but in a real exception and no config are created.

@mawiesne
Copy link
Contributor

mawiesne commented Jun 22, 2018

@reckart Thanks for your support. Just want to raise awareness that both #159 and #160 should be tackled before a release of 1.2.0 is conducted. IMHO #160 is a blocker atm. For #159 I'd say it could be compensated (by ignoring it) even though @claeyzre might have a point that things can fail later during runtime. @claeyzre Can you pls add a full stacktrace of the exceptions you encounter "later on" for a Japanese language/environment.

@claeyzre
Copy link
Author

claeyzre commented Jun 22, 2018

@mawiesne The stacktrace is the one I gave you. The thing is that sometimes it's just printed and my program carries on. Sometimes it stops directly after the stacktrace being printed.

To reproduce, you can execute this (Scala) code:

import org.sweble.wikitext.engine.config.WikiConfig
import org.sweble.wikitext.engine.WtEngineImpl
import org.sweble.wikitext.engine.utils.LanguageConfigGenerator

val config = LanguageConfigGenerator.generateWikiConfig(language)
val engine = new WtEngineImpl(config)

with language being a prefix like 'ja'.

@mawiesne mawiesne added the major label Jul 3, 2018
@mawiesne
Copy link
Contributor

mawiesne commented Jul 17, 2018

@tgalery Could you look into this issue, providing a workaround or fix?

@tgalery
Copy link
Contributor

tgalery commented Jul 19, 2018

Will try to reproduce the bug first. But probably this would be a fix in sweble.

@mawiesne
Copy link
Contributor

Sounds reasonable. If you could provide a test case that demonstrates this scenario that would be of great help.

@mawiesne
Copy link
Contributor

@reckart Why can't I assign @tgalery for this issue? Is he missing some rights?

@reckart
Copy link
Member

reckart commented Jul 19, 2018

@tgalery I have added you to the JWPL dev team so @mawiesne can assign issues to you. This also allows you to create PRs directly in the JWPL repo. However, the settings (should) prevent pushes directly to the master branch - PRs are required and need to be approved as usual.

@reckart
Copy link
Member

reckart commented Jul 19, 2018

@tgalery oh, and you can approve PRs by others now.

@tgalery
Copy link
Contributor

tgalery commented Jul 25, 2018

FYI, I've confirmed the bug. For some language, i.e. German, we can create the config, whereas for some others, like Japanese, we cannot. I still have the feeling that the best place to fix it would be in sweble, so I will create an issue there and reference it here. If things get nasty too quickly, I might just handle the exception and return the English config for the languages that are throwing the error.

@tgalery
Copy link
Contributor

tgalery commented Jul 25, 2018

and here's some note to self:

scala> lastException.getClass.getCanonicalName
res5: String = java.lang.IllegalArgumentException

scala> lastException.getCause
res6: Throwable = null

@mawiesne
Copy link
Contributor

mawiesne commented Aug 2, 2018

@tgalery Could you cross-link the issue in Sweble in here?

@tgalery
Copy link
Contributor

tgalery commented Aug 3, 2018

Sure, the issue is this sweble/sweble-wikitext#72 I've been swamped this week, but should crack on it next week.

@tgalery
Copy link
Contributor

tgalery commented Sep 28, 2018

Can someone test PR #202 to confirm this issue is gone ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants