Skip to content

Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE, : 'data' must be of a vector type, was 'NULL' #7

Closed
@MarcinKosinski

Description

@MarcinKosinski

Does this debug=TRUE help you to understand what is the cause of the error execution?

> tagged.results <- treetag(c("run", "ran", "running"), treetagger="manual", format="obj",
+                           TT.tknz=FALSE , lang="en",
+                           debug = TRUE,
+                           TT.options=list(path="TreeTagger", preset="en"))
split=[[:space:]]
ign.comp=-
heuristics=abbr
heur.fix=c("’", "'"), c("’", "'")
sentc.end=., !, ?, ;, :
detect=FALSE, FALSE
clean.raw=
perl=FALSE
stopwords=
stemmer=
Assuming 'UTF-8' as encoding for the input file. If the results turn out to be erroneous, check the file for invalid characters, e.g. em.dashes or fancy quotes, and/or consider setting 'encoding' manually.
 
        TT.tokenizer:  koRpus::tokenize() 
				tempfile: C:\Users\Marcin\AppData\Local\Temp\Rtmp2PQ5Ts\tokenizef94305e2e24.txt 
        file:  C:\Users\Marcin\AppData\Local\Temp\Rtmp2PQ5Ts\tempTextFromObjectf942ee2415d.txt 
        TT.lookup.command:   
        TT.pre.tagger:   
        TT.tagger:  TreeTagger/bin/tree-tagger.exe 
        TT.opts:  -token -lemma -sgml -pt-with-lemma -quiet 
        TT.params:  TreeTagger/lib/english-utf8.par 
        TT.filter.command:  | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;' 

        sys.tt.call:  type  C:\Users\Marcin\AppData\Local\Temp\Rtmp2PQ5Ts\tokenizef94305e2e24.txt |   TreeTagger/bin/tree-tagger.exe TreeTagger/lib/english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;' 

Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE,  : 
  'data' must be of a vector type, was 'NULL'
In addition: Warning message:
running command 'C:\Windows\system32\cmd.exe /c type  C:\Users\Marcin\AppData\Local\Temp\Rtmp2PQ5Ts\tokenizef94305e2e24.txt |   TreeTagger\bin\tree-tagger.exe TreeTagger\lib\english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet | perl -pe 's\\tV[BDHV]\\tVB\;s\IN\\that\\tIN\;'' had status 9 
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] koRpus_0.10-2       data.table_1.9.6    gradientr_0.0.1     RWeka_0.4-33        tm_0.7-1            NLP_0.1-10         
 [7] stringi_1.1.5       NbClust_3.0         cluster_2.0.5       factoextra_1.0.4    foreach_1.4.3       openxlsx_3.0.0     
[13] networkD3_0.3       VennDiagram_1.6.17  futile.logger_1.4.3 Boruta_5.2.0        ranger_0.6.0        scales_0.4.1       
[19] ggmosaic_0.1.2      productplots_0.1.1  corrplot_0.77       stringr_1.2.0       magrittr_1.5        dplyr_0.5.0        
[25] purrr_0.2.2         readr_1.0.0         tidyr_0.6.1         tibble_1.2          tidyverse_1.0.0     readxl_0.1.1       
[31] haven_1.0.0         plyr_1.8.4          tables_0.8          Hmisc_4.0-2         ggplot2_2.2.1       Formula_1.2-1      
[37] survival_2.40-1     lattice_0.20-34    

loaded via a namespace (and not attached):
 [1] devtools_1.12.0      RColorBrewer_1.1-2   httr_1.2.1           tools_3.3.1          backports_1.0.4      R6_2.2.0            
 [7] rpart_4.1-10         DBI_0.5-1            lazyeval_0.2.0       colorspace_1.3-1     nnet_7.3-12          withr_1.0.2         
[13] sp_1.2-4             gridExtra_2.2.1      compiler_3.3.1       chron_2.3-47         htmlTable_1.9        flashClust_1.01-2   
[19] plotly_4.5.6         labeling_0.3         slam_0.1-40          checkmate_1.8.2      digest_0.6.10        foreign_0.8-67      
[25] ca_0.70              base64enc_0.1-3      jpeg_0.1-8           htmltools_0.3.5      maps_3.1.1           RWekajars_3.9.1-3   
[31] FactoMineR_1.35      htmlwidgets_0.8      jsonlite_1.1         acepack_1.4.1        wordcloud_2.5        leaps_3.0           
[37] geosphere_1.5-5      Matrix_1.2-7.1       Rcpp_0.12.8          munsell_0.4.3        proto_1.0.0          scatterplot3d_0.3-39
[43] MASS_7.3-45          parallel_3.3.1       ggrepel_0.6.5        splines_3.3.1        mapproj_1.2-4        knitr_1.15          
[49] igraph_1.0.1         rjson_0.2.15         reshape2_1.4.2       codetools_0.2-15     futile.options_1.0.0 kohonen_3.0.2       
[55] latticeExtra_0.6-28  lambda.r_1.1.9       spam_1.4-0           png_0.1-7            RgoogleMaps_1.4.1    gtable_0.2.0        
[61] assertthat_0.1       viridisLite_0.1.3    rJava_0.9-8          iterators_1.0.8      memoise_1.0.0        fields_8.10         
[67] ggmap_2.6.1 

Activity

unDocUMeantIt

unDocUMeantIt commented on Apr 14, 2017

@unDocUMeantIt
Owner

the path to TreeTagger is wrong. try an absolute path beginning with the drive letter.

MarcinKosinski

MarcinKosinski commented on Apr 14, 2017

@MarcinKosinski
Author

Hello, thanks for the fast reply. I have the TreeTagger both in the repository in the path I am currently working and in the C:/ directory

> list.files('C:/TreeTagger')
[1] "bin"          "cmd"          "INSTALL.txt"  "INSTALL.txt~" "lib"          "README.txt"
> list.files('/TreeTagger')
character(0)
> list.files('TreeTagger')
[1] "bin"          "cmd"          "INSTALL.txt"  "INSTALL.txt~" "lib"          "README.txt"  

For the absolute path the results is the same

> tagged.results <- treetag(c("run", "ran", "running"), treetagger="manual", format="obj",
+                           TT.tknz=FALSE , lang="en",
+                           debug = TRUE,
+                           TT.options=list(path="C:/TreeTagger", preset="en"))
split=[[:space:]]
ign.comp=-
heuristics=abbr
heur.fix=c("’", "'"), c("’", "'")
sentc.end=., !, ?, ;, :
detect=FALSE, FALSE
clean.raw=
perl=FALSE
stopwords=
stemmer=
Assuming 'UTF-8' as encoding for the input file. If the results turn out to be erroneous, check the file for invalid characters, e.g. em.dashes or fancy quotes, and/or consider setting 'encoding' manually.
 
        TT.tokenizer:  koRpus::tokenize() 
				tempfile: C:\Users\Marcin\AppData\Local\Temp\Rtmp2PQ5Ts\tokenizef9422ad2921.txt 
        file:  C:\Users\Marcin\AppData\Local\Temp\Rtmp2PQ5Ts\tempTextFromObjectf947302209a.txt 
        TT.lookup.command:   
        TT.pre.tagger:   
        TT.tagger:  C:/TreeTagger/bin/tree-tagger.exe 
        TT.opts:  -token -lemma -sgml -pt-with-lemma -quiet 
        TT.params:  C:/TreeTagger/lib/english-utf8.par 
        TT.filter.command:  | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;' 

        sys.tt.call:  type  C:\Users\Marcin\AppData\Local\Temp\Rtmp2PQ5Ts\tokenizef9422ad2921.txt |   C:/TreeTagger/bin/tree-tagger.exe C:/TreeTagger/lib/english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;' 

Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE,  : 
  'data' must be of a vector type, was 'NULL'
In addition: Warning message:
running command 'C:\Windows\system32\cmd.exe /c type  C:\Users\Marcin\AppData\Local\Temp\Rtmp2PQ5Ts\tokenizef9422ad2921.txt |   C:\TreeTagger\bin\tree-tagger.exe C:\TreeTagger\lib\english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet | perl -pe 's\\tV[BDHV]\\tVB\;s\IN\\that\\tIN\;'' had status 9 

I have installed PERL and downloaded the english-utf8.par file (that is included in the lib/ directory.

unDocUMeantIt

unDocUMeantIt commented on Apr 14, 2017

@unDocUMeantIt
Owner

i see (and i was wondering why treetag() didn't complain about missing files, but of course it won't if they're not missing...).

what's inside the tagged.results object? if all goes well, it's a matrix with three columns.

can you open a command line and execute the full line after sys.tt.call:, beginning with type? what does it return? the was "NULL" error usually occurs if TreeTagger doesn't return what koRpus is expecting, which is a character vector with tab separation (should look like three columns in the terminal).

[the command does work on my linux machine. but apart from your actual issue, it seems TT.tknz=FALSE seems to cut off the last character of the input vector -- i need to investigate this.]

MarcinKosinski

MarcinKosinski commented on Apr 14, 2017

@MarcinKosinski
Author

As for standard R execution that finishes with error the final object is not assigned. I get the message that the Error: object 'tagged.results' not found.

MarcinKosinski

MarcinKosinski commented on Apr 14, 2017

@MarcinKosinski
Author

I can not run anything that is after

sys.tt.call:, beginning with type

as this requires some temporary files (that I do not longer have) and that are probably made out of the source vector c("run", "ran", "running")

treetagger

MarcinKosinski

MarcinKosinski commented on Apr 14, 2017

@MarcinKosinski
Author

But it looks like the regular TreeTagger (not invoked from R) works properly (even though I didn't specify the final file to be lemmatized)

works

unDocUMeantIt

unDocUMeantIt commented on Apr 14, 2017

@unDocUMeantIt
Owner

debug=TRUE should actually keep the temp files as long as the R session is running. did you close your session in the meantime?

MarcinKosinski

MarcinKosinski commented on Apr 14, 2017

@MarcinKosinski
Author

I didn't. Maybe it does not keep them when the error appears? I am lemmatizing from the command line anyway :)

unDocUMeantIt

unDocUMeantIt commented on Apr 14, 2017

@unDocUMeantIt
Owner

no, tempfiles should be kept, at least i'm sure they were in the past, because that's the method that we've been debugging these issues for a long time.

which brings me to the hypothesis that maybe generating the tempfile doesn't work for you in the first place? if the file can't be written, for whatever reason, then no tagging could be done.

have you successfully used koRpus earlier? just to see if this is something that way introduced with the last release.

unDocUMeantIt

unDocUMeantIt commented on Apr 14, 2017

@unDocUMeantIt
Owner

have you checked that perl is in your path on the command line? even if TreeTagger works, the following perl filter might break the full call. this should also cause an error if you try to use TreeTagger's tokenizer or the batch scripts that TreeTagger is usually run with.

MarcinKosinski

MarcinKosinski commented on Apr 14, 2017

@MarcinKosinski
Author

@unDocUMeantIt it was the issue of not being able to create a temporary file

below is the example of another character string for which the treetag works

> library(koRpus)
> tagged.results <- treetag(file = c('TreeTagger/texts/T1_to_be_lemmatized.txt'),
+                             treetagger="manual", format="obj",
+                             TT.tknz=FALSE , lang="en",
+                             debug = TRUE,
+                             TT.options=list(path="TreeTagger", preset="en"))
split=[[:space:]]
ign.comp=-
heuristics=abbr
heur.fix=c("’", "'"), c("’", "'")
sentc.end=., !, ?, ;, :
detect=FALSE, FALSE
clean.raw=
perl=FALSE
stopwords=
stemmer=
Assuming 'UTF-8' as encoding for the input file. If the results turn out to be erroneous, check the file for invalid characters, e.g. em.dashes or fancy quotes, and/or consider setting 'encoding' manually.
 
        TT.tokenizer:  koRpus::tokenize() 
				tempfile: C:\Users\Marcin\AppData\Local\Temp\RtmpIxHmph\tokenize20681e628eb.txt 
        file:  C:\Users\Marcin\AppData\Local\Temp\RtmpIxHmph\tempTextFromObject20682a867b18.txt 
        TT.lookup.command:   
        TT.pre.tagger:   
        TT.tagger:  TreeTagger/bin/tree-tagger.exe 
        TT.opts:  -token -lemma -sgml -pt-with-lemma -quiet 
        TT.params:  TreeTagger/lib/english-utf8.par 
        TT.filter.command:  | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;' 

        sys.tt.call:  type  C:\Users\Marcin\AppData\Local\Temp\RtmpIxHmph\tokenize20681e628eb.txt |   TreeTagger/bin/tree-tagger.exe TreeTagger/lib/english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;' 

> tagged.results
      token        tag    lemma      
 [1,] "TreeTagger" "NN"   "<unknown>"
 [2,] "/"          "SYM"  "/"        
 [3,] "texts"      "NNS"  "text"     
 [4,] "/"          "SYM"  "/"        
 [5,] "T1"         "NP"   "<unknown>"
 [6,] "_"          "SYM"  "_"        
 [7,] "to"         "TO"   "to"       
 [8,] "_"          "SYM"  "_"        
 [9,] "be"         "VB"   "be"       
[10,] "_"          "SYM"  "_"        
[11,] "lemmatized" "JJ"   "<unknown>"
[12,] "."          "SENT" "."        
[13,] "txt"        "NN"   "<unknown>"

Maybe one should add an info if the temp file couldn't be created?

MarcinKosinski

MarcinKosinski commented on Apr 14, 2017

@MarcinKosinski
Author

The PERL adds itself to the PATH during the installation.
I did succeed with the tokniezer() R function previously.

Thanks for answering and for your previous time!

unDocUMeantIt

unDocUMeantIt commented on Apr 14, 2017

@unDocUMeantIt
Owner

now, that's odd -- looks like the tempfile is not created only when you use the type="obj" option, because there is successful tempfile creation in your second example. i'll leave this open until i have a clue what's (not) happening there.

unDocUMeantIt

unDocUMeantIt commented on Apr 14, 2017

@unDocUMeantIt
Owner

i've looked at the treetag() code but so far have no clue what could cause this. it doesn't seem to happen on GNU/linux, but that doesn't explain it. it is unliekly that tempfiles are missing, because treetag() checks for their existance.

unDocUMeantIt

unDocUMeantIt commented on Apr 17, 2017

@unDocUMeantIt
Owner

i've installed koRpus in a windows 10 VM and can replicate the problem.

it seems to be caused by inconsistencies between file.path() and shell(), something which used to work for years but now appears to be broken. try

shell(paste("dir", file.path("C:","Users"))

versus the explicit

shell(paste("dir", file.path("C:","Users", fsep="\\"))

19 remaining items

unDocUMeantIt

unDocUMeantIt commented on May 8, 2017

@unDocUMeantIt
Owner

@jmlehrfeld ah, now i see: your call is incomplete because you only defined the path to the *.exe file but nothing else. please try again with these settings instead:

set.kRp.env(TT.cmd="manual", TT.options=list(path="C:/TreeTagger", preset="en"), lang="en")
# or
set.kRp.env(TT.cmd="manual", TT.options=list(path="C:\\TreeTagger", preset="en"), lang="en")

does at least one of those work?

jmlehrfeld

jmlehrfeld commented on May 8, 2017

@jmlehrfeld

I think so! I set my env as you specified, called the treetag function (without the debug argument), and got no warning or error messages back. I guess I'm all set then. Thanks so much!

trinker

trinker commented on May 10, 2017

@trinker

I have tested this using kkoRpus ‘0.10.2’ on a Win 7 machine running R 3.4.0 and 3.3.1 and no error. I have Win 10 @ work i'll try tomorrow. If path normalization is the issue the normalizePath command is nice:

normalizePath(file.path("C:","Users"))
trinker

trinker commented on May 10, 2017

@trinker

I see I didn't read the last comments here and was late to the party :-)

unDocUMeantIt

unDocUMeantIt commented on Jun 20, 2017

@unDocUMeantIt
Owner

seems to be resolved for the moment.

JingwenRobineau

JingwenRobineau commented on Jun 22, 2017

@JingwenRobineau

I had the same problem. Nothing above worked for me. Finally, I solved the problem by updating the version of R.

eyyarbasi

eyyarbasi commented on Jul 12, 2019

@eyyarbasi

I have a similar problem, tried the aforementioned methods but I wasn't able to solve it. When I try to run the following code in Rstudio, I get the following error.

> system.time(
+   lemma_tagged <- treetag(lemma_unique$word_clean, treetagger="manual", 
+                           format="obj",debug = TRUE, TT.tknz=FALSE , encoding = "UTF-8",lang="en",
+                           TT.options=list(
+                             path="C:\\Treetagger", preset="en")
+   )
+ )
split=[[:space:]]
        ign.comp=-
        heuristics=abbr
        heur.fix=c("’", "'"), c("’", "'")
        sentc.end=., !, ?, ;, :
        detect=FALSE, FALSE
        clean.raw=
        perl=FALSE
        stopwords=
        stemmer=
 
        TT.tokenizer:  koRpus::tokenize() 
				tempfile: C:\Users\EYARBA~1\AppData\Local\Temp\Rtmp2hIXBH\tokenize2c787d7c6eb1.txt 
        file:  C:\Users\EYARBA~1\AppData\Local\Temp\Rtmp2hIXBH\tempTextFromObject2c7867b979a2.txt 
        TT.lookup.command:   
        TT.pre.tagger:   
        TT.tagger:  C:\Treetagger/bin/tree-tagger.exe 
        TT.opts:  -token -lemma -sgml -pt-with-lemma -quiet 
        TT.params:  C:/Treetagger/lib/english-utf8.par 
        TT.filter.command:  | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;' 

        sys.tt.call:  type  C:\Users\EYARBA~1\AppData\Local\Temp\Rtmp2hIXBH\tokenize2c787d7c6eb1.txt |   C:\Treetagger\bin\tree-tagger.exe C:\Treetagger\lib\english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;' 

Error: Awww, this should not happen: TreeTagger didn't return any useful data.
  This can happen if the local TreeTagger setup is incomplete or different from what presets expected.
  You should re-run your command with the option 'debug=TRUE'. That will print all relevant configuration.
  Look for a line starting with 'sys.tt.call:' and try to execute the full command following it in a
  command line terminal. Do not close this R session in the meantime, as 'debug=TRUE' will keep temporary
  files that might be needed.
  If running the command after 'sys.tt.call:' does fail, you'll need to fix the TreeTagger setup.
  If it does *not* fail but produce a table with proper results, please contact the author!
In addition: Warning message:
In system(cmd, intern = intern, wait = wait | intern, show.output.on.console = wait,  :
  running command 'C:\windows\system32\cmd.exe /c type  C:\Users\EYARBA~1\AppData\Local\Temp\Rtmp2hIXBH\tokenize2c787d7c6eb1.txt |   C:\Treetagger\bin\tree-tagger.exe C:\Treetagger\lib\english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet | perl -pe 's\\tV[BDHV]\\tVB\;s\IN\\that\\tIN\;'' had status 255
Assuming 'UTF-8' as encoding for the input file. If the results turn out to be erroneous, check the file for invalid characters, e.g. em.dashes or fancy quotes, and/or consider setting 'encoding' manually.
Timing stopped at: 0.63 0.02 0.84

However, on the command prompt, the same thing works. I just can't get it on Rstudio. Any ideas why this might be happening? Btw, I'm relatively new in these stuff so I'm sorry if I'm missing something obvious :)

I think I have all the necessary files in the working directory since I can get some results on the cmd. To me, it seems like everything is working but just not on the platform that I want to use. Thanks a lot!

image

unDocUMeantIt

unDocUMeantIt commented on Jul 13, 2019

@unDocUMeantIt
Owner

@eyyarbasi:

I have a similar problem, tried the aforementioned methods but I wasn't able to solve it. When I try to run the following code in Rstudio, I get the following error.

could you please provide some more information on your system setup?

  • what versions of R and koRpusare you using?
  • what operating system (i take it it's some version of windows)?

just a shot in the dark: can you try to start a plain R session (without RStudio) and run the your R code from there? i would like to check if this issue is somehow related to the environment set up by RStudio (i don't use RStudio, it's all RKWard here ;)).

unDocUMeantIt

unDocUMeantIt commented on Jul 13, 2019

@unDocUMeantIt
Owner

@eyyarbasi does the lemma_taggedobject that you tried to create hold any data at all?

eyyarbasi

eyyarbasi commented on Jul 13, 2019

@eyyarbasi

Thanks for the reply! RStudio is v1.2.1335 and R is 3.6.0.
Here's my sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] installr_0.21.3      koRpus.lang.en_0.1-3 koRpus_0.11-5        sylly_0.1-5          SnowballC_0.6.0      topicmodels_0.2-8   
 [7] ldatuning_1.0.0      tidytext_0.2.1       forcats_0.4.0        stringr_1.4.0        dplyr_0.8.2          purrr_0.3.2         
[13] readr_1.3.1          tidyr_0.8.3          tibble_2.1.3         ggplot2_3.2.0        tidyverse_1.2.1      magrittr_1.5        
[19] readxl_1.3.1        

loaded via a namespace (and not attached):
 [1] modeltools_0.2-22 tidyselect_0.2.5  slam_0.1-45       NLP_0.2-0         haven_2.1.0       lattice_0.20-38   vctrs_0.1.0      
 [8] colorspace_1.4-1  generics_0.0.2    stats4_3.6.0      utf8_1.1.4        rlang_0.4.0       pillar_1.4.2      glue_1.3.1       
[15] withr_2.1.2       modelr_0.1.4      munsell_0.5.0     gtable_0.3.0      cellranger_1.1.0  rvest_0.3.4       tm_0.7-6         
[22] parallel_3.6.0    fansi_0.4.0       sylly.en_0.1-3    broom_0.5.2       tokenizers_0.2.1  Rcpp_1.0.1        scales_1.0.0     
[29] backports_1.1.4   jsonlite_1.6      hms_0.4.2         stringi_1.4.3     grid_3.6.0        cli_1.1.0         tools_3.6.0      
[36] lazyeval_0.2.2    janeaustenr_0.1.5 zeallot_0.1.0     crayon_1.3.4      pkgconfig_2.0.2   Matrix_1.2-17     data.table_1.12.2
[43] xml2_1.2.0        lubridate_1.7.4   assertthat_0.2.1  httr_1.4.0        rstudioapi_0.10   R6_2.4.0          nlme_3.1-139     
[50] compiler_3.6.0

And your intuition was right! It's an issue with RStudio. the object lemma_tagged doesn't even get created in RStudio but the code works as a simple R script without RStudio. Somehow treetag() freaks out in RStudio. Open for futher suggestions. Thanks again!

unDocUMeantIt

unDocUMeantIt commented on Jul 13, 2019

@unDocUMeantIt
Owner

And your intuition was right! It's an issue with RStudio. the object lemma_tagged doesn't even get created in RStudio but the code works as a simple R script without RStudio. Somehow treetag() freaks out in RStudio.

that's interesting -- and a bit puzzling...

Open for futher suggestions.

during a workshop i gave recently one windows user ran into a problem with access permissions. i.e., his code would only run if he started RStudio with admin rights. IIRC, the application was unable to run the TreeTagger executable otherwise. running userland software as admin is not a solution, but if you could at least check once if this makes the problem go way, i'd get a clue where the actual issue lies.

one other hypothesis i have is RStudio's handling of system()/shell() calls. its terminal implementation seems to offer to run a windows version of bash, and i wonder if that could also be the case for shell() calls, because it would render all file paths useless. so it would probably be interesting to have a look at the return values of shell() for the command you successfully ran in cmd.exe. this call seems to fail in RStudio (but not in plain R). if it does, you should try to run it in small units to see at which point in the call chain it actually fails, like

(shell("type C:\Users\EYARBA~1\AppData\Local\Temp\Rtmp2hIXBH\tokenize2c787d7c6eb1.txt", translate=TRUE, ignore.stderr=TRUE, intern=TRUE))

(shell("type C:\Users\EYARBA~1\AppData\Local\Temp\Rtmp2hIXBH\tokenize2c787d7c6eb1.txt | C:\Treetagger\bin\tree-tagger.exe C:\Treetagger\lib\english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet", translate=TRUE, ignore.stderr=TRUE, intern=TRUE))

(shell("type C:\Users\EYARBA~1\AppData\Local\Temp\Rtmp2hIXBH\tokenize2c787d7c6eb1.txt | C:\Treetagger\bin\tree-tagger.exe C:\Treetagger\lib\english-utf8.par -token -lemma -sgml -pt-with-lemma -quiet | perl -pe 's/\tV[BDHV]/\tVB/;s/IN\/that/\tIN/;'", translate=TRUE, ignore.stderr=TRUE, intern=TRUE))

update the temporary file, of course ;) this should tell us if it already fails accessing the text file, running TreeTagger.exe or perl.

XueWenSYan

XueWenSYan commented on Feb 4, 2022

@XueWenSYan

Hi, I encountered the same error as eyyarbasi, and I'm also using windows. I tried running the code in base R gui and with administrative privileges but the error persists. I similarly could run treetag from command line. Has there been a solution now? Thank you!

unDocUMeantIt

unDocUMeantIt commented on Feb 5, 2022

@unDocUMeantIt
Owner

Hi, I encountered the same error as eyyarbasi, and I'm also using windows. I tried running the code in base R gui and with administrative privileges but the error persists.

in that case it is probably not the same issue. since this issue is already closed, could you please open a new one including info on your system setup (installed software packages with version numbers) and example code to reproduce the error?

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @unDocUMeantIt@trinker@jmlehrfeld@MarcinKosinski@eyyarbasi

        Issue actions

          Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE, : 'data' must be of a vector type, was 'NULL' · Issue #7 · unDocUMeantIt/koRpus