[R] Reading PDF files with German umlauts using tabulizer
    Wolfgang Grond 
    grond @end|ng |rom number|@nd@de
       
    Tue Sep  6 11:39:52 CEST 2022
    
    
  
Dear all,
I have some trouble with reading PDF files in German language.
I want to extract text and tables with the tabulizer package, and every 
things goes well as long as I read English texts.
When I try the same code
text <- extract_text(file = "Pub_001.pdf")
with documents in German language
German umlauts are not recognized.
They are either replaced by a combination of characters.
Instead of
"Entmischung und Kristallisation in Gläsern des Systems"
                                      --
I get
"Entmischung und Kristallisation in GHisern des Systems"
                                      --
or replaced by ascii like this
instead of
"In Gläsern des Systems"
       -
I get
"In Glasern des Systems"
       -
Opening the file with Adobe Reader tells me that encoding is "Ansi"
Is there a way to read this file correctly?
Thanks in advance for any idea.
Regards
    
    
More information about the R-help
mailing list