[7]:
options(warn=-1)
library(gutenbergr)
library(tidyverse)

1. Loi de zipf

1.1. Motivation

a brief history and account

1.2. Choisir 3 auteurs de 3 langues différentes

[6]:
# https://www.tidytextmining.com/tfidf.html
# German
german <- gutenberg_metadata %>%
  filter(title == "Die Leiden des jungen Werther") %>% filter(language == "de")
# English
english <- gutenberg_metadata %>%
  filter(title == "Wuthering Heights") %>% filter(language == "en")
# French
french <- gutenberg_metadata %>%
  filter(title == "Madame Bovary") %>% filter(language == "fr")
[19]:
books.ids <- c(german$gutenberg_id,english$gutenberg_id,french.id <- french$gutenberg_id)
names(books.ids) <- c("Die Leiden des jungen Werther","Wuthering Heights","Madame Bovary")
[ ]:
# charger les trois textes de manière interactive
question1 <- function(ids) {

}
[22]:
books <- rbind(english,french,german)
[23]:
books
A tibble: 3 × 8
gutenberg_idtitleauthorgutenberg_author_idlanguagegutenberg_bookshelfrightshas_text
<int><chr><chr><int><chr><chr><chr><lgl>
768Wuthering Heights Brontë, Emily 405enGothic Fiction/Movie Books/Best Books Ever Listings Public domain in the USA.TRUE
14155Madame Bovary Flaubert, Gustave 574frBest Books Ever Listings/FR Littérature/Banned Books from Anne Haight's listPublic domain in the USA.TRUE
19794Die Leiden des jungen WertherGoethe, Johann Wolfgang von586deHarvard Classics Public domain in the USA.TRUE

1.3. Colle les éléments du vecteur texte en un vecteur de caractères de longueur 1

[33]:
# english.book = gutenberg_download(768)
english.text = english.book$text
english.text <- paste(english.text, collapse=" ")
[3]:
# english.text

|2021-02-20\_18-02-10.png|

|2021-03-08\_16-48-55.png|

[ ]: