[7]:
options(warn=-1)
library(gutenbergr)
library(tidyverse)
1. Loi de zipf¶
1.1. Motivation¶
a brief history and account
1.2. Choisir 3 auteurs de 3 langues différentes¶
[6]:
# https://www.tidytextmining.com/tfidf.html
# German
german <- gutenberg_metadata %>%
filter(title == "Die Leiden des jungen Werther") %>% filter(language == "de")
# English
english <- gutenberg_metadata %>%
filter(title == "Wuthering Heights") %>% filter(language == "en")
# French
french <- gutenberg_metadata %>%
filter(title == "Madame Bovary") %>% filter(language == "fr")
[19]:
books.ids <- c(german$gutenberg_id,english$gutenberg_id,french.id <- french$gutenberg_id)
names(books.ids) <- c("Die Leiden des jungen Werther","Wuthering Heights","Madame Bovary")
[ ]:
# charger les trois textes de manière interactive
question1 <- function(ids) {
}
[22]:
books <- rbind(english,french,german)
[23]:
books
| gutenberg_id | title | author | gutenberg_author_id | language | gutenberg_bookshelf | rights | has_text |
|---|---|---|---|---|---|---|---|
| <int> | <chr> | <chr> | <int> | <chr> | <chr> | <chr> | <lgl> |
| 768 | Wuthering Heights | Brontë, Emily | 405 | en | Gothic Fiction/Movie Books/Best Books Ever Listings | Public domain in the USA. | TRUE |
| 14155 | Madame Bovary | Flaubert, Gustave | 574 | fr | Best Books Ever Listings/FR Littérature/Banned Books from Anne Haight's list | Public domain in the USA. | TRUE |
| 19794 | Die Leiden des jungen Werther | Goethe, Johann Wolfgang von | 586 | de | Harvard Classics | Public domain in the USA. | TRUE |
1.3. Colle les éléments du vecteur texte en un vecteur de caractères de longueur 1¶
[33]:
# english.book = gutenberg_download(768)
english.text = english.book$text
english.text <- paste(english.text, collapse=" ")
[3]:
# english.text
[ ]: