{ "cells": [ { "cell_type": "code", "execution_count": 7, "id": "sophisticated-running", "metadata": { "ExecuteTime": { "end_time": "2021-03-08T15:35:16.789575Z", "start_time": "2021-03-08T15:35:16.772Z" }, "lang": "en" }, "outputs": [], "source": [ "options(warn=-1)\n", "library(gutenbergr)\n", "library(tidyverse)" ] }, { "cell_type": "markdown", "id": "accessory-compact", "metadata": {}, "source": [ "# Loi de zipf" ] }, { "cell_type": "markdown", "id": "universal-theater", "metadata": {}, "source": [ "## Motivation\n", "\n", "a brief history and account" ] }, { "cell_type": "markdown", "id": "amazing-peninsula", "metadata": {}, "source": [ "## Choisir 3 auteurs de 3 langues différentes" ] }, { "cell_type": "code", "execution_count": 6, "id": "weekly-validity", "metadata": { "ExecuteTime": { "end_time": "2021-03-08T13:27:16.864486Z", "start_time": "2021-03-08T13:27:16.828Z" } }, "outputs": [], "source": [ "# https://www.tidytextmining.com/tfidf.html\n", "# German\n", "german <- gutenberg_metadata %>%\n", " filter(title == \"Die Leiden des jungen Werther\") %>% filter(language == \"de\")\n", "# English\n", "english <- gutenberg_metadata %>%\n", " filter(title == \"Wuthering Heights\") %>% filter(language == \"en\")\n", "# French\n", "french <- gutenberg_metadata %>%\n", " filter(title == \"Madame Bovary\") %>% filter(language == \"fr\")" ] }, { "cell_type": "code", "execution_count": 19, "id": "looking-excitement", "metadata": { "ExecuteTime": { "end_time": "2021-03-08T13:36:27.864263Z", "start_time": "2021-03-08T13:36:27.849Z" } }, "outputs": [], "source": [ "books.ids <- c(german$gutenberg_id,english$gutenberg_id,french.id <- french$gutenberg_id)\n", "names(books.ids) <- c(\"Die Leiden des jungen Werther\",\"Wuthering Heights\",\"Madame Bovary\")" ] }, { "cell_type": "code", "execution_count": null, "id": "surprising-renaissance", "metadata": {}, "outputs": [], "source": [ "# charger les trois textes de manière interactive\n", "question1 <- function(ids) {\n", " \n", "}" ] }, { "cell_type": "code", "execution_count": 22, "id": "declared-reception", "metadata": { "ExecuteTime": { "end_time": "2021-03-08T13:37:45.267351Z", "start_time": "2021-03-08T13:37:45.251Z" } }, "outputs": [], "source": [ "books <- rbind(english,french,german)" ] }, { "cell_type": "code", "execution_count": 23, "id": "permanent-daniel", "metadata": { "ExecuteTime": { "end_time": "2021-03-08T13:37:49.164731Z", "start_time": "2021-03-08T13:37:49.138Z" } }, "outputs": [ { "data": { "text/html": [ "
| gutenberg_id | title | author | gutenberg_author_id | language | gutenberg_bookshelf | rights | has_text |
|---|---|---|---|---|---|---|---|
| <int> | <chr> | <chr> | <int> | <chr> | <chr> | <chr> | <lgl> |
| 768 | Wuthering Heights | Brontë, Emily | 405 | en | Gothic Fiction/Movie Books/Best Books Ever Listings | Public domain in the USA. | TRUE |
| 14155 | Madame Bovary | Flaubert, Gustave | 574 | fr | Best Books Ever Listings/FR Littérature/Banned Books from Anne Haight's list | Public domain in the USA. | TRUE |
| 19794 | Die Leiden des jungen Werther | Goethe, Johann Wolfgang von | 586 | de | Harvard Classics | Public domain in the USA. | TRUE |