From 435988d621d99ff91952440d2f6602e994f7aa93 Mon Sep 17 00:00:00 2001 From: Dorian Date: Sat, 15 Oct 2022 19:55:14 +0200 Subject: [PATCH] better readme.md --- README.md | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3b4dfe2..0dde071 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,22 @@ -opacity experiment using: +## opacity experiment + +For any wikipedia page, show the text content but where every sentences has an opacity inversely proportional to its TextRank score. + +Meaning sentences considered as _"relevant to be included in a summary of the article"_ becomes invisible; and what become visible is what would be considered as the _"boring and redundant"_. + +### using + * textrank python implementation (https://github.com/summanlp/textrank) modified under `summa/` so it gives us all the sentences with their score. -* wikipedia python module (https://pypi.org/project/wikipedia/) \ No newline at end of file +* wikipedia python module (https://pypi.org/project/wikipedia/) + +### to use + +modify the variable `wikipedia_page` in `make.py` to whatever page then + + python3 make.py + +### technical notes + +* **headers opacities** where manually recomputed has average of their section, this is justified because otherwise their break the flow of the document (their shortness seems to either put them nearly full black or white otherwise, independantly of how textrank rank the paragraph in the sections) +* using the `.content` method of python wikipedia, we get **plain text plus header in wikitext**, but thing like `

`, `