From 8ae06f04df2cc70bc004d388b2be917f06829595 Mon Sep 17 00:00:00 2001 From: Dorian Date: Sun, 16 Oct 2022 11:10:52 +0200 Subject: [PATCH] wikipage getter in a separate folder so its also usable from different experiments --- README.md | 16 +++++- exp.opacity/make.py | 38 ++----------- exp.recommanded/make.py | 56 +++++++++++++++++++ summa/__init__.pyc | Bin 0 -> 339 bytes summa/__pycache__/__init__.cpython-38.pyc | Bin 342 -> 369 bytes summa/__pycache__/edits.cpython-38.pyc | Bin 0 -> 1589 bytes summa/commons.pyc | Bin 0 -> 865 bytes summa/edits.py | 16 +++++- wikipage/__pycache__/__init__.cpython-38.pyc | Bin 0 -> 228 bytes wikipage/__pycache__/page.cpython-38.pyc | Bin 0 -> 679 bytes wikipage/page.py | 25 +++++++++ 11 files changed, 115 insertions(+), 36 deletions(-) create mode 100644 exp.recommanded/make.py create mode 100644 summa/__init__.pyc create mode 100644 summa/__pycache__/edits.cpython-38.pyc create mode 100644 summa/commons.pyc create mode 100644 wikipage/__pycache__/__init__.cpython-38.pyc create mode 100644 wikipage/__pycache__/page.cpython-38.pyc create mode 100644 wikipage/page.py diff --git a/README.md b/README.md index c61a8bb..50fc032 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,12 @@ -## opacity experiment +## edited summa (textrank) + +summa is a textrank python implementation (https://github.com/summanlp/textrank). +it was modified under `summa/`, by adding an `summa/edits.py` files to create two new function, to access the internal process steps of textrank: +1. `scored_sentences`: gives the list of all the sentences with their score. +2. `similarity_graph`: gives the matrix of similarity of all the sentences in a text. + +## [EXP] opacity For any wikipedia page, show the text content but where every sentences has an opacity inversely proportional to its TextRank score. @@ -7,17 +14,20 @@ Meaning sentences considered as _"relevant to be included in a summary of the ar ### using -* textrank python implementation (https://github.com/summanlp/textrank) modified under `summa/` so it gives us all the sentences with their score. +* edited summa (https://github.com/summanlp/textrank) * wikipedia python module (https://pypi.org/project/wikipedia/) ### to use modify the variable `wikipedia_page` in `make.py` to whatever page then + cd exp.opacity python3 make.py ### technical notes * **headers opacities** where manually recomputed has average of their section, this is justified because otherwise their break the flow of the document (their shortness seems to either put them nearly full black or white otherwise, independantly of how textrank rank the paragraphs in their associated sections) * using the `.content` method of python wikipedia, we get **plain text plus header in wikitext**, but things like `

`, `