thewarehouseandtheforest/README.md


## edited summa (textrank)

summa is a textrank python implementation (https://github.com/summanlp/textrank).
it was modified under `summa/`, by adding an `summa/edits.py` files to create two new function, to access the internal process steps of textrank:
1. `scored_sentences`: gives the list of all the sentences with their score.
2. `similarity_graph`: gives the matrix of similarity of all the sentences in a text.

## [EXP] opacity

For any wikipedia page, show the text content but where every sentences has an opacity inversely proportional to its TextRank score.

Meaning sentences considered as _"relevant to be included in a summary of the article"_ becomes invisible; and what become visible is what would be considered as the _"boring and redundant"_.

### using

* edited summa (https://github.com/summanlp/textrank)
* wikipedia python module (https://pypi.org/project/wikipedia/)

### to use

modify the variable `wikipedia_page` in `make.py` to whatever page then

        cd exp.opacity
        python3 make.py

### technical notes

* **headers opacities** where manually recomputed has average of their section, this is justified because otherwise their break the flow of the document (their shortness seems to either put them nearly full black or white otherwise, independantly of how textrank rank the paragraphs in their associated sections)
* using the `.content` method of python wikipedia, we get **plain text plus header in wikitext**, but things like `<p>`, `<ul>`, `<blockquote>`, etc all dissapeared. see if we want to craft a version using the `.html` method of python wikipedia, but it becomes more complex because of sentence tokenisation, probably need an index to keep track of their original div nested location.
* **opacities were remapped** to add contrast to their curves. still need to experiment with that to find some kind of nice compromise on both paper and screen ?

## [EXP] recommanded

## [EXP] custom similarity

### technical note

* had to build a `similarity_graph` function to get the matrix of a text. 
* the computation of those numbers is made in the `_get_similarity` function in `summarizer.py`, basically counting the words and dividing them by length of the sentence. the numbers can vary from approx 3.5 to 0 and are not symmetrized or normalized in any way. so it feels that we can input what we want lol
* we want to input our own matrices, so we create a `set_graph_custom_edge_weights` and `custom_summarize`.
first experiment with opacity 2 years ago
wikipage getter in a separate folder so its also usable from different experiments 2 years ago			`## edited summa (textrank)`

			`summa is a textrank python implementation (https://github.com/summanlp/textrank).`
			it was modified under `summa/`, by adding an `summa/edits.py` files to create two new function, to access the internal process steps of textrank:
			1. `scored_sentences`: gives the list of all the sentences with their score.
			2. `similarity_graph`: gives the matrix of similarity of all the sentences in a text.

			`## [EXP] opacity`
better readme.md 2 years ago
			`For any wikipedia page, show the text content but where every sentences has an opacity inversely proportional to its TextRank score.`

			`Meaning sentences considered as _"relevant to be included in a summary of the article"_ becomes invisible; and what become visible is what would be considered as the _"boring and redundant"_.`

			`### using`

wikipage getter in a separate folder so its also usable from different experiments 2 years ago			`* edited summa (https://github.com/summanlp/textrank)`
better readme.md 2 years ago			`* wikipedia python module (https://pypi.org/project/wikipedia/)`

			`### to use`

			modify the variable `wikipedia_page` in `make.py` to whatever page then

wikipage getter in a separate folder so its also usable from different experiments 2 years ago			`cd exp.opacity`
better readme.md 2 years ago			`python3 make.py`

			`### technical notes`

one more notes in readme.md 2 years ago			`* headers opacities where manually recomputed has average of their section, this is justified because otherwise their break the flow of the document (their shortness seems to either put them nearly full black or white otherwise, independantly of how textrank rank the paragraphs in their associated sections)`
			* using the `.content` method of python wikipedia, we get plain text plus header in wikitext, but things like `<p>`, `<ul>`, `<blockquote>`, etc all dissapeared. see if we want to craft a version using the `.html` method of python wikipedia, but it becomes more complex because of sentence tokenisation, probably need an index to keep track of their original div nested location.
wikipage getter in a separate folder so its also usable from different experiments 2 years ago			`* opacities were remapped to add contrast to their curves. still need to experiment with that to find some kind of nice compromise on both paper and screen ?`

custom_summarize with self made matrix 2 years ago			`## [EXP] recommanded`

			`## [EXP] custom similarity`

			`### technical note`

			* had to build a `similarity_graph` function to get the matrix of a text.
			* the computation of those numbers is made in the `_get_similarity` function in `summarizer.py`, basically counting the words and dividing them by length of the sentence. the numbers can vary from approx 3.5 to 0 and are not symmetrized or normalized in any way. so it feels that we can input what we want lol
			* we want to input our own matrices, so we create a `set_graph_custom_edge_weights` and `custom_summarize`.