@ -30,4 +30,12 @@ modify the variable `wikipedia_page` in `make.py` to whatever page then
* using the `.content` method of python wikipedia, we get **plain text plus header in wikitext**, but things like `<p>`, `<ul>`, `<blockquote>`, etc all dissapeared. see if we want to craft a version using the `.html` method of python wikipedia, but it becomes more complex because of sentence tokenisation, probably need an index to keep track of their original div nested location.
* using the `.content` method of python wikipedia, we get **plain text plus header in wikitext**, but things like `<p>`, `<ul>`, `<blockquote>`, etc all dissapeared. see if we want to craft a version using the `.html` method of python wikipedia, but it becomes more complex because of sentence tokenisation, probably need an index to keep track of their original div nested location.
* **opacities were remapped** to add contrast to their curves. still need to experiment with that to find some kind of nice compromise on both paper and screen ?
* **opacities were remapped** to add contrast to their curves. still need to experiment with that to find some kind of nice compromise on both paper and screen ?
## [EXP] recommanded
## [EXP] recommanded
## [EXP] custom similarity
### technical note
* had to build a `similarity_graph` function to get the matrix of a text.
* the computation of those numbers is made in the `_get_similarity` function in `summarizer.py`, basically counting the words and dividing them by length of the sentence. the numbers can vary from approx 3.5 to 0 and are not symmetrized or normalized in any way. so it feels that we can input what we want lol
* we want to input our own matrices, so we create a `set_graph_custom_edge_weights` and `custom_summarize`.