experimentations on summarization algorithm with TextRank / PageRank, in the context of an Algolitary Publishing House by Anaïs Berck.
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Dorian 435988d621 better readme.md 2 years ago
summa first experiment with opacity 2 years ago
texts first experiment with opacity 2 years ago
www first experiment with opacity 2 years ago
README.md better readme.md 2 years ago
make.py first experiment with opacity 2 years ago
template.html first experiment with opacity 2 years ago

README.md

opacity experiment

For any wikipedia page, show the text content but where every sentences has an opacity inversely proportional to its TextRank score.

Meaning sentences considered as "relevant to be included in a summary of the article" becomes invisible; and what become visible is what would be considered as the "boring and redundant".

using

to use

modify the variable wikipedia_page in make.py to whatever page then

    python3 make.py

technical notes

  • headers opacities where manually recomputed has average of their section, this is justified because otherwise their break the flow of the document (their shortness seems to either put them nearly full black or white otherwise, independantly of how textrank rank the paragraph in the sections)
  • using the .content method of python wikipedia, we get plain text plus header in wikitext, but thing like <p>, <ul>, <blockquote>, etc all dissapeared. see if we want to craft a version using the .html method of python wikipedia, but it becomes more complex because of sentence tokenisation, probably need an index to keep track of their original div nested location.