|
|
|
* **headers opacities** where manually recomputed has average of their section, this is justified because otherwise their break the flow of the document (their shortness seems to either put them nearly full black or white otherwise, independantly of how textrank rank the paragraph in the sections)
|
|
|
|
* **headers opacities** where manually recomputed has average of their section, this is justified because otherwise their break the flow of the document (their shortness seems to either put them nearly full black or white otherwise, independantly of how textrank rank the paragraphs in their associated sections)
|
|
|
|
* using the `.content` method of python wikipedia, we get **plain text plus header in wikitext**, but thing like `<p>`, `<ul>`, `<blockquote>`, etc all dissapeared. see if we want to craft a version using the `.html` method of python wikipedia, but it becomes more complex because of sentence tokenisation, probably need an index to keep track of their original div nested location.
|
|
|
|
* using the `.content` method of python wikipedia, we get **plain text plus header in wikitext**, but things like `<p>`, `<ul>`, `<blockquote>`, etc all dissapeared. see if we want to craft a version using the `.html` method of python wikipedia, but it becomes more complex because of sentence tokenisation, probably need an index to keep track of their original div nested location.
|