You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

402 lines
19 KiB
HTML

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Levenshtein Distance reads Cortázar {{ edition_count }}</title>
<style>
/* http://meyerweb.com/eric/tools/css/reset/
v2.0 | 20110126
License: none (public domain)
*/
html, body, div, span, applet, object, iframe,
h1, h2, h3, h4, h5, h6, p, blockquote, pre,
a, abbr, acronym, address, big, cite, code,
del, dfn, em, img, ins, kbd, q, s, samp,
small, strike, strong, sub, sup, tt, var,
b, u, i, center,
dl, dt, dd, ol, ul, li,
fieldset, form, label, legend,
table, caption, tbody, tfoot, thead, tr, th, td,
article, aside, canvas, details, embed,
figure, figcaption, footer, header, hgroup,
menu, nav, output, ruby, section, summary,
time, mark, audio, video {
margin: 0;
padding: 0;
border: 0;
font-size: 100%;
font: inherit;
vertical-align: baseline;
}
/* HTML5 display-role reset for older browsers */
article, aside, details, figcaption, figure,
footer, header, hgroup, menu, nav, section {
display: block;
}
body {
line-height: 1;
}
ol, ul {
list-style: none;
}
blockquote, q {
quotes: none;
}
blockquote:before, blockquote:after,
q:before, q:after {
content: '';
content: none;
}
table {
border-collapse: collapse;
border-spacing: 0;
}
</style>
<style>
@font-face {
font-family: XanhMono;
src: url(file://{{ BASEDIR }}/static/fonts/XanhMono-Regular.woff2) format('woff2'),
url(file://{{ BASEDIR }}/static/fonts/XanhMono-Regular.woff) format('woff'),
url(file://{{ BASEDIR }}/static/fonts/XanhMono-Regular.ttf) format('truetype');
font-weight: 400;
font-style: normal;
}
@font-face {
font-family: XanhMono;
src: url(file://{{ BASEDIR }}/static/fonts/XanhMono-Italic.woff2) format('woff2'),
url(file://{{ BASEDIR }}/static/fonts/XanhMono-Italic.woff) format('woff'),
url(file://{{ BASEDIR }}/static/fonts/XanhMono-Italic.ttf) format('truetype');
font-weight: 400;
font-style: italic;
}
html, body {
font-family: XanhMono;
font-size: 8.15pt;
line-height: 12pt;
}
body {
margin-left: 9rem;
}
input, select, option, button {
font: inherit;
}
h1 {
font-size: 18pt;
line-height: 26pt;
margin-bottom: 12pt;
}
h2 {
font-size: 14pt;
line-height: 18pt;
break-before: page;
margin-bottom: 12pt;
}
h3 {
font-size: 10pt;
line-height: 12.5pt;
break-before: page;
margin-top: 6pt;
margin-bottom: 6pt;
}
h3.extra-space {
margin-top: 25pt;
}
h1 + h2,
h2 + h3,
.avoid-break {
break-before: avoid;
}
a {
color: currentColor;
}
p, ul, ol {
max-width: 40rem;
margin-bottom: 12pt;
}
h1, h2, h3 {
max-width: 35rem;
}
pre {
font-style: italic;
margin-left: -9rem;
margin-top: 12pt;
}
blockquote {
font-style: italic;
}
footer {
font-style: normal;
font-size: 7pt
}
pre.normal-flow {
margin-left: 0;
}
.two-col {
columns: 2;
margin-left: -8rem;
column-fill: auto;
}
.two-col pre {
margin-left: 0;
margin-top: 0;
orphans: 3;
widows: 3;
}
ul, ol {
margin-top: 12pt;
}
li {
position: relative;
}
ol {
counter-reset: list-counter 0;
}
ol li {
counter-increment: list-counter 1;
}
ol > li {
margin-bottom: 12pt;
}
ol li:before {
position: absolute;
left: -1.15em;
content: counter(list-counter) '.';
}
ol ol {
margin-left: 2em;
counter-reset: sublist-counter;
}
ol ol li {
counter-increment: sublist-counter 1;
}
ol ol li:before {
left: -2.3em;
content: counter(list-counter) '.' counter(sublist-counter);
}
ul li:before {
content: '';
position: absolute;
left: -1em;
}
@page {
margin: 15mm 15mm;
}
@page:left {
@bottom-left {
content: counter(page);
}
}
@page:right {
@bottom-right {
content: counter(page);
}
}
@page:first {
@bottom-right {
content: '';
}
@bottom-left {
content: '';
}
}
</style>
</head>
<body>
<!--title -->
<pre>
{{ fragment_cover_map }}
</pre>
<h1>Levenshtein Distance<br>reads Cortázar</h1>
<p>
Generated on {{ date }} at {{ time }}, N⁰ {{ edition_count}}
</p>
<!--index -->
<h2>Index</h2>
<ol>
<li>Introduction</li>
<li>Reading Cortázar
<ol>
<li>Original fragment</li>
<li>Adapted fragment</li>
<li>Map of the woods</li>
<li>Table with new intermediary species</li>
<li>Repetitive poetry</li>
</ol>
</li>
<li>General description of the Levenshtein Distance</li>
<li>Technical description of the Levenshtein Distance</li>
<li>Code</li>
<li>Credits</li>
</ol>
<!--introduction -->
<h2>1. Introduction</h2>
<p>Levenshtein Distance reads Cortázar is the first version of the first book in the 'Algoliterary Publishing House: making kin with trees'.</p>
<p>The author of this book is the algorithm <a href ="https://en.wikipedia.org/wiki/Levenshtein_distance">Levenhstein Distance</a>, the subject is the eucalyptus in "Fama and eucalyptus", a fragment of <a href ="https://es.wikipedia.org/wiki/Historias_de_cronopios_y_de_famas">Cronopios and Famas</a> by <a href ="https://en.wikipedia.org/wiki/Julio_Cort%C3%A1zar">Julio Cortázar</a>.</p>
<p>The versions of the book are infinite by definition and each copy is unique.</p>
<p><a href = "https://www.anaisberck.be">Anaïs Berck</a> is a pseudonym and represents a collaboration between humans, algorithms and trees. <a href = "https://www.anaisberck.be">Anaïs Berck</a> explores the specificities of human intelligence in the company of artificial and plant intelligences. In June 2021, during a residency at Medialab Prado in Madrid, Anaïs Berck will develop a prototype of an Algoliterary Publishing House, in which algorithms are the authors of unusual books. The residency was granted by the "Residency Digital Culture" programme initiated by the Flemish Government.</p>
<p>In this work Anaïs Berck is represented by:</p>
<ul>
<li>the algorithm <a href ="https://en.wikipedia.org/wiki/Levenshtein_distance">Levenhstein Distance</a> of which you find a description in this book,</li>
<li>el eucalipto en <a href ="https://es.wikipedia.org/wiki/Historias_de_cronopios_y_de_famas">Cronopios and Famas</a> by <a href ="https://en.wikipedia.org/wiki/Julio_Cort%C3%A1zar">Julio Cortázar</a>, published in 1962 by Editorial Minotauro, English edition, 1999, New Directions Classic,</li>
<li>the human beings An Mertens and Gijs de Heij. An has published several books, as a fiction writer and as an artist and researcher at <a href="https://constantvzw.org/">Constant</a>, an organisation for experimental art and media in Brussels of which she has been a member since 2008. Gijs is a programmer and designer, part of <a href="http://osp.kitchen/">Open Source Publishing</a>, a collective of designers in Brussels. Both are members of <a href="https://algolit.net/">Algolit</a>, an artistic experimentation group in Brussels around algorithms and free texts.</li>
</ul>
<!--Reading Cortazar -->
<h2>2. Reading Cortázar</h2>
<!--Reading Cortazar -original text -->
<h3>2.1. Original fragment</h3>
<blockquote>
<p>
A fama is walking through a forest, and although he needs no wood he gazes greedily at the trees. The trees are terribly afraid because they are acquainted with the customs of the famas and anticipate the worst. Dead center of the wood there stands a handsome eucalyptus and the fama on seeing it gives a cry of happiness and dances respite and dances Catalan around the disturbed eucalyptus, talking like this:
</p>
<p>
— Antiseptic leaves, winter with health, great sanitation!
</p>
<p>
He fetches an axe and whacks the eucalyptus in the stomach. It doesnt bother the fama at all. The eucalyptus screams, wounded to death, and the other trees hear him say between sighs:
</p>
<p>
— To think that all this imbecile had to do was buy some Valda tablets.
</p>
<footer>Cronopios and Famas by Julio Cortázar, by <a href ="https://en.wikipedia.org/wiki/Julio_Cort%C3%A1zar">Julio Cortázar</a>, published in 1962 by Editorial Minotauro, English edition, 1999, New Directions Classic.</footer>
</blockquote>
<!--Reading Cortazar - rewritten text -->
<h3 class="avoid-break extra-space">2.2. Adapted fragment</h3>
<!--OUTPUT SCRIPT -->
<pre class="normal-flow">{{ new_fragment }}</pre>
<!--Reading Cortazar - map of the woods -->
<h3>2.3. Map of the woods</h3>
<p>The distances between the main tree you have chosen for your fragment and the areas of other tree species in the woods, according to Levenshtein Distance:</p>
<!--OUTPUT SCRIPT -->
<pre>
{{ forest_map }}</pre>
<!--Reading Cortazar - table of intermediary species-->
<h3>2.4. Table of intermediary species</h3>
<p>The Levenshtein Distance creates a table with the two species. In this table it calculates for each cell the distance between the distinct elements of the two words.</p>
<p>The table is filled with numbers that represent the operations necessary to change one element to another. The possible operations are inserting, deleting or substituting a letter. Instead of numbers, this table is filled with the various intermediary species that the algorithm creates by inserting, deleting or substituting letters.</p>
<!--OUTPUT SCRIPT -->
<pre>
{{ table_of_intermediary_species }}</pre>
<!--Reading Cortazar - repetitive poetry -->
<h3>2.5. Repetitive poetry</h3>
<!--OUTPUT SCRIPT -->
<div class="two-col">
<pre>{{ repetitive_poetry }}</pre>
</div>
<!--General description algorithm-->
<h2>3. General description of the Levenshtein Distance</h2>
<p>Levenshtein Distance is an algorithm that measures the difference between two words or two groups of letters. It is also called the 'edit distance'. The Levenshtein Distance between two words is the minimum number of actions needed to change one word to another. The different possible actions are insertion, deletion or substitution of a single letter. For example, the Levenshtein Distance between 'end' and 'and' is 1, as 'e' is replaced by 'a'.</p>
<p>The algorithm was named after its creator, Vladimir Levenshtein, a Russian mathematician and scientist of Jewish origin whose main area of research was information theory and error-correcting codes. He worked at the Kéldysh Institute of Applied Mathematics in Moscow. He passed away in 2017 at the age of 82. He launched the algorithm in 1965 'to consider the problem of constructing optimal codes capable of correcting deletions, insertions and inversions'.</p>
<p>The Levenshtein Distance operates in software such as spell checkers and consequently in computer-assisted translation programs. The Levenshtein Distance can also be found in search engines where it detects the words most similar to the wrongly entered word.</p>
<p>Its activity extends to less obvious fields such as plagiarism detection, DNA analysis, automatic voice recognition, optical character recognition in scanned text analysis (OCR), handwriting recognition, hoax email detection or stock market sales and purchase assistance.</p>
<p>Sometimes the Levenshtein Distance leads to surprising discoveries. In 1995, for example, Kessler applied the algorithm to the comparison of Irish dialects. He showed that it was a successful method for measuring phonetic distances between dialects. From the linguistic distances between dialectal varieties, dialectal areas can be found. More innovative was the possibility to draw dialect maps reflecting the fact that dialect areas should be considered as continuous and not as areas separated by sharp boundaries.</p>
<p>Sources:</p>
<ul>
<li>Vladimir Levenshtein, <a href="https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf">Binary codes capable of correcting deletions, insertions, and reversals</a>; Cybernetics and Control Theory, vol. 10 nr. 8, February 1966.</li>
<li><a href = "https://en.wikipedia.org/wiki/Vladimir_Levenshtein">Vladimir Levenhstein</a> + <a href="https://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein Distance</a> on Wikipedia.</li>
<li>RYBN, ADMXI, <a href = "http://www.rybn.org/ANTI/ADMXI/documentation/ALGORITHM_DOCUMENTATION/HARMONY_OF_THE_SPEARS/LEVENSHTEIN_EDIT_DISTANCE/ABOUT/Wikipedia_Levenshtein_Edit_Distance.pdf">Levenshtein Edit Distance</a>.</li>
<li>Abhi Dattasharma, Praveen Kumar Tripathi and Sridhar G, <a href="http://www.rybn.org/ANTI/ADMXI/documentation/ALGORITHM_DOCUMENTATION/HARMONY_OF_THE_SPEARS/LEVENSHTEIN_EDIT_DISTANCE/FINANCIAL_USES/2008_Identifying_Stock_Similarity_Based_on_Multi-event_Episodes.pdf">Identifying Stock Similarity Based on Multi-event Episodes</a>, in Seventh Australasian Data Mining Conference, 2008, Glenelg, Australia.</li>
<li>S. Dutta Chowdhury; U. Bhattacharya; S.K. Parui, Online Handwriting Recognition Using Levenshtein Distance Metric, in 12th International Conference on Document Analysis and Recognition, 2013, USA.</li>
<li>Yoke Yie Chen, Suet-Peng Yong, and Adzlan Ishak, Email Hoax Detection System Using Levenshtein Distance Method, in Journal of Computers, vol. 9, nr 2, February 2014.</li>
<li>Charlotte Gooskens and Wilbert Heeringa, <a href =" http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.414.9927&rep=rep1&type=pdf">Perceptive evaluation of Levenshtein dialect distancemeasurements using Norwegian dialect data</a>, in Language Variation and Change, nr.16, 2004, 189207. Cambridge University Press.</li>
</ul>
<!--Technical description algorithm-->
<h2>4. Technical description of the Levenshtein Distance</h2>
<p>Humans can rewrite a word and easily count the number of changes that are necessary to transform one word into another. We invite you to write the word 'machine' on a sheet of paper, followed by the word 'human' on the next line. Knowing that you can only insert, delete or replace one letter, how many operations would you need to do to rewrite the word 'machine' into 'human'?</p>
<p>To give you an idea of how the Levenshtein Distance algorithm works, we describe here the different steps the algorithm takes to transform the word machine into human.</p>
<p>For the word 'machine', the algorithm first analyses the possible elements. They are m, ma, mac, mach, machi, machin and machine for a total of seven elements. For the word human, the elements are h, hu, hum, huma and human for a total of five elements. This creates a matrix with 7 rows and 5 columns. In this distance matrix it will calculate for each cell the distance between the elements of the two words.</p>
<p>It starts with the first element of the word machine which is m, compares it with the five elements of the word human. The first one will be h. What is the Levenshtein distance between m and h? What it has to do is to replace the character m by h, thus the distance is 1.</p>
<p>Then it moves on to the next element of the word human which is hu. What is the Levenshtein distance between m and hu? As m contains only one character and hu contains more than one character, you can be 100% sure that you have to insert a new character. To transform m into hu, first the character m is replaced by h, and then u is added. To transform m into hu, the distance is 2.</p>
<p>Now it moves on to the third element. What is the distance between m and hum? It does the same as above, it replaces the character m by h and adds 2 more characters. The final distance is 3.</p>
<p>It continues until it has calculated the distance between the first element of the word machine, or m, and the 5 elements of the second word human. The distances are simply 1, 2, 3 and 4; they simply increase by 1.</p>
<p>After calculating the distances between the first element of the first word and all the elements of the second word, the process continues by calculating the distances between the remaining elements of the first word and the elements of the second word.</p>
<p>The process continues with ma. It compares ma to the five elements of the word human. The first one will be h. What is the Levenshtein distance between ma and h? What it has to do is replace the character m with h and delete the character a. Thus the distance is 2. It moves on to the next element of the word human which is hu. What is the Levenshtein distance between ma and hu? It replaces the character m with h and the letter a with u. The distance is 2.</p>
<p>That is how the table is filled up.</p>
<p>In code terms one could speak of an optimising effect on the table.</p>
<p>The value is calculated based on the three nearest digits of the cell in the table corresponding to the characters being compared: horizontal, vertical, diagonal.</p>
<p>If the letters are the same, the lowest value of the three is chosen.</p>
<p>If the letters are different, the lowest value of the three is chosen and 1 is added.</p>
<p>The last value in the table of counts is the minimum distance between the 2 words.</p>
<p>In the table it is the value situated in the lower right corner.</p>
<p>Ultimately it is a matter of tracing the shortest path in the transformations from one word to the other:</p>
<ul>
<li>the diagonal path dealing with different characters represents substitution</li>
<li>the diagonal path dealing with similar characters does not represent any change</li>
<li>the path towards the left represents an insertion</li>
</ul>
<p>Source: <a href = "https://blog.paperspace.com/measuring-text-similarity-using-levenshtein-distance/">Blog Paperspace</a></p>
<!--Code-->
<h2>5. Código</h2>
<!--OUTPUT SCRIPT -->
{% for path, source in sources %}
<h3>{{ path }}</h3>
<pre>{{ source }}</pre>
{% endfor %}
<!--Credits-->
<h2>6. Credits</h2>
<p>This book is a creation by <a href = "https://www.anaisberck.be/">Anaïs Berck</a> for <a href = "https://www.tabakalera.eus/es/agora-cemento-codigo">ÁGORA / CEMENTO / CÓDIGO</a>, a project of <a href ="https://www.tabakalera.eus/es/lekutan">Asociación Cultural LEKUTAN</a> in the <a href = "https://www.tabakalera.eus">International Centre for Contemporary Culture Tabakalera</a>, Donostia / San Sebastián.</p>
<p>The copy of this book is unique and the print run is infinite by definition.</p>
<!--OUTPUT SCRIPT -->
<p>This copy is number {{ edition_count }} of all downloaded copies.</p>
<p>Collective conditions of (re)use (CC4r), 2021</p>
<p>Copyleft with a difference: You are invited to copy, distribute, and modify this work under the terms of the CC4r: <a href = "https://gitlab.constantvzw.org/unbound/cc4r">https://gitlab.constantvzw.org/unbound/cc4r</a></p>
</body>
</html>