{{ fragment_cover_map }}
    
    
    
    
    
    

Levenshtein Distance
reads Cortázar

Generated on {{ date }} at {{ time }}, N⁰ {{ edition_count}}

Index

  1. Introduction
  2. Reading Cortázar
    1. Original fragment
    2. Adapted fragment
    3. Map of the woods
    4. Table with new intermediary species
    5. Repetitive poetry
  3. General description of the Levenshtein Distance
  4. Technical description of the Levenshtein Distance
  5. Code
  6. Credits

1. Introduction

Levenshtein Distance reads Cortázar is the first version of the first book in the 'Algoliterary Publishing House: making kin with trees'.

The author of this book is the algorithm Levenhstein Distance, the subject is the eucalyptus in "Fama and eucalyptus", a fragment of Cronopios and Famas by Julio Cortázar.

The versions of the book are infinite by definition and each copy is unique.

Anaïs Berck is a pseudonym and represents a collaboration between humans, algorithms and trees. Anaïs Berck explores the specificities of human intelligence in the company of artificial and plant intelligences. In June 2021, during a residency at Medialab Prado in Madrid, Anaïs Berck will develop a prototype of an Algoliterary Publishing House, in which algorithms are the authors of unusual books. The residency was granted by the "Residency Digital Culture" programme initiated by the Flemish Government.

In this work Anaïs Berck is represented by:

2. Reading Cortázar

2.1. Original fragment

A fama is walking through a forest, and although he needs no wood he gazes greedily at the trees. The trees are terribly afraid because they are acquainted with the customs of the famas and anticipate the worst. Dead center of the wood there stands a handsome eucalyptus and the fama on seeing it gives a cry of happiness and dances respite and dances Catalan around the disturbed eucalyptus, talking like this:

— Antiseptic leaves, winter with health, great sanitation!

He fetches an axe and whacks the eucalyptus in the stomach. It doesn’t bother the fama at all. The eucalyptus screams, wounded to death, and the other trees hear him say between sighs:

— To think that all this imbecile had to do was buy some Valda tablets.

2.2. Adapted fragment

{{ new_fragment }}

2.3. Map of the woods

The distances between the main tree you have chosen for your fragment and the areas of other tree species in the woods, according to Levenshtein Distance:




{{ forest_map }}

2.4. Table of intermediary species

The Levenshtein Distance creates a table with the two species. In this table it calculates for each cell the distance between the distinct elements of the two words.

The table is filled with numbers that represent the operations necessary to change one element to another. The possible operations are inserting, deleting or substituting a letter. Instead of numbers, this table is filled with the various intermediary species that the algorithm creates by inserting, deleting or substituting letters.



{{ table_of_intermediary_species }}

2.5. Repetitive poetry

{{ repetitive_poetry }}

3. General description of the Levenshtein Distance

Levenshtein Distance is an algorithm that measures the difference between two words or two groups of letters. It is also called the 'edit distance'. The Levenshtein Distance between two words is the minimum number of actions needed to change one word to another. The different possible actions are insertion, deletion or substitution of a single letter. For example, the Levenshtein Distance between 'end' and 'and' is 1, as 'e' is replaced by 'a'.

The algorithm was named after its creator, Vladimir Levenshtein, a Russian mathematician and scientist of Jewish origin whose main area of research was information theory and error-correcting codes. He worked at the Kéldysh Institute of Applied Mathematics in Moscow. He passed away in 2017 at the age of 82. He launched the algorithm in 1965 'to consider the problem of constructing optimal codes capable of correcting deletions, insertions and inversions'.

The Levenshtein Distance operates in software such as spell checkers and consequently in computer-assisted translation programs. The Levenshtein Distance can also be found in search engines where it detects the words most similar to the wrongly entered word.

Its activity extends to less obvious fields such as plagiarism detection, DNA analysis, automatic voice recognition, optical character recognition in scanned text analysis (OCR), handwriting recognition, hoax email detection or stock market sales and purchase assistance.

Sometimes the Levenshtein Distance leads to surprising discoveries. In 1995, for example, Kessler applied the algorithm to the comparison of Irish dialects. He showed that it was a successful method for measuring phonetic distances between dialects. From the linguistic distances between dialectal varieties, dialectal areas can be found. More innovative was the possibility to draw dialect maps reflecting the fact that dialect areas should be considered as continuous and not as areas separated by sharp boundaries.

Sources:

4. Technical description of the Levenshtein Distance

Humans can rewrite a word and easily count the number of changes that are necessary to transform one word into another. We invite you to write the word 'machine' on a sheet of paper, followed by the word 'human' on the next line. Knowing that you can only insert, delete or replace one letter, how many operations would you need to do to rewrite the word 'machine' into 'human'?

To give you an idea of how the Levenshtein Distance algorithm works, we describe here the different steps the algorithm takes to transform the word machine into human.

For the word 'machine', the algorithm first analyses the possible elements. They are m, ma, mac, mach, machi, machin and machine for a total of seven elements. For the word human, the elements are h, hu, hum, huma and human for a total of five elements. This creates a matrix with 7 rows and 5 columns. In this distance matrix it will calculate for each cell the distance between the elements of the two words.

It starts with the first element of the word machine which is m, compares it with the five elements of the word human. The first one will be h. What is the Levenshtein distance between m and h? What it has to do is to replace the character m by h, thus the distance is 1.

Then it moves on to the next element of the word human which is hu. What is the Levenshtein distance between m and hu? As m contains only one character and hu contains more than one character, you can be 100% sure that you have to insert a new character. To transform m into hu, first the character m is replaced by h, and then u is added. To transform m into hu, the distance is 2.

Now it moves on to the third element. What is the distance between m and hum? It does the same as above, it replaces the character m by h and adds 2 more characters. The final distance is 3.

It continues until it has calculated the distance between the first element of the word machine, or m, and the 5 elements of the second word human. The distances are simply 1, 2, 3 and 4; they simply increase by 1.

After calculating the distances between the first element of the first word and all the elements of the second word, the process continues by calculating the distances between the remaining elements of the first word and the elements of the second word.

The process continues with ma. It compares ma to the five elements of the word human. The first one will be h. What is the Levenshtein distance between ma and h? What it has to do is replace the character m with h and delete the character a. Thus the distance is 2. It moves on to the next element of the word human which is hu. What is the Levenshtein distance between ma and hu? It replaces the character m with h and the letter a with u. The distance is 2.

That is how the table is filled up.

In code terms one could speak of an optimising effect on the table.

The value is calculated based on the three nearest digits of the cell in the table corresponding to the characters being compared: horizontal, vertical, diagonal.

If the letters are the same, the lowest value of the three is chosen.

If the letters are different, the lowest value of the three is chosen and 1 is added.

The last value in the table of counts is the minimum distance between the 2 words.

In the table it is the value situated in the lower right corner.

Ultimately it is a matter of tracing the shortest path in the transformations from one word to the other:

Source: Blog Paperspace

5. Código

{% for path, source in sources %}

{{ path }}

{{ source }}
{% endfor %}

6. Credits

This book is a creation by Anaïs Berck for ÁGORA / CEMENTO / CÓDIGO, a project of Asociación Cultural LEKUTAN in the International Centre for Contemporary Culture Tabakalera, Donostia / San Sebastián.

The copy of this book is unique and the print run is infinite by definition.

This copy is number {{ edition_count }} of all downloaded copies.

Collective conditions of (re)use (CC4r), 2021

Copyleft with a difference: You are invited to copy, distribute, and modify this work under the terms of the CC4r: https://gitlab.constantvzw.org/unbound/cc4r