Thursday, June 23, 2011

Mining the City Data: Making Sense of Cities with Self-Organizing Maps

Omar Neme started his presentation with sentence: "Humans can create cities without the use of reason." At first it may seem to be like that, but like we saw in presentation there may still be some sense in cities.

Distribution of the population over the city is neither random nor completely regular. The attributes of the population, such as age distribution and educational level are important for city planners in order to make correct decisions where to place for example school, hospitals etc. It is also important to know how city's neighbourhoods are distributed accordingly to demographic and economic variables. Urban planers, as well as politicians, are urged to understand those processes and variables in order them to do proper planing.

In paper 72 neighbourhoods, which consists 554 500 inhabitants, in Mexico City were analysed in terms of economic, demographic, mobility, air quality and several other variables in years 2000 and 2010. Data was gathered from public data which is more or less reliable. For example they did destination survey of how people move and used free map software to gather data of all streets. Older data gathered from library which was apparently hard work.

Neme spoke how they have two major interests in studying gathered data. First, they sought to identify neighbourhoods with similar urban features and similarity of certain urban regions. Second, they intended to visualise the evolution of neighbourhoods from urban point of view. Cities are constantly changing. Citizen get older, new ones are born, new streets are build, new schools buildings are build. So they compared data from years 2000 and 2010 and analysed how each neighbourhood has modified its own variables by time. Ten years is a short period of time to observe significant changes in a city. In general, observed neighbourhoods tend to stay more or less the same. There were, however, some neighbourhoods that clearly changed. One of neighbourhoods, a small residential area, shifted its position towards a cluster of neighbourhoods with higher standards.

When analysing cities there is many parameters. To make sense of several aspects of cities, such as traffic flow, mobility, social welfare, social exclusion, and commodities, data mining may be an appropriate technique. In research basic SOM was used. They defined a neighbourhood as a region of blocks that share same administrative instance. Which is of course an artificial division, but defines the city well enough. Each neighbourhood is defined by an attribute vector. This allows the use of multiple variables.

So, what could be seen from maps?

The visual information obtained by self-organizing map shows interesting and previously unseen relations. For example, households with at least one car lived in richer neighbourhood. This is quite logical. People who have money can afford a car. But it was also noticed that they lived in places that were badly connected to other neighbourhoods. So they have a lot of cars but less streets to leave and enter to neighbourhood. This of course causes traffic. So do people buy cars because they have money or because there they tend to live in neighbourhoods that have bad connections?

Another interesting thing was noticed. Neighbourhoods with the highest percentage of educational level are not the ones whose inhabitants earn the highest salaries. And in neighbourhoods with lower level of education people tend to travel more outside from their own neighbourhood (for work, social reasons). This may mean that people with higher level of education tend to live near their jobs.

After session some question were asked. One from audience commented that in his country highly educated people earn money by selling drugs. So they seem to be earning less than they really are. Neme answered that he doubts that it is the case in Mexico City. In his opinion it is because highly educated and students usually live near the universities and don't really get high salary.

So conclusion was that Self-organizing maps are a suitable tool for planners who seek correlations in cities. It might help to discover relevant information. So self-organizing maps are a good alternative at least in the visualisation and data inspection task to make sense of the city.


Wednesday, June 22, 2011

A SOM-Based Analysis of Early Prosodic Acquisition of English by Brazilian Learners: Preliminary Results

Silva from State University of Piauím, Brazil, presented in his poster if transfer of word stress from first language to the second language can be analyzed by using Self-organizing map(SOM).
They analyzed how stress patterns transfer from first learned language to second language among Brazilian students. The first language was Brazilian Portuguese as the foreign language was English. They wanted to see if SOM would be able to organize the speakers in the groups according to how much they transfer stress pattern and have other linguistic similarities.

The corpus used was composed of interview recordings with 30 students. Students were asked to utter 30 different English sentences containing situations where certain words act sometimes as a verb or as a noun. Because in English some words are pronounced differently depending are they a verb (obJECT) or noun (OBject). These kind of words are investigated because English and Brazilian Portuguese stress patterns are significantly different in these cases. In the poster it was only presented results obtained for the sentence "I object to going to a bar", where the analyzed word was the verb "obJECT". So the errors while pronouncing this word occur when it is pronounced as a noun instead of a verb.

To make the network, SOM used only the information provided by the Linear Predictive Coding coefficients. So no prior linguistic knowledge were used nor SOM was trained. After constructing map, label information was inserted for the result analysis.

It was discovered that by analyzing the resulting U-matrix, two clusters could be seen clearly. They confirmed that speakers were organized according to similarities on prosodic features. So the larger one was the group that transfers the Brazilian Portuguese stress pattern into English and the smaller group that didn't.

Inside the groups can be smaller subgroups which when closely examined in isolation might reveal more about the linguistic analysis of the speaker's utterances. Next Silva and co. will develop experiments to analyze these subgroups. They hope that in the future they could develop a tool for determining the language proficiency level classification in foreign languages.

WSOM Banquet

The WSOM conference organized a banquet on Tuesday evening, 14th of June 2011. The weather was rather cold after having been very warm just two days earlier when the temperature in Helsinki was about +30C. The dinner was organized in restaurant Saari in an island. While the participants waited for a boat, some group pictures were taken.


People in the picture include Sami Virpioja, Eero Carlson, Krista Lagus, Mats Sjöberg, Jorma Laaksonen, Mika Sulkava, Guilherme Barreto, Teuvo Kohonen, Anneli Kohonen, Barbara Hammer, Risto Miikkulainen, Marina Resta, Kadim Tasdemir, Erkki Oja, Jean-Charles Lamirel, Pablo Estevez, Ilari Nieminen, Tommi Vatanen and Timo Honkela. (This is to be completed.)


Amaury Lendasse took the first picture, and in the second picture he is on the right. In this picture, Olli Simula is second from the right.

Monday, June 20, 2011

Rudolf Mayer: Analysing the Similarity of Album Art with Self-Organizing Maps

Rudolf Mayer from Vienna University of Technology has examined connections between music genres and album cover art using self-organizing maps, SOMs. In traditional record stores, consumers already search music with the help of album covers, and there are a lot of resources used to design album covers for the right target groups. With the growing selection of digital music it is important to create more easier ways to organize and storage music. Computer algorithms that can understand some features of music, and then recommend new artist for users, are already used. There are studies where music is analyzed with not only audio features, but also for example with song lyrics and other texts related to it.

The data used in the research, which included over 900 song sorted by 7 genres, was gathered from music store amazon.com, where you can search by genre and download both the music sample and the album cover. Gathering of data was problematic in this study, for even though there are some music banks for research use, most of them don't include album covers or enough information to search them.

Mayer first trained SOMs sized 22x18 nodes with audio features, Rhythm Patterns, Rhythm Histogram and Statistical Spectrum Description (SSD), and analyzed the best of these to cluster music. According to authors' perception, SSD provided the best arrangement of music. Approximate analyze on this data showed that each genre has very recognizable features on album art. For example, classical music has simple design with photos of people, only few colors and simple background. When maps were trained with image features, Color Histogram, Color Names and Scale Invariant Feature Transform, the idea was that similar music should be located close on the map. But there were basically no continuous areas of similar music with any used image feature. Analytic comparison of SOMs showed also only little percentage of matches between two mappings.

Conclusion was that the used image features are not enough to arrange as complex features as album art has. There is potential in using album art when analyzing music information, but there is a need for more powerful image feature descriptors.

Monday, June 13, 2011

Prof. Markku Mattila, President of the Academy of Finland, opened WSOM 2011

In his opening talk, Prof. Markku Mattila gave an overview on scientific research in Finland. For instance, he referred to the fact that the level of investment into science is very high and there are relatively more researchers in Finland than anywhere else. Mattila mentioned that the SOM is a very good example of how scientific results may have an impact in many areas of the society. He also emphasized the importance of international collaboration and exchange.


In one study, the most efficient science policy action to support making innovations was found to be the provision of funding for young researchers to participate international conferences. At the end of his presentation, President Mattila welcomed all participants of WSOM 2011 and hoped that some of them would considered a longer research visit in Finland.

WSOM 2011 starts in suitably cool weather

Finland has been suffering from an early heat wave but between Sunday, 12th and Monday, 13th of June the temperature is falling from +30 to about +15. This makes the weather suitable for the intellectual efforts.


Detailed information on the weather in Espoo is available at http://en.ilmatieteenlaitos.fi/weather/Espoo. One fact that is shown at this page is the length of the day: the sun raises before 4am and sets just before 11pm and even after that the darkness does not fall rapidly.

Tuesday, June 7, 2011

WSOM 2011 blog created

This blog is intended to provide information about and experiences from the Workshop on Self-Organizing Maps, 2011. WSOM 2011 is co-located with ICANN 2011 conference and a similar blog has also been created for ICANN at icann2011.blogspot.com.