Special Session on Visual Analytics

This year, we want to highlight the role of Visual Analytics in knowledge acquisition. Visual analytics merges interaction, visual exploration and data analysis techniques to reveal hidden patterns and to derive trends from very large and complex datasets in a sense-making process. The KDML Workshop features a special session on "Visual Analytics". The session comprises invited contributions on highly topical research in the field. The presented topics will cover application-oriented research in geospatial analytics and life sciences as well as fundamental research targeted at supporting feature space analysis and an exhaustive visual search in multi-dimensional data.


The session will span two days and will be divided into two tracks:

  1. September 28th 2011, 4:00p.m.-6:00p.m.
  2. September 29th 2011, 1:00p.m.-2:00p.m.


The invited speakers of the two tracks are:

  • Dr. Hans-Jörg Schulz, University of Rostock
  • Dr. Natalia Andrienko, Fraunhofer IAIS, Sankt Augustin
  • Dr. Alexander Hinneburg, Martin-Luther University Halle/Saale
  • Dr. Gennady Andrienko, Fraunhofer IAIS, Sankt Augustin
  • Jun.-Prof. Tobias Schreck, University of Konstanz
  • Dirk J. Lehmann, University of Magdeburg



The following topics will be covered in this session:

Visual Analytics of Heterogeneous Data in Life Science Applications. Hans-Jörg Schulz

Data from real-world applications is often heterogeneous, exhibiting sparse and non-uniform distributions across a huge, multi-dimensional data space. The main challenge this situation poses is that different subsets of heterogeneous data need to be treated differently –-queried differently, analyzed differently, shown differently. It is therefore an inherent problem for Visual Analytics of such data, whichhas already been identified in Thomas' & Cook's Visual Analytics research agenda in 2005. Especially in the field of Life Sciences, where data are rarely uniform with respect to their provenance, their type, or their distribution, application experts are eager to finally get suitable tools to handle these. This talk presents two of such tools, which have been developed in close collaboration with medical experts: VisBricks, which is a Visual Analytics approach that allows for an intuitive, straightforward exploration of inhomogeneous tabular data, and Stack'n'Flip, which is a visualization design for realizing Visual Analytics workflows across multiple, heterogeneous data sets. On the basis of these two concrete examples, the talk establishes a number of research hypotheses, which seem worthwhile to investigate infuture work. These regard a fundamental interrelation between inhomogeneity and heterogeneity, as well as extending the concept of heterogeneity from data alone to other domains influencing the visual analysis. This can be, for example, heterogeneous groups of collaborating analysts or heterogeneous hardware/software setups to perform the analysis on -- both of which being common place in LifeScience applications.


Interactive Visual Clustering of Large Collections of Trajectories. Natalia Andrienko

One of the most common operations in exploration and analysis of various kinds of data is clustering, i.e. discovery and interpretation of groups of objects having similar properties and/or behaviors. In clustering, objects are often treated as points in multi-dimensional space of properties. However, structurally complex objects, such as trajectories of moving entities and other kinds of spatiotemporal data, cannot be adequately represented in this manner. Such data require sophisticated and computationally intensive clustering algorithms, which are very hard to scale effectively to large datasets not fitting in the computer main memory. This talk presents an approach to extracting meaningful clusters from large databases by combining clustering and classification, which are driven by a human analyst through an interactive visual interface.


Visually Summarizing Semantic Evolution in Document Streams with Topic Table. Alexander Hinneburg

This talk presents a visualization technique for summarizing contents of document streams, such as news or scienti c archives. The content of streaming documents change over time and so do topics the documents are about. Topic evolution is a relatively new research subject that encompasses the unsupervised discovery of thematic subjects in a document collection and the adaptation of these subjects as new documents arrive: old topics may win or lose in popularity, new topics may emerge and replace old ones. While there are many powerful topic evolution methods, the combination of learning and visualization of the evolving topics has been less explored, altough it is indispensable for understanding a dynamic document collection. The presented visualization approach, called Topic Table, builds upon topic modeling for deriving a condensed representation of a document collection but is not limited to a speci c topic model. Topic Table captures important and intuitively comprehensible aspects of a topic over time: the importance of the topic within the collection, the words characterizing this topic, the semantic changes of a topic from one time point to the next. As a visualization example, the content of the NIPS proceedings from 1987 to 1999 has been chosen.


Discovering Bits of Place Histories from People's Activity Traces. Gennady Andrienko

Events that happened in the past are important for understanding the ongoing processes, predicting future developments, and making informed decisions. Significant and/or interesting events tend to attract many people. Some people leave traces of their attendance in the form of computer-processable data, such as records in the databases of mobile phone operators or photos on photo sharing web sites. This talk presents a suite of visual analytics methods for reconstructing past events from these activity traces. The developed tools combine geocomputations, interactive geovisualizations and statistical methods to enable integrated analysis of the spatial, temporal, and thematic components of the data, including numeric attributes and texts. The utility of the approach is demonstrated on two large real data sets, mobile phone calls in Milano during 9 days and flickr photos made on British Isles during 5 years.


Visual Feature Space Analysis for Visual Analytics Applications. Tobias Schreck

Large amounts of data are collected and generated in many application areas. Visual Analytics is concerned with researching visual-interactive tools which help the user to understand and analyze data. Often, and true in particular for complex data types, the raw data cannot be visualized or analyzed directly, but descriptors (or feature vectors) need to be extracted during a preprocessing step. Given an abundance of potentially relevant descriptor extraction methods for many complex data types, the question arises which descriptors should be used for the visual analysis task at hand. In this talk, we will introduce the descriptor overload problem in context of visual-explorative data analysis. We will propose recent approaches developed within the project Visual Feature Space Analysis. Our approaches are based on comparative visual analysis in multiple descriptor spaces, relying on representing the descriptor spaces by 2D projections and by hierarchical structures. Applications to different data types are presented. We will also discuss research challenges arising in this context.


Features in Continuous Parallel Coordinates. Dirk J. Lehmann

Structures of multivariate and high-dimensional data, respectively are mapped onto certain features of the corresponding visualizations. Such visualizations are, e.g., Continuous Parallel Coordinates (CPC) and Continuous Scatterplots (CSP), which combine several scalar fields, given over a common domain. They facilitate a continuous view of data. There are feature curves in CPCs which are the dominant structures. In this talk, methods to extract and classify them will be presented. Furthermore, it will be shown that these feature curves are related to discontinuities in Continuous Scatterplots (CSP). Concluding, the usefulness of the CPC's features in terms of the visual data analysis process will be discussed.