Visualizing high-dimensional data on a 2Dcanvas is generally challenging. It becomes significantlymore difficult when multiple time-steps are to be presented, as the visual clutter quickly increases. Moreover, the challenge to perceive the significant temporal evolution is even greater. In this paper, we present a method to plot temporal high-dimensional data in a static scatterplot; it uses the established PCA technique to project data from multiple time-steps. The key idea is to extend each individual displacement prior to applying PCA, so as to skew the projection process, and to set a projection plane that balances the directions of temporal change and spatial variance. We present numerous examples and various visual cues to highlight the data trajectories, and demonstrate the effectiveness of the method for visualizing temporal data.

Received: 22 July 2020
Published: 30 November 2020

Fund: Israel Science Foundation(Grant No. 2366/16 and 2472/17)

Corresponding Authors:
Min Lu
E-mail: orpatashnik@gmail.com;lumin.vis@gmail.com;amit.bermano@gmail.com;cohenor@gmail.com

About author: Or Patashnik is a computer science M.Sc. student in Tel-Aviv University. She received her B.Sc. cum laude in computer science and mathematics from Tel-Aviv University in 2015.|Min Lu is an assistant professor at Shenzhen University. She received her B.Sc. degree in computer engineering from Beijing Normal University, China, in 2011, and received her Ph.D. degree in computer science from EECS, Peking University in 2017. Her major research interests include visualization methodology and visual analytics. More information can be found at https://deardeer.github.io/.|Amit H. Bermano has been a senior lecturer (assistant professor) in the School of Computer Science in Tel-Aviv University since 2018. Previously, he was a postdoctoral researcher in the Princeton Graphics Group (2016-2018), and a postdoctoral researcher at Disney Research Zurich (2015). He conducted his Ph.D. studies at ETH Zurich in collaboration with Disney Research Zurich (2011-2015). His master and bachelor degrees were obtained at the Technion—Israel Institute of Technology.|Daniel Cohen-Or is a professor in the School of Computer Science. He received his B.Sc. cum laude in mathematics and computer science (1985), and his M.Sc. cum laude in computer science (1986) from Ben-Gurion University, and his Ph.D. degree from the Department of Computer Science (1991) of the State University of New York at Stony Brook. He is on the editorial boards of a number of international journals, and a member of many program committees of international conferences. He was the recipient of a Eurographics Outstanding Technical Contributions Award in 2005, and an ACM SIGGRAPH Computer Graphics Achievement Award in 2018. In 2013 he received the People’s Republic of China Friendship Award. In 2015 he was named a Thomson Reuters Highly Cited Researcher. In 2019 he won the Kadar Family Award for Outstanding Research. In 2020 he received the Eurographics Distinguished Career Award. His research interests are in computer graphics, in particular, synthesis, processing, and modeling techniques. His current main interests are in image synthesis, motion and transformations, shapes, surfaces, analysis and reconstruction, and information visualization.

Fig. 13D temporal synthetic data, in two time steps, projected into 2D using different approaches. In the first time-step, the data is composed of five groups of normally distributed elements, whose means are co-linear. The colors indicate the different groups. For the second time-step, half of the points of two randomly chosen groups were significantly translated in a similar direction, while the remaining points were only slightly translated. Points in each figure correspond to the points in the first time-step. Strokes connect a point in the first time-step to the corresponding point in the second time-step. Strokes are drawn for a sample of the points which were traslated significantly. (a) A scatterplot visualization using PCA projection of the whole data. The same data embedded using (b) t-SNE, and (c) a projection plane computed by our method.

Fig. 2Embedding example of points in ${\mathbb{R}}^{3}$. The blue points are of the first time-step, and the green ones are of the second one. Two possible projection planes are illustrated. The right plane is selected by applying PCA over the whole data, and best illustrates the spatial arrangement of all the data. The left plane, however, is selected using our method, and focuses the visualization on the temporal evolution.

Fig. 3Displacement scaling. A trajectory of four time steps (${({p}_{i}^{t})}_{t=1}^{\tau}$, purple) is transformed into our intermediate data representation (${({q}_{i}^{t})}_{t=1}^{\tau}$, red). The first time step is kept in place. The rest are moved according to their amplified displacement. For example, the displacement between ${p}_{i}^{2}$ and ${p}_{i}^{3}$ (${\delta}_{3}$) is depicted. After scaling, it remains in the same direction, but is $\alpha $ times larger.

$n$

Number of trajectories

${T}_{i}$

The $i$-th trajectory

$\tau $

Number of time-steps

${p}_{i}^{t}$

The point of the $t$-th time-step in ${T}_{i}$

${q}_{i}^{t}$

The point that ${p}_{i}^{t}$ was translated to

$\alpha $

The scale factor

$l?e?n?g?t?h?({T}_{i})$

Sum of euclidean distances between consecutive points of a trajectory

Fig. 4Demonstration of the scale factor $\alpha $ and its effect. From left to right and from top to bottom the $\alpha $ values are $0,1,4,7,20,40$. The ${\alpha}_{\mathrm{max}}$ value computed for this data is ${\alpha}_{\mathrm{max}}=20$, presented in the red frame. For $\alpha =0$, the projection plane is equivalent to the one found by considering the first time-step alone, and hence the motion is hardly discernible. For $\alpha =1$, the projection plane is equivalent to the one found by considering all the data together, giving the temporal axes no special attention. In general, the higher the $\alpha $, the more aligned is the projection plane to the temporal motion. Using higher values than ${\alpha}_{\mathrm{max}}$ yields very little change, as can be seen in the last example.

Fig. 5Toy example depicting the usage of weighted PCA (see Section 3.1.3). The data is distributed along two spatial lines, where one (red points) is more dominant than the other (green). The temporal motion is mostly orthogonal to both directions, and is significantly larger for the green points. (a) Visualization of the data using our method, with $\alpha =1$, for clarity. (b) Visualization using a weighted PCA method, where the weight of a point is proportional to its trajectory length. As can be seen, the secondary spatial axis has been chosen for visualization, even though it depicts the spatial arrangement less, since the green points receive more weight. (c) Visualization using our method, with $\alpha ={\alpha}_{\mathrm{max}}$. Since only the motion has been amplified, and not the spatial aspect of the green points, the visualization is not biased, and hence is aligned with both the temporal direction and the primary principle direction of the data.

Fig. 6To determine whether trajectories ${T}_{i}$ and ${T}_{j}$ are similar, we uniformly re-sample the curve, and sum the distances between the data new sample points (blue crosses). These distances are depicted by the dashed lines.

Fig. 7Synthetic datasets embedded in 2D using four different methods: (a, e) naive PCA, computed over all time steps together, (b, f) t-SNE, (c, g) UMAP, (d, h) our method. Note how temporal aspects are hardly visible using na?ve PCA, and how the t-SNE and UMAP methods do not preserve directions, making parallel trails seem dissimilar.

Fig. 8Hans Rosling’s statistics. Points represent three statistics of countries (population size, life expectancy, and GDP) over six selected years: (a) projected by PCA over the whole data, (b) projected using our method. Marked points represent the last time step, the stroke getting wider as time passes. Choosing strokes with respect to the class was used to get insights on each class separately.

Fig. 9Synthetic data projected into 2D using our method. In each case the stroke’s design is different: (a) smooth color and width transitions, (b) piece-wise constant coloring with a repeating thickness pattern, (c) repeating coloring pattern with a smooth width transition.

Fig. 10COVID-19 data in China. Each point is a province in China, with five dimensions (longitude, latitude, immigration ratio from the whole country, immigration ratio from Hubei (from 2020-01-22 to 2020-01-31): (a) applying PCA to the whole data, (b) using our method to project the points.

Fig. 11COVID-19 in the USA. Each datum represents a state, with 3 dimensions (longitude, latitude, and confirmed case) in the second week of March: (a) projected using applying PCA over the whole data; (b) projected using our method.

[1]

Archambault, D.; Purchase, H.; Pinaud, B. Animation, small multiples, and the effect of mental map preservation in dynamic graphs. IEEE Transactions on Visualization and Computer Graphics Vol. 17, No. 4, 539-552, 2011.

[2]

Rauber, P. E.; Falc?o, A. X.; Telea, A. C. Visualizing time-dependent data using dynamic t-SNE. In: Proceedings of the Eurographics Conference on Visualization (Short Papers), 73-77, 2016.

[3]

Tufte, E. Envisioning Information. Graphics Press Cheshire, 1990.

[4]

Keim, D. A. Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics Vol. 8, No. 1, 1-8, 2002.

[5]

Liu, S. S.; Maljovec, D.; Wang, B.; Bremer, P. T.; Pascucci, V. Visualizing high-dimensional data: Advances in the past decade. IEEE Transactions on Visualization and Computer Graphics Vol. 23, No. 3, 1249-1268, 2017.

[6]

Inselberg, A. The plane with parallel coordinates. The Visual Computer Vol. 1, No. 2, 69-91, 1985.

[7]

Kandogan, E. Star coordinates: A multi-dimensional visualization technique with uniform treatment of dimensions. In: Proceedings of the IEEE Information Visualization Symposium, Vol. 650, 22, 2000.

[8]

Keim, D. A.; Kriegel, H. P. VisDB: Database exploration using multidimensional visualization. IEEE Computer Graphics and Applications Vol. 14, No. 5, 40-49, 1994.

[9]

Chernoff, H. The use of faces to represent points in k-dimensional space graphically. Journal of the American Statistical Association Vol. 68, No. 342, 361-368, 1973.

[10]

Wang, Y. H.; Chen, X.; Ge, T.; Bao, C.; Sedlmair, M.; Fu, C. W.; Deussen, O.; Chen, B. Optimizing color assignment for perception of class separability in multiclass scatterplots. IEEE Transactions on Visualization and Computer Graphics Vol. 25, No. 1, 820-829, 2019.

[11]

Mayorga, A.; Gleicher, M. Splatterplots: Overcoming overdraw in scatter plots. IEEE Transactions on Visualization and Computer Graphics Vol. 19, No. 9, 1526-1538, 2013.

[12]

Lu, M.; Wang, S. Q.; Lanir, J.; Fish, N.; Yue, Y.; Cohen-Or, D.; Huang, H. Winglets: Visualizing association with uncertainty in multi-class scatterplots. IEEE Transactions on Visualization and Computer Graphics Vol. 26, No. 1, 770-779, 2020.

[13]

Chan, Y. H.; Correa, C. D.; Ma, K. L. The generalized sensitivity scatterplot. IEEE Transactions on Visualization and Computer Graphics Vol. 19, No. 10, 1768-1781, 2013.

[14]

Wilkinson, L.; Anand, A.; Grossman, R. High-dimensional visual analytics: Interactive exploration guided by pairwise views of point distributions. IEEE Transactions on Visualization and Computer Graphics Vol. 12, No. 6, 1363-1372, 2006.

[15]

Elmqvist, N.; Dragicevic, P.; Fekete, J. D. Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation. IEEE Transactions on Visualization and Computer Graphics Vol. 14, No. 6, 1539-1148, 2008.

[16]

Im, J. F.; McGuffin, M. J.; Leung, R. GPLOM: The generalized plot matrix for visualizing multidimensional multivariate data. IEEE Transactions on Visualization and Computer Graphics Vol. 19, No. 12, 2606-2614, 2013.

[17]

Dang, T. N.; Wilkinson, L. ScagExplorer: Exploring scatterplots by their scagnostics. In: Proceedings of the IEEE Pacific Visualization Symposium, 73-80, 2014.

[18]

Jolliffe, I. Principal Component Analysis. Springer Berlin Heidelberg, 1094-1096, 2011.

[19]

Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, No. 86, 2579-2605, 2008.

[20]

De Leeuw, J. Multidimensional scaling. 2000.

[21]

McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.

[22]

Nonato, L. G.; Aupetit, M. Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment. IEEE Transactions on Visualization and Computer Graphics Vol. 25, No. 8, 2650-2673, 2019.

[23]

Beck, F.; Burch, M.; Diehl, S.; Weiskopf, D. A taxonomy and survey of dynamic graph visualization. Computer Graphics Forum Vol. 36, No. 1, 133-159, 2017.

[24]

Krstaji?, M.; Keim, D. A. Visualization of streaming data: Observing change and context in information visualization techniques. In: Proceedings of the IEEE International Conference on Big Data, 41-47, 2013.

[25]

Bach, B.; Pietriga, E.; Fekete, J. D. GraphDiaries: Animated transitions and temporal navigation for dynamic networks. IEEE Transactions on Visualization and Computer Graphics Vol. 20, No. 5, 740-754, 2014.

[26]

Liu, S. X.; Yin, J. L.; Wang, X. T.; Cui, W. W.; Cao, K. L.; Pei, J. Online visual analytics of text streams. IEEE Transactions on Visualization and Computer Graphics Vol. 22, No. 11, 2451-2466, 2016.

[27]

Fisher, D. Animation for visualization: Opportunities and drawbacks. In: Beautiful Visualization. O’Reilly Media, 329-352, 2010.

[28]

Wang, Y.; Archambault, D.; Scheidegger, C. E.; Qu, H. M. A vector field design approach to animated transitions. IEEE Transactions on Visualization and Computer Graphics Vol. 24, No. 9, 2487-2500, 2018.

[29]

Fujiwara, T.; Chou, J. K.; Shilpika, S.; Xu, P. P.; Ren, L.; Ma, K. L. An incremental dimensionality reduction method for visualizing streaming multidimensional data. IEEE Transactions on Visualization and Computer Graphics Vol. 26, No. 1, 418-428, 2020.

[30]

Alvarez, G. A.; Franconeri, S. L. How many objects can you track: Evidence for a resource-limited attentive tracking mechanism. Journal of Vision Vol. 7, No. 13, 14, 2007.

[31]

Crnovrsanin, T.; Muelder, C.; Correa, C.; Ma, K. Proximity-based visualization of movement trace data. In: Proceedings of the IEEE Symposium on Visual Analytics Science and Technology, 11-18, 2009.

[32]

Jackle, D.; Fischer, F.; Schreck, T.; Keim, D. A. Temporal MDS plots for analysis of multivariate data. IEEE Transactions on Visualization and Computer Graphics Vol. 22, No. 1, 141-150, 2016.

[33]

Wulms, J.; Buchmüller, J.; Meulemans, W.; Verbeek, K.; Speckmann, B. Spatially and temporally coherent visual summaries. arXiv preprint arXiv:1912.00719, 2019.

[34]

Hong, D.; Fessler, J. A.; Balzano, L. Optimally weighted PCA for high-dimensional heteroscedastic data. arXiv preprint arXiv:1810.12862, 2018.

[35]

Robertson, G.; Fernandez, R.; Fisher, D.; Lee, B.; Stasko, J. Effectiveness of animation in trend visualization. IEEE Transactions on Visualization and Computer Graphics Vol. 14, No. 6, 1325-1332, 2008.