People leverage the power of interactive visualization to make sense of, and gain insights from large/complex datasets. In machine learning (ML), developers need to understand how their models work. Stakeholders who make decisions with, or are affected by, such models also need that understanding (albeit in broader terms), for legislative and good business practice reasons. That is the essence of "Explainable AI" (XAI) - an approach that allows the reasoning of ML and other types of AI model to be explained.
Visualizations work by showing people graphical patterns from which characteristics "pop out" or can be found by inspection ("visual search"). Although some types of visualization are agnostic to scale (e.g., outliers pop out in a box plot, irrespective of the number of values), many visualizations break down (e.g., due to overplotting, trying to use colour to encode categorical variables that have dozens of levels, or forcing people to zoom/scroll a lot just to see all the data).
The Making Visualization Scalable (MAVIS) project will address these deficiencies by bringing together a multidisciplinary team of experts in visualization, user evaluation, visual communication, machine learning and statistics, with a strong track record in fundamental and applied visualization research. The team are all part of the Leeds Institute for Data Analytics (LIDA), where researchers from six faculties work collaboratively to address specialist challenges. The project's aim is to develop and evaluate methods for visually communicating and interacting with data in visualizations that are effective for the large/complex data that is commonplace in XAI, specifically focussing on the explainability of ML classification models (deep neural decision trees, gradient boosting, random forests, etc.).
The project is divided into four work packages (WPs). WP1 will identify fine-grained tasks that developers and stakeholders perform to explain ML models, via a literature review and real-world scenarios. We will also create datasets suitable for investigating those tasks, and publish the annotated/documented datasets and our software as a resource for other researchers and practitioners.
The heart of the project is WP2 (static visualizations) and WP3 (interactive visualizations), where we will answer two questions that are central to the funding call: "how to improve visualizations?" and "how should people interact with data and visualizations?" Our driving hypothesis in WP2 is, counterintuitively, that as data gets more complex visualizations should be made simpler. To investigate that hypothesis, we will: (a) quantify (response time; error rate) and characterise the scales of data where visual encodings (colour, etc.) break down and impede people from gaining insights, (b) address the breakdowns by developing and evaluating new encoding simplification methods based on visual mappings and view transformations, and (c) compare and evaluate widely used and hybrid chart types.
WP3 has a similar structure, starting by investigating how people interact in the WP1 tasks, to identify barriers, obstacles and inefficiencies. From that analysis, we will identify requirements for new interaction designs, and develop and evaluate corresponding solutions. By following this rigorous approach, we are confident that our new visualization designs will transform the effectiveness with which people can work (as we have previously shown in genomics, petrophysics and other applications).
WP4 grounds our fundamental research in real-world scenarios from transport, health and business. We will perform two phases of field evaluations to corroborate the benefits of our best visual communication (WP2) and interaction designs (WP3), and answer the question "how improving visualizations and interactions can improve human centred decision making?" when people need to understand, diagnose, or explain ML classification models.
|