Wednesday, April 24, 2019

The Value of Data Visualization


     Does information visualization provide sufficient return on investment (ROI)? Let us examine the investment value—the estimated financial worth—of providing end-user data visualization. For an organization desiring to depict complex or large data sets in various pictorial or graphical formats for 10,000 users, a commercial visualization product subscription model in which each seat license may cost $50 per month equates to $600,000 per year, not including the labor fee to prepare, implement and maintain the tool.
     Consuming the information on a proliferating number of endpoint mobile devices could further increase costs to more than $1 million, a nontrivial amount in any organization’s budget. The potential outlays escalate when spanned across the federal enterprise.
     Data visualization value, whether expressed directly from straightforward monetized return or subjectively derived from intangible benefits, needs to be assessed quantitatively to determine the economic return to permit a comparison with expected losses and gains from other organizational investments. Without an operationally relevant ROI performance metric, any project expense could be justified to counterweigh the risk of loss.
     Extracting value from data typically focuses on the larger and more expensive issues of management and use of big data, where it is assumed that information visualization is a derived byproduct. Yet when tallied as a separate line of investment, the intended scope of graphically depicted data may not provide enough justification for the production cost and potential difficulties. Consequently, as with any significant investment, the chief information officer and the chief financial officer should conduct a timely review of the data-visualization business case for a quantifiable performance measure of success or failure. For example, a good ROI likely would not involve spending more than $1 million on data visualization to save $200,000.
     Investments in data visualization must compete with other organizational priorities. Determining the ROI is a challenging exercise because it requires that the organization quantifiably measure not only the quality of the tool’s functional characteristics (whether it is accessible, accurate and well designed) but also utilization of the produced information— what can be achieved with better, data-driven management decisions? Too often investment decisions are made, and ROI is not measured, because it is considered unrealistic to expect a quantified measurement of less tangible benefits. The abstract goal of loosely defined long-term benefits then underpins the business case: greater business and customer insight, faster decision-to-answer time, or faster response to customers and markets. However, reducing uncertainty for intangible investments is possible, as indicated by Douglas Hubbard’s Rule of Five (How to Measure Anything: Finding the Intangibles in Business, published by John Wiley & Sons, Inc., 2010 and 2014). This was applied in the investment risk simulation example in my article, “How to Improve Communication of Information Technology Investments Risks,” in the November–December 2017 issue of Defense AT&L magazine. Subject-matter expert (SME) knowledge, supplemented with historical and industry statistics, may be a reliable source for accurate numerical value metrics.
     Most organizations produce or consume data for leadership to monitor performance and answer such basic questions as: “Are we accomplishing our objectives . . . Are we using our resources in the most effective manner ... Are we learning ways to improve our performance?” Some outcomes are relatively straightforward, such as “certifying compliance within a numeric benchmark for system defects that either did or did not decline over time.” For example, the Internal Revenue Service investment in the Return Review Program (RRP) fraud detection system—replacing the Electronic Fraud Detection System that dated from 1994—either does or does not help prevent, detect and resolve criminal and civil noncompliance. A successful system should result in greater success with more revenue returned to the U.S. Treasury to offset the RRP cost.
     But it is more difficult to pinpoint how the data results would reduce risk or improve organizational performance in essential planning, organizing, directing and controlling operations—i.e., identifying the specific business decision problem, the root issue, and how the data visualization investment would help. The answer would then define the metric created to evaluate visualization product cost against expected business results: ROI = Investment Gain/ Cost of Investment.
     To select the best tool for the job, management must first precisely determine how visualization would support users’ efforts to distinguish between evidence-based reality and unsubstantiated intuitive understanding. The tool must present raw abstract data in a manner that is meaningful to users for improving understanding, discovery, patterns, measurement, analysis, confirmation, effective ness, speed, efficiency, productivity, decision making, and in reducing redundancy. Classic approaches for extracting information from data include descriptive, predictive and prescriptive analytics. The most common is descriptive analysis, used as a lag metric to review what has already occurred. Predictive analysis also uses existing data as the basis for a forecast model. Prescriptive builds on predictive analytics, going a step further by offering greater calculated insight into possible outcomes for selected courses of action leading to better decision making. Data visualization of these approaches range from classic bar and pie charts to complex illustrations.
     The approach selected must align with the organization’s senior leader expectations or else the experiment will be short lived. The organization may already possess visualization tools that can be leveraged at little or no additional cost. If the organization is just getting started, a proof of concept pilot approach may be best, initiating a seminal demonstration that can be progressively refined until an effective management tool emerges. The beginning point could be basic metrics to more accurately measure and assess success associated with the organizational goals, objectives and performance plan. Basic example performance measurement of services, products and processes includes:
• Cost Benefit = (program cost avoided or cost incurred) / (total program cost)
• Productivity = (number of units processed by an employee) / (total employee hours)
• Training Effectiveness = (number of phishing email clicked) / (total phishing attempts)
     Performance metrics enable quantitative analysis of whether the tool investment produces sufficient monetary value, fundamentally a risk decision about business outlays. One common method for quantifying risk is: Annualized Loss Expectancy (ALE) = Single Loss Expectancy (SLE) x Annualized Rate of Occurrence (ARO). For example, if the average cost of a phishing and social engineering attack is $1.6 million (M) for a midsize company and the likelihood of a targeted user clicking on the malicious attachment or link is 0.02736, then the risk value = ($1.6M x 0.02736) = $43,776. After weighing the organization’s cyber defenses and history of cyber-attacks, the business-investment decision makers could better determine if investing in employee anti-phishing training and training data visualization is a reasonable risk-reduction expenditure. After the visualization tool has been purchased and deployed, the value of the insights revealed by the analytics must at that point be substantiated through organizational actions—i.e., cause and effect linkage leading to input/process/output adjustments. As a means of generating business intelligence, the organization is then able to weigh the tool’s value, which should be equal to or greater than the production cost. Generally, a more complex visualization results in higher tool cost. The journey from feasibility determination to requirements refinement and then to operational maturity, should be undertaken with the understanding that the initial investment may not be supported by the magnitude of the early results, but total improvement over time should be greater than total outlay.
     In conclusion, managing and mining vast amounts of complex data typically results in the need to view information in ways that are measurably meaningful and actionable to the organization. Added benefits include selective sharing, on-demand viewing and more informed decisions. Information visualization tools range from low cost Microsoft Excel charts to more powerful applications capable of producing relationship and pattern analysis, forecasts, scorecards and performance dashboards from large unstructured data. Organization leaders can then shift from reacting to lag measures towards proactive actions based upon predictive data presentation.
     Data visualization has a potentially significant cost that must be balanced against the payback benefits rather than simply bundled into a data management package. Selecting the best tool for the organization should include basic cost-benefit analysis based upon a performance metric of the value of the decisions made from the information provided.


Better Communications on IT Spending Risks


     Why are million-dollar information technology (IT) investment decisions based on single-point green, yellow, and red visual indicators, which are poorly defined and ineffective abstractions of the fundamental components of risk—probability and impact? Decisions are founded on a weak understanding of the risk without considering a range of possible outcomes for any choice of action.
      IT professionals can significantly improve how they assess and communicate program risk to business investment decision makers, who must allocate funds among competing priorities. We can reform our communication of risk to business leaders so we provide a range of estimated outcome values, within a confidence interval that reflects the inherent uncertainties of large, complex decisions.
      Monte Carlo simulation prepared with standard Microsoft Excel is a low-cost, yet effective, method for quantifiably modelling risk. Displaying the simulation results graphically as a familiar management histogram chart overlaid with a risk expectancy line enables uncertainty to be precisely articulated within a confidence interval for better-informed decision making. Risk variable values can also be changed on the fly to support dynamic what-if analysis. The model presented by the author was developed from material taught by Derek E. Brink, a Certified Information Systems Security Professional, in Harvard University’s Division of Continuing Education course “How to Assess and Communicate Risk in Information Security.”
      The stakes are high. The federal IT dashboard indicates that government-wide IT spending for fiscal year (FY) 2017 totals about $81.6 billion. The site also specifies that for all major IT investments government-wide, 3.4 percent of the projects are considered to be high risk, and 23.2 percent are considered medium risk. The U.S. Government Accountability Office has issued several reports between 2011 and 2015 documenting failed major IT projects, including eight projects valued at more than $8.5 billion. Improved risk analysis and communication would return substantial value. For example, if the cost of failed programs was reduced by merely 1 percent, this would amount to more than $85 million saved on these eight projects alone.
      The key or greatest facilitator of informed business decisions is communicating data uncertainty as a frequency and impact distribution, overlaid with an exceedance probability (EP) curve at the desired confidence level. The concept may seem complex, but the technique has been widely applied in financial, insurance, actuary, and catastrophe planning to estimate the probability that a certain level of loss will be exceeded over a given time.
I offer three assumptions regarding risk that show why I believe we must improve our assessment and communication of risk. These include:
• Risk is fundamentally determined by the likelihood of an undesirable event, and the impact of such an event.
• Risk in federal IT programs is mostly presented in qualitative terms of colors—red (high), yellow (medium) or green (low).
• Risk assessment and management are important activities for successful project management.

A More Detailed Look
     Risk determination depends upon the type of threat, weakness or vulnerability. However, framing risk based only on potential dangers does very little to enable value-based investment judgments. In fact, using technical jargon to present risk supports poor value judgments because there is no assessment of the odds that something bad actually will happen. As a result, decision makers often are left with only a binary choice of whether to commit resources. For example, the IT professional might describe a cyber-security risk as an unauthorized access breach that could expose employee records to compromise if stronger access management controls are not put into place. In the best-case scenario, the business leader is somewhat better informed and at worst has misleading value information on which to base decisions. Properly framing risk in terms of the probability and associated consequence magnitude allows evaluation of the level of uncertainty. Communicating the same cyber risk as a 10 percent probability that unauthorized access could result in an annual business cost of $2 million enables the organization leaders to determine how much risk they are willing to mitigate at the corresponding cost.
      Again, most risk in federal programs is presented as red, yellow or green. The color scheme is a risk representation convention described by the Department of Defense’s Risk, Issue, and Opportunity Management Guide for Defense Acquisition Programs. The approach to relative risk levels attempts to assess risk based upon Likert scales ranging from “not likely” to “near certainty” and “minimal impact” to “critical impact.” Likert scales are ordinal, meaning the data can be ranked but not accurately interpreted mathematically. In short, risk heat maps should be limited to the most basic risk prioritization. As a business investment decision support tool, the color-coded representation is ineffective for articulating quantified risk probability distributions for a range of possible outcomes for any meaningful choice of action.
Risk management seeks to define uncertainty as the probability of an event—and the business effect, positive or negative, of such an event. In terms of program and project management, risk is most often expressed for individual cost, schedule and performance variables in relationship to delivering the end product. Different disciplines such as research, engineering development, and logistics may each have its own perspective on project risk. But managing activity risk must not be confused with investment decisions that aggregate the effect of all variables to permit best-value business case investment analysis.
      The subject-matter expert (SME) plays an essential role in determining risk. SMEs typically are more knowledgeable than others regarding uncertainty measures within their areas. Using the unauthorized access breach example, the cybersecurity SME might estimate the likelihood that the organization could experience between one and three unauthorized access breaches within the next 12 months, in line with the 2016 Ponemon Institute data breach study reporting about a 26 percent likelihood of a company having one or more data breaches involving at least 10,000 records in the following 24 months. The SME knowledge, supplemented with historical and industry data, provides a reasonable measurement of the factors of risk, while incorporating the inherent uncertainty. Typical—though insufficient—risk representation would then simply apply an annualized loss expectancy (ALE) calculation such as annual loss = (likelihood of at least one breach) x (estimated number of breaches per year) x (estimated cost per breach). Given a breach cost estimated at $100,000, an ALE statement would quantify the annual potential risk as an average of $200,000. Based on this rudimentary cost analysis, risk then would be conventionally presented as red, yellow or green ordinal choices for the business leader to determine if the potential loss would be worth the financial investment needed to mitigate the risk.
Monte Carlo simulation is an excellent quantitative method for determining the likelihood of a potential loss within any of several designated intervals, over a range of values. Standard Microsoft Excel is more than adequate for creating simulation models and displaying possible scenario impact outcomes graphically as familiar charts. In the simulation model, the SMEs provide their estimates for the risk factors; specifically, providing the values for the upper and lower bounds, with a 90 percent certainty.
      For example, consider a hypothetical software development project for which the business leader wants to assess the risk of the project’s $40 million budget and submits the Business Impact Question: What is the risk that a longer development time will increase the overall project cost? Figure 1 illustrates the project simulation risk model, with four key risk variables that fundamentally determine the overall project duration. The model simulates the number of days to complete each factor. Factors 1, 2 and 3 are accomplished in parallel and must be completed before Factor 4 can begin; Factor 4 is then added to the highest of the three values. Daily cost is then applied to the resulting number of days.




     The probability and impact simulation results for this hypothetical project are displayed in Figure 2, indicating that for 10,000 simulations there is a 90 percent likelihood that the annual cost will exceed about $46 million and a 10 percent probability that the annual cost will exceed about $50 million, with a median (50 percent likelihood) expected annual cost of about $48 million. The values between 90 percent and 10 percent represent an 80 percent confidence interval, but any level of risk can be determined simply by examining the exceedance probability curve.




     When communicating with business leaders, the same information could be presented as in Figure 3. Because Excel calculates 10,000 simulations of this model in about 1 second, leaders could quickly receive answers to “what if” sensitivity analysis questions that change the risk simulation variable values such as labor and material costs, purchase versus lease, number of units produced or purchased, workforce size and payment schedules. Creating an initial risk simulation model from existing Monte Carlo modeling templates took about a week, but subsequently building the model used in this example took only about 1 hour. The simulation model is clearly a significant improvement over ALE and red-yellow-green risk communication. First, simulation considers thousands of possible outcomes, not just the average outcome. Second, simulation assesses the likelihood of each outcome. Third, risk analysis can then be communicated as quantified values rather than hunches or guesses.


Conclusions and Recommendations
     Business leaders facing uncertainty for significant investments in complex and expensive IT projects require more than simple risk heat maps to inform their decisions. Accurate and meaningful communication of risk requires a quantitative measurement of business impact. Risk simulation provides an inexpensive yet effective method for reducing uncertainty, by quantifying probability and impact for a possible future event, within a specified time period, over a range of values, with a specified confidence level. Communicating risk as, “90 percent likelihood that the annual cost will exceed about $46 million with a median (50 percent likelihood) annual cost of about $48 million” is far more useful to making a better-informed business decision than simply stating that increased project cost is “Very Low, Low, Moderate, High, or Very High.”
To begin transitioning from risk matrix to risk simulation for investment circumstances I recommend the following:
• Schedule FY 2018 and FY 2019 for discussion, publishing guidance and creating training opportunities. Then, beginning in FY 2020, provide that Monte Carlo risk simulation become mandatory for all IT investment decisions exceeding $1 million.
• Establish a library of basic simulation models and tutorials to facilitate rapid development for a variety of applications.