Anyone who focuses on the practical uses of information technology, as I do, must consider the data aspects of adopting any new technology to achieve some business purpose. Reliable data must be readily available in the necessary form and format, or that shiny new IT bauble you want to deploy will fall short of expectations. Our research benchmarks cover a range of core business and IT processes, and they regularly demonstrate that data deficiencies are a root cause of issues organizations have in performing core functions; typically the larger the company, the more severe the data issues become.
Today, “big data” is a popular buzzword, connoting the ability of companies to tap into large amounts of structured and unstructured data using advanced data processing technologies and analytics to achieve insights not available using more conventional techniques. Big data is an important phenomenon, but in using it organizations must ensure that it doesn’t become a larger, faster way of experiencing the GIGO effect: Big Garbage In – Big Garbage Out. At IBM’s recent business analytics analyst summit, IBM Fellow and Chief Technology Officer Brenda Dietrich talked about the challenges of dealing with big data. In passing she provided insight into what may be one of the most important ways to address this challenge.
Today, handling data used for business analytics requires rigor to ensure the quality of the output. For this reason, most of today’s analytics are based on structured data, usually internally held. Our benchmark research confirms that today’s big-data efforts usually use existing internal structured data, not external and/or unstructured sources of information. The most commonly used source, customer data from CRM and other systems, is employed by two-thirds (65%) of companies, while 60 percent use data from transaction systems such as ERP and CRM. Only one-fifth (21%) use social media, 13 percent use multimedia and a mere 5 percent use data from smart meters. In the future, however, companies will need to exploit a greater amount of external, less structured and more complex data (multimedia, for example), because the most significant opportunities to profitably employ analytics in the future will come from expanding the scope of data applied to business issues.
One illustration of an analytics application that combines structured internal and unstructured external data is an analytic application that devises a just-in-time right-of-way maintenance process for an electric utility. Such maintenance is labor-intensive and expensive, but then so is dealing with the impact of downed power lines in severe weather. Rather than employing the usual approach of setting a fixed maintenance schedule, using these analytics it’s possible for a utility to combine internal data about the right of way, customer density and past incidents with real-time weather sensors (measuring rainfall, for example) and three-dimensional imaging to deploy maintenance crews more optimally. Efforts can be diverted away from less pressing areas to those where vegetation density poses the greatest threat of damage in the near term.
Analysts have considerable scope to devise innovative approaches using broad sets of data to address every aspect of managing a business, including operations, risk management, modeling and forecasting, to name just a few. However, broadening the sources of data used in business analytics poses difficult challenges. In today’s analytics, to ensure the quality of the answers, specific data used in reporting and modeling must be selected and values slotted into the appropriate pigeonholes. This approach will grow increasingly impractical when businesses try to utilize larger sets of information, especially when this includes an ever-changing set of unstructured information. Unstructured data exists outside of formal databases and includes text in documents, social media and the Web, as well as voice, video and other rich media. A major point of Dietrich’s presentation was that to make sense of this torrent of ever-changing data it will be necessary to develop models that are far more adaptive and flexible than today’s at ingesting data. Analytic modeling must be iterative, to accommodate learning and the evolving sets of data available. And the models must be able to assess the degree of trustworthiness of the information used and communicate the degree of confidence that consumers of the results should place on the results. With so much of the information coming from sources outside the company, the system needs to learn how accurate it is and how applicable it is to the analysis being performed.
Over time, these broad data sets can become quite reliable and therefore extremely useful. Meanwhile, the basic techniques of judging the quality of the data are important in themselves. It may be that even using the best available information will result in an answer that is “garbage” (that is, low-quality information that you shouldn’t trust), but at least the system will tell you that’s what it is. Being able to discern the difference will enable organizations to look for answers much more freely without having to spend time developing structure or doing rigorous testing to ensure quality.
I think today’s big-data efforts represent the first small step in what’s likely to be a long process of developing technologies, techniques and practices to use in designing and employing analytics that work with large sets of data of all types. The skeptic in me sees the danger of big data becoming big GIGO. The futurist in me sees the potential to apply information technology intelligently to develop better business practices in a way that is not feasible today. Both of us will be watching closely to see what happens.
Regards,
Robert Kugel – SVP Research

Business Exchange
Google+
Klout
LinkedIn
Plaxo
Twitter
Facebook Fan Page
Facebook Group
Ventana Research Website
5 comments
Comments feed for this article
June 29, 2012 at 6:19 pm
Patrick Taylor, CEO of Oversight Systems
Robert – Great post. Learning to ask the right questions with Big Data is one of the areas where we’ve seen organizations both struggle the most and achieve remarkable success. The best results come when organizations use unstructured data in the context of daily business decisions to develop richer insights than could be found simply using existing technologies.
For example, unstructured data such as emails can augment analysis of purchase orders to help ensure that items are being secured at the best price possible. Or when predictive analytics recognize unusual patterns of activity in financial transactions that indicate fraud, unstructured data analysis can produce corroborating evidence.
The key is maintaining that focus on daily business decisions and recognizing how Big Data can contribute to analysis of traditional data sources. Big Data is part of a greater whole — a “total data” environment, if you will. Unstructured data is just another part of that whole. It’s a tool and a resource, not the be-all and end-all in and of itself.
August 17, 2012 at 7:02 pm
Making Better Use of Ratios in Analytics «
[...] The use of advanced analytics is growing in importance as technology provides companies another way to achieve an edge on their competitors. At the same time, it’s critical that executives and managers build on the basics. If an organization cannot formulate the most important ratios that define business performance, and if it cannot readily access the data needed to perform this simple division, it’s unlikely to be able to handle large sets of data effectively and benefit from more advanced analytic techniques. Instead it is likely to wind up experiencing the “big garbage in, big garbage out” syndrome. [...]
August 24, 2012 at 7:31 pm
Good Data Stewardship Is Critical for Business Analytics «
[...] trend toward big data puts a spotlight on data quality management. Without quality data, as I’ve noted, organizations will wind up with just a bigger, less manageable version of the [...]
August 24, 2012 at 7:31 pm
Good Data Stewardship Is Critical for Business Analytics «
[...] trend toward big data puts a spotlight on data quality management. Without quality data, as I’ve noted, organizations will wind up with just a bigger, less manageable version of the [...]
November 30, 2012 at 7:02 pm
What Every CEO Needs to Know About Key Supporting Technologies «
[...] Big data is a term that begs the question, “How big is big?” The answer is: any data set so large and complex that it’s difficult or impossible to process it with existing tools in a reasonable amount of time. There are three dimensions to big: volume (the number of bytes), velocity (the speed with which data must be moved) and variety (the number of types of data). Big data has become an issue because businesses are generating an accelerating amount of usable information, and even more useful information exists outside of a company’s walls than within it. At the same time, data processing systems no longer limit organizations to using mainly (or only) structured data; that is, the type that exists in formal databases. Advanced techniques make it possible to mine social media to gauge sentiment or parse audio files to rapidly uncover unhappy customers. Industrial data from sensors in machinery and at every point along a supply chain gives organizations greater insight into ways to improve efficiency and respond faster to changing environmental and business conditions. Our big data benchmark research found the top benefit for the technology, cited by three-fourths of organizations (74%), is to allowing companies to retain and analyze more data. Big data can be an important resource for companies, but it’s equally important to recognize the importance of good data management practices. Failure to institute appropriate practices simply results in “big garbage in, big garbage out.” [...]