DATA, DATABASE, METADATA, BIG DATA, PERSONAL DATA, DATA MINING…..
Because we are besieged by terms that include the word “data,” the phrase “big data” does not immediately capture attention. Big Data is comprised of little data, but it is more than a database. Data is shorthand for information stored digitally. The world is awash with information and we have not yet approached the high tide of information, of data, that is coming.
Each of us, as individuals, generates and is defined by data; and a subset of that is called personal data (with the subset being, roughly speaking, data identifying you or a device reasonably traceable to you). Our societal awareness of our personal data is rising as is legislation relating to how that is generated and managed.
But Big Data creates additional privacy issues because it is capable of taking lots and lots of tiny bits that do not identify you or your device. Like Johnny Carson sensing the contents of a sealed envelope floating before him on the old U.S. Tonight Show, many bits of data can accumulate and lead to you anyway. Although those bits often are not personal data, their use is part of what creates privacy issues when compiled into Big Data. An oft quoted example is the teenager, who had not told her family of her pregnancy, who received advertisements heralding the good news: “Now that you are pregnant …” That incident created a poster child for Big Data in the American retailer, Target, which harnessed Big Data to correlate five discrete and otherwise unrelated and independent data points found in a sea of purchasing data. Finding a pregnancy pattern in that sea was important to Target because it believed that pregnancy is a watershed moment when customers are open to changing their favorite shopping haunts. Big Data included the potential gateway to knowing when the customer was in that watershed moment of change, and Target’s use of Big Data analytics found the gateway.
So what is this Big Data really and why do we care?
1. Definition
Gartner Group’s analyst Doug Laney’s definition of Big Data is:
“Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”
Decoded, Big Data means data (of all types) that is so large in quantity that the prior processing tools and algorithms are not capable of effectively analyzing it. For instance, prior data crunching strategies marshalled the data into fields and orderly rows arising from hierarchies of homogenous data. Vast quantities of data include nonhomogeneous and unstructured data; data that is outside the bell curve where fields and orderly rows may be found. Vast quantities of data include large amounts of outlier data and heterodox data that eluded prior analysis. Big Data is not a statistical sample: it is raw, overwhelming data that speaks to patterns even if no one knows why.
2. Why is Big Data Different?
Big Data is here to stay and will impact everyone, companies and individuals alike. Big Data involves computational analysis to reveal patterns, trends, and associations. Big Data allows for inclusive large scale processing that cannot be done in small scale processing. The change is not in the machine that processes the data, but in the nature of the mammoth sized data accumulations and how we analyze that data with new and more complex algorithms. Remember, this is not a carefully constructed statistical sampling – it is pattern discernment, including garbage in – garbage out and patterns that can emerge regardless of the garbage because of the sheer mass of the data.
When big data is processed and stored, additional dimensions come into play, such as ownership, encumbrances, governance, security and policies. Choosing an architecture and building an appropriate Big Data solution is challenging because so many factors have to be considered.
3. Why is Big Data Important?
Big Data is transformative for the reason that it challenges how we interact with the world or the world interacts with us. Big Data is sufficiently radically transformative that legislators in all fifty states, on Capitol Hill and internationally are struggling to include rules and regulations to balance the benefits and dangers of Big Data. One of the themes that any reading about Big Data will reveal is that Big Data gives insight into data that can provide correlation, not but causation. This shift from causation (statistical data sample showing the why or cause of something) to correlation (Big Data processing showing patterns that may have nothing to do with causation) is a paradigm shift.
4. Coming Up
It is not an overstatement to say that Big Data represents a watershed of how we think of and work with data collection. Some enterprises will be collectors of vast quantities of data via collection methods that might or might not be legal. Some will be generating small or big data that they might or might not want to be part of the Big Data sea. Some will be forbidden to add flow to that sea or required to ensure that what they add is in a particular form. Some enterprises will be contractually obliged to protect or carefully titrate their customer data from improper collection or use.
Suffice it to say that the legal issues relevant to Big Data are numerous and material, even if not as voluminous as Big Data itself. This piece is one of a series that will focus upon how we can assist our clients in navigating this emerging technology area. Currently we see great heat being generated from all sides, but a scarcity of light in determining strategies for the future. This series will focus on shedding light on some of those navigation paths through the shifting topography of Big Data.