I collect articles on data integration. I do this for my ongoing education, as well as to gain new and interesting perspectives on a technology that I’ve been involved with since the early 90s.
One of the most thought-provoking articles was a recent article by Anne Buff, discussing the ethics behind data, including big data and data integration. “There has been a lot of hype around the introduction of social media data and big data to the worlds of data integration and master data management. After all, isn’t more data – capable of helping us identify and understand our customers better – invaluable to the business? Perhaps, but along with its infinite value could come some highly unexpected, extensive costs and liabilities if not handled appropriately.”
As Anne points out, as we integrate external data sources with other data sources, we must certainly be aware of the governing laws and regulations in how we handle that data. What is missing, however, is an understanding of the responsibility of data management with ethics in mind. We must do no harm as we keep and leverage data from many different sources, including our own. “We are consistently seeing more news reports about brand-damaging situations companies are facing because the ethical implications of their actions were just not considered.”
Most of us don’t think about ethical responsibility around data integration and the general use of data. These days, to have data is to have understanding, and to have understanding is to have power, and to have power there is a potential for abuse.
Data integration has always been a powerful tool to drive understanding, typically intra-enterprise. By having applications and databases that share data, enterprises can react upon near perfect information.
For instance, the ability for a sales order system to check a customer’s credit rating, and dynamically adjust the price based upon the risk of not getting paid. Moreover, the ability for the sales information to automatically inform the production systems to begin building and delivering a product, then the information moves to accounting, and perhaps to a data warehouse. All of this occurs within seconds. Back in the day, we called this the “real time enterprise.”
Today we have a few new concepts today that make data even more powerful, including:
- The rise of big data systems that allow for the easy analysis of both structured and unstructured data with quick response times.
- Better and more scalable data integration technology that’s able to replicate data across systems and databases.
- The rise of cloud-delivered data from many different sources, including social media, government, and commercial organizations.
- An increasing desire to leverage this information to enhance revenue.
So, a well-integrated enterprise can actually have access to many different data points, both inside and outside of the enterprise. The ability to see that data, and place it into context, provides insights not available in the past, such as the ability to determine human attributes, even if the direct data is not there, such as marriage status, sexual orientation, income, criminal records, political leanings, credit, hobbies, affiliations, etc. .
Much of this data is derived from seemingly innocuous data, such as posting a picture of your new motorcycle on Facebook, or stating your support for gay marriage on twitter. Even if you don’t put this information out on social networks, certain conclusions can be reached based upon whom you allow into your virtual social circles, or by tracking your smart phone. The ability to draw these conclusions from data is the core concept behind data science, which is a rising discipline.
Other more business-oriented sophisticated data analysis can occur as well. An example would be the ability to determine if a company will meet their quarterly numbers based upon thousands of unrelated data points, and trading on that information. Or, another example would be government entities using data gathered with GPS systems leveraged by motorists to issue speeding tickets. The list goes on.
So, what are the ethics around using data, and gathering data using data integration technologies and approaches? As we discussed above, with data comes knowledge, and with knowledge comes power, and with power comes responsibility.
As we learn how to leverage data to understand more about the data that we manage, as well as leverage other outside data to define a better pattern of context, we actually have to ask the question: What do we really need to understand to support the business? What information is relevant? What are the legalities around management of certain data? What information is too invasive?
This does not mean we just protect the company from criticism, or avoid PR issues. This is about a fundamental set of policies that guide the use of information that has become far more complete and detailed than it was just a few years ago. This is about the ethical use of data, and the continued ability to leverage data integration approaches and technologies for the good of the company.