Taming the Data Maven

Once upon a time, the wise old CEO asked for monthly sales and production figures. Not unreasonable. However, our high-tech, large-scale computing platforms have turned this innocent dream into a nightmare. From spreadmarts to data warehouses, our infrastructure groans with the weight of accumulated data; yet good quality results are paradoxically harder to achieve than ever.

When Actian asked me if I would speak on the challenges facing organisations at Big Data World in London recently, I jumped at the chance. Cutting through the hype of big data is a personal crusade right now, as I firmly believe there are important ideas and solutions being obscured by the smoke and mirrors of the big marketing machines. Let’s face it, your data isn’t getting smaller is it? Every graph and chart you see has an exponential curve – and it points only upward.

On the one hand, this is really not a problem. Moore’s Law continues to help us with both storage and processing speed, and box sizes continue to shrink in every product cycle. On the other hand, what to do with all this data is a harder question. Wikipedia defines a Maven as One who understands, based on an accumulation of knowledge. I think we’ve got the accumulation piece covered. Certainly, the millions of shining, spinning platters woven into your SAN fabric tell a tale of storage on a massive scale. But do you achieve understanding from all this data? Here is the nub of the issue.

There are two key topics we must address if we are to find the real value and competitive edge that we know is buried in our data mine. These are Scope and Question.

Every organisation of any size has a multiplicity of data stores covering everything from a humble spreadsheet to a strategic data warehouse (DW). But have you noticed that despite all your efforts, mission critical data keeps popping up in spreadsheets? And it isn’t in the DW? We need to acknowledge that our quest to store All Data in the data warehouse is doomed to failure. Even if we have the resource and the budget – and these days, most of us have neither – by the time we have completed the programme to absorb All Data in the DW – yet more data stores have appeared! This is what it is like to live on an exponential curve – you’re always behind.

In parallel, we have moved from merely storing data so we can create a quarterly report or prove an audit trail, to asking questions about our data. Did viewers see our ad campaign and download our apps? When users downloaded our apps, did they take up the trial offer? Were any of these new users also existing customers in another segment? The list of potential user queries is now inexhaustible, and it’s burying most IT departments ability to produce reports. As if that weren’t enough, deploying software frameworks like Hadoop and using MapReduce technologies offers the capability to manipulate yet more data content, and facilitate yet other lines of analysis!

With a big data context, we also need to consider data sources outside the corporate firewall and in the cloud, like Salesforce.com, Google Analytics, Twitter, Facebook, and others. If your people are not already using some of these services, they soon will be. And once they do … the next step is analysing that data. At FlyingBinary we are focused on deploying and integrating cloud services and combining data from all sources into a 360° data and analytics ecosystem.

Resolving the Scope item for your organisation means identifying the real key data that the business wants to access and analyse. This is likely scattered across all the data stores listed above. We need to collect this key data together and this is where Actian Vectorwise really shines. Easy to set up and use, fast load and go importing, commodity hardware and best of breed query results per dollar invested. Slide this alongside your other data stores and you avoid any disruption or downtime to existing business flows. Oh, did I mention that it’s a true column store database (meaning it’s designed expressly to manage rapid, high volume analytics)? That too. Hadoop? Plug that right into Vectorwise. No problem.

Along the way, we also need to change our thinking and recognise that the true data owner is the business user. In IT, we are merely the custodian. This can be hard to do, but the benefits are huge.

For the Question piece, we need to provide business users (yes, again) with a self-service capability where they can safely ask adhoc questions themselves and publish the results to the community. This can include the wider, web-connected world, but more typically is the business user community within the organisation. We use Tableau Software for this presentation layer, with its universal data connectors, data blending from multiple sources, one-click publishing, and browser consumption using no plugins. Of course, 100% of the visuals it produces are totally aligned with the human visual system. But you knew that.

Here too, we need to change our thinking to align with our users’ needs for data consumption and analysis. We need to curate, assist and educate, rather than hoard, obstruct and govern.

Using a combination of applying innovative technologies like Actian Vectorwise and Tableau Software, and changing business thinking around data ownership and information access we can finally achieve the understanding we seek, alongside the accumulating we already profess.