Today I am going to give you an overview of my new book,“Data Dynamite: how liberating information will transform ourworld.” Originally I was to co-author the book with Vivek Kundra,Chief Technical Officer of the District of Columbia, and a truetrailblazer in this field. However, fortunately for the US,unfortunately for me, President Obama chose Vivek to become theUS’s first CIO.
I’m convinced I was chosen by to write this book through somesort of cosmic joke, because I’m the least-likely person to write abook on data. You see, I’m right-brained and intuitive. For me, dataused to be good for one thing, and one thing only: figuring the RedSox’ batting averages. But in reality, that makes me ideally suited towrite this book, because it’s time that people like me no longer bedisenfranchised when it comes to data. It’s time for data for the restof us!
When I got interested in data, I found it was pretty hard to get at. We pay taxes so government can collect data, and youcan bet companies know all about our shopping habits. Ouractivities and lives are data’s raw material. But once it’s collected, most citizens -- and a lot ofemployees for that matter -- don’t have a clue where data is storedor how it’s used. It’s like that last scene in “Raiders of the LostArk,” where the Ark is boxed up and stored in a governmentwarehouse: you knew it wouldn’t be found again. Substitute a datawarehouse and you’re got the picture of the too-frequent reality.
Today, there are signs of hope. Closely-controlled andlong-lost data is being liberated by the growing demand fortransparency. Perhaps the best example is one of Vivek Kundra’sprimary accomplishments while he was the U.S. CIO: Data.gov.The government launched it in the Spring of 2009 with about 20data sets. By the end of its first three months in use more than100,000 government data sets – many of them valuable real-timegeo-spatial ones – had been uploaded, Now, nearly 400,000 datasets are hosted on Data.gov, demonstrating how much data hasbeen trapped in data warehouses, waiting only to be liberated toserve the common good .
The time has come to liberate data! ”Liberating data makes it automatically available tothose who need it (based on their roles and responsibilities), whenand where they need it, in forms they can use, and with freedom touse as they choose -- while simultaneously protecting security andprivacy."
The result will be change and benefits in every aspect ofour lives, changes that are particularly critical given the currentglobal challenges and that will improve our lives:• give workforces real-time information• automate previously manual processes, saving time & increasingefficiency• improve government regulatory processes by making access toreports instantaneous and shareable by all agencies• reduce corporate regulatory costs• restore public confidence through transparency• empower the public as full partners in government and business.
However, we are a long way from fully realizing thesebenefits. Data.gov and its counterparts in about 20 other countriesto the contrary, the reality is that, by and large, data has not beenliberated either by government or businesses -- and when it hasbeen liberated we’re often unprepared to capitalize on it. The potential for transformation is not all that differentfrom 1520, when Martin Luther’s translation of the Latin Bible intoGerman and decision to print copies, instead of hand-copy them,gave most people direct access to the printed word for the firsttime. They no longer had to rely on the clergy as intermediaries. The results were quick and dramatic: Luther’s works noonly led to the Reformation, but to a tremendous push for literacyand the printed word. Just as the printing press transformed learning andpeople’s access to the word, so too the Internet, and handful of newweb-based tools, none of them radically innovative by themselvesbut revolutionary when combined, is making it possible, in manycases for the first time, for workers and the general public to havedirect access to actionable, valuable data. I believe the benefits andrevolution for numbers will be equally dramatic as what Luther setin motion for words.
The first step to begin this transition is an strategic one:It’s time to switch to data-centric organizations, in which usabledata is accessible to all sorts of applications and devices,automatically, and all of the organization’s functions are arrangedaround the data.
The 2nd step to liberate data is to assure that data isvaluable. That means that instead of data becoming captured andaltered by applications, it must remain as “data nuggets,”accessible to all applications and machines that can act on it. Tocreate those data nuggets we must “structure” data using XML,KML or other systems that attach “tags” such as the XBRL onesyou see here, to the numbers. This information about information,or metadata, transforms mere numbers into valuable data. In thiscase, instead of just the number 882,000,000, we now know itrefers to the company’s net income. That income data can flowautomatically, and in real time, to any place where the same tagsare inserted.9 These tag systems are universal, open standards,available to all, at no charge. I want to emphasize standards,incidentally: it’s precisely because XML, XBRL, KML areuniversally recognized and not proprietary, that it makes themvaluable: they, and the data tagged by them can be shared by all.One of the most important aspects of XML and variants is thatonce the tags are attached to the data, they remain attached: thepackage of metadata and data can be automatically shared byother applications as well as devices. That reduces errors becausethe data doesn’t have to be rekeyed: you get a “single version of thetruth.”
The third step for effective liberating data programs isto provide users with the Web 2.0-based tools such as Gapminder(shown here) that will make it possible for them to really capitalizeon that data. Even for trained statisticians, let alone the rest of us,data visualization tools aid in understanding complex data sets,relationships, and so on, because they take statistics and portraythem graphically, which makes it easier to understand trends,possible causality, and other factors. As one of the acknowledgedthought leaders in data visualization, Edward Tufte, says,“Graphics reveal data. Indeed, graphics can be more precise andrevealing than conventional statistical computations.” In recent years a number of lower-cost dashboardapplications such as Tableau, as well as free web-based datavisualization tools, such as Many Eyes, have become available ,allowing non-statisticians to easily take data and turn it into a widerange of highly informative visual representations, while Web 2.0tools such as tags, threaded discussions and topic hubs encouragerobust discussion of the results. That’s important, too: when datais discussed by people with differing backgrounds, interests andskills, aspects of the data are discovered and explored that even thebrightest person, working in isolation, would never uncover.
Curiously, although a growing range of governmentagencies release public data streams, almost none provide them totheir own workforces, to give workers actionable data preciselywhen and where they need it, to do their work more efficiently. The fourth element of an effective liberating datastrategy is for agencies -- and corporations -- to follow theDistrict of Columbias lead, and apply the same strategy behind thefirewall first, giving workers access to the same data they disclosein public data feeds. After all, employees may be struggling withincompatible data bases, may need to reach across departmental“silos” to see if there might be synergies between programs, andemployees from another department may be able to provide newinsights simply because of their differing life experiences andexpertise. As more young workers, who have never known lifewithout the Web, join workforces, they’ll naturally ask why toolsthey’ve used can’t be used in the workplace. A data graphics projectcan empower them and tap their expertise. Using the same data feeds to run your organization thatagencies and companies furnish through external data feeds to thepublic and others can be a powerful way of earning public trust:you’re in essence saying we stand behind this data: we’re soconfident in it that we use the same data to run our dailyoperations as we furnish to you.
Finally, on the cutting edge of liberating data is to use it toinvite your customers or citizens to become co-creators of productsand services. That’s what Beth Noveck, the former Obama Administrationdeputy CTO, did prior to joining the Administration, with the Peer-to-Patent program, which allows interested experts and laymen tobecome active partners in the patent review process. They havealready significantly reduced the patent application backlog. With liberating data, crowdsourcing will becomecommonplace and will result in both improved services to thepublic and entrepreneurial opportunities.
But what if you liberate data but nobody comes? We have torealize, and deal with, the reality that a majority of the Americanpopulation is innumerate, i.e., doesn’t have the basic skillsdemanded to deal with basic numeric calculations. This rate wasprobably masked by indifference during the era when data washard to obtained, but now that it is potentially ubiquitous, thathigh failure rate is unacceptable. Fortunately, the same tools that can make data intelligible andinteresting to adults can also be used in the classroom to make drynumbers come alive and let students learn by playing withnumbers. The private sector should partner with educators tomake this transition a reality, to build numeracy and the people’sability to deal with statistical information..
One reason for optimism that a new data-centric society couldovercome innumeracy is the way that users of a wide range ofsocial media have been quick to adopt, and have quickly learned touse accurately, tagging data. In this case, use of the #wxreport tagassures that the National Weather Service’s computers will receiveTweets referring to breaking local weather observations, makingthe public valuable adjuncts to other information sources. If thiskind of alteration in user behavior can happen spontaneously,imagine what could happen if there were formal programsdesigned to increase data numeracy!
Thank you. To learn more about liberating data and how to createthe processes and policies to make it a reality, contact:Stephenson Strategies 335 Main Street, Medfield, MA 02052 (617)314-7858 D.Stephenson@stephensonstrategies.com