Data for the rest of us.. W. David Stephenson Stephenson Strategies Authors @ Google October 28, 2011 Today I’m going to give you an overview of my new book, “Data Dynamite: how liberating information will transform our world,” and, equallyimportant, how it relates to Google’s Mission. Originally I was to co-author the book with Vivek Kundra, who was chief technical officer of the District of Columbia, and a true trailblazer inthe ﬁeld of democratizing data. However, fortunately for the US, unfortunately for me, President Obama chose Vivek to become the U.S.’s ﬁrst CIO.Much of his thinking can be found in my book.
INTJ + ENFP But ﬁrst, there are a couple of things you should know about me that weren’t covered in the kind introduction. I was a religion major in college. I never intended to become a minister, but I did become an evangelist -- an evangelist for change. Second, if you’re familiar with the Myers-Briggs personality test, about 95% of engineers are INTJs -- Introspective, Intuitive, Thinking andJudging, which makes you almost the polar opposite of me, Extroverted, Intuitive, Feeling, and Perceiving. In my mind, that means that we can eachbeneﬁt from cross-fertilization of the other’s unique way of looking at the world. Today I want to use that perspective to intorduce you to adifferent way of looking at data than you, who work with it so much, may not be aware of.
Finally, I’m convinced I was chosen to write this book through some sort of cosmic joke, because on ﬁrst blush I’m the least-likely person towrite a book on data. You see, I’m right-brained and intuitive. For me, data used to be good for one thing and one thing only: checking the RedSox’ batting averages. But in reality, that makes me ideally suited to write this book, because it’s time that people like me no long be disenfranchised when itcomes to data. It’s time for data for the rest of us...
But when I got interested in data, I found it was pretty hard to get at. We pay taxes so government can collect data, and you can bet companies know all about our shopping habits. Our activities and lives aredata’s raw materials. But once it’s collected, most citizens -- and a lot of employees for that matter -- don’t have a clue where data is stored or how it’s used.It’s like that last scene in “Raiders of the Lost Ark,” where the Ark is boxed up and stored in a government warehouse: you knew it wouldn’t befound again. Substitute data warehouse and you’ve got the picture of the too frequent reality.
Liberate Data! “liberating data makes it automatically available to all who need it, when and where they need it, in forms they can use, and with freedom to use as they choose -- while protecting security and privacy” The time has come to liberate data! “Liberating data makes it automatically available to all who need it, when and where they need it, in forms they can use, and with freedom touse as they choose -- while protecting security and privacy.”
Crucial for today’s challenges •give workers real-time data •automate processes •cut reporting costs •improve regulation •restore public trust •make people partners The result will be change and beneﬁts in every aspect of our lives: changes that are particularly critical given the current global challenges,and which will improve our lives: They will give workforces real-time information to help them make better decisions and carry out their responsibilities more effectively. They will automate previously manual processes, saving time and increasing efficiency. They will improve government regulatory processes by making access to reports instantaneous and shareable by all agencies. Seemingly at odds with the previous beneﬁt, they will also reduce companies’ regulatory compliance costs. They will restore badly-eroded conﬁdence in government and industry through transparency . They will empower the public to be full partners in business and government.
Data-centric organization Most of the steps I am going to tell you about are probably second-nature to Google: bear in mind that they are entirely new to mostorganizations. The ﬁrst step to begin this transition is a strategic one: it’s time to switch to data-centric organization, in which usable data is accessible topeople, applications and devices, automatically, and all of the organization’s functions are visualized as revolving around the data.
Tag and syndicate data The second step to liberate data is to assure that the data you have is valuable. That means that, instead of becoming captured and alteredby applications, it must remain as “data nuggets,” accessible to all applications, people, and machines that can act on it. To create data nuggetswe must structure data using XML, KML or other taxonomies that attach tags such as the XBRL-GL ones you see here, to the numbers. Thismetadat transforms mere numbers into valuable data that can ﬂow anywhere the tags are repeated. As you know, one of the most important aspects of metadata and data can be automatically shared. That reduces errors because the datadoesn’t have to be rekeyed: you get a “single version of the truth.” It also means we can begin to ask a new question: where else can this data create value, within our company or with our customers orsuppliers? Copying the tags in these locations lets the data ﬂow automatically to other places it can create value. I can’t emphasize it enough: if data is tagged the ﬁrst time it occurs in your operations, it doesn’t have to be entered again.
Provide tools Next, the data is worthless, especially to those among us who are part of the great unwashed, without easy-to-use Web 2.0 based tools thatwill allow us to collaboratively analyze, interpret, and act on the data. I emphasize collaborative tools, because they introduce the “wisdom of crowds” phenomenon, in which richer, more nuanced interpretationsof the data gradually emerge due to back-and-forth between a number of those participating in the discussion. Incidentally, this happens to be the ﬁrst data visualization that I ever did, using Many Eyes, a tool co-developed by your own MartinWattenberg when he was at IBM. A rank amateur, I was able to scrape the data, upload it, and create this visualization in less than an hour. Thankyou, Martin!
Get workers data Curiously, although a growing range of government agencies release public data streams, almost none provide them to their ownworkforces, to give workers actionable data precisely when and where they need it, to do their work more efficiently. The fourth element of an effective liberating data strategy is for agencies -- and corporations -- to follow the District of Columbiaslead, and apply the same strategy behind the ﬁrewall ﬁrst, giving workers access to the same data they disclose in public data feeds. After all, employees may be struggling with incompatible data bases, may need to reach across departmental “silos” to see if there mightbe synergies between programs, and employees from another department may be able to provide new insights simply because of their differinglife experiences and expertise. As more young workers, who have never known life without the Web, join workforces, they’ll naturally ask why tools they’ve used can’t beused in the workplace. A data graphics project can empower them and tap their expertise. Finally, using the same data feeds to run your organization that agencies and companies furnish through external data feeds to thepublic and others can be a powerful way of earning public trust: you’re in essence saying we stand behind this data: we’re so conﬁdent in it thatwe use the same data to run our daily operations as we furnish to you.
Crowdsource Finally, on the cutting edge of liberating data is to use it to invite customers or citizens to become co-creators of products and services. That’s what Beth Noveck, the former Obama Administration deputy CTO, did prior to joining the Administration, with the Peer-to-Patentprogram, which allows interested experts and laymen to become active partners in the patent review process. They have already signiﬁcantlyreduced the patent application backlog. With liberating data, crowdsourcing will become commonplace and will result in both improved services to the public and entrepreneurialopportunities.
So, what are some examples of this data-centric approach in action? Let me tell you about two, right in the Boston area. The ﬁrst is a start-up, Vitality, Inc., which makes something called a pill-bottle cap. It’s not granny’s pill bottle cap -- although she may be using it soon. Each cap has it’s own IP address and a lowll-powered transmitter. It is programmed to your prescription, and your schedulefor taking it. A while before you’re supposed to take the pill, the cap and a wall-mounted nightlight/transmitter begin to glow.When you take the pill, the transmitter instantly notiﬁes Vitality. That data in turn generates a weekly email update to a friend orrelative, harnessing the power of social networks; it automatically reorders reﬁlls, and every month you and your doctor get areport, plus incentives if you exceed compliance goals. Untaken prescriptions are a $23 BILLION a year problem, so this systemcan not only improve the patient’s health, but also dramatically affect soaring health care costs.
The second is the acid test of the data-centric approach, because it is literally a case of life-or-death, with elements of security and privacythrown in to complicate the mix. The second example is from across the river, at Beth-Israel Deaconess Hospital, where state-of-the-art online medical records with built-inHIPPA security and privacy safeguards mean that, on one hand, an ER doc treating an unconscious patient will have access to her complete medicalrecords, while at the same time prying eyes are denied access. If a data-centric approach can work for this most critical of needs, why aren’t otherorganizations following suit?
How can we build a data-centric society? So how can we build a data-centric society, and what and why, in particular, can Google do to help in this effort?
Be a model! First, you can be a model. You’ll remember that I mentioned earlier that to really be fully data-centric, I believe you have to tag all of your data the ﬁrst time you enterit. If you aren’t, and speciﬁcally if you’re not tagging with XBRL GL, you can be a model for other organizations. Just do it!
Provide tools One area where Google can be of particular assistance is by increasing availability of your data-visualization tools such as Public DataExplorer (shown here in its non-proﬁt incarnation, “Gapminder”) that will make it possible for them to really capitalize on that data. Even fortrained statisticians, let alone the rest of us, data visualization tools aid in understanding complex data sets, relationships, and so on, becausethey take statistics and portray them graphically, which makes it easier to understand trends, possible causality, and other factors. As one of the acknowledged thought leaders in data visualization, Edward Tufte, says, “Graphics reveal data. Indeed, graphics can be more precise and revealing than conventional statistical computations.” In recent years a number of lower-cost dashboard applications such as Tableau, as well as free web-based data visualization tools, have become available , allowing non-statisticians to easily take data and turn it into a wide range of highly informative visual representations, while Web 2.0 tools such as tags, threaded discussions and topic hubs encourage robust discussion of the results.
Promote data What if, on the Google search results page, “Data” was listed along with Maps, Videos and more?
Overall, I’m optimistic about the possibility of building a data-centric society, because of the work of Steve Martin. Oops, I mean Steve Jobsand Martin Luther.
Luther liberated written word •390 editions published in 1523. •By 1525, 3M copies of pamphlets relating to him printed. •Transformed scholarship. The potential for transformation is not all that different from 1520, when Martin Luther’s translation of the Latin Bible into German anddecision to print copies, instead of hand-copy them, gave most people direct access to the printed word for the ﬁrst time. They no longer had torely on the clergy as intermediaries. The results were quick and dramatic: Luther’s works no only led to the Reformation, but to a tremendous push for literacy and theprinted word. Just as the printing press transformed learning and people’s access to the word, so too the Internet, and handful of new web-basedtools, none of them radically innovative by themselves but revolutionary when combined, is making it possible, in many cases for the ﬁrst time,for workers and the general public to have direct access to actionable, valuable data. I believe the beneﬁts and revolution for numbers will beequally dramatic as what Luther set in motion for words.
More recently, we can look to the work of Steve Jobs.
“Our goal was to bring a liberal arts perspective and a liberal arts audience to what had traditionally been a very geeky technology and a very geeky audience” --- Steve Jobs One of the things he was most proud of in was the way that, as he said it, the Mac brought “a liberal arts perspective and a liberal artsaudience to what had traditionally been a very geeky technology and a very geeky audience.” I’d say that the current reality with data is that it too is a geeky technology with a very geeky audience. I hope that Google will reach out tothose of us from that liberal arts background and that, together, we can bring Mac-like simplicity of use to the world’s data. Then, and only then,will Google have achieved the lofty goals of your mission.
To learn more about liberating data: And read... Thank you, and now I’d like to take some of your questions! To learn more about liberating data and how to create the processes and policies to make it a reality, contact:Stephenson Strategies 335 Main Street, Medﬁeld, MA 02052 (617) 314-7858 D.Stephenson@stephensonstrategies.com